数据挖掘笔记第六节 推荐系统
Posted 是坠坠啊
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据挖掘笔记第六节 推荐系统相关的知识,希望对你有一定的参考价值。
写在前面
-
本文为《数据挖掘》课程推荐系统部分重要的知识点,为相对简略的知识点汇总。文末点击 阅读全文
获取思维导图,后期复习的时候可以借助其快速地过知识点。 -
教材:《数据挖掘导论》:Tan, Pang-Ning, et al. Introduction to data mining, 2018.
Basics of Recommendation Systems
Formal Model
U: the set of users
I: the set of items
U×I→R: the set of ratings
– Explicit 显式:Ask people to rate items (like 1-5 stars)
– Implicit 隐式:Learn from user behaviors (like click or not).
Goal:
– To predict the rating a user would give to an item.
Key Steps of Recommendation
1. Gather known ratings
2. Extrapolate unknown ratings from the known ones
Mainly interested in high unknown ratings
3. Evaluate extrapolation methods
Mainstream Methods
• Content-based recommendation
• Collaborative-based recommendation
Memory-based methods
User-oriented methods
Item-oriented methods
Model-based methods
Latent-factor-based methods
• Hybrid recommendation
Evaluation
Offline 离线评价,数据切分成训练集和测试集
User Survey
Online
Evaluation Metrics
Compare predictions with known ratings 类似回归问题的评价指标
Root-mean-square error (RMSE)
Precision at Top 10:
% of those in Top 10 (10个中预测中了几个)
Rank Correlation:
Spearman’s correlation between system’s and user’s complete rankings
Another approach: 0/1 model (click or not) 类似分类问题的评价指标
Coverage: 一般能达到100%
Number of items/users for which system can make predictions
Precision:
Accuracy of predictions (predictions &recall)
Receiver Operating Characteristic (ROC)
Tradeoff curve between false positives and false negatives
More Thoughts on Evaluation
Narrow focus on accuracy sometimes misses the point
Prediction Coverage 小众的也要推,对于平台发展有好处
Prediction Diversity 要看用户对于新鲜事物的态度,抓住尝鲜动机强的用户
Prediction Novelty
Real-time
Robustness 海量数据时要求稳健
In practice, we care only to predict high ratings:
RMSE might penalize a method that does well for high ratings and badly for others (RMSE 在大规模/大距离数据时表现不好)
Representative Recommendation Techniques
Content-based Recommendation
Main idea:
Recommend items to customer i similar to previous items rated highly by i
Item & User Profile
For each item, create an item profile
Profile is a set (vector) of features
Movie: author, title, actor, director,…
Text: the set of “important” words in document
User profile possibilities:
Weighted average of rated item profiles
Variation: weight by difference from average rating for item
Prediction
A&D
Collaborative-based Recommendation 协同过滤
Basic Assumption
Those who agreed in the past tend to agree again in the future
Two Mainstreams
1.Memory-based Methods
2.Model-based Methods
Cluster users and then recommend items the users in the cluster closest to the active user like.
Mine association rules and then use the rules to recommend items.
Learn a latent factor model from the data and then use the discovered factors to find items with high expected ratings.
-
优化问题
-
Overfitting:加入正则项
-
Stochastic Gradient Descent -
GD v.s. SGD
GD 迭代次数少,每个步骤慢,光滑下降
SGD 迭代次数多,每个步骤快,波动下降
整体算出来的性能差不多
-
Extending LFM to Include Biases
考虑固定效应
考虑时间效应
Extended Applications
Social Recommendation
Jointly factorize the rating matrix and social matrix
A missing rating for a user is predicted as a linear combination of ratings of the user and her social relations
写在最后
-
点击 阅读全文
可查看思维导图。 -
如果你觉得文章还不错,欢迎点亮 在看
,感谢你的支持~ -
关注公众号,收取更多学习笔记。 -
思维导图编辑软件: 幕布
,扫描以下二维码获取30天高级会员
以上是关于数据挖掘笔记第六节 推荐系统的主要内容,如果未能解决你的问题,请参考以下文章