数据挖掘笔记第六节 推荐系统

Posted 是坠坠啊

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据挖掘笔记第六节 推荐系统相关的知识,希望对你有一定的参考价值。

写在前面

  • 本文为《数据挖掘》课程推荐系统部分重要的知识点,为相对简略的知识点汇总。文末点击 阅读全文获取思维导图,后期复习的时候可以借助其快速地过知识点。
  • 教材:《数据挖掘导论》:Tan, Pang-Ning, et al. Introduction to data mining, 2018.

Basics of Recommendation Systems

Formal Model

U: the set of users

I: the set of items

U×I→R: the set of ratings

– Explicit 显式:Ask people to rate items (like 1-5 stars)

– Implicit 隐式:Learn from user behaviors (like click or not).

Goal:

– To predict the rating a user would give to an item.

Key Steps of Recommendation

1. Gather known ratings

2. Extrapolate unknown ratings from the known ones

  • Mainly interested in high unknown ratings

3. Evaluate extrapolation methods

Mainstream Methods

• Content-based recommendation

• Collaborative-based recommendation

  • Memory-based methods

    • User-oriented methods

    • Item-oriented methods

  • Model-based methods

    • Latent-factor-based methods

• Hybrid recommendation

Evaluation

Offline 离线评价,数据切分成训练集和测试集

User Survey

Online

Evaluation Metrics

  • Compare predictions with known ratings 类似回归问题的评价指标

  • Root-mean-square error (RMSE)

    • Precision at Top 10:

      • % of those in Top 10 (10个中预测中了几个)

    • Rank Correlation:

      • Spearman’s correlation between system’s and user’s complete rankings

  • Another approach: 0/1 model (click or not) 类似分类问题的评价指标

    • Coverage: 一般能达到100%

      • Number of items/users for which system can make predictions

    • Precision:

      • Accuracy of predictions (predictions &recall)

    • Receiver Operating Characteristic (ROC)

      • Tradeoff curve between false positives and false negatives

  • More Thoughts on Evaluation

    • Narrow focus on accuracy sometimes misses the point

      • Prediction Coverage 小众的也要推,对于平台发展有好处

      • Prediction Diversity 要看用户对于新鲜事物的态度,抓住尝鲜动机强的用户

      • Prediction Novelty

      • Real-time

      • Robustness 海量数据时要求稳健

    • In practice, we care only to predict high ratings:

      • RMSE might penalize a method that does well for high ratings and badly for others (RMSE 在大规模/大距离数据时表现不好)

Representative Recommendation Techniques

Content-based Recommendation

Main idea:

  • Recommend items to customer i similar to previous items rated highly by i

Item & User Profile

  • For each item, create an item profile

  • Profile is a set (vector) of features

    • Movie: author, title, actor, director,…

    • Text: the set of “important” words in document

  • User profile possibilities:

    • Weighted average of rated item profiles

    • Variation: weight by difference from average rating for item

Prediction

cosin similarity

A&D

【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统

Collaborative-based Recommendation 协同过滤

Basic Assumption

  • Those who agreed in the past tend to agree again in the future

Two Mainstreams

1.Memory-based Methods

【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
修正,纳入固定效应
【数据挖掘笔记】第六节 推荐系统

2.Model-based Methods

  • Cluster users and then recommend items the users in the cluster closest to the active user like.

  • Mine association rules and then use the rules to recommend items.

  • Learn a latent factor model from the data and then use the discovered factors to find items with high expected ratings.

【数据挖掘笔记】第六节 推荐系统
  • 优化问题
【数据挖掘笔记】第六节 推荐系统
  • Overfitting:加入正则项
【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
  • Stochastic Gradient Descent
  • GD v.s. SGD
  • GD 迭代次数少,每个步骤慢,光滑下降

  • SGD 迭代次数多,每个步骤快,波动下降

  • 整体算出来的性能差不多

  • Extending LFM to Include Biases
  • 考虑固定效应

【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
【数据挖掘笔记】第六节 推荐系统
  • 考虑时间效应

【数据挖掘笔记】第六节 推荐系统

Extended Applications

Social Recommendation

Jointly factorize the rating matrix and social matrix

【数据挖掘笔记】第六节 推荐系统

A missing rating for a user is predicted as a linear combination of ratings of the user and her social relations

【数据挖掘笔记】第六节 推荐系统

写在最后

  • 点击 阅读全文可查看思维导图。
  • 如果你觉得文章还不错,欢迎点亮 在看,感谢你的支持~
  • 关注公众号,收取更多学习笔记。
  • 思维导图编辑软件: 幕布,扫描以下二维码获取30天高级会员


以上是关于数据挖掘笔记第六节 推荐系统的主要内容,如果未能解决你的问题,请参考以下文章

网络层-第六节1:路由选择协议概述

网络层-第六节4:边界网关协议BGP的基本工作原理

网络层-第六节2:路由信息协议RIP的基本工作原理

网络层-第六节3:开放最短路径优先OSPF的基本工作原理

Oracle学习笔记之第六节网络介绍及配置,会话

第六节课课堂笔记