ML之CF：基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例

Posted 2023-03-23 一个处女座的程序猿

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了ML之CF：基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例相关的知识，希望对你有一定的参考价值。

基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例

# 1、定义数据集

# 3、模型训练与推理

# 3.1、切分数据集：将数据集分为训练集和测试集

# 3.2、文本数据集再处理

# 构建用户-电影评分矩阵

# 3.3、计算用户之间的相似度：余弦相似度

# 3.4、模型评估：计算准确率和召回率

# 3.5、模型推理：为用户1推荐电影

基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例

# 1、定义数据集

userId	movieId	rating	timestamp
1	1	4	964982703
1	3	4	964981247
1	6	4	964982224
1	47	5	964983815
1	50	5	964982931
1	70	3	964982400
1	101	5	964980868
1	110	4	964982176
1	151	5	964984041
1	157	5	964984100

movieId	title	genres
1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
2	Jumanji (1995)	Adventure\|Children\|Fantasy
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
5	Father of the Bride Part II (1995)	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller
11	American President, The (1995)	Comedy\|Drama\|Romance

        userId  movieId  rating   timestamp
0            1        1     4.0   964982703
1            1        3     4.0   964981247
2            1        6     4.0   964982224
3            1       47     5.0   964983815
4            1       50     5.0   964982931
...        ...      ...     ...         ...
100831     610   166534     4.0  1493848402
100832     610   168248     5.0  1493850091
100833     610   168250     5.0  1494273047
100834     610   168252     5.0  1493846352
100835     610   170875     3.0  1493846415

[100836 rows x 4 columns]

# 3、模型训练与推理

# 3.1、切分数据集：将数据集分为训练集和测试集

# 3.2、文本数据集再处理

# 构建用户-电影评分矩阵

train_matrix 
 movieId  1       2       3       4       ...  193583  193585  193587  193609
userId                                   ...                                
1           4.0     NaN     4.0     NaN  ...     NaN     NaN     NaN     NaN
2           NaN     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
3           NaN     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
4           NaN     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
5           NaN     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
...         ...     ...     ...     ...  ...     ...     ...     ...     ...
606         2.5     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
607         4.0     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
608         2.5     2.0     NaN     NaN  ...     NaN     NaN     NaN     NaN
609         3.0     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN
610         NaN     NaN     NaN     NaN  ...     NaN     NaN     NaN     NaN

[610 rows x 8975 columns]

# 3.3、计算用户之间的相似度：余弦相似度

user_similarity 
 userId 1   2   3   4   5   6   7   8   9    ... 602 603 604 605 606 607 608 609 610
userId                                      ...                                    
1        1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
2        1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
3        1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
4        1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
5        1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
...     ..  ..  ..  ..  ..  ..  ..  ..  ..  ...  ..  ..  ..  ..  ..  ..  ..  ..  ..
606      1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
607      1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
608      1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
609      1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1
610      1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1   1   1

[610 rows x 610 columns]

# 3.4、模型评估：计算准确率和召回率

     userId  movieId  rating
618     408   138036     5.0
123       1     2459     5.0
650     409     1234     5.0
162       1     3273     5.0
163       1     3386     5.0
precision: 0.026973684210526316
recall: 0.004065846886156287

# 3.5、模型推理：为用户1推荐电影

     userId  movieId  rating
618     408   138036     5.0
123       1     2459     5.0
650     409     1234     5.0
162       1     3273     5.0
163       1     3386     5.0
precision: 0.026973684210526316
recall: 0.004065846886156287
     userId  movieId  rating
460     405    32587     5.0
715     409     3814     5.0
286     410     3855     5.0
288     410     3910     5.0
487     406    56949     5.0

ML之KG：基于MovieLens电影评分数据集利用基于知识图谱的推荐算法(networkx+基于路径相似度的方法)实现对用户进行Top电影推荐案例

基于MovieLens电影评分数据集利用基于知识图谱的推荐算法(networkx+基于路径相似度的方法)实现对用户进行Top电影推荐案例

# 1、定义数据集

# 2、构建知识图谱

# 2.1、创建Graph

# 2.2、使用matplotlib和NetworkX绘制知识图谱

# T1、直接绘制

# T2、将网络转化为稀疏矩阵再绘制

# 3、基于知识图谱实现推理

基于MovieLens电影评分数据集利用基于知识图谱的推荐算法(networkx+基于路径相似度的方法)实现对用户进行Top电影推荐案例

# 1、定义数据集

userId	movieId	rating	timestamp
1	1	4	964982703
1	3	4	964981247
1	6	4	964982224
1	47	5	964983815
1	50	5	964982931
1	70	3	964982400
1	101	5	964980868
1	110	4	964982176
1	151	5	964984041
1	157	5	964984100

movieId	title	genres
1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
2	Jumanji (1995)	Adventure\|Children\|Fantasy
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
5	Father of the Bride Part II (1995)	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller
11	American President, The (1995)	Comedy\|Drama\|Romance

        userId  movieId  rating   timestamp
0            1        1     4.0   964982703
1            1        3     4.0   964981247
2            1        6     4.0   964982224
3            1       47     5.0   964983815
4            1       50     5.0   964982931
...        ...      ...     ...         ...
100831     610   166534     4.0  1493848402
100832     610   168248     5.0  1493850091
100833     610   168250     5.0  1494273047
100834     610   168252     5.0  1493846352
100835     610   170875     3.0  1493846415

[100836 rows x 4 columns]

# 2、构建知识图谱

# 2.1、创建Graph

graph 
 Graph with 9829 nodes and 100403 edges

# 2.2、使用matplotlib和NetworkX绘制知识图谱

# T1、直接绘制

# T2、将网络转化为稀疏矩阵再绘制

# 3、基于知识图谱实现推理

recommended_movies 
 Index(['movieId', 'score', 'title', 'genres'], dtype='object')
user_id= 1 
                          title                                       genres
0             Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy
1     Air Up There, The (1994)                                       Comedy
2          Pretty Woman (1990)                               Comedy|Romance
3  Natural Born Killers (1994)                        Action|Crime|Thriller
4         Total Eclipse (1995)                                Drama|Romance




recommended_movies 
 Index(['movieId', 'score', 'title', 'genres'], dtype='object')
user_id= 11 
                               title                            genres
0    American President, The (1995)              Comedy|Drama|Romance
1              Jurassic Park (1993)  Action|Adventure|Sci-Fi|Thriller
2        Vampire in Brooklyn (1995)             Comedy|Horror|Romance
3        In the Line of Fire (1993)                   Action|Thriller
4  Silence of the Lambs, The (1991)             Crime|Horror|Thriller

以上是关于ML之CF：基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例的主要内容，如果未能解决你的问题，请参考以下文章