将数据框转换为元组列表字典
Posted
技术标签:
【中文标题】将数据框转换为元组列表字典【英文标题】:Convert dataframe to dictionary of list of tuples 【发布时间】:2018-02-03 14:14:45 【问题描述】:我有一个如下所示的数据框
user item \
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 1
3 1
4 1
并想实现如下结构:
dict-> list of tuples
user-> (item, rating)
b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack
Johnson, 1), ... , )
我能做到:
item_set = dict((user, set(items)) for user, items in \
data.groupby('user')['item'])
但这只会让我半途而废。如何从 groupby 中获取相应的“评分”值?
【问题讨论】:
【参考方案1】:将user
设置为索引,使用df.apply
转换为元组,使用df.groupby(level=0)
进行分组索引,使用dfGroupBy.agg
获取列表并使用df.to_dict
转换为字典:
In [1417]: df
Out[1417]:
user item \
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 2
3 2
4 2
In [1418]: df.set_index('user').apply(tuple, 1)\
.groupby(level=0).agg(lambda x: list(x.values))\
.to_dict()
Out[1418]:
'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1),
('Entre Dos Aguas - Paco De Lucia', 2),
('Stronger - Kanye West', 2),
('Constellations - Jack Johnson', 2),
('Learn To Fly - Foo Fighters', 2)]
【讨论】:
正是我想要达到的目标。谢谢??以上是关于将数据框转换为元组列表字典的主要内容,如果未能解决你的问题,请参考以下文章