从两个熊猫数据框创建一个python字典
Posted
技术标签:
【中文标题】从两个熊猫数据框创建一个python字典【英文标题】:creating a python dictionary from two pandas dataframe 【发布时间】:2021-11-11 03:16:05 【问题描述】:我正在尝试从两个 pandas 数据帧创建一个字典,以下是假设保存键的数据帧的快照:
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000007.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000012.jpg
以下数据帧快照是字典的值:
324,339,263,211,9
253,372,165,264,9
67,374,5,244,9
295,299,241,194,9
所以我想将每两行作为键和值附加到一个字典中 这是我尝试过的:
import pandas as pd
import numpy as np
image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')
image_anno_dict=
for image_file, row in zip(image_files,file.iterrows()):
image_anno_dict[image_file]=np.array(row)
我的预期输出:
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [324,339,263,211,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [253,372,165,264,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [67,374,5,244,9]
.
.
.
但是代码只适用于第一行,有什么解决方案的建议吗?
打印(image_files.head(5)):
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
0 C:/Users/Yaman/PycharmProjects/Mindsporeprojec...
1 C:/Users/Yaman/PycharmProjects/Mindsporeprojec...
2 C:/Users/Yaman/PycharmProjects/Mindsporeprojec...
3 C:/Users/Yaman/PycharmProjects/Mindsporeprojec...
4 C:/Users/Yaman/PycharmProjects/Mindsporeprojec...
打印(file.head(5)):
0 1 2 3 4
0 324 339 263 211 9
1 253 372 165 264 9
2 67 374 5 244 9
3 295 299 241 194 9
4 312 220 277 186 9
【问题讨论】:
请立即查看,谢谢 预期输出,因为字典始终具有唯一键。 哦,明白了,有什么办法可以解决这个问题吗? 可以使用元组列表来代替dict。 这是因为我在一张图片中有多个对象,所以我必须多次重复同一张图片。 【参考方案1】:您可以使用 pandas Series 组合两个数据帧,然后通过调用 to_dict 方法进行转换。这里是working sample code
import pandas as pd
df1 = pd.DataFrame('df1Keys':['ab','bc','c','df','efg'])
df2 = pd.DataFrame('df2Vlues':[1,25,3,84,545])
#method 1
print(pd.Series(df2.df2Vlues.values,index=df1.df1Keys).to_dict())
#method 2
print(dict(zip(df1.df1Keys,df2.df2Vlues)))
【讨论】:
【参考方案2】:import pandas as pd
import numpy as np
image_files = pd.read_csv('image_files.csv', header=None)
file = pd.read_csv('Training_dataset.csv')
image_anno_list = list(zip(image_files[0], file.apply(np.array, axis=1)))
输出:
>>> image_anno_list
[('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
array([324, 339, 263, 211, 9])),
('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
array([253, 372, 165, 264, 9])),
('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
array([ 67, 374, 5, 244, 9])),
('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
array([295, 299, 241, 194, 9])),
('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
array([312, 220, 277, 186, 9]))]
如果你使用字典,你会得到这个:
image_anno_dict = dict(zip(image_files[0], file.apply(np.array, axis=1)))
>>> image_anno_dict
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg':
array([312, 220, 277, 186, 9])
【讨论】:
【参考方案3】:您可以使用collections.defaultdict
和list
默认创建dictionary
,如下所示:
from collections import defaultdict
import pandas as pd
import numpy as np
image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')
image_anno_dict=defaultdict(list)
for image_file, row in zip(image_files,file.iterrows()):
image_anno_dict[image_file].append(np.array(row))
输出:
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg' :
[
[324,339,263,211,9], [253,372,165,264,9] , [67,374,5,244,9], ...
]
,
...
,
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg' :
[
[253,372,165,264,9] , [67,374,5,244,9], ...
],
...
【讨论】:
以上是关于从两个熊猫数据框创建一个python字典的主要内容,如果未能解决你的问题,请参考以下文章