从两个熊猫数据框创建一个python字典

Posted

技术标签:

【中文标题】从两个熊猫数据框创建一个python字典【英文标题】:creating a python dictionary from two pandas dataframe 【发布时间】:2021-11-11 03:16:05 【问题描述】:

我正在尝试从两个 pandas 数据帧创建一个字典,以下是假设保存键的数据帧的快照:

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000007.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000012.jpg

以下数据帧快照是字典的值:

324,339,263,211,9
253,372,165,264,9
67,374,5,244,9
295,299,241,194,9

所以我想将每两行作为键和值附加到一个字典中 这是我尝试过的:

import pandas as pd
import numpy as np
image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')

image_anno_dict=

for image_file, row in zip(image_files,file.iterrows()):
    image_anno_dict[image_file]=np.array(row)

我的预期输出:

'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [324,339,263,211,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [253,372,165,264,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [67,374,5,244,9]
.
.
.

但是代码只适用于第一行,有什么解决方案的建议吗?

打印(image_files.head(5)):

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
0  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
1  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
2  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
3  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
4  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...

打印(file.head(5)):

     0    1    2    3  4
0  324  339  263  211  9
1  253  372  165  264  9
2   67  374    5  244  9
3  295  299  241  194  9
4  312  220  277  186  9

【问题讨论】:

请立即查看,谢谢 预期输出,因为字典始终具有唯一键。 哦,明白了,有什么办法可以解决这个问题吗? 可以使用元组列表来代替dict。 这是因为我在一张图片中有多个对象,所以我必须多次重复同一张图片。 【参考方案1】:

您可以使用 pandas Series 组合两个数据帧,然后通过调用 to_dict 方法进行转换。这里是working sample code

import pandas as pd

 
df1 = pd.DataFrame('df1Keys':['ab','bc','c','df','efg'])
df2 = pd.DataFrame('df2Vlues':[1,25,3,84,545])

#method 1
print(pd.Series(df2.df2Vlues.values,index=df1.df1Keys).to_dict())

#method 2
print(dict(zip(df1.df1Keys,df2.df2Vlues))) 

【讨论】:

【参考方案2】:
import pandas as pd
import numpy as np

image_files = pd.read_csv('image_files.csv', header=None)
file = pd.read_csv('Training_dataset.csv')

image_anno_list = list(zip(image_files[0], file.apply(np.array, axis=1)))

输出:

>>> image_anno_list

[('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([324, 339, 263, 211,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([253, 372, 165, 264,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([ 67, 374,   5, 244,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([295, 299, 241, 194,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([312, 220, 277, 186,   9]))]

如果你使用字典,你会得到这个:

image_anno_dict = dict(zip(image_files[0], file.apply(np.array, axis=1)))
>>> image_anno_dict

'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg':
 array([312, 220, 277, 186,   9])

【讨论】:

【参考方案3】:

您可以使用collections.defaultdictlist 默认创建dictionary,如下所示:

from collections import defaultdict
import pandas as pd
import numpy as np

image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')

image_anno_dict=defaultdict(list)

for image_file, row in zip(image_files,file.iterrows()):
    image_anno_dict[image_file].append(np.array(row))

输出:

'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg' :
 [
    [324,339,263,211,9], [253,372,165,264,9] , [67,374,5,244,9], ...
 ]
 ,
 ...
 , 
 'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg' : 
 [
     [253,372,165,264,9] , [67,374,5,244,9], ...
 ], 
 ...

【讨论】:

以上是关于从两个熊猫数据框创建一个python字典的主要内容,如果未能解决你的问题,请参考以下文章

从熊猫数据框创建嵌套字典

从字典创建熊猫数据框

如何使用熊猫从嵌套字典创建数据框?

如何从熊猫数据框中创建一个字典?

从嵌套字典列表中获取熊猫数据框

如何从字典列表中提取数据到熊猫数据框中?