将熊猫数据框转换为字典

Posted 2023-03-11

技术标签:

【中文标题】将熊猫数据框转换为字典【英文标题】：convert a pandas dataframe to dictionary 【发布时间】：2018-07-14 06:33:09 【问题描述】：

我有一个如下的熊猫数据框：

df=pd.DataFrame('a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0])
df

看起来像

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

我想把它转换成字典，这样我就能得到：

red     d
yellow  c
blue    b

如果数据集很大，请避免使用任何迭代方法。我还没有想出解决办法。任何帮助表示赞赏。

【问题讨论】：

Convert a Pandas DataFrame to a dictionary的可能重复 pandas.pydata.org/pandas-docs/stable/generated/… 对您的数据进行子集化，然后执行to_dict，这是通过pandas 提供的现成可用的连续两个1可以吗？ @tai : 一行中只有一个 1 【参考方案1】：

首先，如果你真的想把它转换成字典，把你想要的作为键的值转换成DataFrame的索引会更好一点：

df.set_index('a', inplace=True)

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

您的数据似乎采用“单热”编码。您首先必须使用the method detailed here 来扭转它：

series = df.idxmax(axis=1)

这看起来像：

a
red       d
yellow    c
blue      b
dtype: object

快到了！现在并在“值”列上使用to_dict（这是设置列a 作为索引的地方）：

series.to_dict()

这看起来像：

'blue': 'b', 'red': 'd', 'yellow': 'c'

我认为这是您正在寻找的。作为单行：

df.set_index('a').idxmax(axis=1).to_dict()

【讨论】：

很好的解释。我喜欢你采取的简单步骤【参考方案2】：

你可以试试这个。

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

【讨论】：

【参考方案3】：

这里需要dot 和zip

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: 'blue': 'b', 'red': 'd', 'yellow': 'c'

【讨论】：

或许只是df.set_index('a').dot(df.columns[1:]).to_dict()【参考方案4】：

希望这可行：

import pandas as pd
df=pd.DataFrame('a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0])

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

输出：

0: 'a': 'red', 'e': 'd', 1: 'a': 'yellow', 'e': 'c', 2: 'a': 'blue', 'e': 'b'

【讨论】：

是的，我使用的是 python 2.7 标记为3.x。输出看起来不像 OP 想要的。看来，我忘了使用轴列。我也检查了它的python3，工作正常。 @bhushan，感谢您的回答，但输出不正确..我想要不同的格式【参考方案5】：

您可以将dataframe 转换为dict，使用pandas to_dict 和list 作为参数。然后遍历生成的dict 并获取值为1 的列标签。

>>> k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()
>>> 'yellow': 'c', 'blue': 'b', 'red': 'd'

【讨论】：

感谢您的解决方案，但它是一个迭代的解决方案，对于我的大型数据集来说很慢。【参考方案6】：

将 a 列设置为索引，然后查看 df 的行找到值 1 的索引，然后使用 to_dict 将结果系列转换为字典

这里是代码

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

或者将索引设置为a，然后使用argmax查找每行中最大值的索引，然后使用to_dict转换为字典

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

在这两种情况下，结果都是

'blue': 'b', 'red': 'd', 'yellow': 'c'

附言。我使用 apply 通过设置 axis=1 来遍历 df 的行

【讨论】：

以上是关于将熊猫数据框转换为字典的主要内容，如果未能解决你的问题，请参考以下文章