将熊猫数据框转换为具有多个键的字典
Posted
技术标签:
【中文标题】将熊猫数据框转换为具有多个键的字典【英文标题】:convert pandas dataframe to dictionary with multiple keys 【发布时间】:2019-02-11 00:21:51 【问题描述】:我正在尝试将数据框转换为具有四个键的字典,这些键都来自列。我还有多个列,我想使用从这四列构建的键返回值。我使用循环的方式工作,但最终出现内存错误。我很好奇有没有更有效的方法呢?
数据框如下所示:
Service Bill Weight Zone Resi UPS FedEx USPS DHL
1DEA 1 2 N 33.02 9999 9999 9999
1DEA 2 2 N 33.02 9999 9999 9999
1DEA 3 2 N 33.02 9999 9999 9999
我希望每个运营商都有一个这样的密钥:
price[('1DEA', '1', '2', 'N', 'UPS')]=33.02
price[('1DEA', '1', '2', 'N', 'FedEx')]=9999
我试过这个:
price =
carriers = ['UPS', 'FedEx', 'USPS','DHL']
for carrier in carriers:
for row in rate_keys.to_dict('records'):
key = (row['Service'], row['Bill Weight'], row['Zone'],
row['Resi'], carrier)
rate_keys[key] = row[carrier]
【问题讨论】:
【参考方案1】:IIUC,具有这样的列表理解:
carriers = ['UPS', 'FedEx', 'USPS','DHL']
price = (row['Service'], row['Bill Weight'], row['Zone'], row['Resi'], c):row[c]
for c in carriers for _, row in df.iterrows()
[输出]
('1DEA', 1, 2, 'N', 'UPS'): 33.02,
('1DEA', 2, 2, 'N', 'UPS'): 33.02,
('1DEA', 3, 2, 'N', 'UPS'): 33.02,
('1DEA', 1, 2, 'N', 'FedEx'): 9999,
('1DEA', 2, 2, 'N', 'FedEx'): 9999,
('1DEA', 3, 2, 'N', 'FedEx'): 9999,
('1DEA', 1, 2, 'N', 'USPS'): 9999,
('1DEA', 2, 2, 'N', 'USPS'): 9999,
('1DEA', 3, 2, 'N', 'USPS'): 9999,
('1DEA', 1, 2, 'N', 'DHL'): 9999,
('1DEA', 2, 2, 'N', 'DHL'): 9999,
('1DEA', 3, 2, 'N', 'DHL'): 9999
【讨论】:
【参考方案2】:如果你这样做
df = df.set_index(['Service', 'Bill','Weight','Zone'])
你基本上有同样的东西
输出
print(df.loc[('1DEA', 1, 2, 'N')]['UPS'])
9999.0
【讨论】:
【参考方案3】:您可能不应该在循环时更新rate_keys
。我猜你的示例脚本的最后一行应该是
price[key] = row[carrier]
【讨论】:
【参考方案4】:首先,
temp = df.set_index(['Service', 'Bill', 'Weight', 'Zone']).to_dict()
然后,我们进行字典推导以获得所需的输出,
dict(((k+(i,)), a[i][k]) for i in temp for (k) in temp[i] )
【讨论】:
【参考方案5】:将索引设置为除载体列之外的所有索引,然后堆叠。
df.set_index(['Service', 'Bill Weight', 'Zone', 'Resi']).stack().to_dict()
('1DEA', 1, 2, 'N', 'DHL'): 9999.0,
('1DEA', 1, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 1, 2, 'N', 'UPS'): 33.02,
('1DEA', 1, 2, 'N', 'USPS'): 9999.0,
('1DEA', 2, 2, 'N', 'DHL'): 9999.0,
('1DEA', 2, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 2, 2, 'N', 'UPS'): 33.02,
('1DEA', 2, 2, 'N', 'USPS'): 9999.0,
('1DEA', 3, 2, 'N', 'DHL'): 9999.0,
('1DEA', 3, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 3, 2, 'N', 'UPS'): 33.02,
('1DEA', 3, 2, 'N', 'USPS'): 9999.0
理解
(*r[:4], c): v for r in df.values for c, v in zip(df.columns[4:], r[4:])
('1DEA', 1, 2, 'N', 'DHL'): 9999,
('1DEA', 1, 2, 'N', 'FedEx'): 9999,
('1DEA', 1, 2, 'N', 'UPS'): 33.02,
('1DEA', 1, 2, 'N', 'USPS'): 9999,
('1DEA', 2, 2, 'N', 'DHL'): 9999,
('1DEA', 2, 2, 'N', 'FedEx'): 9999,
('1DEA', 2, 2, 'N', 'UPS'): 33.02,
('1DEA', 2, 2, 'N', 'USPS'): 9999,
('1DEA', 3, 2, 'N', 'DHL'): 9999,
('1DEA', 3, 2, 'N', 'FedEx'): 9999,
('1DEA', 3, 2, 'N', 'UPS'): 33.02,
('1DEA', 3, 2, 'N', 'USPS'): 9999
【讨论】:
以上是关于将熊猫数据框转换为具有多个键的字典的主要内容,如果未能解决你的问题,请参考以下文章