Python Pandas:如何在数据框的列中拆分已排序的字典
id asn orgs
0 3320 'Deutsche Telekom AG': 2288
1 47886 'Joyent': 16, 'Equinix (Netherlands) B.V.': 7
2 47601 'fusion services': 1024, 'GCE Global Maritime':16859
3 33438 'Highwinds Network Group': 893
我想对实际上是字典的“orgs”列进行排序,然后提取在两个不同列中具有最高值的 pair(k,v)。像这样:
id asn org value
0 3320 'Deutsche Telekom AG' 2288
1 47886 'Joyent' 16
2 47601 'GCE Global Maritime' 16859
3 33438 'Highwinds Network Group' 893
df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True))
id asn orgs
0 3320 [('Deutsche Telekom AG', 2288)]
1 47886 [('Joyent', 16),( 'Equinix (Netherlands) B.V.', 7)]
2 47601 [('GCE Global Maritime',16859),('fusion services', 1024)]
3 33438 [('Highwinds Network Group', 893)]
嗯,你要求的只是最大值,排序有点无关紧要,不是吗? @EdChum 否,因为我希望将键和值都放在具有最大值的对的单独列中。 【参考方案1】:另一种方法定义一个函数,该函数只在 dict 上调用 min
并返回一个系列,以便您可以分配给多个列(函数体取自 @Alex Martelli's answer):
In [17]:
def func(x):
k = min(x, key=x.get)
return pd.Series([k, x[k]])
df[['orgs', 'value']] = df['orgs'].apply(func)
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Equinix (Netherlands) B.V. 7
2 47601 2 fusion services 1024
3 33438 3 Highwinds Network Group 893
如果您的数据有空 dicss,那么您可以测试 len
In [34]:
df = pd.DataFrame('id':[0,1,2,3,4],
'orgs':['Deutsche Telekom AG': 2288,
'Joyent': 16, 'Equinix (Netherlands) B.V.': 7,
'fusion services': 1024, 'GCE Global Maritime':16859,
'Highwinds Network Group': 893,])
asn id orgs
0 3320 0 'Deutsche Telekom AG': 2288
1 47886 1 'Equinix (Netherlands) B.V.': 7, 'Joyent': 16
2 47601 2 'GCE Global Maritime': 16859, 'fusion service...
3 33438 3 'Highwinds Network Group': 893
4 56 4
In [36]:
def func(x):
if len(x) > 0:
k = min(x, key=x.get)
return pd.Series([k, x[k]])
return pd.Series([np.NaN, np.NaN])
df[['orgs', 'value']] = df['orgs'].apply(func)
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Equinix (Netherlands) B.V. 7
2 47601 2 fusion services 1024
3 33438 3 Highwinds Network Group 893
4 56 4 NaN NaN
感谢 EdChum。我收到此错误:ValueError: min() arg is an empty sequence,我猜是因为我也有一些空单元格。如何针对此异常进行修改? 您可以测试该值是否为空或包装一个try catch,我会更新我的答案 是空的还是NaN
使用 try catch 为空
In [1]: import pandas as pd
In [2]: import operator
In [3]: df = pd.DataFrame( 'id' : [0,1,2,3],
...: 'asn' : [3320, 47886, 47601, 33438],
...: 'orgs' : ['Deutsche Telekom AG': 2288, 'Joyent': 16, 'Equinix (Netherlands) B.V.': 7, 'fusion services': 1024, 'GCE Global Maritime':16859, 'Highwinds Network Group': 893]
...: )
In [4]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0]))
In [5]: df
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Joyent 16
2 47601 2 GCE Global Maritime 16859
3 33438 3 Highwinds Network Group 893
我使用zip(* <first element of sorted dict items>)
In [3]: df = pd.DataFrame( 'id' : [0,1,2,3],
...: 'asn' : [3320, 47886, 47601, 33438],
...: 'orgs' : ['Deutsche Telekom AG': 2288, 'Joyent': 16, 'Equinix (Netherlands) B.V.': 7, 'fusion services': 1024, 'GCE Global Maritime':16859, ]
...: )
In [4]: df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('',''))
0 (Deutsche Telekom AG, 2288)
1 (Joyent, 16)
2 (GCE Global Maritime, 16859)
3 (, )
Name: orgs, dtype: object
In [5]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('','')))
In [6]: df
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Joyent 16
2 47601 2 GCE Global Maritime 16859
3 33438 3
