根据列中的一组查找最大值行并在熊猫中进行透视

Posted 2023-03-11

技术标签:

【中文标题】根据列中的一组查找最大值行并在熊猫中进行透视【英文标题】：Find largest value rows based on one group in column and pivot in pandas 【发布时间】：2018-12-13 06:17:06 【问题描述】：

我有以下熊猫数据框：

id  val city    
4   78  a   
4   12  b   
4   50  c   

9   20  d   
9   8   e   
9   30  f   
9   17  g

我想把它转换成下面的形状。在每个“id”组中，根据“val”获取最大的行（在本例中为 n=2）。例如78 和 50 组 id 4 和 30 和 20 组 id 9

id  val city    
4   78  a   
4   50  c   

9   30  f   
9   20  d

最后，如下旋转表格：

id  c_1stLrgst  c_1Lrgst_val    c_2ndLrgst  c_2Lrgst_val...c_nLrgst c_nLrgst_val
4   a           78              c           50
9   f           30              d           20

我可以使用df.groupby('id').nlargest(2, 'val') 来获取这些组。不知道下一步该做什么。

import pandas as pd
df_dict = 'id': [4,4,4,9,9,9,9],
            'val':[78,12,50,20,8,30,17],    
            'city':['a', 'b', 'c', 'd', 'e', 'f', 'g'], 
            ;
df = pd.DataFrame(df_dict);

【问题讨论】：

【参考方案1】：

您可以使用sort_values + groupby.head，然后再使用另一个groupby 到list。然后拆分列表并连接。

# sort by "val" descending and extract first 2 rows from each group
df_filtered = df.sort_values('val', ascending=False)\
                .groupby('id').head(2)

groupvars = ['city', 'val']

# groupby city and val
g = df_filtered.groupby('id')[groupvars].agg(list)

# split lists and create dataframe for each group key
L = [pd.DataFrame(g[x].values.tolist(), index=res.index).add_prefix(x) for x in groupvars]

# concatenate results
res = pd.concat(L, axis=1)

print(res)

   city0 city1  val0  val1
id                        
4      a     c    78    50
9      f     d    30    20

【讨论】：

第一部分效果很好！。但是，我需要更好地解释枢轴。我不需要保护国家。编辑了问题。我只需要包含前 n 个值的 2 列，因此一列用于城市名称，一列用于城市值如果我使用城市作为列，这意味着我将获得与城市一样多的列，而不是获得城市 1 和城市 2 @orak，请查看更新，希望这更接近您正在寻找的内容。

以上是关于根据列中的一组查找最大值行并在熊猫中进行透视的主要内容，如果未能解决你的问题，请参考以下文章