合并循环中创建的数据(python)
Posted
技术标签:
【中文标题】合并循环中创建的数据(python)【英文标题】:Merge the data created in a loop (python) 【发布时间】:2021-06-07 22:11:14 【问题描述】:我有一个简单的数据集:
import pandas as pd
data = [['A', 10,16], ['B', 15,11], ['C', 14,8]]
df = pd.DataFrame(data, columns = ['Name', 'Apple','Pear'])
Output
Name Apple Pear
0 A 10 16
1 B 15 11
2 C 14 8
我想对不同水果的数量进行排名 - 苹果和梨。规则:
-
确定苹果和梨每个地方的区别
按位置排列差异。数量越近的两个地方排名越低
# apple
dif = abs(df['Apple'].values - df['Apple'].values[:, None])
df_apple = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
df_apple1 = pd.melt(df_apple, id_vars = ['Name'], value_name='Difference_apple')
df_apple1 = df_apple1[df_apple1.Difference_apple != 0]
df_apple1['Ranking_apple'] = df_apple1.groupby('variable')['Difference_apple'].rank(method = 'dense', ascending = True)
df_apple1 = df_apple1[["variable","Name","Ranking_apple"]]
df_apple1
# Output - apple
variable Name Ranking_apple
1 A B 2.0
2 A C 1.0
3 B A 2.0
5 B C 1.0
6 C A 2.0
7 C B 1.0
# pear
dif = abs(df['Pear'].values - df['Pear'].values[:, None])
df_pear = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
df_pear1 = pd.melt(df_pear, id_vars = ['Name'], value_name='Difference_pear')
df_pear1 = df_pear1[df_pear1.Difference_pear != 0]
df_pear1['Ranking_pear'] = df_pear1.groupby('variable')['Difference_pear'].rank(method = 'dense', ascending = True)
df_pear1 = df_pear1[["variable","Name","Ranking_pear"]]
df_pear1
# output-pear
variable Name Ranking_pear
1 A B 1.0
2 A C 2.0
3 B A 2.0
5 B C 1.0
6 C A 2.0
7 C B 1.0
这是每个水果的算法。因为我使用相同的逻辑,所以我可以为每个水果创建一个循环。 我不确定如何合并这两部分,因为我需要最终输出如下所示:
new_df = pd.merge(df_apple1, df_pear1, how='inner', left_on=['variable','Name'], right_on = ['variable','Name'])
new_df = new_df[["variable","Name","Ranking_apple","Ranking_pear"]]
new_df
# output
variable Name Ranking_apple Ranking_pear
0 A B 2.0 1.0
1 A C 1.0 2.0
2 B A 2.0 2.0
3 B C 1.0 1.0
4 C A 2.0 2.0
5 C B 1.0 1.0
我很欣赏任何想法。谢谢
【问题讨论】:
有什么问题?似乎您有预期的输出。你只是想概括一下吗? 是的,我想为多列使用一种算法。谢谢 太好了,希望答案能满足您的需要。 【参考方案1】:如果您希望将您的方法推广到任意数量的水果,您可以执行以下操作:
data = [['A', 10,16], ['B', 15,11], ['C', 14,8]]
df = pd.DataFrame(data, columns = ['Name', 'Apple','Pear'])
# all fruit
final = pd.DataFrame()
fruitcols = df.columns.values.tolist()
fruitcols.remove('Name')
for col in fruitcols:
dif = abs(df[col].values - df[col].values[:, None])
diff_col = 'Difference_'.format(col)
rank_col = 'Ranking_'.format(col)
df_frt = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
df_frt1 = pd.melt(df_frt, id_vars = ['Name'], value_name=diff_col)
df_frt1 = df_frt1[df_frt1[diff_col] != 0]
df_frt1[rank_col] = df_frt1.groupby('variable')[diff_col].rank(method = 'dense', ascending = True)
df_frt1 = df_frt1[["variable","Name",rank_col]]
df_frt1
final = pd.concat([final, df_frt1], axis=1)
final.loc[:,~final.columns.duplicated()]
variable Name Ranking_Apple Ranking_Pear
1 A B 2.0 1.0
2 A C 1.0 2.0
3 B A 2.0 2.0
5 B C 1.0 1.0
6 C A 2.0 2.0
7 C B 1.0 1.0
【讨论】:
以上是关于合并循环中创建的数据(python)的主要内容,如果未能解决你的问题,请参考以下文章