如何找到 DataFrame 行的所有组合?
Posted
技术标签:
【中文标题】如何找到 DataFrame 行的所有组合?【英文标题】:How to find all combinations of DataFrame rows? 【发布时间】:2020-12-24 13:02:44 【问题描述】:如果这个问题与本论坛中其他人提出的问题相似,我很抱歉,但我找不到足够相似的问题。我有一个包含 9 列和 3 行的 df,我想找到这些行之间的所有可能组合。我曾尝试使用 itertools 包中的组合,但我似乎无法使其工作。 我想要的输出将是所有可能组合的列表。谢谢,如果它与其他问题相似,我们很抱歉。
import pandas as pd
from itertools import combinations
df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"],
"Main2": ["Outcome1", "Outcome2", "Outcome3"],
"Main3": ["Outcome1", "Outcome2", "Outcome3"],
"Main4": ["Outcome1", "Outcome2", "Outcome3"],
"Main5": ["Outcome1", "Outcome2", "Outcome3"],
"Main6": ["Outcome1", "Outcome2", "Outcome3"],
"Main7": ["Outcome1", "Outcome2", "Outcome3"],
"Main8": ["Outcome1", "Outcome2", "Outcome3"],
"Main9": ["Outcome1", "Outcome2", "Outcome3"])
Main1 Main2 Main3 Main4 Main5 Main6 Main7 Main8 Main9
0 Outcome1 Outcome1 Outcome1 Outcome1 Outcome1 Outcome1 Outcome1 Outcome1 Outcome1
1 Outcome2 Outcome2 Outcome2 Outcome2 Outcome2 Outcome2 Outcome2 Outcome2 Outcome2
2 Outcome3 Outcome3 Outcome3 Outcome3 Outcome3 Outcome3 Outcome3 Outcome3 Outcome3
all_combinations = list(combinations(df1, 3))
编辑:较小的样本和所需的输出:
df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"])
想要的输出是这样的:
[["Outcome1","Outcome1"], ["Outcome1","Outcome2"], ["Outcome1","Outcome3"], ["Outcome2","Outcome1"], ["Outcome2","Outcome2"], ["Outcome2","Outcome3"], ["Outcome3","Outcome1"], ["Outcome3","Outcome2"], ["Outcome3","Outcome3"]]
【问题讨论】:
你的预期输出是什么? 嗨!所有结果组合的列表。示例:第一个组合将仅是每行中的 Outcome1,第二个组合将仅是每行中的 Outcome2,第三个组合将是第一行中的 Outcome2 和每隔一行中的 Outcome1 等等。抱歉不清楚 也许发布一个较小的样本并显示该样本的结果。 好主意。 @RichieV 我用更小的样本和示例输出更新了帖子。谢谢! 谢谢,您似乎希望每个输出在您的 df 中包含与列一样多的项目...但在您的第一个示例中,您试图获得 3 个项目的输出(与 df 中的行数一样多) ……这是什么?或者,如果您发布数据集中的真实样本,可能会更清楚。 【参考方案1】:您正在寻找列表自身的笛卡尔积。
from itertools import product
options = ['Outcome1', 'Outcome2', 'Outcome3']
result = product(options, options)
print(*result, sep='\n')
输出
('Outcome1', 'Outcome1')
('Outcome1', 'Outcome2')
('Outcome1', 'Outcome3')
('Outcome2', 'Outcome1')
('Outcome2', 'Outcome2')
('Outcome2', 'Outcome3')
('Outcome3', 'Outcome1')
('Outcome3', 'Outcome2')
('Outcome3', 'Outcome3')
【讨论】:
【参考方案2】:使用列表理解
>>> [[i,j] for i in df1.Main1 for j in df1.Main2]
[['Outcome1', 'Outcome1'], ['Outcome1', 'Outcome2'], ['Outcome1', 'Outcome3'], [
'Outcome2', 'Outcome1'], ['Outcome2', 'Outcome2'], ['Outcome2', 'Outcome3'], ['O
utcome3', 'Outcome1'], ['Outcome3', 'Outcome2'], ['Outcome3', 'Outcome3']]
【讨论】:
【参考方案3】:使用 itertools 产品
对于较小的数据框
import pandas as pd
from itertools import product
# Define dataframe
df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"])
# Take product of row values
# Once transposed, all the columns are the rows are the same
# We take the value of first row, and repeat to get the desired product
all_combinations = list(product(np.transpose(df1.values)[0], repeat=2))
# Show result
from pprint import pprint as pp
pp(all_combinations)
输出
[('Outcome1', 'Outcome1'),
('Outcome1', 'Outcome2'),
('Outcome1', 'Outcome3'),
('Outcome2', 'Outcome1'),
('Outcome2', 'Outcome2'),
('Outcome2', 'Outcome3'),
('Outcome3', 'Outcome1'),
('Outcome3', 'Outcome2'),
('Outcome3', 'Outcome3')]
对于原始数据帧
df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"],
"Main2": ["Outcome1", "Outcome2", "Outcome3"],
"Main3": ["Outcome1", "Outcome2", "Outcome3"],
"Main4": ["Outcome1", "Outcome2", "Outcome3"],
"Main5": ["Outcome1", "Outcome2", "Outcome3"],
"Main6": ["Outcome1", "Outcome2", "Outcome3"],
"Main7": ["Outcome1", "Outcome2", "Outcome3"],
"Main8": ["Outcome1", "Outcome2", "Outcome3"],
"Main9": ["Outcome1", "Outcome2", "Outcome3"])
all_combinations = list(product(np.transpose(df1.values)[0], repeat=3))
pp(all_combinations)
输出
[('Outcome1', 'Outcome1', 'Outcome1'),
('Outcome1', 'Outcome1', 'Outcome2'),
('Outcome1', 'Outcome1', 'Outcome3'),
('Outcome1', 'Outcome2', 'Outcome1'),
('Outcome1', 'Outcome2', 'Outcome2'),
('Outcome1', 'Outcome2', 'Outcome3'),
('Outcome1', 'Outcome3', 'Outcome1'),
('Outcome1', 'Outcome3', 'Outcome2'),
('Outcome1', 'Outcome3', 'Outcome3'),
('Outcome2', 'Outcome1', 'Outcome1'),
('Outcome2', 'Outcome1', 'Outcome2'),
('Outcome2', 'Outcome1', 'Outcome3'),
('Outcome2', 'Outcome2', 'Outcome1'),
('Outcome2', 'Outcome2', 'Outcome2'),
('Outcome2', 'Outcome2', 'Outcome3'),
('Outcome2', 'Outcome3', 'Outcome1'),
('Outcome2', 'Outcome3', 'Outcome2'),
('Outcome2', 'Outcome3', 'Outcome3'),
('Outcome3', 'Outcome1', 'Outcome1'),
('Outcome3', 'Outcome1', 'Outcome2'),
('Outcome3', 'Outcome1', 'Outcome3'),
('Outcome3', 'Outcome2', 'Outcome1'),
('Outcome3', 'Outcome2', 'Outcome2'),
('Outcome3', 'Outcome2', 'Outcome3'),
('Outcome3', 'Outcome3', 'Outcome1'),
('Outcome3', 'Outcome3', 'Outcome2'),
('Outcome3', 'Outcome3', 'Outcome3')]
【讨论】:
以上是关于如何找到 DataFrame 行的所有组合?的主要内容,如果未能解决你的问题,请参考以下文章
pandas dataframe:如何根据列的值聚合行的子集