如何找到 DataFrame 行的所有组合?

Posted

技术标签:

【中文标题】如何找到 DataFrame 行的所有组合?【英文标题】:How to find all combinations of DataFrame rows? 【发布时间】:2020-12-24 13:02:44 【问题描述】:

如果这个问题与本论坛中其他人提出的问题相似,我很抱歉,但我找不到足够相似的问题。我有一个包含 9 列和 3 行的 df,我想找到这些行之间的所有可能组合。我曾尝试使用 itertools 包中的组合,但我似乎无法使其工作。 我想要的输出将是所有可能组合的列表。谢谢,如果它与其他问题相似,我们很抱歉。

import pandas as pd
from itertools import combinations

df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main2": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main3": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main4": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main5": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main6": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main7": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main8": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main9": ["Outcome1", "Outcome2", "Outcome3"])

    Main1   Main2   Main3   Main4   Main5   Main6   Main7   Main8   Main9
0   Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1
1   Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2
2   Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3

all_combinations = list(combinations(df1, 3))

编辑:较小的样本和所需的输出:

df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"]) 

想要的输出是这样的:

[["Outcome1","Outcome1"], ["Outcome1","Outcome2"], ["Outcome1","Outcome3"], ["Outcome2","Outcome1"], ["Outcome2","Outcome2"], ["Outcome2","Outcome3"], ["Outcome3","Outcome1"], ["Outcome3","Outcome2"], ["Outcome3","Outcome3"]] 

【问题讨论】:

你的预期输出是什么? 嗨!所有结果组合的列表。示例:第一个组合将仅是每行中的 Outcome1,第二个组合将仅是每行中的 Outcome2,第三个组合将是第一行中的 Outcome2 和每隔一行中的 Outcome1 等等。抱歉不清楚 也许发布一个较小的样本并显示该样本的结果。 好主意。 @RichieV 我用更小的样本和示例输出更新了帖子。谢谢! 谢谢,您似乎希望每个输出在您的 df 中包含与列一样多的项目...但在您的第一个示例中,您试图获得 3 个项目的输出(与 df 中的行数一样多) ……这是什么?或者,如果您发布数据集中的真实样本,可能会更清楚。 【参考方案1】:

您正在寻找列表自身的笛卡尔积。

from itertools import product

options = ['Outcome1', 'Outcome2', 'Outcome3']

result = product(options, options)
print(*result, sep='\n')

输出

('Outcome1', 'Outcome1')
('Outcome1', 'Outcome2')
('Outcome1', 'Outcome3')
('Outcome2', 'Outcome1')
('Outcome2', 'Outcome2')
('Outcome2', 'Outcome3')
('Outcome3', 'Outcome1')
('Outcome3', 'Outcome2')
('Outcome3', 'Outcome3')

【讨论】:

【参考方案2】:

使用列表理解

>>> [[i,j] for i in df1.Main1 for j in df1.Main2]
[['Outcome1', 'Outcome1'], ['Outcome1', 'Outcome2'], ['Outcome1', 'Outcome3'], [
'Outcome2', 'Outcome1'], ['Outcome2', 'Outcome2'], ['Outcome2', 'Outcome3'], ['O
utcome3', 'Outcome1'], ['Outcome3', 'Outcome2'], ['Outcome3', 'Outcome3']]

【讨论】:

【参考方案3】:

使用 itertools 产品

对于较小的数据框

import pandas as pd
from itertools import product

# Define dataframe
df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"]) 

# Take product of row values
# Once transposed, all the columns are the rows are the same
# We take the value of first row, and repeat to get the desired product
all_combinations = list(product(np.transpose(df1.values)[0], repeat=2))

# Show result
from pprint import pprint as pp
pp(all_combinations)

输出

[('Outcome1', 'Outcome1'),
 ('Outcome1', 'Outcome2'),
 ('Outcome1', 'Outcome3'),
 ('Outcome2', 'Outcome1'),
 ('Outcome2', 'Outcome2'),
 ('Outcome2', 'Outcome3'),
 ('Outcome3', 'Outcome1'),
 ('Outcome3', 'Outcome2'),
 ('Outcome3', 'Outcome3')]

对于原始数据帧

df1 = pd.DataFrame("Main1": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main2": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main3": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main4": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main5": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main6": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main7": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main8": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main9": ["Outcome1", "Outcome2", "Outcome3"])
all_combinations = list(product(np.transpose(df1.values)[0], repeat=3))

pp(all_combinations)

输出

[('Outcome1', 'Outcome1', 'Outcome1'),
 ('Outcome1', 'Outcome1', 'Outcome2'),
 ('Outcome1', 'Outcome1', 'Outcome3'),
 ('Outcome1', 'Outcome2', 'Outcome1'),
 ('Outcome1', 'Outcome2', 'Outcome2'),
 ('Outcome1', 'Outcome2', 'Outcome3'),
 ('Outcome1', 'Outcome3', 'Outcome1'),
 ('Outcome1', 'Outcome3', 'Outcome2'),
 ('Outcome1', 'Outcome3', 'Outcome3'),
 ('Outcome2', 'Outcome1', 'Outcome1'),
 ('Outcome2', 'Outcome1', 'Outcome2'),
 ('Outcome2', 'Outcome1', 'Outcome3'),
 ('Outcome2', 'Outcome2', 'Outcome1'),
 ('Outcome2', 'Outcome2', 'Outcome2'),
 ('Outcome2', 'Outcome2', 'Outcome3'),
 ('Outcome2', 'Outcome3', 'Outcome1'),
 ('Outcome2', 'Outcome3', 'Outcome2'),
 ('Outcome2', 'Outcome3', 'Outcome3'),
 ('Outcome3', 'Outcome1', 'Outcome1'),
 ('Outcome3', 'Outcome1', 'Outcome2'),
 ('Outcome3', 'Outcome1', 'Outcome3'),
 ('Outcome3', 'Outcome2', 'Outcome1'),
 ('Outcome3', 'Outcome2', 'Outcome2'),
 ('Outcome3', 'Outcome2', 'Outcome3'),
 ('Outcome3', 'Outcome3', 'Outcome1'),
 ('Outcome3', 'Outcome3', 'Outcome2'),
 ('Outcome3', 'Outcome3', 'Outcome3')]
​

【讨论】:

以上是关于如何找到 DataFrame 行的所有组合?的主要内容,如果未能解决你的问题,请参考以下文章

Pandas:所有可能的行组合

在 pandas 中删除 nan 行的更好方法

pandas dataframe:如何根据列的值聚合行的子集

如何遍历 .dat 文件并将每组行的特定列附加到数组

如何显示多个 DataFrame,如 subplot [重复]

如何显示多个 DataFrame,如 subplot [重复]