组内的 Python 排列
Posted
技术标签:
【中文标题】组内的 Python 排列【英文标题】:Python permutation within a group 【发布时间】:2019-12-10 14:20:47 【问题描述】:我正在尝试使用 itertools 查找组内列表的所有可能组合。
itertools.combinations(iterable, r)
例如,我有一个 CSV 文件,其中包含:
customerID,storeID
C1,S1
C1,S2
C1,S3
C2,S1
C2,S2
C2,S4
C2,S5
我所追求的输出是每个客户可以拥有的所有可能的 storeID 组合。例如,
C1, S1, S2
C1, S1, S3
C1, S2, S3
C2, S1, S2
C2, S1, S4
C2, S1, S5
C2, S2, S4
C2, S2, S5
C2, S4, S5
我可以轻松获得 storeID 的整个组合,但不太确定如何仅在组内进行。
【问题讨论】:
【参考方案1】:您的 csv 似乎已经排序。如果是这种情况,您可以使用itertools.groupby
来抓取按第一列分组的元素:
import csv
from itertools import combinations, groupby
from operator import itemgetter
with open('myfile.csv') as fh:
# skip header
_ = next(fh)
reader = csv.reader(fh)
# itemgetter(0) will grab the first element as the grouping key
for k, v in groupby(reader, key=itemgetter(0)):
chunk = [item[1] for item in v]
group = list(combinations(chunk, 2))
print(k, group)
C1 [('S1 ', 'S2 '), ('S1 ', 'S3 '), ('S2 ', 'S3 ')]
C2 [('S1 ', 'S2 '), ('S1 ', 'S4 '), ('S1 ', 'S5'), ('S2 ', 'S4 '), ('S2 ', 'S5'), ('S4 ', 'S5')]
如果它未排序,您仍然可以这样做,但使用defaultdict
来保存您的条目:
from collections import defaultdict
from itertools import groupby, combinations
from operator import itemgetter
import csv
groups = defaultdict(list)
with open('myfile.csv') as fh:
# skip header
_ = next(fh)
reader = csv.reader(fh)
# itemgetter(0) will grab the first element as the grouping key
for k, v in groupby(reader, key=itemgetter(0)):
chunk = [item[1] for item in v]
group = list(combinations(chunk, 2))
groups[k].extend(group)
defaultdict(<class 'list'>, 'C1': [('S1 ', 'S2 '), ('S1 ', 'S3 '), ('S2 ', 'S3 ')], 'C2': [('S1 ', 'S2 '), ('S1 ', 'S4 '), ('S1 ', 'S5'), ('S2 ', 'S4 '), ('S2 ', 'S5'), ('S4 ', 'S5')])
【讨论】:
【参考方案2】:这是使用 pandas 解决此问题的一种方法
import itertools
import pandas as pd
df = pd.DataFrame('customerID':['C1','C1', 'C1', 'C2', 'C2', 'C2', 'C2'], 'storeID': ['S1','S2','S3','S1','S2','S4','S5'])
output_df = pd.DataFrame()
for i in range( len(set(df['customerID']))):
iter_df = pd.DataFrame(columns = ['customerID', 'store1', 'store2'])
customerID = list(set(df['customerID']))[i]
#get subset of stores for this customer
temp_df = df[df['customerID'] == customerID]
#stores of interest
stores = list(set(temp_df['storeID']))
for item in itertools.combinations(stores, r=2):
iter_df.loc[len(iter_df)] = [customerID, item[0], item[1]]
output_df = pd.concat([output_df, iter_df])
output_df = output_df.sort_values(by = ['customerID'])
您将遍历数据框并每次对其进行子集化,并为每个子集创建组合
【讨论】:
以上是关于组内的 Python 排列的主要内容,如果未能解决你的问题,请参考以下文章