动态分配唯一值 - Python
Posted
技术标签:
【中文标题】动态分配唯一值 - Python【英文标题】:Dynamic assignment of unique values - Python 【发布时间】:2020-02-13 19:13:28 【问题描述】:我正在尝试为特定分配或组分配唯一值。棘手的是这些独特的值是动态开始和结束的。因此,这些群体将保持先前看到的价值,并在不同的时间段采用新的独特价值。关于df
,唯一值位于Place
中,可供选择的组位于Available Group
中,每个时间段Period
。
我试图遵守的广泛准则是:
1) 每个Group
在任何时候都不能超过 3 个唯一的Places
2) 当前唯一的Places
应均匀分布在每个Group
中
3) 一旦Places
被分配给Group
,保持直到Group
完成。 除非Group
变为 NA 或会议分配不均
为了了解当前出现了多少Places
,我已包含Total
,它基于Place
值是否再次出现。我正在满足我的前两个准则和部分第三个准则。当 Place
分配给 Group
时,它会保持在相同的 Group
直到 Place
完成(不会再次出现)。
但是,我没有引用 Available Group
来了解 Group
是否可用。当Group
变得不可用时,我想将那些Places
重新排列在另一个Available Group
之间。使用下面的df
,可以很好地分配位置,从而增加独特的位置。但是一旦他们开始完成并且第 2 组变得不可用,这些位置就不会重新分配给第 1 组。此时只有 3 个位置出现。
df = pd.DataFrame(
'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6],
'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS'],
'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5],
'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1'],
)
尝试:
# df to store all unique places
uniquePlaces = pd.DataFrame(df["Place"].unique(), columns=["Place"])
# Start stores index of df where the place appears 1st
uniquePlaces["Start"] = -1
# End stores index of df where the place appears last
uniquePlaces["End"] = -1
def assign_place_label(group):
''' Create a label column that calculates the amount of unique meetings
throughout the racing schedule '''
label = uniquePlaces[uniquePlaces["Place"] == group.name].index[0]
group["Place Label"] = label
uniquePlaces.loc[label, "Start"] = group.index.min()
uniquePlaces.loc[label, "End"] = group.index.max()
return group
# Based on Start and End of each place, assign index to each place.
# when 'freed' the index is reused to new place appearing after that
def Assign_Meetings_group(up):
up["Index"] = 0
up["Freed"] = False
max_ind=0
free_indx = []
for i in range(len(up)):
ind_freed = up.index[(up["End"]<up.iloc[i]["Start"]) & (~up["Freed"])]
free = list(up.loc[ind_freed, "Index"])
free_indx += free
up.loc[ind_freed, "Freed"] = True
if len(free_indx)>0:
m = min(free_indx)
up.loc[i, "Index"] = m
free_indx.remove(m)
else:
up.loc[i, "Index"] = max_ind
max_ind+=1
up["Group"] = up["Index"]//3+1
return up
df2 = df.groupby("Place").apply(assign_place_label)
uniquePlaces = Assign_Meetings_group(uniquePlaces)
df3 = df2[df2['Period']!=0].drop_duplicates(subset = ['Period','Place'])
result = df3.merge(uniquePlaces[["Group"]], how="left", left_on="Place Label", right_index=True, sort=False)
输出:
Period Place Total Available Group Place Label Group
0 1 CLUB 1 1 0 1
1 2 CLUB 1 1 0 1
3 2 HOME 2 1 1 1
5 2 AWAY 3 1 2 1
7 3 WORK 4 1 3 2
9 3 AWAY 4 1 2 1
11 3 GOLF 5 1 4 2
13 3 HOME 5 1 1 1
15 4 CLUB 4 1 0 1
17 4 AWAY 3 1 2 1
19 4 POOL 3 1 5 1
21 5 WORK 3 1 3 2
23 5 POOL 2 1 5 1
25 6 GOLF 1 1 4 2
预期输出:
Period Place Total Available Group Place Label Group
0 1 CLUB 1 1 0 1
1 2 CLUB 1 1 0 1
3 2 HOME 2 1 1 1
5 2 AWAY 3 1 2 1
7 3 WORK 4 1 3 2
9 3 AWAY 4 1 2 1
11 3 GOLF 5 1 4 2
13 3 HOME 5 1 1 1
15 4 CLUB 4 1 0 1
17 4 AWAY 3 1 2 1
19 4 POOL 3 1 5 1
21 5 WORK 3 1 3 1
23 5 POOL 2 1 5 1
25 6 GOLF 1 1 4 1
【问题讨论】:
【参考方案1】:这是我的问题的解决方案,请在评论中找到详细信息
import pandas as pd
import numpy as np
df = pd.DataFrame(
'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6],
'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS'],
'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5],
'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1'],
)
# df to store all unique places
uniquePlaces = pd.DataFrame(df["Place"].unique(), columns=["Place"])
# Start stores index of df where the place appears 1st
uniquePlaces["Start"] = -1
# End stores index of df where the place appears last
uniquePlaces["End"] = -1
def assign_place_label(group):
''' Create a label column that calculates the amount of unique meetings
throughout the racing schedule '''
label = uniquePlaces[uniquePlaces["Place"] == group.name].index[0]
group["Place Label"] = label
uniquePlaces.loc[label, "Start"] = group.index.min()
uniquePlaces.loc[label, "End"] = group.index.max()+1
return group
df2 = df.groupby("Place").apply(assign_place_label)
def calc_groups(uniquePlaces, df2):
## group need to be changed only when a group starts or finishes
change_points = np.sort(uniquePlaces[["Start", "End"]].values.ravel()).reshape(-1,1)
## for each change points find boolean indxes for places (True if place is in use at that point)
inds = (change_points>=uniquePlaces["Start"].values) & (change_points<uniquePlaces["End"].values)
## all available indexes for place
all_ind = set(uniquePlaces.index.values+1)
prev_ind = np.array([0]*len(all_ind))
result = []
for ind in inds:
## copy prev_ind where place exists
new_ind = prev_ind * ind
## mark places with index greater than available places with -1
new_ind[new_ind>sum(ind)] = -1
## mark existing places with index 0 with -1
new_ind[(new_ind==0) & ind] = -1
available_ind = all_ind - set(new_ind[new_ind>0])
## replace indxes marked by -1 with minimum values from available_ind
for i in range(len(new_ind)):
if new_ind[i]==-1:
new_ind[i] = min(available_ind)
available_ind.remove(new_ind[i])
result.append(new_ind)
prev_ind = new_ind
result = np.r_[result]
repeats = np.r_[change_points[1:] - change_points[:-1], [[0]]].ravel()
## place index calculated only for change points, now fill the gap between change points
## by repeating index in the gap
result = np.repeat(result, repeats, axis=0)
df2["group"] = (result[np.arange(len(result)), df2["Place Label"].values]-1)//3 + 1
return df2
df2 = calc_groups(uniquePlaces, df2)
df2.drop_duplicates(subset=['Period','Place'])
结果
Period Place Total Available Group Place Label group
0 1 CLUB 1 1 0 1
1 2 CLUB 1 2 0 1
3 2 HOME 2 2 1 1
5 2 AWAY 3 2 2 1
7 3 WORK 4 2 3 2
9 3 AWAY 4 1 2 1
11 3 GOLF 5 1 4 2
13 4 CLUB 4 2 0 1
15 4 POOL 4 2 5 1
17 4 HOME 4 2 1 1
19 5 WORK 4 2 3 1
21 5 AWAY 4 1 2 1
23 5 POOL 4 1 5 1
25 6 TENNIS 5 2 6 1
【讨论】:
以上是关于动态分配唯一值 - Python的主要内容,如果未能解决你的问题,请参考以下文章