动态分配唯一值 - Python

Posted

技术标签:

【中文标题】动态分配唯一值 - Python【英文标题】:Dynamic assignment of unique values - Python 【发布时间】:2020-02-13 19:13:28 【问题描述】:

我正在尝试为特定分配或组分配唯一值。棘手的是这些独特的值是动态开始和结束的。因此,这些群体将保持先前看到的价值,并在不同的时间段采用新的独特价值。关于df,唯一值位于Place 中,可供选择的组位于Available Group 中,每个时间段Period

我试图遵守的广泛准则是:

1) 每个Group在任何时候都不能超过 3 个唯一的Places

2) 当前唯一的Places 应均匀分布在每个Group

3) 一旦Places 被分配给Group,保持直到Group 完成。 除非Group 变为 NA 或会议分配不均

为了了解当前出现了多少Places,我已包含Total,它基于Place 值是否再次出现。我正在满足我的前两个准则和部分第三个准则。当 Place 分配给 Group 时,它会保持在相同的 Group 直到 Place 完成(不会再次出现)。

但是,我没有引用 Available Group 来了解 Group 是否可用。当Group 变得不可用时,我想将那些Places 重新排列在另一个Available Group 之间。使用下面的df,可以很好地分配位置,从而增加独特的位置。但是一旦他们开始完成并且第 2 组变得不可用,这些位置就不会重新分配给第 1 组。此时只有 3 个位置出现。

df = pd.DataFrame(
    'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6],  
    'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS'],                                
    'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5],                            
    'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1'],                           
    )

尝试:

# df to store all unique places
uniquePlaces = pd.DataFrame(df["Place"].unique(), columns=["Place"])

# Start stores index of df where the place appears 1st
uniquePlaces["Start"] = -1

# End stores index of df where the place appears last 
uniquePlaces["End"] = -1

def assign_place_label(group):

    ''' Create a label column that calculates the amount of unique meetings 
        throughout the racing schedule '''

    label = uniquePlaces[uniquePlaces["Place"] == group.name].index[0]
    group["Place Label"] = label
    uniquePlaces.loc[label, "Start"] = group.index.min()
    uniquePlaces.loc[label, "End"] = group.index.max()
    return group

# Based on Start and End of each place, assign index to each place.
# when 'freed' the index is reused to new place appearing after that
def Assign_Meetings_group(up):
    up["Index"] = 0
    up["Freed"] = False
    max_ind=0
    free_indx = []
    for i in range(len(up)):
        ind_freed = up.index[(up["End"]<up.iloc[i]["Start"]) & (~up["Freed"])]

        free = list(up.loc[ind_freed, "Index"])
        free_indx += free

        up.loc[ind_freed, "Freed"] = True

        if len(free_indx)>0:
            m = min(free_indx)
            up.loc[i, "Index"] = m
            free_indx.remove(m)

        else:
            up.loc[i, "Index"] = max_ind
            max_ind+=1

    up["Group"] = up["Index"]//3+1

    return up  

df2 = df.groupby("Place").apply(assign_place_label)
uniquePlaces = Assign_Meetings_group(uniquePlaces)

df3 = df2[df2['Period']!=0].drop_duplicates(subset = ['Period','Place'])
result = df3.merge(uniquePlaces[["Group"]], how="left", left_on="Place Label", right_index=True, sort=False)

输出:

    Period Place  Total Available Group  Place Label  Group
0   1       CLUB  1      1               0            1    
1   2       CLUB  1      1               0            1    
3   2       HOME  2      1               1            1    
5   2       AWAY  3      1               2            1    
7   3       WORK  4      1               3            2    
9   3       AWAY  4      1               2            1    
11  3       GOLF  5      1               4            2    
13  3       HOME  5      1               1            1    
15  4       CLUB  4      1               0            1    
17  4       AWAY  3      1               2            1    
19  4       POOL  3      1               5            1    
21  5       WORK  3      1               3            2    
23  5       POOL  2      1               5            1    
25  6       GOLF  1      1               4            2 

预期输出:

    Period Place  Total Available Group  Place Label  Group
0   1       CLUB  1      1               0            1    
1   2       CLUB  1      1               0            1    
3   2       HOME  2      1               1            1    
5   2       AWAY  3      1               2            1    
7   3       WORK  4      1               3            2    
9   3       AWAY  4      1               2            1    
11  3       GOLF  5      1               4            2    
13  3       HOME  5      1               1            1    
15  4       CLUB  4      1               0            1    
17  4       AWAY  3      1               2            1    
19  4       POOL  3      1               5            1    
21  5       WORK  3      1               3            1    
23  5       POOL  2      1               5            1    
25  6       GOLF  1      1               4            1 

【问题讨论】:

【参考方案1】:

这是我的问题的解决方案,请在评论中找到详细信息

import pandas as pd
import numpy as np

df = pd.DataFrame(
    'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6],  
    'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS'],                                
    'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5],                            
    'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1'],                           
    )

# df to store all unique places
uniquePlaces = pd.DataFrame(df["Place"].unique(), columns=["Place"])

# Start stores index of df where the place appears 1st
uniquePlaces["Start"] = -1

# End stores index of df where the place appears last 
uniquePlaces["End"] = -1

def assign_place_label(group):

    ''' Create a label column that calculates the amount of unique meetings 
        throughout the racing schedule '''

    label = uniquePlaces[uniquePlaces["Place"] == group.name].index[0]
    group["Place Label"] = label
    uniquePlaces.loc[label, "Start"] = group.index.min()
    uniquePlaces.loc[label, "End"] = group.index.max()+1
    return group

df2 = df.groupby("Place").apply(assign_place_label)


def calc_groups(uniquePlaces, df2):

    ## group need to be changed only when a group starts or finishes
    change_points = np.sort(uniquePlaces[["Start", "End"]].values.ravel()).reshape(-1,1)

    ## for each change points find boolean indxes for places (True if place is in use at that point)
    inds = (change_points>=uniquePlaces["Start"].values) & (change_points<uniquePlaces["End"].values)

    ## all available indexes for place
    all_ind = set(uniquePlaces.index.values+1)
    prev_ind = np.array([0]*len(all_ind))

    result = []
    for ind in inds:
        ## copy prev_ind where place exists
        new_ind = prev_ind * ind
        ## mark places with index greater than available places with -1
        new_ind[new_ind>sum(ind)] = -1
        ## mark existing places with index 0 with -1
        new_ind[(new_ind==0) & ind] = -1

        available_ind = all_ind - set(new_ind[new_ind>0])

        ## replace indxes marked by -1 with minimum values from available_ind
        for i in range(len(new_ind)):
            if new_ind[i]==-1:
                new_ind[i] = min(available_ind)
                available_ind.remove(new_ind[i])

        result.append(new_ind)
        prev_ind = new_ind

    result = np.r_[result]
    repeats = np.r_[change_points[1:] - change_points[:-1], [[0]]].ravel()

    ## place index calculated only for change points, now fill the gap between change points
    ## by repeating index in the gap
    result = np.repeat(result, repeats, axis=0)

    df2["group"] = (result[np.arange(len(result)), df2["Place Label"].values]-1)//3 + 1
    return df2


df2 = calc_groups(uniquePlaces, df2)
df2.drop_duplicates(subset=['Period','Place'])

结果

Period   Place  Total Available Group  Place Label  group
0        1    CLUB      1               1            0      1
1        2    CLUB      1               2            0      1
3        2    HOME      2               2            1      1
5        2    AWAY      3               2            2      1
7        3    WORK      4               2            3      2
9        3    AWAY      4               1            2      1
11       3    GOLF      5               1            4      2
13       4    CLUB      4               2            0      1
15       4    POOL      4               2            5      1
17       4    HOME      4               2            1      1
19       5    WORK      4               2            3      1
21       5    AWAY      4               1            2      1
23       5    POOL      4               1            5      1
25       6  TENNIS      5               2            6      1

【讨论】:

以上是关于动态分配唯一值 - Python的主要内容,如果未能解决你的问题,请参考以下文章

执行动态查询并将值分配给两个变量

如何在 TypeScript 中为类属性动态分配值

linux主编号的动态分配

使用 array.map() 创建组件时如何动态分配 Prop 值

执行动态查询并将值分配给两个变量

如何将动态内存分配的数组调整为特定值?