根据附加的字典列表在 df 中创建新列并遍历字典 Pandas 列表

Posted

技术标签:

【中文标题】根据附加的字典列表在 df 中创建新列并遍历字典 Pandas 列表【英文标题】:Create new column in a df based the appended list of dictionary and looping over the list of dictionary Pandas 【发布时间】:2020-11-08 15:43:08 【问题描述】:

我有一个 df 和字典列表,如下所示。

df:

Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70 

REQUEST_OBJ = 
    "blue": 
        "best": [
 'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
    

第一步: 从此我想更改字典中的“最佳”列表,如下所示。

Step1.1: Sort the list based on the value of "from" key in dictionary

[
 "type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,
"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,
     "type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ]
Step1.2:add a dictionary with value of "from" key as minimum date of df and "to" should be "from" date the first dictionary in the sorted list. "days" = 0, "coef":[0.1,0.1,0.1,0.1,0.1,0.1].

"type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      
Step1.3:add a dictionary with value of "from" key as 7 days after minimum date of df and "to" should be one days after from

"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      
Step1.4:add a dictionary with value of "from" key as maximum date of df and "to" should be same as well as "from".

"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      
Step 1.5: Sort all the dictionary based on "from" date.

Expected Output:

["type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,
     "type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,
"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,

     "type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,
"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      ,

"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      
]

Step 1.6:

Replace the "to" value of each dictionary with "from" value of next dictionary. "to" value of last dictionary be as it is.

Expected  output:

    ["type": "df_first",
              "from": "2020-02-01T20:00:00.000Z",
              "to": "2020-02-03T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              ,
             "type": "quadratic",
              "from": "2020-02-03T20:00:00.000Z",
              "to": "2020-02-04T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              ,
        "type": "linear",
              "from": "2020-02-04T20:00:00.000Z",
              "to": "2020-02-05T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              ,
        
             "type": "polynomial",
              "from": "2020-02-05T20:00:00.000Z",
              "to": "2020-02-08T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              ,
        "type": "df_mid",
              "from": "2020-02-08T20:00:00.000Z",
              "to": "2020-02-14T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              ,
        
        "type": "df_last",
              "from": "2020-02-14T20:00:00.000Z",
              "to": "2020-02-14T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              
        ]

根据更新的字典在df中创建一个新列 我想根据字典指定的“类型”和日期列在 df 中创建一个新列。

Explanation:

if "type" == df_first:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)


if "type" == df_mid:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)


elif "type" == "quadratic":
     df['new_col'] = a0 + a1*(T) + a2*(T)**2 + previous value of df['new_col']
     where T = 1 for one day after the "from" date of that dictionary and T counted in days based Date value

elif "type" == "linear":
     df['new_col'] = a0 + a1*(T) + previous value of df['new_col']
     where T = 1 for one day after the "from" date of that dictionary.

elif "type" == "polynomial":
     df['new_col'] = a0 + a1*(T) + a2*(T)**2  + a3*(T)**3  + a4*(T)**4  + a5*(T)**5 + previous value of df['new_col']
     where T = 1 for start_date of that dictionary.

if "type" == df_last:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)

我试过下面的代码:

    df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70"""), sep="\s+", parse_dates=[0])

REQUEST_OBJ = 
    "blue": 
        "best": [
 'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
    


def add_dct(lst, _type, _from, _to):
    lst.append(
        'type': _type,
        'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'days': 0,
        "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
    )

def fn_graph(df, REQUEST_OBJ):
    

    REQUIRED_KEYS = ["blue"]

    for bluewhite_category in REQUIRED_KEYS:
        print(bluewhite_category)
        if bluewhite_category in REQUEST_OBJ.keys():
            for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
                print(bestworst_category)
                param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
                dmin, dmax = df['Date'].min(), df['Date'].max()
                #sort input list based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # add a dictionary with d['from'] = dmin
                param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
                # add a dictionary with d['from'] as data_end
                param_obj_list = add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
                # add dictionary with d['from'] as projection end
                param_obj_list = add_dct(param_obj_list, 'df_last', dmax, dmax)
                # sort the final list of dictionary based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # Replace the 'to' date as from of previous dictionary
                df1ist = pd.DataFrame(param_obj_list)
                df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
                param_obj_list = df1ist.to_dict('r')
                print(param_obj_list)
                kind = bluewhite_category + '_' + bestworst_category
                df['time_function'] = np.nan
                for d in param_obj_list:
                    a0, a1, a2, a3, a4, a5 = d['coef']

                    start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
                    end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')

                    T = df['Date'].sub(pd.Timestamp(start)).dt.days
                    mask = df['Date'].between(start, end, inclusive=True)

                    if d['type'] == 'df_first':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'quadratic':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
        
                    elif d['type'] == 'linear':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
        
                    elif d['type'] == 'polynomial':
                        df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
                                    (T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
                    
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_last':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                
        else:
            return df
    return df

fn_graph(df, REQUEST_OBJ)

而且我遇到了错误。

AttributeError: 'NoneType' object has no attribute 'append'

【问题讨论】:

我自己尝试并解决了这个错误。下面是工作代码 【参考方案1】:

我是如何纠正这个错误的

刚刚改变

param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])

add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])

完整代码如下:

df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70"""), sep="\s+", parse_dates=[0])

REQUEST_OBJ = 
    "blue": 
        "best": [
 'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
 'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
    


def add_dct(lst, _type, _from, _to):
    lst.append(
        'type': _type,
        'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'days': 0,
        "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
    )

def fn_graph(df, REQUEST_OBJ):


    REQUIRED_KEYS = ["blue"]

    for bluewhite_category in REQUIRED_KEYS:
        print(bluewhite_category)
        if bluewhite_category in REQUEST_OBJ.keys():
            for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
                print(bestworst_category)
                param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
                dmin, dmax = df['Date'].min(), df['Date'].max()
                #sort input list based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # add a dictionary with d['from'] = dmin
                add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
                # add a dictionary with d['from'] as data_end
                add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
                # add dictionary with d['from'] as projection end
                add_dct(param_obj_list, 'df_last', dmax, dmax)
                # sort the final list of dictionary based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # Replace the 'to' date as from of previous dictionary
                df1ist = pd.DataFrame(param_obj_list)
                df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
                param_obj_list = df1ist.to_dict('r')
                print(param_obj_list)
                kind = bluewhite_category + '_' + bestworst_category
                df['time_function'] = np.nan
                for d in param_obj_list:
                    a0, a1, a2, a3, a4, a5 = d['coef']

                    start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
                    end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')

                    T = df['Date'].sub(pd.Timestamp(start)).dt.days
                    mask = df['Date'].between(start, end, inclusive=True)

                    if d['type'] == 'df_first':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'quadratic':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
        
                    elif d['type'] == 'linear':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
        
                    elif d['type'] == 'polynomial':
                        df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
                                    (T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
                    
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_last':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                
        else:
            return df
    return df

fn_graph(df, REQUEST_OBJ)

【讨论】:

以上是关于根据附加的字典列表在 df 中创建新列并遍历字典 Pandas 列表的主要内容,如果未能解决你的问题,请参考以下文章

遍历字典列表并 1) 将流值与流列元素进行比较 2) 如果匹配,则附加一个带有数据的新列表

通过映射到字典创建新列(字符串包含匹配)

遍历python字典并在两个列表中附加关键组件

Pandas:通过从列表的字典映射创建一列

将列表中的字典追加到熊猫数据框

在循环中将字典附加到熊猫数据框