根据循环内另一列的值将列的值更改为nan

Posted

技术标签:

【中文标题】根据循环内另一列的值将列的值更改为nan【英文标题】:Change the value of a column into a nan based on the value of another column inside a loop 【发布时间】:2020-10-12 10:03:58 【问题描述】:

我有大量带有后缀“mean”或“sum”的列。有时带有“平均”后缀的是NaN。发生这种情况时,我也想将带有“sum”后缀的那个也变成 NaN。我有大量变量,所以我需要 (?) 使用循环。我创建了一个假数据框,并添加了基于 SO 中类似帖子尝试过的 3 件事。不幸的是,没有任何效果

original_data_set = (pd.DataFrame
(

    'customerId':[1,2]
    ,'usage_1_sum':[100, 200]
    ,'usage_1_mean':[np.nan,100]
    ,'usage_2_sum':[420,330]
    ,'usage_2_mean':[45,np.nan]

)
             )

print('original dataset')
original_data_set

desired_data_set = (pd.DataFrame
(

    'customerId':[1,2]
    ,'usage_1_sum':[np.nan, 200]
    ,'usage_1_mean':[np.nan,100]
    ,'usage_2_sum':[420,np.nan]
    ,'usage_2_mean':[45,np.nan]

)
             )

print('desired dataset')
desired_data_set



holder_set = original_data_set.copy()

for number in range(1,3):
    holder_set['usage__sum'.format(number)] = (
        
        holder_set['usage__sum'.format(number)]
        .where(holder_set['usage__mean'.format(number)] == np.nan, np.nan
              )
                                                )

print('using an np.where statement changed all sum variables into NaN with no discretion')
holder_set


holder_set = original_data_set.copy()

for number in range(1,3):
    conditions = [holder_set['usage__mean'.format(number)]==np.nan]
    outcome = [np.nan]
    holder_set['usage__sum'.format(number)] = np.select(conditions, outcome, default=holder_set['usage__sum'.format(number)])
    
    
print('using an np.select did not have any effect on the dataframe')
holder_set


holder_set = original_data_set.copy()

for number in range(1,3):
    holder_set.loc[holder_set['usage__mean'.format(number)]==np.nan, 'usage__sum'.format(number)] = 12

print('using a loc did not have any effect on the dataframe')
holder_set

【问题讨论】:

也许可以尝试查看DataFrame.where() 功能。您应该能够直接索引到问题区域,而无需自己编写 for 循环。 【参考方案1】:

假设original 数据框为df

df = pd.DataFrame('customerId': [1, 2], 'usage_1_sum': [100, 200], 'usage_1_mean': [
                  np.nan, 100], 'usage_2_sum': [420, 330], 'usage_2_mean': [45, np.nan])

使用Series.str.endswith 过滤以_mean 结尾的列,然后对于以_mean 结尾的列中的每一列,将_sum 列中的相应值更改为NaN,其中均值列中的值为@ 987654330@:

for col in df.columns[df.columns.str.endswith('_mean')]:
    df.loc[df[col].isna(), col.rstrip('_mean') + '_sum'] = np.nan

结果:

# print(df)
   customerId  usage_1_sum  usage_1_mean  usage_2_sum  usage_2_mean
0           1          NaN           NaN        420.0          45.0
1           2        200.0         100.0          NaN           NaN

【讨论】:

以上是关于根据循环内另一列的值将列的值更改为nan的主要内容,如果未能解决你的问题,请参考以下文章

熊猫通过根据另一列的值添加列级别来重塑数据框[重复]

用多个值更新一列的值

是否有一个 R 函数用于在不使用循环的情况下根据另一列的修改版本重新编码列?

Pandas:根据另一列的键在现有列上映射字典值以替换 NaN

用 pandas 数据框中另一列的值填充多列中的 Na

如何根据火花DataFrame中另一列的值更改列的值