计算 groupby 中的 nanmean 并根据子组将此均值应用于 DF 列

Posted 2023-03-11

技术标签:

【中文标题】计算 groupby 中的 nanmean 并根据子组将此均值应用于 DF 列【英文标题】：Calculate nanmean in groupby and apply this mean to DF column according to subgroup 【发布时间】：2021-07-04 23:21:48 【问题描述】：

我正在尝试为我的 DataFrame 中的 user_score 列填充缺失值。数据当前是字符串，包括'tbd'。我希望用NaN 替换'tbd' 值，然后将该列转换为浮点数，然后按游戏类型计算user_score 平均值并将此平均值应用于user_score 列中的每个NaN 值，基于它们类型（而不是使用整体 user_score 平均值）。

games['user_score'] = games['user_score'].replace('tbd', np.nan, inplace=True)
games['user_score'] = games['user_score'].astype(float)
#genre_mean = games.groupby('genre').agg('user_score':np.mean)
games['user_score'] = games.groupby('genre', sort=False)['user_score'].apply(lambda x: x.fillna(x.mean()))
print(games.groupby('genre').agg('user_score':np.mean))
print(games['user_score'].head(10))

现在，当我在代码末尾打印 groupby 时，它会向我显示每个类型的 user_score 的平均值，作为每个类型的 NaN。然后我尝试在应用函数中使用.nanmean()，但它给了我一个错误。如何根据游戏类型将每个游戏类型的 user_score 平均值应用于 user_score 列中的缺失值？

谢谢！

【问题讨论】：

【参考方案1】：

user_score 在第一行变为空：

games['user_score'] = games['user_score'].replace('tbd', np.nan, inplace=True)

如果你指定user_score 和同时使用inplace=True，那将清除user_score。

要么不使用inplace，要么分配回来：

games['user_score'] = games['user_score'].replace('tbd', np.nan)

或者使用inplace而不分配回：

games['user_score'].replace('tbd', np.nan, inplace=True)

【讨论】：

非常感谢，成功了！！现在也很有意义。

以上是关于计算 groupby 中的 nanmean 并根据子组将此均值应用于 DF 列的主要内容，如果未能解决你的问题，请参考以下文章

根据另一列计算 groupby 中的百分比

Python数据框groupby，然后根据组中的值计算位置

numpy nanmean 中的错误

Groupby 并根据熊猫数据框中的其他列比较/过滤特定组

Groupby 一列并根据 R 中的字符串向量重新排列另一列字符串

根据 pandas 数据框中的条件将 value_counts 与 groupby 函数一起使用并插入新列