如何在两个 Pandas 数据帧中找到元素调和平均值

Posted 2023-03-12

技术标签:

【中文标题】如何在两个 Pandas 数据帧中找到元素调和平均值【英文标题】：How to find the elementwise harmonic mean across two Pandas dataframes 【发布时间】：2021-03-13 01:20:28 【问题描述】：

与这篇文章类似：efficient function to find harmonic mean across different pandas dataframes 我有两个形状相同的 Pandas 数据框，我想找到每对元素的调和平均值 - 来自同一位置的每个数据框。该帖子中给出的解决方案是使用面板，但现在已弃用。

如果我这样做：

import pandas as pd
import numpy as np
from scipy.stats.mstats import hmean

df1 = pd.DataFrame(dict(x=np.random.randint(5, 10, 5), y=np.random.randint(1, 6, 5)))
df2 = pd.DataFrame(dict(x=np.random.randint(5, 10, 5), y=np.random.randint(1, 6, 5)))
dfs_dictionary = 'DF1':df1,'DF2':df2
df=pd.concat(dfs_dictionary)
print(df)

       x  y
DF1 0  9  4
    1  6  4
    2  7  2
    3  5  2
    4  5  2
DF2 0  9  2
    1  7  1
    2  7  1
    3  9  5
    4  8  3

x = df.groupby(level = 1).apply(hmean, axis = None).reset_index()
print(x)
   index         0
0      0  4.114286
1      1  2.564885
2      2  2.240000
3      3  3.956044
4      4  3.453237

我只得到一列值。为什么？根据原始 df，我期望有两列，一列用于 x 值的 hmean，另一列用于 y 值的 hmean。我怎样才能实现我想做的事情？

【问题讨论】：

【参考方案1】：

原因是您将axis=None 传递给hmean，这会使数据变平。请记住，当您执行groupby().apply() 时，参数是整个组，例如df.loc['DF1']。只需删除axis=None：

x = df.groupby(level = 1).apply(hmean).reset_index()

你会得到：

   index                                        0
0      0                 [6.461538461538462, 3.0]
1      1  [5.833333333333333, 2.4000000000000004]
2      2                               [8.0, 3.0]
3      3  [6.857142857142858, 2.4000000000000004]
4      4   [6.461538461538462, 2.857142857142857]

或者你可以使用agg:

x = df.groupby(level = 1).agg('x':hmean,'y':hmean)

然后得到：

          x         y
0  6.461538  3.000000
1  5.833333  2.400000
2  8.000000  3.000000
3  6.857143  2.400000
4  6.461538  2.857143

如果您的列数不仅仅是x,y：

x = df.groupby(level=1).agg(c:hmean for c in df.columns)

【讨论】：

没有矢量化版本吗？它是矢量化的。即使是aply版本？【参考方案2】：

尝试删除axis = None参数。

【讨论】：

以上是关于如何在两个 Pandas 数据帧中找到元素调和平均值的主要内容，如果未能解决你的问题，请参考以下文章

合并两个不同长度的python pandas数据帧，但将所有行保留在输出数据帧中

如何将列添加到依赖于数据帧或外部数据帧中组的平均值的数据帧？

如何在 python / pandas 中进行左内连接？ [复制]

如何在 pandas 数据帧中有效地使用 one-hot 编码规范化列？

如何将 Google Cloud Storage 中的千兆字节数据加载到 pandas 数据帧中？

如何在pandas数据帧中反转.astype（str）？