如何使用 Pandas 中的数据透视表计算标准差？

Posted 2023-03-12

技术标签:

【中文标题】如何使用 Pandas 中的数据透视表计算标准差？【英文标题】：How do I calculate the standard deviation with a pivot table in Pandas? 【发布时间】：2015-08-23 17:17:14 【问题描述】：

我有一堆数据，涉及特定运动的某些球员的特定数字。我想在 Pandas 中使用数据透视表来按运动拆分数据，并且对于每项运动的相应值，对于所有从事该运动的人来说，它的平均“数字”值。（所以如果是篮球，就是所有打篮球的人的平均数，这个数字基本代表一种偏好。）

我可以用数据透视表很容易地做到这一点，但如果我想做同样的事情来计算标准偏差，我不知道怎么做。我可以做np.mean 的意思，但没有np.std。我知道有 std()，但我不确定在这种情况下如何使用它。

是否不建议使用数据透视表来执行此任务？我应该如何找到特定运动的所有运动员的数值数据的标准差？

【问题讨论】：

在询问有关 pandas 的问题时，最好包含一些代码以生成示例 DataFrame。有关示例，请参见 ***.com/questions/13404468/t-test-in-pandas-python。 【参考方案1】：

如果您有一个 DataFrame (df)，其中有一列名为 "sport"，则很简单：

df.groupby(by=['sport']).std()

【讨论】：

【参考方案2】：

df.pivot_table(values='number', index='sport', aggfunc='std')

【讨论】：

【参考方案3】：

你使用的是什么版本的 numpy？ 1.9.2 有 np.std:

np.std?
Type:        function
String form: <function std at 0x0000000003EE47B8>
File:        c:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py
Definition:  np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)
Docstring:
Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution,
of the array elements. The standard deviation is computed for the
flattened array by default, otherwise over the specified axis.

【讨论】：

如果我使用np.std 而不是sum，则会收到此错误：gist.github.com/anonymous/0f439fff9af42d600639

以上是关于如何使用 Pandas 中的数据透视表计算标准差？的主要内容，如果未能解决你的问题，请参考以下文章