为啥在抽得的样本中计算方差时，除去的是n-1，而不是n

Posted 2023-03-30

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了为啥在抽得的样本中计算方差时，除去的是n-1，而不是n相关的知识，希望对你有一定的参考价值。

如果你经过一次详细的推导可以得到n-1做分母的式子,理论原因是由于样本方差不向总体方差,总体方差你直接用n做分母就是对的,但是样本方差不是让你就算出样本方差来,而是用样本方差来估计总体方差,如果用n做分母那么算出的方差不是无偏估计,也就是说n做分母的样本方差的期望值不等于总体方差的期望值,那就更谈不上什么有效性,只有当分母是n-1的时候样本方差才是无偏的,才能够反映总体方差.但是如果样本空间足够大,也就是说n足够大,那么分母用n还是n-1其实相差无几,具体n取多少是大,你可以用t检验来检验一下~ 参考技术A 因为我们的目的是用样本方差去估计总体方差。
这时除以n和n-1是两个不同的样本方差估计量，二者有各自的优点。
当你学参数估计的时候，你就会知道除以n那个是用最小二乘的方法估计出来的，而除以n-1那个是用极大似然估计做出来的，
可以证明除以n-1那个是无偏估计，所以也就用的多些，但并不是说那个就一定比除以n那个好些，只是在样本不量较小时，我们更侧重于估计的无偏性。

matlab中求方差为啥除以n-1？

方差有两种biased和unbiased，前者除以n；后者除以n-1，叫做Bessel's correction，可以修正样本的variance,更精确描述样本空间。matlab采用的是后者。
抄一段对两者区别的解释
In statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance, and some (but not all) of the bias in the estimation of the population standard deviation.

That is, when estimating the population variance and standard deviation from a sample when the population mean is unknown, the sample variance is a biased estimator of the population variance, and systematically underestimates it. Multiplying the standard sample variance by n/(n − 1) (equivalently, using 1/(n − 1) instead of 1/n) corrects for this, and gives an unbiased estimator of the population variance. The cost of this correction is that the unbiased estimator has uniformly higher mean squared error than the biased estimator.

A subtle point is that, while the sample variance (using Bessel's correction) is an unbiased estimate of the population variance, its square root, the sample standard deviation, is a biased estimate of the population standard deviation; because the square root is a concave function, the bias is downward, by Jensen's inequality. There is no general formula for an unbiased estimator of the population standard deviation, though there are correction factors for particular distributions, such as the normal; see unbiased estimation of standard deviation for details.

One can understand Bessel's correction intuitively as the degrees of freedom in the residuals vector:

(x_1-\overlinex,\,\dots,\,x_n-\overlinex),

where \overlinex is the mean. While there are n independent samples, there are only n − 1 independent residuals, as they sum to 0. 参考技术A 从数学上说，样本方差的公式是除以（n-1) 的。

总体方差的公式是除以n的。

从MATLAB的语法上说，
s = std(X,flag) for flag = 0, is the same as std(X). For flag = 1, std(X,1) returns the standard deviation using (2) above, producing the second moment of the set of values about their mean.

flag=0是除以（n-1)的。也是默认的。
flag=1是除以n的。

参考资料：http://www.mathworks.com/help/techdoc/ref/std.html

参考技术B 求方差为什么除以n-1，这个是数理统计和概率论的内容。
用数学名词语言来说的话，就是要保证随机量估计的“无偏性”。

你可以用“方差无偏性”作为关键词，在百度里搜索。

以上是关于为啥在抽得的样本中计算方差时，除去的是n-1，而不是n的主要内容，如果未能解决你的问题，请参考以下文章

数学中方差为啥有的时候是除以n减一,而不是n?

数学中方差为啥有的时候是除以n减一，而不是n