直观理解Law of Total Variance(方差分解公式)
Posted Jie Qiao
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了直观理解Law of Total Variance(方差分解公式)相关的知识,希望对你有一定的参考价值。
Law of Iterated Expectations (LIE)
在讲方差分解之前,我们需要先理解双期望定理。对于一个X,我们可以根据不同的Y将其任意的划分为几部分:
于是经过这样的划分,X总体的均值其实是等价于每一个划分下均值的总体均值。
E [ X ] = E [ E [ X ∣ Y ] ] \\operatornameE [X]=\\operatornameE [\\operatornameE [X|Y]] E[X]=E[E[X∣Y]]
举个例子,假设一共划分为三部分,每部分的均值分别为70 60 80, 于是
E [ X ] = E [ E [ X ∣ Y ] ] = E [ E [ X ∣ Y = y 1 ] + E [ X ∣ Y = y 2 ] + E [ X ∣ Y = y 3 ] ] = 70 + 60 + 80 3 = 70 \\beginaligned & E[X]=E[E[X\\mid Y]]\\\\ = & E[E[X\\mid Y=y_1 ]+E[X\\mid Y=y_2 ]+E[X\\mid Y=y_3 ]]\\\\ = & \\frac70+60+803\\\\ = & 70 \\endaligned ===E[X]=E[E[X∣Y]]E[E[X∣Y=y1]+E[X∣Y=y2]+E[X∣Y=y3]]370+60+8070
从理论上,
E
[
E
[
X
∣
Y
]
]
=
∫
p
(
y
)
∫
x
p
(
x
∣
y
)
d
x
d
y
=
∫
p
(
x
,
y
)
x
d
x
d
y
=
∫
p
(
x
)
x
d
x
=
E
[
X
]
\\beginaligned E[E[X\\mid Y]] & =\\int p( y)\\int xp( x|y) dxdy\\\\ & =\\int p( x,y) xdxdy\\\\ & =\\int p( x) xdx\\\\ & =E[ X] \\endaligned
E[E[X∣Y]]=∫p(y)∫xp(x∣y)dxdy=∫p(x,y)xdxdy=∫p(x)xdx=E[X]
Mathematical Derivation of the Law of Total Variance
另一个重要的规则是total variance:
V
a
r
(
X
)
=
E
[
V
a
r
(
X
∣
Y
)
]
+
V
a
r
(
E
[
X
∣
Y
]
)
Var(X)=\\operatornameE [Var(X\\mid Y)\\ ]+Var(\\operatornameE [X\\mid Y])
Var(X)=E[Var(X∣Y) ]+Var(E[X∣Y])
它刻画了方差的两个组成成分:
E
[
V
a
r
(
X
∣
Y
)
]
=
E
[
E
[
X
2
∣
Y
]
−
(
E
[
X
∣
Y
]
)
2
]
Def. of variance
=
E
[
E
[
X
2
∣
Y
]
]
−
E
[
(
E
[
X
∣
Y
]
)
2
]
Lin. of Expectation
=
E
[
X
2
]
−
E
[
(
E
[
X
∣
Y
]
)
2
]
law of Ite. Expect
V
a
r
(
E
[
X
∣
Y
]
)
=
E
[
(
E
[
X
∣
Y
]
)
2
]
−
E
[
E
[
X
∣
Y
]
]
2
Def. of variance
=
E
[
(
E
[
X
∣
Y
]
)
2
]
−
E
[
X
]
2
law of Ite. Expect
∴
E
[
V
a
r
(
X
∣
Y
)
]
+
V
a
r
(
E
[
X
∣
Y
]
)
=
E
[
X
2
]
−
E
[
X
]
2
=
V
a
r
(
X
)
\\beginaligned \\operatornameE [Var(X\\mid Y)\\ ] & =\\operatornameE [\\ \\operatornameE [X^2 \\mid Y\\ ]-(\\operatornameE [X\\mid Y])^2 \\ ] & \\textDef. of variance\\\\ & =\\operatornameE [\\ \\operatornameE [X^2 \\mid Y]\\ ]-\\operatornameE [\\ (\\operatornameE [X\\mid Y])^2 \\ ] & \\textLin. of Expectation\\\\ & =\\operatornameE [X^2 ]-\\operatornameE [\\ (\\operatornameE [X\\mid Y])^2 \\ ] & \\textlaw of Ite. Expect \\endaligned\\\\ \\\\ \\beginaligned Var(E[X\\mid Y]) & =E[( E[X\\mid Y])^2 ]-E[E[X\\mid Y]]^2 & \\textDef. of variance\\\\ & =E[( E[X\\mid Y])^2 ]-E[X]^2 & \\textlaw of Ite. Expect \\endaligned\\\\ \\\\ \\therefore \\ \\operatornameE [Var(X\\mid Y)\\ ]+Var(\\operatornameE [X\\mid Y])=\\operatornameE [X^2 ]-E[X]^2 =Var( X)
E[Var(X∣Y) ]=E[ E[X2∣Y ]−(E[X∣Y])2 ]=E[ E[X2∣Y] ]−E[ (E[X∣Y])2 ]=E[X2]−E[ (E[X∣Y])2 ]Def. of varianceLin. of Expectationlaw of Ite. ExpectVar(E[X∣Y])=E[(E[X∣Y])2]−E[E[X∣Y]]2=E[(E[X∣Y])2]−E[X]2Def. of variancelaw of Ite. Expect∴ E[Var(X∣Y) ]+Var(E[X∣Y])=E[X2]−E[X]2=Var(X)
怎么理解呢?
- 什么是 E [ V a r ( X ∣ Y ) ] \\displaystyle \\operatornameE [Var(X\\mid Y)\\ ] E[Var(X∣Y) ]? 直观来看,他是每个划分下方差的均值,因此,它刻画了样本内差异的均值。
- 什么是 V a r ( E [ X ∣ Y ] ) \\displaystyle Var(E[X\\mid Y]) Var(E[X∣Y])? 它刻画了不同分组下均值的差异程度,因此,它刻画了样本间差异的程度。
因此,方差刻画了样本内和样本间差异的叠加,这就是Law of Total Variance.
与k-means聚类的联系
熟悉聚类算法的同学可能意识到,k means聚类其实有两种等价的学习方式,分别是,最小化类内距离(within-cluster sum of squares (WCSS)):
arg min
S
∑
i
=
1
k
∑
x
∈
S
i
∥
x
−
μ
i
∥
2
=
arg min
S
∑
i
=
1
k
∣
S
i
∣
Var
S
i
\\displaystyle \\underset\\mathbfS\\operatornamearg\\ min\\sum ^k_i=1\\sum _\\mathbfx \\in S_i\\Vert \\mathbfx -\\boldsymbol\\mu _i\\Vert ^2 =\\underset\\mathbfS\\operatornamearg\\ min\\sum ^k_i=1 |S_i |\\operatornameVar S_i
直观理解Law of Total Variance(方差分解公式)
中心极限定理 | central limit theorem | 大数定律 | law of large numbers