Entropy, relative entropy and mutual information
Posted mtandhj
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Entropy, relative entropy and mutual information相关的知识,希望对你有一定的参考价值。
Entropy, relative entropy and mutual information.
Entropy
[H(X) = -sum_{x} p(x) log p(x),
]
熵非负, 且当且仅当(X)确定性的时候为有最小值0, 即(P(X=x_0)=1).
Proof:
由(log)的凹性可得
[egin{array}{ll}
H(X)
& = -sum_{x} p(x) log p(x) & = sum_{x} p(x) log frac{1}{p(x)} & ge log 1=0.
end{array}
]
Joint Entropy
[H(X,Y) := -mathbb{E}_{p(x, y)} [log p(x, y)] = sum_{x in mathcal{X}} sum_{yin mathcal{Y}} p(x, y) log p(x, y).
]
Conditional Entropy
[egin{array}{ll}
H(Y|X)
&= - mathbb{E}_{p(x)} [H(Y|X=x)] &= - sum_{x in mathcal{X}} p(x) H(Y|X=x) &= - sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x)p(y|x) log p(y|x) &= - sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x, y) log p(y|x).
end{array}
]
注意 (H(Y|X)) 和 (H(Y|X=x)) 的区别.
Chain rule
[H(X, Y) = H(X) + H(Y|X).
]
proof:
根据(p(y|x)=frac{p(x, y)}{p(x)})以及上面的推导可知:
[egin{array}{ll}
H(Y|X)
&= H(X,Y) + sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x, y) log p(x) &= H(X, Y) -H(X).
end{array}
]
推论:
[H(X,Y| Z) = H(X|Z) + H(Y|X, Z).
]
[egin{array}{ll}
H(Y|X,Z)
&= mathbb{E}_{x,z} [H(Y|x,z)] &= -sum_{x,z} p(x,z) p(y|x,z) log p(y|x,z) &= -sum_{x, z} p(x, y, z) [log p(x, y|z) - log p(x|z)] &= mathbb{E}_{z} H(X, Y|z) - mathbb{E}_{z} H(X|z) = H(X, Y|Z) - H(X|Z).
end{array}
]
[I(X;Y) = H(X) - H(X|Y) = sum_{x in mathcal{X}} sum_{y in mathcal{Y}}p(x, y)log frac{p(x, y)}{p(x)p(y)}
]
Relative Entropy
[D(p|q) := mathbb{E}_p (log frac{p(x)}{q(x)}) = sum_{xin mathcal{X}} p(x) log frac{p(x)}{q(x)}.
]
Chain Rules
Chain Rule for Entropy
设((X_1, X_2,ldots, X_n) sim p(x_1, x_2, ldots, x_n)):
[H(X_1, X_2, ldots, X_n) = sum_{i-1}^n H(X_i|X_{i-1}, ldots, X_1).
]
proof:
归纳法 + (H(X, Y) = H(X) + H(Y|X)).
定义:
[I(X;Y|Z) := H(X|Z) - H(X|Y,Z)= mathbb{E}_{p(x, y,z)} log frac{p(X, Y| Z)}{p(X|Z)p(Y|Z)}.
]
性质:
[I(X_1, X_2, ldots, X_n; Y) = sum_{i=1}^n I(X_i;Y|X_{i-1}, ldots, X_1).
]
proof:
[egin{array}{ll}
I(X_1, X_2, ldots, X_n; Y)
& =H(X_1, ldots, X_n) + H(Y) - H(X_1,ldots, X_n;Y) &= H(X_1,ldots, X_{n-1}) + H(X_n|X_1,ldots, X_{n-1}) + H(Y) - H(X_1, ldots, X_n;Y) &= I(X_1, X_2,ldots, X_{n-1};Y) + H(X_n|X_1,ldots, X_{n-1}) - H(X_n|X_1, ldots, X_{n-1};Y) &= I(X_1, X_2,ldots, X_{n-1};Y) + I(X_n;Y|X_1,ldots, X_{n-1}). \end{array}
]
Chain Rule for Relative Entropy
定义:
[egin{array}{ll}
D(p(y|x)|q(y|x))
&:= mathbb{E}_{p(x, y)} [log frac{p(Y| X)}{q(Y|X)}] &= sum_x p(x) sum_y p(y|x) log frac{p(y|x)}{q(y|x)}.
end{array}
]
性质:
[D(p(x, y) | q(x, y)) = D(p(x) | q(x)) + D(p(y|x)| q(y|x)).
]
proof:
[egin{array}{ll}
D(p(x, y)| q(x, y))
&= sum_{x, y} p(x, y) log frac{p(x, y)}{q(x, y)} &= sum_{x, y} p(x, y) log frac{p(y|x)p(x)}{q(y|x)q(x)} &= sum_{x, y} [p(x, y) (log frac{p(y|x)}{q(y|x)} + log frac{p(x)}{q(x)})]&= D(p(x)|q(x)) + D(p(y|x)|q(y|x)).
end{array}
]
补充:
[D(p(x, y) | q(x, y)) = D(p(y) | q(y)) + D(p(x|y)| q(x|y)).
]
故, 当(p(x) = q(x))的时候, 我们可以得到
[D(p(x, y) | q(x, y)) = D(p(y|x)| q(y|x)) ge D(p(y)|q(y))
]
-
(D(p(y|x)|q(y|x))=D(p(x, y)| p(x)q(y|x)))
-
(D(p(x_1, x_2,ldots, x_n)| q(x_1, x_2,ldots, x_m)) = sum_{i=1}^n D(p(x_i|x_{i-1}, ldots, x_1)|q(x_i| x_{i-1}, ldots, x_1)))
-
(D(p(y)| q(y)) le D(p(y|x)|q(y|x))), (q(x)=p(x)).
1, 2, 3的证明都可以通过上面的稍作变换得到.
Jensen‘s Inequality
如果(f)是凸函数, 则
[mathbb{E} [f(X)] ge f(mathbb{E}[X]).
]
Properties
- (D(p|q) ge 0) 当且仅当(p=q)取等号.
- (I(X; Y) ge 0)当且仅当(X, Y)独立取等号.
- (D(p(y|x)|q(y|x)) ge 0) (根据上面的性质), 当且仅当(p(y|x) = q(y|x))取等号, (p(x) > 0).
- (I(X; Y|Z) ge 0), 当且仅当(X, Y)条件独立.
- (H(X|Y)le H(X)), 当且仅当(X, Y)独立等号成立.
- (H(X_1, X_2, ldots, X_n)le sum_{i=1}^n H(X_i)), 当且仅当所有变量独立等号成立.
Log Sum Inequality
此部分的证明, 一方面可以通过(plogfrac{p}{q})的凸性得到, 更有趣的证明是, 构造一个新的联合分布
[p(x,c) = p_1 cdot lambda + p_2 cdot (1- lambda), q(x, c) = q_1 cdot lambda + q_2 cdot (1-lambda).
]
即
[p(x|c=0)=p_1, p(x|c=1)=p_2, q(x|c=0)=q_1, q(x|c=2)=q_2, p(c=0)=q(c=0)=lambda, p(c=1) = q(c=1) = 1-lambda.
]
并注意到(D(p(y)| q(y)) le D(p(y|x)|q(y|x))).
- (H(X) = sum_{x in mathcal{X}} p(x) log p(x))是关于(p)的凹函数.
- (I(X, Y) = sum_{x, y} p(y|x)p(x) log frac{p(y|x)}{p(y)}), 当固定(p(y|x))的时候是关于(p(x))的凹函数, 当固定(p(x))的时候, 是关于(p(y|x))的凸函数.
仅仅证明后半部分, 任给(p_1(y|x), p_2(y|x)), 由于(p(x))固定, 故(forall 0 le lambda le 1):
[p(x, y) := lambda p_1(x, y) + (1-lambda) p_2(x, y) = [lambda p_1(y|x) + (1-lambda) p_2(y|x)]p(x) p(y): = sum_x p(x, y) = lambda sum_x p_1(x, y) + (1-lambda) sum_{x} p_2(x, y) q(x, y):= p(x)p(y) = sum_x p(x, y) = lambda p(x) sum_x p_1(x, y) + (1-lambda) p(x)sum_{x} p_2(x, y) =: lambda q_1(x, y) + (1-lambda)q_2(x, y).\]
又
[I(X, Y) = D(p(x, y)| p(x)p(y))=D(p(x, y)| q(x,y)),
]
因为KL散度关于((p, q))是凸函数, 所以(I)关于(p(y|x))如此.
Data-Processing Inequlity
数据(X
ightarrow Y
ightarrow Z), 即(P(X, Y,Z) = P(X)P(Y|X)P(Z|Y)) 比如(Y=f(X), Z = g(Y)).
[I(Y, Z;X) = I(X;Y) + I(X;Z|Y)= I(X;Z) + I(X;Y|Z),
]
又
[I(X;Z|Y) = sum_{x, y, z} p(x, y, z) log frac{p(x,z|y)}{p(x|y)p(z|y)} = sum_{x,y,z}p(x,y,z) log 1 = 0. I(X;Y|Z) = sum_{x,y,z} p(x,y,z) log frac{p(x|y)}{p(x|z)}ge 0.
]
故
[I(X;Z) le I(X;Y) I(X;Y|Z) le I(X;Y).
]
Sufficient Statistics
-
一族概率分布({f_{ heta(x)}})
-
(X sim f_{ heta}(x)), (T(X))为其统计量, 则
[ heta
ightarrow X
ightarrow T(X)
]
-
故
[I( heta;X) ge I( heta;T(X))
]
Sufficient Statistics and Compression
充分统计量定义: 一个函数(T(X))被称之为一族概率分布({f_{ heta}(x)})的充分统计量, 如果给定(T(X)=t)时(X)的条件分布与( heta)无关, 即
[f_{ heta}(x) = f(x|t) f_{ heta}(t) Rightarrow heta
ightarrow T(X)
ightarrow X Rightarrow I( heta;T(X)) ge I( heta;X).
]
此时, (I( heta;T(X))= I( heta;X)).
最小充分统计量定义: 如果一个充分统计量(T(X))与其余的一切关于({f_{ heta}(x)})的充分统计量(U(X))满足
[ heta
ightarrow T(X)
ightarrow U(X)
ightarrow X.
]
以上是关于Entropy, relative entropy and mutual information的主要内容,如果未能解决你的问题,请参考以下文章
[2016-02-04][HDU][1053][Entropy]
halcon算子翻译——entropy_gray
text sparse_softmax_cross_entropy_with_logits和softmax_cross_entropy_with_logits
cross-entropy(交叉熵)是什么?用了表征什么东西?
`PHP.ini` 中的 `session.entropy_length` 有啥作用?
10 entropy was available. 临界阈值:50。