Entropy, relative entropy and mutual information

Posted mtandhj

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Entropy, relative entropy and mutual information相关的知识,希望对你有一定的参考价值。

Entropy, relative entropy and mutual information.

Entropy

[H(X) = -sum_{x} p(x) log p(x), ]

熵非负, 且当且仅当(X)确定性的时候为有最小值0, 即(P(X=x_0)=1).

Proof:

(log)的凹性可得

[egin{array}{ll} H(X) & = -sum_{x} p(x) log p(x) & = sum_{x} p(x) log frac{1}{p(x)} & ge log 1=0. end{array} ]

Joint Entropy

[H(X,Y) := -mathbb{E}_{p(x, y)} [log p(x, y)] = sum_{x in mathcal{X}} sum_{yin mathcal{Y}} p(x, y) log p(x, y). ]

Conditional Entropy

[egin{array}{ll} H(Y|X) &= - mathbb{E}_{p(x)} [H(Y|X=x)] &= - sum_{x in mathcal{X}} p(x) H(Y|X=x) &= - sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x)p(y|x) log p(y|x) &= - sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x, y) log p(y|x). end{array} ]

注意 (H(Y|X))(H(Y|X=x)) 的区别.

Chain rule

[H(X, Y) = H(X) + H(Y|X). ]

proof:

根据(p(y|x)=frac{p(x, y)}{p(x)})以及上面的推导可知:

[egin{array}{ll} H(Y|X) &= H(X,Y) + sum_{x in mathcal{X}} sum_{y in mathcal{Y}} p(x, y) log p(x) &= H(X, Y) -H(X). end{array} ]

推论:

[H(X,Y| Z) = H(X|Z) + H(Y|X, Z). ]

[egin{array}{ll} H(Y|X,Z) &= mathbb{E}_{x,z} [H(Y|x,z)] &= -sum_{x,z} p(x,z) p(y|x,z) log p(y|x,z) &= -sum_{x, z} p(x, y, z) [log p(x, y|z) - log p(x|z)] &= mathbb{E}_{z} H(X, Y|z) - mathbb{E}_{z} H(X|z) = H(X, Y|Z) - H(X|Z). end{array} ]

Mutual Information

[I(X;Y) = H(X) - H(X|Y) = sum_{x in mathcal{X}} sum_{y in mathcal{Y}}p(x, y)log frac{p(x, y)}{p(x)p(y)} ]

  • (I(X; Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = I(Y;X) = H(X) + H(Y) - H(X, Y) ge 0)

  • (I(X, X) = H(X))

Relative Entropy

[D(p|q) := mathbb{E}_p (log frac{p(x)}{q(x)}) = sum_{xin mathcal{X}} p(x) log frac{p(x)}{q(x)}. ]

Chain Rules

Chain Rule for Entropy

((X_1, X_2,ldots, X_n) sim p(x_1, x_2, ldots, x_n)):

[H(X_1, X_2, ldots, X_n) = sum_{i-1}^n H(X_i|X_{i-1}, ldots, X_1). ]

proof:

归纳法 + (H(X, Y) = H(X) + H(Y|X)).

Chain Rule for Mutual Information

Conditional MUtual Information

定义:

[I(X;Y|Z) := H(X|Z) - H(X|Y,Z)= mathbb{E}_{p(x, y,z)} log frac{p(X, Y| Z)}{p(X|Z)p(Y|Z)}. ]

性质:

[I(X_1, X_2, ldots, X_n; Y) = sum_{i=1}^n I(X_i;Y|X_{i-1}, ldots, X_1). ]

proof:

[egin{array}{ll} I(X_1, X_2, ldots, X_n; Y) & =H(X_1, ldots, X_n) + H(Y) - H(X_1,ldots, X_n;Y) &= H(X_1,ldots, X_{n-1}) + H(X_n|X_1,ldots, X_{n-1}) + H(Y) - H(X_1, ldots, X_n;Y) &= I(X_1, X_2,ldots, X_{n-1};Y) + H(X_n|X_1,ldots, X_{n-1}) - H(X_n|X_1, ldots, X_{n-1};Y) &= I(X_1, X_2,ldots, X_{n-1};Y) + I(X_n;Y|X_1,ldots, X_{n-1}). \end{array} ]

Chain Rule for Relative Entropy

定义:

[egin{array}{ll} D(p(y|x)|q(y|x)) &:= mathbb{E}_{p(x, y)} [log frac{p(Y| X)}{q(Y|X)}] &= sum_x p(x) sum_y p(y|x) log frac{p(y|x)}{q(y|x)}. end{array} ]

性质:

[D(p(x, y) | q(x, y)) = D(p(x) | q(x)) + D(p(y|x)| q(y|x)). ]

proof:

[egin{array}{ll} D(p(x, y)| q(x, y)) &= sum_{x, y} p(x, y) log frac{p(x, y)}{q(x, y)} &= sum_{x, y} p(x, y) log frac{p(y|x)p(x)}{q(y|x)q(x)} &= sum_{x, y} [p(x, y) (log frac{p(y|x)}{q(y|x)} + log frac{p(x)}{q(x)})]&= D(p(x)|q(x)) + D(p(y|x)|q(y|x)). end{array} ]

补充:

[D(p(x, y) | q(x, y)) = D(p(y) | q(y)) + D(p(x|y)| q(x|y)). ]

故, 当(p(x) = q(x))的时候, 我们可以得到

[D(p(x, y) | q(x, y)) = D(p(y|x)| q(y|x)) ge D(p(y)|q(y)) ]

  1. (D(p(y|x)|q(y|x))=D(p(x, y)| p(x)q(y|x)))

  2. (D(p(x_1, x_2,ldots, x_n)| q(x_1, x_2,ldots, x_m)) = sum_{i=1}^n D(p(x_i|x_{i-1}, ldots, x_1)|q(x_i| x_{i-1}, ldots, x_1)))

  3. (D(p(y)| q(y)) le D(p(y|x)|q(y|x))), (q(x)=p(x)).

1, 2, 3的证明都可以通过上面的稍作变换得到.

Jensen‘s Inequality

如果(f)是凸函数, 则

[mathbb{E} [f(X)] ge f(mathbb{E}[X]). ]

Properties

  • (D(p|q) ge 0) 当且仅当(p=q)取等号.
  • (I(X; Y) ge 0)当且仅当(X, Y)独立取等号.
  • (D(p(y|x)|q(y|x)) ge 0) (根据上面的性质), 当且仅当(p(y|x) = q(y|x))取等号, (p(x) > 0).
  • (I(X; Y|Z) ge 0), 当且仅当(X, Y)条件独立.
  • (H(X|Y)le H(X)), 当且仅当(X, Y)独立等号成立.
  • (H(X_1, X_2, ldots, X_n)le sum_{i=1}^n H(X_i)), 当且仅当所有变量独立等号成立.

Log Sum Inequality

  • (D(p|q)) 关于((p, q))为凸函数, 即(forall 0le lambda le 1):

    [D(lambda p_1 + (1-lambda)p_2| lambda q_1 + (1-lambda)q_2) le lambda D(p_1|q_1) + (1-lambda)D(p_2 | q_2). ]

此部分的证明, 一方面可以通过(plogfrac{p}{q})的凸性得到, 更有趣的证明是, 构造一个新的联合分布

[p(x,c) = p_1 cdot lambda + p_2 cdot (1- lambda), q(x, c) = q_1 cdot lambda + q_2 cdot (1-lambda). ]

[p(x|c=0)=p_1, p(x|c=1)=p_2, q(x|c=0)=q_1, q(x|c=2)=q_2, p(c=0)=q(c=0)=lambda, p(c=1) = q(c=1) = 1-lambda. ]

并注意到(D(p(y)| q(y)) le D(p(y|x)|q(y|x))).

  • (H(X) = sum_{x in mathcal{X}} p(x) log p(x))是关于(p)的凹函数.
  • (I(X, Y) = sum_{x, y} p(y|x)p(x) log frac{p(y|x)}{p(y)}), 当固定(p(y|x))的时候是关于(p(x))的凹函数, 当固定(p(x))的时候, 是关于(p(y|x))的凸函数.

仅仅证明后半部分, 任给(p_1(y|x), p_2(y|x)), 由于(p(x))固定, 故(forall 0 le lambda le 1):

[p(x, y) := lambda p_1(x, y) + (1-lambda) p_2(x, y) = [lambda p_1(y|x) + (1-lambda) p_2(y|x)]p(x) p(y): = sum_x p(x, y) = lambda sum_x p_1(x, y) + (1-lambda) sum_{x} p_2(x, y) q(x, y):= p(x)p(y) = sum_x p(x, y) = lambda p(x) sum_x p_1(x, y) + (1-lambda) p(x)sum_{x} p_2(x, y) =: lambda q_1(x, y) + (1-lambda)q_2(x, y).\]

[I(X, Y) = D(p(x, y)| p(x)p(y))=D(p(x, y)| q(x,y)), ]

因为KL散度关于((p, q))是凸函数, 所以(I)关于(p(y|x))如此.

Data-Processing Inequlity

数据(X ightarrow Y ightarrow Z), 即(P(X, Y,Z) = P(X)P(Y|X)P(Z|Y)) 比如(Y=f(X), Z = g(Y)).

[I(Y, Z;X) = I(X;Y) + I(X;Z|Y)= I(X;Z) + I(X;Y|Z), ]

[I(X;Z|Y) = sum_{x, y, z} p(x, y, z) log frac{p(x,z|y)}{p(x|y)p(z|y)} = sum_{x,y,z}p(x,y,z) log 1 = 0. I(X;Y|Z) = sum_{x,y,z} p(x,y,z) log frac{p(x|y)}{p(x|z)}ge 0. ]

[I(X;Z) le I(X;Y) I(X;Y|Z) le I(X;Y). ]

Sufficient Statistics

Statistics and Mutual Information

  • 一族概率分布({f_{ heta(x)}})

  • (X sim f_{ heta}(x)), (T(X))为其统计量, 则

    [ heta ightarrow X ightarrow T(X) ]

  • [I( heta;X) ge I( heta;T(X)) ]

Sufficient Statistics and Compression

充分统计量定义: 一个函数(T(X))被称之为一族概率分布({f_{ heta}(x)})的充分统计量, 如果给定(T(X)=t)(X)的条件分布与( heta)无关, 即

[f_{ heta}(x) = f(x|t) f_{ heta}(t) Rightarrow heta ightarrow T(X) ightarrow X Rightarrow I( heta;T(X)) ge I( heta;X). ]

此时, (I( heta;T(X))= I( heta;X)).

最小充分统计量定义: 如果一个充分统计量(T(X))与其余的一切关于({f_{ heta}(x)})的充分统计量(U(X))满足

[ heta ightarrow T(X) ightarrow U(X) ightarrow X. ]

以上是关于Entropy, relative entropy and mutual information的主要内容,如果未能解决你的问题,请参考以下文章

[2016-02-04][HDU][1053][Entropy]

halcon算子翻译——entropy_gray

text sparse_softmax_cross_entropy_with_logits和softmax_cross_entropy_with_logits

cross-entropy(交叉熵)是什么?用了表征什么东西?

`PHP.ini` 中的 `session.entropy_length` 有啥作用?

10 entropy was available. 临界阈值:50。