机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)

Posted fish04

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)相关的知识,希望对你有一定的参考价值。

1. Notation

||$x$||: Euclidean norm ($l_2$ norm),

$$
||x||^2 = x^Tx= \\sum_i=1^dxi^2
$$
$\\mathbbR
+ = x\\in \\mathbbR: x\\geq 0 $

2. Cauchy-Schwarz inequality

Let $u,v\\in \\mathbbR^d$, then
$$
|u^Tv|\\leq ||u||\\cdot ||v||
$$

3. Spectral Norm

Let $A$ be a matrix, $A\\in \\mathbbR^m\\times d$, then
$$
||A|| = \\max_||v||=1||Av||
$$
where $v\\in \\mathbbR^d$. Besides,
$$
||Av|| \\leq ||A||\\cdot ||v||
$$
Furthermore, we have triangle inequality:
$$
||A+B||\\leq ||A||+||B||
$$

4. Mean Value Theorem

$\\large\\textTheorem\\ 1.3$: Let $a<b$ be real numbers, $h:[a,b]\\rightarrow \\mathbbR$ be a continuous function which is differential on $(a,b)$. Then there exists $c\\in (a,b)$ such that:
$$
h(c) = \\frach(b)-h(a)b-a
$$

5. Differentiability

$\\large\\textDefinition\\ 1.5$: $f: dom(f)\\rightarrow \\mathbbR^m, dom(f)\\rightarrow \\mathbbR^d. f \\text is called differentiable at x \\text in the interior of dom(f) \\text if there exists a matrix A\\in \\mathbbR^m\\times d \\text and an error function r:\\mathbbR^d \\rightarrow \\mathbbR^m\\text defined in some neighborhood of 0\\in \\mathbbR^d \\textsuch that for all y \\text in some neighborhood of x,$
$$
f(y) = f(x)+A(y-x)+r(y-x)
$$
where
$$
\\limv\\rightarrow 0 \\frac||r(v)||||v||=0
$$
Besides, $Df(x)
ij = \\frac\\partial f_i\\partial x_j(x)$

6. Convex Set

$\\large\\textTheorem\\ 1.9$: $f:dom(f)\\rightarrow \\mathbbR^m \\text be differentiable, X\\in dom(f) \\text a convex set, B\\in \\mathbbR_+. \\text If X\\in dom(f) \\text is non-empty and open the following statements are equivalent:$
$$
\\beginalign
&(i)\\large||f(x)-f(y)||\\leq B||x-y||(\\large\\bf\\textB-Lipschitz) \\
&(ii)\\large||Df(x)|| \\leq B, \\forall x\\in X
\\endalign
$$

7. Epigraph

$$
\\beginequation
\\bfepi(f)=(x,\\alpha)\\in \\mathbbR^d+1:x\\in dom(f), \\alpha\\geq f(x)
\\endequation
$$
$\\large\\textLemma\\ 1.11$: $f\\text is a convex function if and only if \\bf epi(f) \\text is a convex set$

$\\large\\textLemma\\ 1.15$: $f\\text is convex if and only if dom(f) \\text is convex and $:

$$
\\beginequation
f(y)\\geq f(x)+\\nabla f(x)^T(y-x)
\\endequation
$$
$\\textholds for all x,y\\in dom(f)$.

$\\large\\bf\\textLemma\\ 1.16$: $\\textSuppose that dom(f) \\text is open and f\\text is differentiable. Then f\\text is convex if and only if dom(f) \\text is convex and$
$$
\\beginequation
\\large(\\nabla f(y)-\\nabla f(x))^T(y-x)\\geq 0
\\endequation
$$
$\\textholds for all x,y\\in dom(f)$.
$\\bfProof: \\textIf f \\text is convex,from first-order convex property we have:$
$$
\\beginalign
f(y)\\geq f(x)+\\nabla f(x)^T(y-x)\\
f(x)\\geq f(y)+\\nabla f(y)^T(x-y)
\\endalign
$$
$\\textfor all x,y\\in dom(f). \\text After adding up these two inequalities, then we get:$
$$
\\nabla f(x)^T(y-x)+\\nabla f(y)^T(x-y) = (\\nabla f(y)-\\nabla f(x))^T(x-y)\\leq 0
$$

8. Second-Order Characterization of Convexity

$\\large\\textLemma\\ 1.17$: $f\\text is convex if and only if dom(f) \\text is convex and $:

$$
\\beginequation
\\large\\nabla^2 f(x)\\geq 0
\\endequation
$$
$\\textholds for all x,y\\in dom(f) \\text Positive Semidefinite: M\\geq 0, x^TMx\\geq 0 \\text for all x\\neq 0$.

$\\large\\textLemma\\ 1.25$: $f:dom(f)\\rightarrow \\mathbbR\\text be strictly convex. Then f \\text has just at most one global minimum.$

$\\large\\textLemma\\ 1.27$: $\\textSuppose f:dom(f)\\rightarrow \\mathbbR\\text is convex and differentiable. X\\in dom(f) \\text be a convex set. Point x^\\in X \\text is a \\bf minimizer \\text of f \\text if and only if$
$$
\\large\\nabla f(x^
)^T(x-x^*)\\geq 0, \\forall x\\in X
$$

9. log-sum-exp function

$\\bf \\large Statement: \\textlog-sum-exp function is a convex function$
$\\bf Proof$: $f(x) = \\log(\\sum_i e^x_i)$
$$
\\beginalign
f(\\theta x+(1-\\theta)y)&=\\log(\\sum_i e^\\theta x_i+(1-\\theta)y_i)\\
&=\\log(\\sum_i e^\\theta x_ie^(1-\\theta)y_i)\\
&=\\log(\\sum_i ui^\\thetavi^(1-\\theta))
\\endalign
$$
$\\textBy Inequality:$
$$
\\beginalign
\\sum_i x_iy_i\\leq (\\sum_i |x_i|^p)^1/p(\\sum_i |y_i|^q)^1/q
\\endalign
$$
$\\textWhere 1/p+1/q=1. \\text Therefore,$
$$
\\beginalign
\\log(\\sum_i ui^\\thetavi^(1-\\theta)) &\\leq \\log[(\\sum_i |u_i|^\\theta\\cdot 1/\\theta)^\\theta(\\sum_i |v_i|^(1-\\theta)\\cdot(1/(1-\\theta)))^1-\\theta]\\
&= \\theta\\log(\\sum_i u_i)+(1-\\theta)\\log(\\sum_i v_i)
\\endalign
$$

以上是关于机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)的主要内容,如果未能解决你的问题,请参考以下文章

Python使用bayesian-optimization包的BayesianOptimization算法构建贝叶斯优化器获取机器学习模型最优超参数组合实战+代码

机器学习:Colorization using Optimization

Python使用灰狼算法(Grey Wolf Optimization (GWO) Algorithm)构建优化器获取机器学习模型最优超参数组合拟合最佳模型实战+代码

Convex Optimization

DSO 中的Windowed Optimization

DSO 中的Windowed Optimization