CVX notes
Posted yychi
CVX notes
1. PSD
M is positive semidefinite matrix (iff) all principal submatrices (P) of (M) are PSD
Note: This follows by considering the quadratic form (x^T Mx) and looking at the components of (x) corresponding to the defining subset of principal submatrix. The converse is trivially true.
M is PSD (iff) all principal minors are non-negative (所有主子式非负)
x^T M x = sum_{i,j}M_{ij}x_ix_j
于是取 (x) 为标准基 (e_i ~implies M_{ii} ge 0 implies mathbf{tr}(M) ge 0) , 再取(x)为零向量只有 i,j两个位置为 1,则
x^T M x = M_{ii}M_{jj} - M_{ij}^2 ge 0 ~~(PSD) \implies M_{ij} le sqrt{M_{ii}M_{jj}} le frac{M_{ii} + M_{jj}}{2}
2. Matrix norm
General definition of a norm:
Matrix norm:
- Frobenius norm: (|A|_F := sqrt{langle A,A angle_F} = sqrt{mathbf{tr}(A^*A)})
- Induced norm: (|A|_p := sup_limits{|x|_p = 1} |Ax|_p)
- Nuclear norm: (|A|_{nuclear} := sum sigma_i(A)) (奇异值之和)
- Spectral norm: (|A|_{spectral} := lambda_1) (最大特征值)
Spectrial radius
3. Duality
Two ==equivalent== ways to represent a convex set:
- standard representation: The family of points in the set
- dual representation: The set of halfspaces containing the set (半平面的交集)
A closed convex set (S) is the intersection of all closed halfspaces (H) containing it.
Let (S subseteq mathbb{R}^n) be a convex set containing the origin. The polar of (S) is defined as follows
S^{circ} := {y ~|~ y^Tx le 1, ~forall x in S}
- polar is one way of representing the all halfspaces containing a convex set
- every halfspace (a^Tx le b) with (b eq 0) can be written as a “normalized” inequality (y^T x le 1), by dividing by (b)
- (S^{circ}) can be thought of as the normalized representations of halfspaces containing (S)
Properties of the polar:
- (S^{circcirc} = S)
- (S^{circ}) is a closed convex set containing the origin
- When 0 is in the interior of (S), then (S^{circ}) is bounded
- When (S) is non-convex, (S^{circ} = (mathbf{conv}(S))^{circ}), and (S^{circcirc} = mathbf{conv}(S))
Polar duality of convex cones
- (K^{circcirc} = K)
- (K^{circ}) is closed and convex
Conjugation of convex functions
Let (f: mathbb{R}^n mapsto mathbb{R}cup{infty}) be a convex function. The ==conjugation== of (f) is
f^*(y) := sup_limits{x}(y^Tx - f(x))
Properties of the conjugate
- (f^{**} = f)
- (f^*) is convex (supremum of affine functions of (y))
Convex sets
Convex functions
affine is convex: (f(x) = a^T x+b)
affine 既凸也凹
Proof: let (pi(x)) be a norm of (x), then
(f) is convex (iff) epi((f)) is convex
1. Closed convex
A convex function (f) is called closed if its epigraph is a closed set.
- (f) which is convex and continuous on a closed domain is a closed function. (norms)
- all differentiable convex functions are closed with dom(f = mathbb{R}^n).
- 当考虑一个凸函数时,通常认为在dom(f)外取值为(infty)
- Jensen‘s inequality:
pf: (f(x) = f(sumalpha_i x_i) le sum alpha_i f(x_i) le max_limits{i} f(x_i))
2. Level sets
Note: the convexity of level sets does not characterize convex functions, but quasiconvex functions.
- convex (f) is closed (implies) all its level sets are closed
Some convex sets
- norm ball (({xin mathbb{R}^n | |x| le 1})) is convex and closed
椭球(({x | (x-a)^T Q (x-a) le r^2})) is convex and closed
pf: (x^TQy := langle x, y angle) 满足内积的三条性质
- bilinearity
- symmetry
- positivity
上述三条性质 (iff) Q is PSD
3. Operations perserving convexity of functions
- stability under taking weighted sums: (f,g mapsto lambda f + mu g, ; lambda,mu ge 0)
- stability under affine substitutions of the argument: (x mapsto Ax+b) or (f(x) mapsto phi(x) = f(Ax+b))
- stability under taking pointwise sup: ({f_i}_{i in mathcal{I}} mapsto g(x) := sup_limits{i in mathcal{I}}f_i(x)), 凸函数族 ({f_i}_{i in mathcal{I}}) 逐点取上确界而成的函数也是凸的
- stability under partial minimization: (f(x,y)) jointly convex in ((x,y)), then (g(x) = inf_limits{y} f(x,y)) is convex (suppose g is proper, i.e., > -(infty) everywhere and is finite at least at one point)
- stability under perspective: (f(x) mapsto g(x,t) = tf(x/t), mathbf{dom}g = {(x,t) | x/t in mathbf{dom}f, t > 0})
4. Detect convexity
Necessary and Sufficient Convexity Condition for smooth function:
- 一阶可微的光滑函数 (f) 是凸的 (iff) (f‘(x)) 单调非减
- 二阶可微的光滑函数 (f) 是凸的 (iff) (f‘‘(x) ge 0)
subgradient property is characteristic of convex functions:
5. Subgradient
6. Optimality conditions
(x^* in mathbf{dom}f?) is the minimizer (iff?) (0 in partial f(x^*)?)
7. Strong convexity
A differentiable function f is strongly convex if
f(y) ge f(x) +
abla f(x)^T(y-x) + frac{mu}{2} |y-x|^2
- (f) is not necessarily differentiable, (see the equivalent definition)
- if (f) is non-smooth, gradient -> subgradient
- strong convexity (implies) strict convexity
Note: Intuitively speaking, strong convexity means that there exists a quartic lower bound on the growth of the function.
Equivalent definition
&(i)~f(y)ge f(x)+
abla f(x)^T(y-x)+frac{mu}{2}lVert y-x
Vert^2,~forall x, y. &(ii)~g(x) = f(x)-frac{mu}{2}lVert x
Vert^2~ ext{is convex},~forall x. &(iii)~langle
abla f(x) -
abla f(y),x-y
angle ge mu lVert x-y
Vert^2,~forall x, y. &(iv)~f(alpha x+ (1-alpha) y) le alpha f(x) + (1-alpha) f(y) - frac{alpha (1-alpha)mu}{2}Vert x-y
Vert^2,~alpha in [0,1]. &(v)~
abla^2 f(x) succeq mu oldsymbol{I}
Lagrange Duality
Consider an optimization problem in standard form (not necessarily convex)
underset{x}{ ext{minimize}} & f_0(x) \text{subject to} & f_i(x) le 0, ~i=1,cdots,m ~ & h_i(x) = 0, ~i=1,cdots,p
The Lagrangian is
L(x,oldsymbol{lambda},oldsymbol{mu}) = f_0(x) + sum_{i=1}^m lambda_i f_i(x) + sum_{i=1}^p mu_i h_i(x)
The Lagrange dual function is defined as
g(lambda, mu) = inf_{x} L(x,lambda,mu)
Lagrange dual problem
underset{lambda, mu}{ ext{maximize}} & g(lambda, mu) \text{subject to} & oldsymbol{lambda} succeq mathbf{0}
Weak duality
d^* le p^*
- (d^*): optima of dual problem
- (p^*): optima of primal problem
- duality gap: (p^* - d^*)
- always hold
Strong dualiy
d^* = p^*
- constraint qualifications (implies) strong duality
- Slater’s Constraint Qualification: a convex problem is strictly feasible (i.e., (exists ~x in mathbf{int} mathcal{D}: x in Omega))
Complementary slackness
KKT conditions
Tagent cone
Let M be a (nonempty) convex set and (x^* in M), the tagent cone of (M) at (x^*) is the cone
T_M(x^*) &= {h in mathbb{R}^n | x^* + th in M, ; forall t > 0 } &= {y in mathbb{R}^n ~|~ y - x^* in M}
- Geometrically, this is the set of all directions leading from (x^*) inside (M)
- convex but not necessarily closed
- fact: if (x^*) is a minimizer, then (forall h in T_M(x^*) implies h^T abla f(x^*) ge 0). (因为tangent cone里面都是可行解,所以必须不是下降方向)
- (T_M(x^*) = mathbb{R}^n iff x^* in mathbf{int}M)
e.g. 多面体
M = {x | Ax le b} = {x | a_i^Tx le b_i, ; i = 1,dots,m}
the tangent cone at (x^*) is
T_M(x^*) = {h~|~a_i^T h le 0, ~forall i, ~a_i^T x^* = b_i}
Normal cone: the polar cone of tangent cone
N_M(x^*) = {g in mathbb{R}^n ~|~ langle g, y-x^*
angle le 0, ~forall y in M}
- normal cone is the polar to tangent cone, i.e.,
[ egin{split} T_M(x^*) &= {g in mathbb{R}^n ~|~ langle g, y-x^* angle ge 0, ~forall y in M} N_M(x^*) &= {g in mathbb{R}^n ~|~ langle g, y-x^* angle le 0, ~forall y in M} end{split} ] - fact: if (x^*) is a minimizer, then (- abla f(x^*) in N_M(x^*)).
Algorithm convergence
~ | Stepsize Rule | Convergence Rate | Iteration Complexity |
Gradient descent | |||
strongly convex & smooth | (eta_t = frac{2}{mu + L}) | (Oleft(frac{kappa -1}{kappa +1} ight)^t) | (Oleft(frac{logfrac{1}{epsilon}}{logfrac{kappa+1}{kappa-1}} ight)) |
convex & smooth | (eta_t = frac{1}{L}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
Frank-Wolfe | |||
(strongly) convex & smooth | (eta_t = frac{1}{t}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
Projected GD | |||
convex & smooth | (eta_t = frac{1}{L}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
strongly convex & smooth | (eta_t = frac{1}{L}) | (Oleft((1-frac{1}{kappa})^t ight)) | (O(kappalogfrac{1}{epsilon})) |
Subgradient method | |||
convex & Lipschitz | (eta_t = frac{1}{sqrt{t}}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon^2})) |
strongly convex & Lipschitz | (eta_t = frac{1}{t}) | (Oleft(frac{1}{t} ight)) | (O(frac{1}{epsilon})) |
Proximal GD | |||
convex & smooth (w.r.t. (f)) | (eta_t = frac{1}{L}) | (O(frac{1}{t})) | (O(frac{1}{epsilon})) |
strongly convex & smooth (w.r.t. (f)) | (eta_t = frac{1}{L}) | (Oleft((1-frac{1}{kappa})^t ight)) | (O(kappalogfrac{1}{epsilon})) |
