合肥工业大学Linux实验二Latex 科技论文排版
Posted 上衫_
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了合肥工业大学Linux实验二Latex 科技论文排版相关的知识,希望对你有一定的参考价值。
一、实验目的
通过使用 latex 进行科技论文的编辑,掌握使用 latex 排版的方法。
二、实验任务和要求
将老师提供的符合 IEEE 期刊论文格式的 pdf 文件,使用 latex 编辑排版软件进
行编辑处理,进而生成出相同的 pdf 文件。
三、实验步骤和实验结果
1.打开老师所给的IEEE模板
2.根据IEEE模板先将论文的大致框架完成:标题和正文
3.向论文中增加数学公式
4.向论文中增加图片
5.向论文中增加引用
四、代码如下
\\documentclass[journal]IEEEtran
\\hyphenationop-tical net-works semi-conduc-tor
\\makeatletter
\\newcommand\\rmnum[1]\\romannumeral #1
\\newcommand\\Rmnum[1]\\expandafter\\@slowromancap\\romannumeral #1@
\\makeatother
\\usepackageamssymb
\\usepackageamsmath
\\usepackagemultirow
\\usepackage[text=190mm,250mm,centering]geometry
\\usepackageindentfirst
\\usepackagegraphicx
\\usepackageepstopdf
\\usepackagecite
\\usepackagestfloats
\\begindocument
\\titleOptimizing Top-k Multiclass SVM via\\\\Semismooth Newton Algorithm
\\authorSan~Zhang,~Si~Li,~\\IEEEmembershipFellow,~IEEE
\\thanksS. Zhang and S. Li are with the Department of Automation,Hefei University, Anhui Province, 230031, China (e-mails: sanzhang, sili@mails.hfut.edu.cn).
\\markbothJournal of \\LaTeX\\ Class Files,~Vol.~14, No.~8, August~2015%Shell \\MakeLowercase\\textitet al.: Bare Demo of IEEEtran.cls for IEEE Journals
\\maketitle
\\beginabstract
Top-k performance has recently received increasing attention in large data categories. Advances like top-k multiclass SVM have consistently improved the top-k accuracy. However,the key ingredient in the state-of-the-art optimization scheme
based upon Stochastic Dual Coordinate Ascent (SDCA) relies on the sorting method which yields O(d log d) complexity. In this paper, we leverage the semismoothness of the problem and propose an optimized top-k multiclass SVM algorithm, which employs semismooth Newton algorithm for the key building block to improve the training speed. Our method enjoys a local superlinear convergence rate in theory. In practice, experimental results confirm the validity. Our algorithm is 4 times faster than the existing method in large synthetic problems; Moreover, on real-world datasets it also shows significant improvement intraining time.
\\endabstract
\\beginIEEEkeywords
Multiclass SVM, top-k error, SDCA optimiza-\\\\tion, root finding, semismooth Newton method.
\\endIEEEkeywords
\\IEEEpeerreviewmaketitle
\\sectionIntroduction
\\IEEEPARstartMULTICLASS classification is a fundamental problem in pattern recognition and has been attracting much attention in the machine learning community
\\citeBishop2006Pattern,Hsu2002comparison,Rifkin2004In,Yuan2012Recent,Rocha2014Multiclass.
The major challenges posed by large-scale dataset for training multiclass classifier lies not only in the data size, but also in the number of data categories \\citeDeng2010What,Zhou2014Learning,Russakovsky2015Imagenet.
For example, there are 1000 object categories for the image classification task in ImageNet visual recognition challenge \\citeRussakovsky2015Imagenet. When the object classes increase, an important issue, i.e. the overlapping categories,
emerges. Many real-world classification tasks involve large numbers of overlapping categories
\\citeCai2007Exploiting
, which lead to class ambiguity. Thus, it is customary to report top-k accuracy
for large-scale object recognition problems \\citeKrizhevsky2012Imagenet,Simonyan2014Very,Szegedy2015Going,He2016Deep, where the top-k accuracy is the fraction of test data for which the counted correct label is among the top k predicted labels by the model. However, all these reported top-k error rates are based on the top-1 error. Recently, Lapin et al. generalize
Crammer and Singer’s Multiclass Support Vector Machine (MSVM) \\citeCrammer2001algorithmic
to top-k MSVM which leads to improvements in top-k performance
\\citeLapin2015Top, \\citeLapin2016Loss.
Since the direct extension of MSVM to nonconvex top-k zero-one loss will encounter a computationally intractable problem; it minimizes the surro-gate function, i.e., so-called top-k hinge loss which is a tight convex upper bound of the top-k zero-one loss. Furthermore, a highly efficient SDCA procedure
\\citeZhang2015Stochastic
is proposed to solve the optimization problem.
\\section\\footnotesize TOP-k MULTICLASS SUPPORT VECTOR MACHINE
We first review the well-known MSVM proposed by Crammer and Signer \\citeCrammer2001algorithmic.
Given a set of $\\mathitn$ instance-label pairs ($\\mathitx_i,\\mathity_i),i$ = 1,$...,\\mathitn$,where $\\mathitx \\in \\mathbbR^p$ and the associated label $\\mathity_i$ is an integer from the set $\\mathcalY$ = $\\1,...,Y\\$.Let the weight vector $\\mathbfw_j$ be the $\\mathitj$-th column of parameter matrix $\\mathbfW_p \\times Y$.Crammer and Singer's MSVM solves the following problem
\\beginequation
\\min _\\mathbfW \\frac\\lambda2 \\sum_j \\in\\mathcalY\\left\\|\\mathbfw_j\\right\\|^2+\\frac1n \\sum_i=1\\ell\\left(\\mathbfW;\\boldsymbolx_i,y_i\\right)),
\\endequation
where $\\lambda$ is referred to as the positive regularization parameter and $\\ell$(.) is called the loss fuction of example ($\\mathitx_i,\\mathity_i$).The loss fuction is defined as
$$\\ell\\left(\\mathbfW ; \\boldsymbolx_i, y_i\\right):=\\max _j \\in \\mathcalY\\left\\\\mathbbI\\left(j \\neq y_i\\right)+\\left\\langle\\mathbfw_j-\\mathbfw_y_i, \\boldsymbolx_i\\right\\rangle\\right\\$$
where $\\mathbbI$(.) is the indicator function which takes a value of one
if its argument is true. Then the multiclass decision function has the form
$$\\undersetj \\in \\mathcalY\\operatornameargmax\\left\\langle\\mathbfw_j, \\boldsymbolx\\right\\rangle.$$
Let $\\mathite_j$ be the $\\mathitj$-th unit vector in $\\mathbbR^Y$ and 1 with ones in all elements. For every $\\mathiti$, let $\\mathitc_i = 1 - \\mathite_yi$ and $\\mathitb_i = \\mathbfW^T\\mathitx_i - (\\mathbfW^T\\mathitx_i)_yi.$To lighten the notation we denote the $\\mathitj$-th largest component of $\\mathitb_i$ by $b_[j], \\text i.e., b_[1] \\geq b_[2] \\geq \\cdots \\geq b_[Y].$
Thus,the loss function can be rewritten as $$\\ell\\left(\\boldsymbolb_i\\right)=\\max \\left\\0,\\left(\\boldsymbolc_i+\\boldsymbolb_i\\right)_[1]\\right\\.$$
Recently, Lapin et al. extended the above loss function to the top-$\\mathitk$ hinge loss \\citeLapin2015Top, \\citeLapin2016Loss,
\\beginequation
\\ell_k\\left(\\boldsymbolb_i\\right)=\\max \\left\\0, \\frac1k \\sum_j=1^k\\left(\\boldsymbolc_i+\\boldsymbolb_i\\right)_[j]\\right\\
\\endequation
where $1 \\leq k<Y$. We show that the top-$\\mathitk$ multiclass SVM can be cast as an unconstrained optimization problem
\\beginequation
\\min _\\mathbfW \\frac\\lambda2 \\sum_j \\in \\mathcalY\\left\\|\\mathbfw_j\\right\\|^2+\\frac1n \\sum_i=1^n \\ell_k\\left(\\mathbfW ; \\boldsymbolx_i, y_i\\right)
\\endequation
\\subsectionDual Problem of Top-k MSVM
To solve the top-$\\mathitk$ MSVM problem (3) using the SDCA framework, one may first derive its dual form. Following the notation given in \\citeShalev-Shwartz2016Accelerated, let $\\mathbfX_i \\in \\mathbbR^p Y \\times Y$ be the matrix whose $\\mathitj$-th column is vec$\\left(\\boldsymbolx_i\\left(\\boldsymbole_j-\\boldsymbole_y_i\\right)^\\top\\right)$ and $\\boldsymbolw=\\operatornamevec(\\mathbfW)$.Then,
$$ \\boldsymbolb_i=\\mathbfX_i^\\top \\boldsymbolw.$$
Hence we can reformulate the primal optimization problem of top-$\\mathitk$ MSVM as
\\beginequation
\\min _\\boldsymbolw \\in \\mathbbR^p Y P(\\boldsymbolw):=\\frac\\lambda2 \\boldsymbolw^\\top \\boldsymbolw+\\frac1n \\sum_i=1^n \\ell_k\\left(\\boldsymbolw ; \\mathbfX_i, y_i\\right).
\\endequation
We obtain its equivalent optimization problem
$$\\min\\frac12\\left\\|\\boldsymbol\\alpha_i\\right\\|_2^2+\\boldsymbola_i^\\top \\boldsymbol\\alpha_i+\\frac12\\left(\\mathbf1^\\top \\boldsymbol\\alpha_i\\right)^2$$
\\beginequation
\\text s.t. 0 \\leq-\\boldsymbol\\alpha_i \\leq \\frac1k \\sum-\\boldsymbol\\alpha_i
\\endequation
$$\\sum-\\boldsymbol\\alpha_i \\leq 1$$
$$\\alpha_i^y_i=0$$
where
$$\\boldsymbola_i=\\frac1\\rho_i\\left(\\boldsymbolc_i+\\mathbfX_i^\\top \\hat\\boldsymbolw\\right), \\rho_i=\\frac1n \\lambda\\left\\|\\boldsymbolx_i\\right\\|^2$$
Here calculating $\\mathbfX_i^\\top \\hat\\boldsymbolw$ still takes $O\\left(p Y^2\\right)$ operations, which is too expensive. We reshape the vector $\\hat\\boldsymbolw$ into a $\\mathitp$-by-Y matrix $\\hatW.$Thus the computation
$$\\mathbfX_i^\\top \\hat\\boldsymbolw=\\hat\\boldsymbolW^\\top \\boldsymbolx_i-\\left(\\hat\\boldsymbolW^\\top \\boldsymbolx_i\\right)_y_i$$
takes $O(p Y)$ operations. In order to avoid the heavy notation,
we drop the subscript of $\\mathita_i$ and let $\\mathitz$ = $-\\boldsymbol\\alpha_i^\\backslash y_i$,$s=\\sum z_j$,the above optimization problem (5) can be rewritten as
$$\\min _\\boldsymbolz, s \\frac12\\|\\boldsymbolz-\\boldsymbola\\|^2+\\frac12 s^2$$
\\beginequation
\\text s.t. s=\\sum z_j
\\endequation
$$s \\leq 1$$
$$0 \\leq z_j \\leq s / k.$$
Once the problem (6) is solved, a sufficient increase of the
dual objective will be achieved. Whilst for the primal problem,
the process will lead to the update
\\beginequation
\\boldsymbolw=\\boldsymbolw+\\frac1n \\lambda \\mathbfX_i\\left(\\boldsymbol\\alpha_i-\\boldsymbol\\alpha_i^\\text old \\right)
\\endequation
A pseudo-code of the SDCA algorithm for the top-$k$ MSVM is
depicted as Algorithm 1. To have the first $w$, we can initialize
$\\alpha_i$ = 0 and then $w$ = 0.
\\begintable[h]
\\begintabularllllllll
\\hline
\\multicolumn8l\\begintabular[c]@l@Algorithm 1 Stochastic Dual Coordinate Ascent Algorithm\\\\ for Top-k MSVM\\endtabular \\\\ \\hline
\\multicolumn8l\\textbfRequire: $\\alpha,\\lambda,k,\\epsilon$ \\\\
\\multicolumn8l
\\begintabular[c]@l@
1:$\\boldsymbolw \\leftarrow \\sum_i \\frac1n \\lambda \\mathbfX_i \\boldsymbol\\alpha_i$
\\\\ 2:while $\\alpha$ is not optimal do
\\\\ 3:~~~~~Randomly permute the training examples
\\\\ 4:~~~~~for $i = 1,...,n$ do
\\\\ 5:~~~~~~~~~$\\boldsymbol\\alpha_i^\\text old \\leftarrow \\boldsymbol\\alpha_i$
\\\\ 6:~~~~~~~~~Update $\\alpha_i$ by solving sub-problem (6)
\\\\ 7:~~~~~~~~~$\\boldsymbolw \\leftarrow \\boldsymbolw+\\frac1n \\lambda \\mathbfX_i\\left(\\boldsymbol\\alpha_i-\\boldsymbol\\alpha_i^\\text old \\right)$
\\\\ 8:~~~~~end for
\\\\ 9:end while
\\endtabular \\\\
\\multicolumn8l\\multirow2*\\textbfEnsure: $\\boldsymbolw,\\alpha$
\\\\
\\multicolumn8l \\\\ \\hline
\\endtabular
\\endtable
\\section EXPERIMENTS
In this section, we first demonstrate the performance of our semismooth Newton method on synthetic data. Then, we apply our algorithm to the top-k multiclass SVM problem to show the efficiency compared with the existing method in \\citeLapin2015Top. Our algorithms used to solve problem are implemented in C with a Matlab interface and run on 3.1GHz Intel Xeon (E5-2687W) Linux machine with 128G of memory. The compiler used is GCC 4.8.4. Both our code and libsdca package of \\citeLapin2015Top ensure the “-O3” optimization flag is set. The experiments are carried out in Matlab 2016a. All the implementation will be released publicly on website.
\\subsectionEfficiency of the Proposed Algorithm
To investigate the scalability in the problem dimension of our algorithm, two synthetic problems are randomly generated with $d$ ranging between 50,000 and 2,000,000. In the first test problem, $a_j$ is randomly chosen from the uniform
distribution U(15, 25) as in \\citeCominetti2014Newtons, \\citeKiwiel2008Variable. In the second test, following the setup of \\citeLapin2015Top, \\citeGong2011Efficient, data entries are sampled from the normal distribution N(0, 1). In the third synthetic problem, $a_j$ is chosen by independent draws from uniform distribution U(−1, 1). For pure comparison, we assume the problem without the constraint $s \\le r$.Thus, the knapsack problem which corresponds to the $s = r$ case will not occur in these synthetic problems.
We first present numerical results to investigate the scalability of our proposed algorithm compared with the sortingbased method for different values of $k = 1, 5, 10$. Fig. 1(a), 1(b) and 1(c) correspond to the first, the second and the third
test problems respectively. They tell us that the running times grow linearly with the problem size for both the sortingbased method and our proposed algorithm. However, our algorithm ?? is consistently much faster than the sortingbased method. When the problem size $d \\ge 2 \\times 10^6$, our proposed algorithm is 2.5 times faster in the first problem, and 4 times faster in both the second and the third problems
respectively. In addition to the superlinear convergence, the semismooth Newton method accesses to accurate solutions in a few iterations. Our numerical results suggest that it usually takes $3 \\thicksim 5$ iterations to converge.
\\begintable[h]
\\captionDatasets used in the experimental evaluation.
\\begintabularl|llll
\\hline
Dataset & Classes & Features & Training size & Testing size \\\\ \\hline
FMD & 10 & 2048 & 500 & 500 \\\\
News20 & 20 & 15,478 & 15,935 & 3993 \\\\
Letter & 26 & 16 & 15,000 & 5,000 \\\\
INDoor67 & 67 & 4,096 & 5,360 & 1,340 \\\\
Caltech101 & 101 & 784 & 4,100 & 4,100 \\\\
Flowers & 102 & 2,048 & 2,040 & 6,149 \\\\
CUB & 200 & 2,048 & 5,994 & 5.794 \\\\
SUN397 & 379 & 4,096 & 19,850 & 19,850 \\\\
ALOI & 1,000 & 128 & 86,400 & 21,600 \\\\
ImageNet & 1,000 & 2,048 & 1,281,167 & 50,000 \\\\ \\hline
\\endtabular
\\endtable
\\ifCLASSOPTIONcaptionsoff
\\newpage
\\fi
\\sectionCONCLUDING REMARKS
In this paper, we leverage the semismoothness of the optimization problem and develop an optimized top-$k$ multiclass SVM. While our proposed semismooth Newton method enjoys the local superlinear convergence rate, we also present an efficient algorithm to obtain the starting point, which works quite well in practice for the Newton iteration. Experimental results on both synthetic and real-world datasets show that our proposed method scales better with larger numbers of categories and offers faster convergence compared with the existing sorting-based algorithm. We note that there are many other semismooth scenarios, such as ReLU activation function
in deep neural networks and hinge loss in the empirical risk minimization problem. It must be very appealing to exploit the semismooth structure and propose more efficient machine learning algorithms in future work.
\\beginfigure*[ht]
\\centering
\\includegraphics[width=5.5cm]scale1.eps
\\hspace10pt %每张图片水平距离
\\includegraphics[width=5.5cm]scale2.eps
\\hspace10pt
\\includegraphics[width=5.5cm]scale3.eps
\\captionScaling of our algorithm compared with sorting method. Left: $a_j \\sim U(10,25)$.Middle: $a_j \\sim U(0,1)$.Right: $a_j \\sim U(-1,1)$
\\endfigure*
\\section*ACKNOWLEDGEMENTS
The authors would like to thank the reviewers for their valuable suggestions on improving this paper. Thanks also goes to Wu Wang for the helpful email exchange.
\\bibliographystyleieeetr
\\bibliographyprojection
\\enddocument
五、实验结果
以上是关于合肥工业大学Linux实验二Latex 科技论文排版的主要内容,如果未能解决你的问题,请参考以下文章