Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

Posted Stata杩炰韩浼?/a>

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏相关的知识,希望对你有一定的参考价值。

馃崕 杩炰韩浼氫富椤碉細lianxh.cn

New锛?/span> lianxh 鍛戒护鍙戝竷浜嗭細  
闅忔椂鎼滅储 Stata 鎺ㄦ枃銆佹暀绋嬨€佹墜鍐屻€佽鍧涳紝瀹夎鍛戒护濡備笅锛?br>鈥?. ssc install lianxh

鈥?/p>

鈥?杩炰韩浼?路 鏈€鍙楁杩庣殑璇?/strong>

鈥?br>馃崜 2021 Stata 瀵掑亣鐝?/strong>
鈱?2021 骞?1.25-2.4

馃尣 涓昏锛氳繛鐜夊悰 (涓北澶у)锛涙睙鑹?(涓浗浜烘皯澶у)

馃憠 璇剧▼涓婚〉锛?/p>

鈥?/p>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鈥?/p>

鈥?/p>

鈥?/p>

浣滆€? 妯婂槈璇?(涓北澶у)

E-Mail: fanjch676@163.com

鈥?/p>

鈥?/p>


鐩綍

  • 1. 寮曡█

  • 2. 鐞嗚浠嬬粛

    • 2.1 鏀寔鍚戦噺鏈?/p>

    • 2.2 鍐崇瓥鏍?/p>

    • 2.3 绁炵粡缃戠粶

  • 3. 鍛戒护浠嬬粛鍜屽畨瑁?/p>

    • 3.1 鍩烘湰浠嬬粛

    • 3.2 瀹夎鏂规硶

    • 3.3 璇硶鍙婇€夐」

    • 3.4 娉ㄦ剰浜嬮」

  • 4. Stata 瀹炴搷

    • 4.1 鏁版嵁缁撴瀯鎻忚堪

    • 4.2 妯″瀷璁粌鍜岀粨鏋?/p>

    • 4.3 缁撴灉姹囨€?/p>

  • 5. 鎬荤粨

  • 6. 鍙傝€冭祫鏂?/p>



鈥?/p>

娓╅Θ鎻愮ず锛?/strong> 鏂囦腑閾炬帴鍦ㄥ井淇′腑鏃犳硶鐢熸晥銆傝鐐瑰嚮搴曢儴銆岄槄璇诲師鏂囥€?/span>銆?/p>

鈥?/p>

1. 寮曡█

鈥滃浣曟牴鎹タ鐡滅殑鑹叉辰銆佹牴钂傘€佹暡澹扮瓑鐗瑰緛鍒嗚鲸鍑哄ソ鐡滃拰鍧忕摐鈥濓紝杩欐槸鎴戜滑鏃ュ父鐢熸椿涓粡甯搁潰涓寸殑鍒嗙被闂 (classification problem) 銆傝€屽湪瀛︽湳鐮旂┒涓紝璇稿鐮旂┒閮界涓嶅紑鍒嗙被鐨勫奖瀛愶細璇嗗埆缁忔祹鍛ㄦ湡锛屽垽鏂湭鏉ョ粡娴庡舰鍔匡紱鐮旂┒涓婂競鍏徃璐㈠姟淇℃伅锛屽鍏惰储鍔″洶澧冩垨鍗辨満杩涜棰勮鈥︹€︽澶栵紝璁$畻鏈鸿瑙夈€佸瀮鍦鹃偖浠跺垎绫汇€佸尰瀛﹁瘖鏂瓑涔熶笌鍒嗙被闂瀵嗗垏鐩稿叧銆?/p>

鍥炴兂缁忓吀鐨勮閲忔ā鍨?Logit 鍥炲綊锛屼负瑙e喅绂绘暎閫夋嫨闂鎻愪緵鎬濊矾锛屾湰璐ㄤ笂涔熷彲瑙嗕綔涓€绉嶅垎绫荤畻娉曘€傞殢鐫€澶ф暟鎹椂浠g殑鍒版潵锛岃澶氬垎绫讳换鍔¢潰涓寸潃鏁版嵁缁村害杩囬珮銆?span>鏁版嵁璐ㄩ噺杈冧綆銆?span>鏍锋湰涓嶅钩琛?/strong>绛夎澶氶棶棰樸€?span>鏈哄櫒瀛︿範 (Machine Learning, ML) 绠楁硶浣滀负杩戝勾鏉ョ倷鎵嬪彲鐑殑鏂规硶锛屼负瑙e喅杩欎簺闂寮€杈熶簡鏂扮殑鎬濊矾涓庨€斿緞銆?/p>

鏈帹鏂囧皢瑕佷粙缁嶇殑鍛戒护 c_ml_stata 鍒╃敤 Python 璇█鍦?Stata 涓疄鐜颁簡鏈哄涔犲垎绫荤畻娉曪紝涓嶄粎鍥婃嫭浜嗕紬澶氬垎绫荤畻娉曪紝濡?span>鏀寔鍚戦噺鏈?/strong>銆?span>鍐崇瓥鏍?/strong>銆?span>绁炵粡缃戠粶绛夛紱涔熸敮鎸?span>浜ゅ弶楠岃瘉 (Cross Validation, CV) 锛屾槸鍒╃敤 Stata 澶勭悊鍒嗙被闂鏈夊姏宸ュ叿銆傛湰鎺ㄦ枃鐨勪綑涓嬮儴鍒嗗畨鎺掑涓嬶細鍦ㄧ浜岄儴鍒嗭紝瀵硅鍛戒护鐨勯儴鍒嗘満鍣ㄥ涔犲垎绫荤畻娉曡繘琛岀畝鍗曠殑鐞嗚浠嬬粛锛涘湪绗笁閮ㄥ垎锛屽璇ュ懡浠ょ殑绠€瑕佷粙缁嶅拰瀹夎鏂规硶杩涜璇存槑锛涘湪绗洓閮ㄥ垎锛屽埄鐢ㄨ鍛戒护鍙婂叾鎻愪緵鐨勬暟鎹泦浣跨敤 Stata 杩涜鍒嗙被闂鐨勫鐞嗭紱鍦ㄧ浜旈儴鍒嗭紝瀵规湰鎺ㄦ枃涓昏鍐呭杩涜鎬荤粨銆?/p>

鈥?/p>

2. 鐞嗚浠嬬粛

鏈哄櫒瀛︿範鍒嗙被绠楁硶浼楀锛岀敱浜庣瘒骞呮湁闄愶紝鐜扮粨鍚?c_ml_stata 鍛戒护涓彁渚涚殑閮ㄥ垎鍒嗙被绠楁硶杩涜绠€瑕佺殑鐞嗚浠嬬粛锛屼互渚垮鏈哄櫒瀛︿範鍒嗙被闂銆佺畻娉曞強鍚庣画鍛戒护浣跨敤鏈夋洿娓呮鐨勮璇嗐€?span>鐔熸倝杩欎簺绠楁硶鐨勮鑰呭彲浠ュ揩閫熻烦杩?/strong>銆傝閮ㄥ垎涓昏浠嬬粛鐨勬満鍣ㄥ涔犵畻娉曞寘鎷細鏀寔鍚戦噺鏈?(Support Vector Machine, SVM) 銆佸喅绛栨爲 (Decesion Tree) 鍜岀缁忕綉缁?(Neural Network, NN) 銆?/p>

2.1 鏀寔鍚戦噺鏈?/span>

2.1.1 鏀寔鍚戦噺涓庨棿闅?/span>

鏀寔鍚戦噺鏈烘槸涓€绉?span>浜屽垎绫诲櫒锛屽畠鐨勫熀鏈€濇兂鏄熀浜庤缁冮泦 鍦ㄦ牱鏈┖闂翠腑瀵绘壘涓€涓?span>鍒掑垎瓒呭钩闈?/strong>锛屽皢涓嶅悓绫诲埆鐨勬牱鏈垝鍒嗗紑銆?/p> Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鏀寔鍚戦噺鏈哄涔犳柟娉曞寘鎷敱绠€鑷崇箒鐨勪竴绯诲垪妯″瀷锛氬綋璁粌鏁版嵁绾挎€у彲鍒嗘椂锛岄€氳繃纭棿闅旀渶澶у寲 (hard margin maximization) 锛屽涔犱竴涓嚎鎬у垎绫诲櫒锛屽嵆绾挎€у彲鍒嗘敮鎸佸悜閲忔満锛涘綋璁粌鏁版嵁杩戜技绾挎€у彲鍒嗘椂锛岄€氳繃杞棿闅旀渶澶у寲 (soft margin maximization) 锛屼篃瀛︿範涓€涓嚎鎬у垎绫诲櫒锛屽嵆绾挎€ф敮鎸佸悜閲忔満锛涘綋璁粌鏁版嵁绾挎€т笉鍙垎鏃讹紝閫氳繃浣跨敤鏍告柟娉?(kernel method) 鍙婅蒋闂撮殧鏈€澶у寲锛屽涔?/strong>闈炵嚎鎬ф敮鎸佸悜閲忔満銆?/p>

鎴戜滑浠庢渶绠€鍗曠殑绾挎€у彲鍒嗘敮鎸佸悜閲忔満浣滀负寮曞叆锛屽亣瀹氱粰瀹氫竴涓壒寰佺┖闂磋缁冩暟鎹泦 锛屽叾涓紝 琛ㄧず鍏锋湁 涓壒寰佺殑鐗瑰緛鍚戦噺 (feature vector) 锛?span class="mq-90"> 琛ㄧず鍥犲彉閲忕殑涓や釜涓嶅悓鐨勫垎绫绘爣绛?(class label) 銆傚苟涓斿亣瀹?span>璁粌鏁版嵁闆嗘槸绾挎€у彲鍒嗙殑銆傛垜浠殑鐩殑鏄壘鍒颁竴涓垝鍒嗚秴骞抽潰 灏嗗疄渚嬪垝鍒嗗埌涓嶅悓鐨勭被锛屽叾涓紝 涓烘硶鍚戦噺锛?span class="mq-99"> 涓烘埅璺濄€傛晠鍙畾涔夌浉搴旂殑鍒嗙被鍑芥暟涓?/p>

鐗瑰埆鍦帮紝鑻? 琛ㄧず鏍锋湰鐐逛綅浜庡垝鍒嗚秴骞抽潰涓婏紝琚О涔嬩负鈥?span>鏀寔鍚戦噺鈥?(support vector) 銆?/p>

涓€鑸湴锛屽浜庣嚎鎬у彲鍒嗙殑鏁版嵁锛屽瓨鍦ㄦ棤绌峰涓垝鍒嗚秴骞抽潰鍙互灏嗕袱绫绘暟鎹纭湴鍒嗗紑锛岄偅涔堝浣曡幏寰椾竴涓敮涓€鐨勬渶浼樺垝鍒嗚秴骞抽潰鍛紵鍦ㄤ笅鍥句腑锛屾湁 涓変釜鐐癸紝琛ㄧず 涓疄渚嬶紝涓旈娴嬪垎绫绘椂鍧囧湪鍒掑垎瓒呭钩闈㈢殑涓€渚с€傚洜涓? 鐐硅窛鍒嗙被瓒呭钩闈㈣緝杩滐紝灏辨瘮杈冪‘淇? 鐐硅姝g‘鍒嗙被鐨勫彲淇″害杈冮珮锛?span class="mq-122"> 鐐圭鍒掑垎瓒呭钩闈㈣緝杩戯紝鍥犳鍙兘鎬€鐤戝垝鍒嗘槸鍚﹀噯纭?(鍙俊搴﹁緝浣? 锛涜€? 鐐逛粙浜庣偣 鍜? 涔嬮棿锛岄娴嬪叾鍒嗙被缁撴灉鐨勫彲淇″害浠嬩簬 涓? 涔嬮棿銆備笂杩扮洿瑙夊紩瀵兼垜浠浣曠‘瀹氭渶浼樼殑鍒掑垎瓒呭钩闈⑩€斺€?span>涓€鑸潵璇达紝涓€涓偣绂诲垝鍒嗚秴骞抽潰鐨勮繙杩戝彲浠ヨ〃绀哄垎绫荤粨鏋滅殑鍙俊搴︼紝鑰屾渶浼樼殑鍒掑垎瓒呭钩闈㈠簲浣垮緱鎵€鏈夌偣鐨勫垎绫荤粨鏋滃彲淇″害灏藉彲鑳界殑楂?/strong>銆傜敱鍩烘湰鐨勭┖闂村嚑浣曠煡璇嗗彲浠ョ煡閬擄紝 鎭板ソ鍙互鐩稿鍦拌〃绀虹 涓疄渚嬪埌鍒掑垎瓒呭钩闈㈢殑璺濈銆傛澶栵紝鑻? 鍒欒〃绀哄垎绫绘纭€傚湪姝わ紝寮曞叆涓€涓潪甯稿叧閿殑姒傚康鈥斺€?闂撮殧 (margin)锛屽畾涔変负

鍩轰簬鏈€鍒濈殑鐩磋锛屾垜浠殑鐩爣鑷劧鏄笇鏈涘嵆浣挎槸绂诲垝鍒嗚秴骞抽潰鏈€杩戠殑鐐癸紝鍏跺垝鍒嗙粨鏋滅殑鍙俊搴︿篃杈冮珮銆傚洜姝わ紝鏈€澶у寲闂撮殧鏄竴涓笉閿欑殑鎯虫硶銆傜劧鑰岋紝娉ㄦ剰鍒板鏋滄垜浠垚姣斾緥鍦版敼鍙? 鍜? 锛岃櫧鐒舵垜浠殑闂撮殧鍙樺寲浜嗭紝浣嗘槸瓒呭钩闈㈡湰韬苟鏈敼鍙樸€傚洜姝わ紝涓烘眰寰楀敮涓€瑙o紝瀵规硶鍚戦噺鏂藉姞鏌愮绾︽潫鏄繀瑕佺殑 (濡傝鑼冨寲锛岀害鏉? ) 銆傛渶缁堬紝鎴戜滑鍙互鍐欏嚭濡備笅鏈€浼樺寲闂

鐣ュ井閬楁喚鐨勬槸锛屼互涓婃渶浼樺寲闂鏄€滈潪鍑糕€?(Non-convex) 鐨勶紝姹傝В杩囩▼杈冧负澶嶆潅銆傚垢杩愮殑鏄紝涓婅堪鏈€浼樺寲闂鍙互閫氳繃绛変环鍙樻崲锛岃浆鎹负浠ヤ笅鍑?(Convex) 闂锛?/p>

姹傚緱鏈€浼樿В 鍗冲彲寰楀埌鏈€浼樺垝鍒嗚秴骞抽潰銆傚彲浠ヨ瘉鏄庯紝璇ユ渶浼樺垝鍒嗚秴骞抽潰鏄瓨鍦ㄤ笖鍞竴鐨勩€?/p>

2.1.2 杞棿闅?/span>

浠ヤ笂鐨勮璁轰腑锛屾垜浠亣瀹氫簡璁粌鏁版嵁闆嗗彲浠ヨ鍒掑垎瓒呭钩闈㈠噯纭€佸畬鍏ㄥ湴鍒嗗紑銆傜劧鑰岋紝鍦ㄥ疄闄呴棶棰樹腑锛屾垜浠線寰€闅句互纭畾鏁版嵁鏄惁绾挎€у彲鍒嗭紱鎴栬€咃紝鍗充娇鏁版嵁绾挎€у彲鍒嗭紝涔熷緢闅炬柇瀹氳繖涓矊浼肩嚎鎬у彲鍒嗙殑缁撴灉涓嶆槸鐢变簬杩囨嫙鍚?/strong>瀵艰嚧鐨勩€傚洜姝わ紝涓轰簡鍑忓皯鏁版嵁涓€濆櫔澹扳€滅殑骞叉壈锛屾垜浠厑璁告敮鎸佸悜閲忔満鍦ㄤ竴浜涙牱鏈笂鐨勫垎绫荤粨鏋滃嚭閿欍€備负姝わ紝鎴戜滑闇€瑕佸紩鍏?span>杞棿闅?/strong> (soft margin) 鐨勬蹇点€備箣鍓嶇殑绾︽潫瑕佹眰鎵€鏈夋牱鏈潎鍒掑垎姝g‘锛屽嵆婊¤冻 锛岃繖鍙互鐞嗚В涓衡€滅‖闂撮殧鈥濄€傝€岃蒋闂撮殧鍒欏厑璁告煇浜涙牱鏈笉婊¤冻璇ョ害鏉燂紝褰撶劧杩欎簺涓嶆弧瓒虫牱鏈殑绾︽潫搴旇灏藉彲鑳界殑灏戯紝鍥犳浼樺寲鐩爣鍙互鍐欎负

鍏朵腑锛?span class="mq-185"> 鏄鍒欏寲绯绘暟锛?span class="mq-188"> 瓒婂ぇ锛岃〃绀烘垜浠閿欒鍒嗙被鐨勫蹇嶇▼搴﹁秺浣庯紝闂撮殧瓒婂皬锛涘弽涔嬶紝 瓒婂皬锛岃〃绀烘垜浠閿欒鍒嗙被鐨勫蹇嶇▼搴﹁秺澶э紝闂撮殧瓒婂ぇ銆?span class="mq-194"> 涓虹ず鎬у嚱鏁?锛屽嵆

寮曞叆鏉惧紱鍙橀噺 (slack variables) 锛屽彲浠ュ皢涓婂紡閲嶅啓涓?/p>

2.1.3 鏍告柟娉?/span>

鐢变簬鐜板疄涓殑璁稿闂骞堕潪鏄嚎鎬у彲鍒嗙殑锛屽浜庨潪绾挎€у彲鍒嗙殑鏁版嵁锛屽父閲囩敤鏍告柟娉?/strong>灏嗘牱鏈粠鍘熷绌洪棿鏄犲皠鍒版洿楂樼淮鐨勭壒寰佺┖闂?(濡備笅鍥炬墍绀? 锛屼娇寰楁牱鏈湪杩欎釜鐗瑰緛绌洪棿鍐呯嚎鎬у彲鍒嗐€?/p> Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

浠? 琛ㄧず 鏄犲皠鍚庣殑鐗瑰緛鍚戦噺锛屼簬鏄紝鍦ㄧ壒寰佺┖闂翠腑鍒掑垎瓒呭钩闈㈡墍瀵瑰簲鐨勬ā鍨嬪彲浠ヨ〃绀轰负

鍥犳锛屾渶浼樺寲闂鍙互鍐欎负

鍦ㄦ眰瑙h繃绋嬩腑锛岀敱浜庨渶瑕佽绠楁牱鏈? 涓? 鏄犲皠鍒扮壒寰佺┖闂翠箣闂寸殑鍐呯Н 銆傚洜涓虹壒寰佺┖闂寸殑缁村害鍙兘寰堥珮锛岀洿鎺ヨ绠楀唴绉€氬父鍗佸垎鍥伴毦锛屽洜鑰岃鎯冲涓嬪嚱鏁?/p>

鍙互澶уぇ绠€鍖栬繍绠楄繃绋嬨€傝€屽嚱鏁? 绉颁负鏍稿嚱鏁?/strong> (kernel function) 銆傛牳鍑芥暟鐨勯€夊彇骞堕潪鏄换鎰忕殑锛岄渶瑕佹弧瓒充竴浜涙潯浠讹紝鍙楅檺浜庣瘒骞呮垜浠笉浣滆缁嗚璁恒€備互涓嬪垪鍑轰竴浜涘父鐢ㄧ殑鏍稿嚱鏁帮細

  • 绾挎€ф牳锛? 锛?
  • 澶氶」寮忔牳锛? 锛屽弬鏁? 涓哄椤瑰紡鐨勬鏁帮紱
  • 楂樻柉鏍?/strong>锛? 锛屽弬鏁? 涓洪珮鏂牳鐨勫甫瀹?(width) 锛?
  • 鎷夋櫘鎷夋柉鏍?/strong>锛? 锛屽弬鏁? 锛?
  • Sigmoid 鏍?/strong>锛? 锛屽叾涓? 涓哄弻鏇叉鍒囧嚱鏁帮紝 銆?

2.1.4 琛ュ厖

鏀寔鍚戦噺鏈哄彲浠ユ瀯閫?span>瀵瑰伓闂锛屽埄鐢ㄦ媺鏍兼湕鏃ヤ箻瀛愭硶姹傝В銆傛敮鎸佸悜閲忔満鏈€缁堝彲杞寲涓轰竴涓?span>浜屾瑙勫垝闂锛屼娇鐢ㄨ濡?SMO ( Sequential Minimal Optimization ) 绛夐珮鏁堢畻娉曟眰瑙?(鐢变簬鎴戜滑鏈潃娴呮樉鍦颁簡瑙f敮鎸佸悜閲忔満鐨勫熀鏈悊璁猴紝渚夸笉璇︾粏浠嬬粛鍏舵眰瑙d紭鍖栬繃绋?銆?/p>

姝ゅ锛屾敮鎸佸悜閲忔満鏈変互涓嬩紭鍔e娍锛屾垜浠湪浣跨敤璇ュ垎绫绘柟娉曟椂闇€棰濆娉ㄦ剰锛?/p>

  • 浼樺娍锛?/strong>
    • 鏀寔鍚戦噺鏈虹殑鏈€浼樺寲鍑芥暟浠呯敱灏戞暟鐨勬敮鎸佸悜閲忕‘瀹氾紝璁$畻鐨勫鏉傛€у彇鍐充簬鏀寔鍚戦噺鐨勪釜鏁帮紝鑰岄潪鏍锋湰绌洪棿鐨勭淮鏁帮紝鍥犳鍦ㄦ煇绉嶇▼搴︿笂鍙互閬垮厤鈥滅淮鏁扮伨闅锯€濄€?
    • 鏀寔鍚戦噺鏈烘嫢鐩稿浜庡叾浠? 榛戠 (Black box) 鐨勬満鍣ㄥ涔犳柟娉曟湁鏇村鐨勬洿鏈夌悊璁猴紝骞朵笖鍏剁粨鏋滃叿鏈夎緝濂界殑 绋冲仴鎬?/strong> (Robust) 銆?
  • 鍔e娍锛?/strong>
    • 鏀寔鍚戦噺鏈虹畻娉曢毦浠ョ敤浜庡ぇ瑙勬ā鐨勮缁冩牱鏈殑璁$畻锛屽缂哄け鏁版嵁鍜屾牳鍑界殑閫夊彇杈冧负鏁忔劅銆?
    • 浼犵粺鐨勬敮鎸佸悜閲忔満绠楁硶浠呴€傜敤浜庝簩鍒嗙被闂锛岃€屽疄闄呭簲鐢ㄤ腑甯稿父闈复澶氬垎绫讳换鍔°€?

2.2 鍐崇瓥鏍?/span>

2.2.1 鍩烘湰妯″瀷

鍐崇瓥鏍戞槸涓€绉嶅熀浜庣殑鍒嗙被鍜屽洖褰掓柟娉曪紝椤惧悕鎬濅箟锛屽喅绛栨爲鍛堢幇鏍戝舰缁撴瀯 (瑙佷笅鍥? 銆備竴棰楀喅绛栨爲鐢辩粨鐐?(node) 鍜屾湁鍚戣竟 (directed edge) 鎴栧垎鏋濈粍鎴愶紝缁撶偣涓€鑸寘鎷牴缁撶偣銆佸唴閮ㄧ粨鐐瑰拰鍙剁粨鐐癸紝鍙互褰㈣薄绫绘瘮涓衡€滄爲鏍光€濆拰鈥滄爲鍙垛€濓紝鏈夊悜杈瑰彲浠ョ悊瑙d负鈥滄爲鏋濃€濄€傜敤鍐崇瓥鏍戝垎绫荤殑鍩烘湰鎬濇兂鏄紝浠庢牴缁撶偣寮€濮嬶紝瀵规牱鏈殑鏌愪竴鐗瑰緛 (鍒掑垎渚濇嵁) 杩涜娴嬭瘯锛屾牴鎹祴璇曠粨鏋滃皢鏍锋湰鍒嗛厤鍒板瓙缁撶偣锛涘姝ら€掑綊鍦板鏍锋湰杩涜娴嬭瘯骞跺垎閰嶏紝鐩磋嚦鍒拌揪鍙剁粨鐐广€?/p> Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

2.2.2 鐗瑰緛閫夋嫨

鍐崇瓥鏍戝涔犵殑鍏抽敭闂涔嬩竴鏄?span>鐗瑰緛閫夋嫨锛屽嵆鍦ㄦ瘡娆″垎绫绘椂閫夋嫨浠€涔堢壒寰佽繘琛屾祴璇曞拰鍒掑垎銆傚洜姝わ紝鎴戜滑闇€瑕佺‘瀹氶€夋嫨鐗瑰緛鐨勫噯鍒欍€傜洿瑙備笂锛屽鏋滄煇涓€涓壒寰佸叿鏈夋洿濂界殑鍒嗙被鑳藉姏锛岄偅涔堝喅绛栨爲鎸夎繖涓€鐗瑰緛鍒嗙被鍚庣殑鍚勪釜瀛愮被搴斿敖鍙兘鍦板睘浜庡悓涓€绫诲埆锛岀粨鐐圭殑鈥滅函搴︹€濊秺楂樸€傛帴涓嬫潵锛屾垜浠細渚濇寮曞叆涓€浜涙蹇碉細淇℃伅鐔点€佹潯浠剁喌銆佷俊鎭鐩婁互鍙婁俊鎭鐩婃瘮锛屾潵鐞嗚В濡備綍杩涜鐗瑰緛閫夋嫨銆?/p>

淇℃伅鐔?/strong> (information entropy) 鏄害閲忔牱鏈泦鍚堢函搴︾殑涓€绉嶆寚鏍囷紝瀵逛簬涓€涓湁 涓鏁e彇鍊肩殑闅忔満鍙橀噺 锛屽叾姒傜巼鍒嗗竷涓?/p>

鍏朵俊鎭喌瀹氫箟涓?/p>

鐗瑰埆鍦帮紝鑻? 锛屽畾涔? 锛涗笂寮忕殑瀵规暟甯镐互浠? 涓哄簳鎴栬€呬互 涓哄簳銆傜喌瓒婂ぇ锛岄殢鏈哄彉閲忕殑涓嶇‘瀹氭€ц秺澶с€傚彲浠ヨ瘉鏄? 銆?/p>

瀵逛簬闅忔満鍙橀噺 锛屽叾鑱斿悎姒傜巼鍒嗗竷涓?/p>

鏉′欢鐔?/strong> (conditional entropy) 琛ㄧず鍦ㄥ凡鐭ラ殢鏈哄彉閲? 鐨勬潯浠朵笅闅忔満鍙橀噺 鐨勪笉纭畾鎬э紝瀹氫箟涓? 缁欏畾鏉′欢涓? 鐨勬潯浠舵鐜囧垎甯冪殑鐔靛 鐨勬暟瀛︽湡鏈?/p>

杩欓噷锛?span class="mq-379"> 銆傚綋鐔靛拰鏉′欢鐔典腑鐨勬鐜囩敱鏁版嵁浼拌鑾峰緱鏃讹紝鎵€瀵瑰簲鐨勭喌鍜屾潯浠剁喌鍒嗗埆鎴愪负缁忛獙鐔?/strong>鍜?span>缁忛獙鏉′欢鐔?/strong>銆?/p>

淇℃伅澧炵泭 (information gain) 琛ㄧず寰楃煡鐗瑰緛 鐨勪俊鎭€屼娇寰楃被 鐨勪俊鎭笉纭畾鎬у噺灏戠殑绋嬪害锛屽洜姝ゆ垜浠畾涔夌壒寰? 瀵硅缁冩暟鎹泦 鐨勪俊鎭鐩? 涓烘暟鎹泦 鐨勭粡楠岀喌 涓庣壒寰? 缁欏畾涓嬪叾鏉′欢鐔? 鐨勫樊锛屽嵆

鏄剧劧锛屽浜庡叿鏈夎緝寮哄垎绫昏兘鍔涚殑鐗瑰緛锛屽叾淇℃伅澧炵泭鏇撮珮銆傚洜姝わ紝鎴戜滑鍒╃敤淇℃伅澧炵泭閫夋嫨鐗瑰緛鐨勬柟娉曟槸锛屽浜庤缁冩暟鎹泦 锛岃绠楀叾姣忎釜鐗瑰緛鐨勪俊鎭鐩婏紝閫夋嫨淇℃伅澧炵泭鏈€澶х殑鐗瑰緛銆?/p>

浣嗘槸锛屼娇鐢ㄤ俊鎭鐩婁綔涓哄垝鍒嗘爣鍑嗗瓨鍦ㄥ亸鍚戜簬閫夋嫨鐗瑰緛鍙栧€艰緝澶氱殑鐗瑰緛鐨勯棶棰橈紝杩欐牱鏄笉鍏钩鐨勩€傚洜姝わ紝寮曞叆浜?span>淇℃伅澧炵泭姣?/strong> (information gain ratio) 鐨勬蹇点€傚畾涔夌壒寰? 瀵硅缁冩暟鎹泦 鐨勪俊鎭鐩婃瘮 涓哄叾淇℃伅澧炵泭 涓庤缁冩暟鎹泦 鍏充簬鐗瑰緛 鐨勫€肩殑鐔? 涔嬫瘮锛屽嵆

2.2.3 鏍戠殑鐢熸垚

鍐崇瓥鏍戠殑鐢熸垚鏈夊绉嶇畻娉曪紝濡?ID3 銆?span>C4.5 绛夌粡鍏哥殑鐢熸垚绠楁硶銆備负浜嗙悊瑙e喅绛栨爲鐨勭敓鎴愯繃绋嬶紝鎴戜滑杩樻槸閫夋嫨浠嬬粛鍏朵腑鐨勪竴绉嶇敓鎴愮畻娉曪細ID3 锛屽叾鏍稿績鎬濇兂鏄湪鏍戠殑鍚勪釜缁撶偣鐢ㄤ俊鎭鐩婁綔涓虹壒寰侀€夋嫨鍑嗗垯锛岄€掑綊鍦版瀯寤哄喅绛栨爲銆傚叿浣撴柟娉曟槸锛?/p>

  1. 浠庢牴缁撶偣寮€濮嬶紝瀵圭粨鐐硅绠楁墍鏈夊彲鑳界壒寰佺殑淇℃伅澧炵泭锛岄€夋嫨淇℃伅澧炵泭鏈€澶х殑鐗瑰緛浣滀负缁撶偣鐨勭壒寰侊紝骞剁敱璇ョ壒寰佺殑涓嶅悓鍙栧€煎缓绔嬩笉鍚岀殑瀛愮粨鐐癸紱
  2. 瀵瑰瓙缁撶偣閫掑綊鍦颁娇鐢ㄤ互涓婃柟娉曪紝鏋勫缓鍐崇瓥鏍戯紱
  3. 鐩村埌鎵€鏈夌壒寰佺殑淇℃伅澧炵泭寰堝皬鎴栬€呮墍鏈夌壒寰佸潎閫夋嫨瀹屾瘯涓烘銆?

C4.5 绠楁硶涓?ID3 绠楁硶鐩镐技锛屼笉鍚屼箣澶勫湪浜庯紝 C4.5 浣跨敤淇℃伅澧炵泭姣斾綔涓虹壒寰侀€夋嫨鐨勪緷鎹€傛澶栬繕鏈夎濡?CART 绠楁硶绛夌瓑澶氱澶氭牱鐨勭敓鎴愭爲鐨勬柟娉曘€?/p>

2.2.4 琛ュ厖

鐢熸垚鍐崇瓥鏍戝悗锛屽線寰€杩橀渶瑕佸鍏惰繘琛?span>鍓灊 (pruning) 锛岄【鍚嶆€濅箟锛屽氨鏄粠宸茬敓鎴愮殑鏍戜笂瑁佸壀涓€浜涘瓙鏍戞垨鑰呭彾缁撶偣锛屽鏍戠殑缁撴瀯杩涜绠€鍖栦互闃叉鍏惰繃鎷熷悎銆傛€荤粨涓€涓嬪喅绛栨爲鐨勪紭缂虹偣锛?/p>

  • 浼樼偣锛?/strong>
    • 鏄撲簬鐞嗚В鍜屽疄鐜帮紝骞朵笖鑳藉鐞嗚В鍐崇瓥鏍戞墍琛ㄨ揪鐗瑰緛鐨勫惈涔夛紱
    • 瀵圭己澶卞€间笉鏁忔劅锛岃兘澶勭悊澶ц妯℃暟鎹€?
  • 缂虹偣锛?/strong>
    • 瀵规湁鏃堕棿椤哄簭鐨勬暟鎹紝闇€瑕佸緢澶氭暟鎹澶勭悊銆?
    • 蹇界暐灞炴€т箣闂寸殑鐩稿叧鎬с€?

鍥犳锛屽湪鍏跺熀纭€涓婁篃鏈夎澶氭嫇灞曟ā鍨嬶細涓?span>琚嬭娉?/strong> (Bagging) 鎬濇兂缁撳悎鐨?span>闅忔満妫灄 (Random Forest) 锛屼笌鎻愬崌娉?/strong> (Boosting) 缁撳悎鐨?span>姊害鎻愬崌鏍?/strong> (Gradient Boosting Decesion Tree) 銆?span>鏋佺姊害鎻愬崌鏍?/strong> (Extreme Gradient Boosting Decesion Tree) 绛夌瓑銆?/p>

2.3 绁炵粡缃戠粶

2.3.1 绁炵粡鍏?/span>

绁炵粡缃戠粶鏄幇鍦ㄦ瘮杈冩祦琛岀殑鏈哄櫒瀛︿範绠楁硶锛屽彲浠ュ鐞嗗洖褰掋€佸垎绫荤瓑澶氱闂銆傜缁忕綉缁滀腑鏈€鍩烘湰鐨勭粨鏋勬槸绁炵粡鍏?(neuron) 锛屽叾缁撴瀯瑙佷笅鍥俱€?/p> Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

涓€涓渶鍩烘湰鐨勭缁忓厓鐢辫緭鍏?(input) 銆佹潈閲?(weight) 銆佸亸缃?(bias) 鎴栭槇鍊?(threshold) 銆佹縺娲诲嚱鏁?(active function) 鍜岃緭鍑?(output) 缁勬垚銆備互涓€涓湁澶氫釜杈撳叆鍙湁涓€涓緭鍑虹殑绁炵粡鍏冧负渚嬶紝鍏舵帴鍙椾簡 涓緭鍏ヤ俊鍙? 锛屽嵆 锛岃繖浜涜緭鍏ヤ俊鍙烽€氳繃甯︽潈閲嶇殑杩炴帴杩涜浼犻€掞紝鍏跺姞鏉冨悗鐨勬€昏緭鍏ヤ笌绁炵粡鍏冪殑闃堝€兼瘮杈冿紝閫氳繃婵€娲诲嚱鏁板鐞嗕骇鐢熻緭鍑恒€傝嫢灏嗙 涓緭鍏ヤ俊鍙风殑鏉冮噸鍐欎负 锛屽垯鍔犳潈鎬昏緭鍏ヤ负 銆傝闃堝€间负 锛屾潈閲嶅嚱鏁颁负 锛屽垯绁炵粡鍏冧骇鐢熺殑杈撳嚭涓?/p>

婵€娲诲嚱鏁? 寰€寰€鏄潪绾挎€х殑锛屾湁浠ヤ笅甯哥敤澶氱婵€娲诲嚱鏁板彲浠ヤ娇鐢細

  • Sigmoid: 锛?/p>

  • tanh:

  • ReLU:

涓€鑸€岃█锛屽父甯搁€夊彇 ReLu 婵€娲诲嚱鏁帮紝鍘熷洜鏄鍑芥暟褰㈠紡杈冧负绠€鍗曪紝璁$畻蹇€佹敹鏁涘揩涓斿彲浠ラ伩鍏嶈濡傛搴︽秷澶辩瓑闂銆?/p>

2.3.2 鍓嶉绁炵粡缃戠粶

绁炵粡缃戠粶缁撴瀯涓€鑸敱**杈撳叆灞?(input layer) ** 銆?*闅愬眰 (hidden layer) ** 鍜?杈撳嚭灞?(output layer) 缁勬垚銆?/p>

  • 杈撳叆灞傦細鐢卞涓棤婵€娲诲嚱鏁扮粨鏋勭殑绁炵粡鍏冩瀯鎴愶紱
  • 闅愬眰鍜岃緭鍑哄眰锛氱敱澶氫釜鏈夋縺娲诲嚱鏁扮粨鏋勭殑绁炵粡鍏冪粍鎴愶紝涓斿悓灞傞棿绁炵粡鍏冪殑婵€娲诲嚱鏁板線寰€鐩稿悓銆?

鏈€缁忓吀涔熸槸鏈€甯歌鐨?span>鍓嶉绁炵粡缃戠粶姝f槸鐢辫緭鍏ュ眰銆侀殣灞傚拰杈撳嚭灞傛瀯鎴愶紝鍏剁壒鐐规槸鍚湁澶氫釜闅愬眰锛屾瘡灞傜缁忓厓涓庝笅涓€灞傜缁忓厓鍏ㄤ簰杩烇紝绁炵粡鍏冧箣闂翠笉瀛樺湪鍚屽眰杩炴帴锛屼篃涓嶅瓨鍦ㄨ法灞傝繛鎺?/strong>銆傚涓嬪浘鎵€绀恒€?/p> Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

2.3.3 绁炵粡缃戠粶鍒嗙被绠楁硶

涓€鑸湴锛屽浜庝竴涓鍒嗙被闂锛屾垜浠彲浠ュ畾涔夌缁忕綉缁滅殑杈撳嚭 缁村垪鍚戦噺 锛屽叾涓? 鏃㈠彲浠ユ槸杩炵画鍙栧€硷紝涔熷彲浠ユ槸绂绘暎鍙栧€煎 銆傝缃戠粶鍏辨湁 涓殣灞傦紝绠€鍗曡捣瑙佹垜浠悇涓殣灞傚強杈撳嚭灞傜缁忓厓鐨勬縺娲诲嚱鏁板潎涓? 锛岃緭鍏ュ眰鎺ュ彈鐨勮緭鍏ュ嵆鍒嗙被鐨勭壒寰? 缁村垪鍚戦噺 銆傝鍓嶉绁炵粡缃戠粶鍙互鍐欎负濡備笅閫掑綊褰㈠紡锛?/p>

鍏朵腑锛?span class="mq-571"> 鍜? 鍒嗗埆涓哄悇涓殣灞傜缁忓厓鏉冮噸涓庡亸缃粍鎴愮殑鐭╅樀鎴栧悜閲忥紝 鍗充负绁炵粡缃戠粶鐨勮緭鍑恒€傜壒鍒湴锛岃嫢杈撳嚭 涓? 涓鸿繛缁彇鍊硷紝鎯宠寰楀埌鍒嗙被缁撴灉鎴栫鏁g殑鍙栧€硷紝鎴戜滑鍙互浣跨敤 softmax 鏂规硶璁$畻鍏垛€滄鐜団€濓紝鍗?/p>

璁惧畾鏌愪竴闃堝€硷紝鎴栧皢鍏舵渶澶х殑 瀵瑰簲涓? 锛屽叾浣欎负 锛屽嵆鍙緱鍒板垎绫荤粨鏋溿€?/p>

2.3.4 琛ュ厖

鍓嶉绁炵粡缃戠粶涓殑鏉冮噸 鍜屽亸缃? 鍧囨槸鏈煡鍙傛暟锛屽父鐢?span>璇樊閫嗕紶鎾畻娉?(Error BackPropagation, BP) 杩涜纭畾銆傜缁忕綉缁滅畻娉曟湁浠ヤ笅闇€瑕佹敞鎰忕殑鍦版柟锛?/p>

  1. 绁炵粡缃戠粶鐨勭粨鏋?(鍗宠瀹氬灏戜釜闅愬眰銆佸寘鍚灏戜釜绁炵粡鍏? 闇€瑕佷簨鍏堣瀹氾紝鍏剁粨鏋勪笌绠楁硶鐨勬敹鏁涙€с€佹槸鍚﹁繃鎷熷悎 (over fitting) 绛夐棶棰樺瘑鍒囩浉鍏筹紱
  2. 涓轰簡渚夸簬璁$畻銆佹秷闄ら噺绾茬殑褰卞搷锛屼竴鑸杈撳叆鏁版嵁杩涜 鏍囧噯鍖?(normalization) 澶勭悊銆?
  3. 鐢变簬绁炵粡缃戠粶鍏锋湁寮哄ぇ鐨勮〃绀鸿兘鍔涳紝缁忓父闇€浣跨敤涓€浜涚瓥鐣ラ伩鍏嶈繃鎷熷悎锛屽鏃╁仠 (early stop) 銆佹鍒欏寲 (regularization) 銆佷涪鍖呮硶 (dropout) 绛夈€?

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鈥?/p>

3. 鍛戒护浠嬬粛鍜屽畨瑁?/span>

鐔繃浠ヤ笂鐣ュ井绻佺悙鐨勭畻娉曠悊璁轰粙缁嶏紝姝f槸鏉ュ埌鏈鎺ㄦ枃鐨勪富瑙掑懡浠も€斺€?code class="mq-627">c_ml_stata 銆傝櫧鐒舵満鍣ㄥ涔犵畻娉曞崄鍒嗗鏉?(杩滄瘮浠ヤ笂鐨勭悊璁轰粙缁嶈澶嶆潅璁稿) 锛?浣嗘槸璇ュ懡浠よ緝涓虹畝娲佽€岀洿鎺ュ湴闆嗘垚浜嗗绉嶇畻娉曪紝鍚屾椂鍗佸垎瀹规槗璋冪敤銆傝閮ㄥ垎涓昏瀵瑰懡浠よ繘琛屽熀鏈粙缁嶏紝璇存槑鍏跺畨瑁呮柟娉曚互鍙婅娉曢€夐」鐨勪娇鐢ㄥ強涓€浜涙敞鎰忎簨椤广€?/p>

3.1 鍩烘湰浠嬬粛

c_ml_stata 鐢?Giovanni Cerulli 缂栧啓锛屾槸鍦?Stata 16 涓疄鐜版満鍣ㄥ涔犲垎绫荤畻娉曠殑鍛戒护锛岃鍛戒护浣跨敤 Python 涓殑 Scikit-learn 鎺ュ彛瀹炵幇妯″瀷璁粌銆侀娴嬬瓑鍔熻兘锛屼富瑕佹湁浠ヤ笅鐗圭偣锛?/p>

  1. 鏀寔澶氱鍒嗙被绠楁硶锛?/strong> 璇ュ懡浠ゆ彁渚涗簡鍒嗙被鏍?(Classification tree) 銆佽瑁呮爲鍜岄殢鏈烘.鏋?(Bagging and random forests) 銆佹彁鍗囩畻娉?(Boosting) 銆佹鍒欏寲澶氶」寮?(Regularized multiomial) 銆並 杩戦偦绠楁硶 (K-Neareast Neighbor) 銆佺缁忕綉缁?(Neural network) 銆佹湸绱犺礉鍙舵柉 (Naive Bayes) 鍜屾敮鎸佸悜閲忔満 (Support vector machine) 澶氱鍒嗙被绠楁硶锛屼究浜庢ā鍨嬬殑灏濊瘯鍜屾瘮杈冦€?
  2. 鏀寔浜ゅ弶楠岃瘉锛?/strong> 璇ュ懡浠ゆ彁渚涗簡 cross_validation 閫夐」锛屽埄鐢ㄢ€滆椽濠悳绱⑩€?(greed search) 瀹炵幇 K 鎶樹氦鍙夐獙璇?/strong> (K-fold cross validation) 閫夋嫨鏈€浼樿秴鍙傛暟 (hyper parameters) 锛岃皟浼樺垎绫绘ā鍨嬨€?

3.2 瀹夎鏂规硶

c_ml_stata 闇€瑕佸湪 Stata 16.0 鍙婁互涓婄増鏈娇鐢ㄣ€傚湪 Stata 鐨勫懡浠よ涓緭鍏?ssc install c_ml_stata 鍗冲彲涓嬭浇锛屾垨鑰呬娇鐢ㄥ涓嬪懡浠ゆ墦寮€涓嬭浇椤甸潰锛?/p>

. veiw net describe c_ml_stata  // 鍛戒护鍖呯畝浠?br>
Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鏌ョ湅涓庝箣鐩稿叧鐨勫畬鏁寸▼搴忔枃浠跺拰鐩稿叧闄勪欢锛屼富瑕佸寘鎷互涓嬫枃浠讹細

  • example_c_ml_stata.do 锛氫綔鑰呮彁渚涚ず渚嬬殑 do 鏂囦欢銆?
  • c_ml_stata_data_example.dta 锛氫綔鑰呮彁渚涚ず渚嬬殑鏍锋湰鍐呰缁冩暟鎹€?
  • c_ml_stata_data_new_example.dta 锛氫綔鑰呮彁渚涚ず渚嬬殑鏍锋湰澶栭娴嬫暟鎹€?

濡傛灉鎯冲畬鍏ㄤ娇鐢ㄥ懡浠ゆ潵瀹夎锛屽彲浠ユ墽琛屽涓嬩袱鏉″懡浠わ細

路 net install  c_ml_stata  // 瀹夎鍛戒护鍖?br>. net get      c_ml_stata  // 涓嬭浇闄勪欢锛歞ofile, .dta 绛?br>                           // 瀛樺偍鍦ㄥ綋鍓嶅伐浣滆矾寰勪笅

3.3 璇硶鍙婇€夐」

c_ml_stata 涓昏鍛戒护鐨勮娉曟牸寮忎负锛?/p>

c_ml_stata outcome [varlist], mlmodel(modeltype) out_sample(filename)
in_prediction(name) out_prediction(name) cross_validation(name)
seed(integer) [save_graph_cv(name)]

杈撳叆鍙橀噺鐨勫惈涔夊涓嬶細

  • outcome锛氭槸涓€涓暟鍊煎瀷銆佺鏁g殑鍥犲彉閲?(鎴栨爣绛? 锛岃〃绀轰笉鍚岀殑绫诲埆銆傝嫢鍥犲彉閲忔湁 绉嶇被鍒紝寤鸿瀵瑰叾绫诲埆杩涜缂栫爜 (recode) 锛屽彇鍊艰寖鍥翠负 銆備緥濡傦紝瀵逛簬涓€涓簩鍏冨彉閲?(鍙栧€间负 鎴? ) 锛屽垯搴旂紪鐮佹垚 銆? 娉ㄦ剰锛?/strong> outcome 涓嶆帴鍙楃己澶卞€?/strong>銆?
  • varlist锛氭槸浠h〃鑷彉閲?(鎴栫壒寰? 鐨勬暟鍊煎瀷鍙橀噺鍒楄〃锛屽睘浜庡彲閫夐」銆傝嫢鏌愪竴鐗瑰緛涔熸槸绫诲埆鍙橀噺锛屽垯闇€鍏堢敓鎴愮浉搴旂殑鏁板€煎瀷銆佺鏁g殑铏氭嫙鍙橀噺銆? 娉ㄦ剰锛?/strong> varlist 涓嶆帴鍙楃己澶卞€?/strong>銆?

鍚勪釜閫夐」鐨勫惈涔夊涓嬶細

  • mlmodel(modeltype)锛氭寚瀹氫娇鐢ㄧ殑鏈哄櫒瀛︿範鍒嗙被绠楁硶 (妯″瀷) 锛屾湁浠ヤ笅鍑犵閫夋嫨锛?/p>

    • tree : Classification tree (鍒嗙被鏍?
    • randomforest : Bagging and random forests (琚嬭鏍戝拰闅忔満妫灄)
    • boost : Boosting (鎻愬崌绠楁硶锛屾彁鍗囨爲)
    • regularizedmultionmial : Regularized multinomial (姝e垯鍖栧椤瑰紡)
    • nearestneighbor : Nearest Neighbor (K 杩戦偦绠楁硶)
    • neuralnet : Neural network (绁炵粡缃戠粶)
    • naivebayes : Naive Bayes (鏈寸礌璐濆彾鏂?
    • svm : Support vector machine (鏀寔鍚戦噺鏈?
  • out_sample(filename)锛氳姹傛寚瀹氫竴涓牱鏈鐨勬柊鏁版嵁闆?(娴嬭瘯闆? 锛岃鏁版嵁闆嗕粎鍖呭惈鍚勪釜鐗瑰緛 (鏃犲洜鍙橀噺) 锛岀敤浜庢牱鏈娴嬭瘯銆?code class="mq-727">filename 琛ㄧず瀛樻斁璇ユ暟鎹泦鐨勬枃浠跺悕銆?/p>

  • in_prediction(name)锛氫繚瀛樻牱鏈唴璁粌鏁版嵁 (璁粌闆嗗拰楠岃瘉闆? 鐨勬嫙鍚堢粨鏋滐紝name 涓烘枃浠跺悕銆?/p>

  • out_prediction(name)锛氫繚瀛樻牱鏈鏁版嵁 (娴嬭瘯闆? 鐨勯娴嬬粨鏋?锛?code class="mq-735">name 涓烘枃浠跺悕銆傛牱鏈鏁版嵁浠?out_sample 涓幏寰椼€?/p>

  • cross_validation(name)锛氬皢 name 璁惧畾涓?"CV" 鍙互鎵ц浜ゅ弶楠岃瘉锛岄粯璁や负 10 鎶樹氦鍙夐獙璇併€?/p>

  • seed(integer)锛氶殢鏈虹瀛?(鏁存暟) 銆?/p>

  • [save_graph_cv(name)]锛氬彲閫夐」锛屼繚瀛樹氦鍙夐獙璇佷腑妯″瀷鍦ㄨ缁冮泦鍜岄獙璇侀泦涓婂垎绫荤粨鏋滅殑鍑嗙‘鎬?( Accuracy ) 锛岀敤浜庣‘瀹氭渶浼樼殑瓒呭弬鏁板拰妯″瀷銆?/p>

c_ml_stata 鐨勮繑鍥炲€硷細

  • 杈撳嚭娴嬭瘯闆嗙殑棰勬祴缁撴灉鍜岄娴嬩负鍚勪釜绫诲埆鐨勬鐜囥€?
  • 鍌ㄥ瓨閫氳繃浜ゅ弶楠岃瘉纭畾鐨勬渶浼樿秴鍙傛暟 (涓嶅悓妯″瀷鍌ㄥ瓨鐨勮秴鍙傛暟涓嶅悓) 銆佹渶浼樿缁冮泦鍑嗙‘鐜囥€佹渶浼橀獙璇侀泦鍑嗙‘鐜囷紝鍙互閫氳繃 ereturn list 鍛戒护鏌ョ湅 (鏁板€煎瀷杩斿洖鍊煎偍瀛樺湪 scalars 涓紝瀛楃鍨嬬粨鏋滃偍瀛樺湪 macros 涓? 銆?
Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

3.4 娉ㄦ剰浜嬮」

  • 杩愯 c_ml_stata 绋嬪簭闇€瑕佹嫢鏈?Stata 16 鍙?Python (2.7 鍙婁互涓婄増鏈? 锛屽悓鏃堕渶瀹夎 Python 鐨? Scikit-learn 鍜? Stata Function Interface (SFI) 涓や釜渚濊禆搴撱€?
  • outcome 鍜? varlist 鍧囦笉鍏佽鍑虹幇缂哄け鍊硷紝鍥犳鍦ㄤ娇鐢ㄨ鍛戒护鍓嶉渶妫€鏌ユ暟鎹泦鏄惁鍑虹幇缂哄け鍊?(骞跺垹闄ょ己澶卞€?銆?
  • 寤鸿瀹夎璇ュ懡浠ょ殑鏈€鏂扮増鏈苟鍙婃椂鏇存柊锛屽懡浠や负锛? ssc install c_ml_stata, replace銆?
  • 鏇村璇︾粏浠嬬粛鍙娇鐢? help c_ml_stata 鍛戒护鑾峰彇銆?
Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鈥?/p>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

4. Stata 瀹炴搷

鎴戜滑浣跨敤 c_ml_stata 鎻愪緵鐨勬暟鎹泦 (鍌ㄥ瓨鍦?c_ml_stata_data_example.dta 鏂囦欢涓? 杩涜瀹炴搷銆傜敱浜庝綔鑰呭苟鏈彁渚涜繃澶氭湁鍏宠鏁版嵁闆嗙殑鍏朵粬瑙i噴淇℃伅锛屾墍浠ヨ渚嬪瓙浠呬綔涓烘搷浣滃拰婕旂ず鎻愪緵锛屽疄闄呮剰涔変笉澶с€?/p>

4.1 鏁版嵁缁撴瀯鎻忚堪

璇ユ暟鎹泦鍏辨湁 74 涓牱鏈紝鍖呭惈 4 涓В閲婂彉閲?(鍒嗗埆鍛藉悕涓?x1, x2, x3, x4 ) 鍜?1 涓洜鍙橀噺 (鍛藉悕涓?y ) 銆傚叾涓紝4 涓В閲婂彉閲忓潎涓鸿繛缁彉閲忥紝鑰屽洜鍙橀噺涓哄垎绫诲彉閲?(绂绘暎鍙栧€? 锛屽洜姝ゆ垜浠垎鍒娇鐢?summary 鍜?tab 鍛戒护瀵瑰彉閲忚繘琛屾弿杩版€х粺璁°€傚彲浠ョ湅鍑猴紝鏍规嵁 c_ml_stata 鍛戒护鐨勮姹傦紝杈撳叆鏁版嵁闆嗗苟鏃犵己澶卞€笺€傚緟鍒嗙被鍙橀噺 (浣滀负 outcome 鐨勮緭鍏? y 杩涜鏁板€煎寲缂栫爜锛?span class="mq-792"> 鍒嗗埆浠h〃涓夌绫诲埆銆?/p>

. use "c_ml_stata_data_example.dta", clear
. tab y

y | Freq. Percent Cum.
-------+------------------------------
1 | 42 56.76 56.76
2 | 22 29.73 86.49
3 | 10 13.51 100.00
-------+------------------------------
Total | 74 100.00

. sum x1-x4

Variable | Obs Mean Std. Dev. Min Max
----------+--------------------------------------------
x1 | 74 6165.257 2949.496 3291 15906
x2 | 74 21.2973 5.785503 12 41
x3 | 74 3019.459 777.1936 1760 4840
x4 | 74 187.9324 22.26634 142 233

4.2 妯″瀷璁粌鍜岀粨鏋?/span>

鐢变簬 c_ml_stata 鎻愪緵浜嗗绉嶆湁鐩戠潱瀛︿範鍒嗙被绠楁硶锛屾垜浠娇鐢ㄥ彉閲?x1, x2, x3, x4 浣滀负瑙i噴鍙橀噺锛?span>y 浣滀负鏍囩杩涜妯″瀷璁粌銆備互涓嬮儴鍒嗘垜浠互鏀寔鍚戦噺鏈轰负渚嬶紝璇︾粏浠嬬粛璇ュ懡浠ょ殑璋冪敤鏂规硶鍜岃緭鍑虹粨鏋溿€?/p>

4.2.1 鏀寔鍚戦噺鏈?/span>

灏嗛€夐」 mlmodel 璁惧畾涓?svm 鍗冲彲浣跨敤鏀寔鍚戦噺鏈虹畻娉曡繘琛屽垎绫汇€傛牱鏈唴棰勬祴缁撴灉淇濆瓨鍦ㄦ枃浠?in_pred_svm.dta 涓紝浣跨敤 c_ml_stata_data_new_example.dta 鏂囦欢璇诲彇鏍锋湰澶栨暟鎹?(浠呭寘鍚壒寰?x1, x2, x3, x4 ) 锛屾牱鏈棰勬祴缁撴灉淇濆瓨鍦ㄦ枃浠?out_pre_svm.dta 涓€傚湪 cross_validation 閫夐」涓娇鐢?CV 鍗冲彲杩涜浜ゅ弶楠岃瘉锛屼氦鍙夐獙璇佺粨鏋滆嚜鍔ㄤ繚瀛樺湪 CV.dta 鏂囦欢涓紱鑻ユ湁闇€瑕佸彲浣跨敤 save_graph_cv 閫夐」鍙鍖栦氦鍙夐獙璇佺粨鏋滃苟淇濆瓨銆傚叿浣撲唬鐮佸強閮ㄥ垎杈撳嚭缁撴灉濡備笅锛?/p>

. c_ml_stata y x1-x4, mlmodel(svm)   ///
out_sample("c_ml_stata_data_new_example") ///
in_prediction("in_pred_svm") ///
out_prediction("out_pred_svm") ///
cross_validation("CV") ///
seed(10) save_graph_cv("graph_cv_svm")

-------------------------------------------
CROSS-VALIDATION RESULTS TABLE
-------------------------------------------
The best score is:
0.5678571428571428
-------------------------------------------
The best parameters are:
{'C': 1, 'gamma': 0.1}
1
0.1
-------------------------------------------
The best estimator is:
SVC(C=1, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
verbose=False)
-------------------------------------------
The best index is:
0
-------------------------------------------

in_pred_svm.dta 閮ㄥ垎鏁版嵁濡備笅鍥炬墍绀猴紝鍏朵腑 index 琛ㄧず瑙傛祴鍊兼牱鏈爣鍙凤紝涓庡師鏁版嵁鏍锋湰鏍囧彿鐩稿搴旓紱label_in_pred 琛ㄧず鏍锋湰鍐呮爣绛剧殑棰勬祴缁撴灉锛?span>Prob_1, Prob_2, Prob_3 鍙兘琛ㄧず棰勬祴缁撴灉涓嶆槸绗? 绫荤殑姒傜巼銆?/p>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

out_pred_svm.dta 鏁版嵁缁撴灉涓?in_pred_svm.dta 绫讳技銆傚叾涓紝label_out_pre 琛ㄧず鏍锋湰澶栨爣绛剧殑棰勬祴缁撴灉锛屽彲浠ョ湅鍑?SVM 灏嗘牱鏈缁撴灉鍒嗙被涓虹 绫汇€?/p>

瀵逛簬鏀寔鍚戦噺鏈虹畻娉曪紝鏍规嵁绗簩閮ㄥ垎鐞嗚閮ㄥ垎鐨勪粙缁嶏紝鎴戜滑鐨勪富瑕佽秴鍙傛暟涓烘鍒欏寲绯绘暟 鍜屾牳鍑芥暟鍙傛暟 (甯歌浣?GAMMA ) 銆備娇鐢?ereturn list 鍙繑鍥炴渶浼樼殑瓒呭弬鏁伴€夊彇缁撴灉锛屽彲浠ョ湅鍑烘渶浼樼殑姝e垯鍖栫郴鏁板簲璁惧畾涓? 锛屽拰鍑芥暟鍙傛暟閫夊彇涓? 锛涗氦鍙夐獙璇佷腑璁粌闆嗙殑姝g‘鐜囦负 锛屾祴璇曢泦 (鍗抽獙璇侀泦) 姝g‘鐜囦负 銆?/p>

. ereturn list

scalars:
e(OPT_C) = 1
e(OPT_GAMMA) = .1
e(TEST_ACCURACY) = .5678571428571428
e(TRAIN_ACCURACY) = 1
e(BEST_INDEX) = 94.5

璇︾粏鐨勪氦鍙夐獙璇佺粨鏋滃彲鍦?CV.dta 涓煡鐪嬨€傛澶栵紝閫氳繃 save_graph_cv("graph_cv_svm") 鍙鍖栦氦鍙夐獙璇佺粨鏋滃苟淇濆瓨涓?graph_cv_svm.gph 鏂囦欢锛岀粨鏋滃涓嬫墍绀恒€傚彲浠ュ彂鐜帮紝闅忕潃 index 鐨勬敼鍙?(涓嶅悓瓒呭弬鏁扮粍鍚堢殑璁惧畾) 锛屾ā鍨嬪湪璁粌闆嗐€侀獙璇侀泦鐨勮〃鐜板潎涓嶆敼鍙樸€?span>鍊煎緱娉ㄦ剰鐨勬槸锛岃繖绉嶆儏鍐靛湪瀹為檯搴旂敤涓瀬涓哄皯瑙併€?/strong>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

4.2.2 鍐崇瓥鏍?/span>

灏?mlmodel 閫夐」璁惧畾涓?tree 鍗冲彲浣跨敤鍐崇瓥鏍戣繘琛屽垎绫汇€傚喅绛栨爲鐨勮秴鍙傛暟涓昏涓哄彾瀛愯妭鐐逛釜鏁?(leaves) 锛岄€氳繃浜ゅ弶楠岃瘉鐨勫彲瑙嗗寲缁撴灉鍙互鍙戠幇锛岄殢鐫€ index 鐨勫鍔?(鍙跺瓙鑺傜偣涓暟鐨勫鍔? 锛?妯″瀷鍦ㄨ缁冮泦鐨勮〃鐜颁笉鏂彁楂橈紝鑰屽湪楠岃瘉闆嗙殑琛ㄧ幇鍏堜笂鍗囧悗鍛堜笅闄嶈秼鍔?(鍑虹幇杩囨嫙鍚堥棶棰? 锛屽洜姝ゅ彲浠ュ緱鍒版渶浼樼殑鍙跺瓙鑺傜偣涓暟 (瀵瑰簲浜庨獙璇侀泦鏈€楂樼殑鍒嗙被姝g‘鐜? 銆備娇鐢?ereturn list 鏌ョ湅鏈€鏈夎秴鍙傛暟鐨勯€夊彇缁撴灉銆傚彲浠ュ彂鐜帮紝鏈€浼樼殑鍙跺瓙鑺傜偣涓暟涓?3 锛涗氦鍙夐獙璇佷腑璁粌闆嗙殑姝g‘鐜囦负 锛屾祴璇曢泦 (鍗抽獙璇侀泦) 姝g‘鐜囦负 銆?/p>

. c_ml_stata y x1-x4, mlmodel(tree)    ///
out_sample("c_ml_stata_data_new_example") ///
in_prediction("in_pred_ctree") ///
out_prediction("out_pred_ctree") ///
cross_validation("CV") ///
seed(10) save_graph_cv("graph_cv_ctree")

. ereturn list

scalars:
e(OPT_LEAVES) = 3
e(TEST_ACCURACY) = .6375
e(TRAIN_ACCURACY) = .8108095884215288
e(BEST_INDEX) = 2
Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

4.2.3 绁炵粡缃戠粶

灏?mlmodel 閫夐」璁惧畾涓?neuralnet 鍗冲彲浣跨敤绁炵粡缃戠粶杩涜鍒嗙被銆傜缁忕綉缁滅殑涓昏瓒呭弬鏁颁负绁炵粡缃戠粶灞傛暟 (layers) 鍜岀缁忓厓涓暟 (neurons) 銆傞€氳繃鏌ョ湅绁炵粡缃戠粶鍦ㄦ牱鏈唴澶栫殑棰勬祴缁撴灉鍙戠幇锛岃鍒嗙被绠楁硶鍦ㄨ鏁版嵁闆嗕笂鐨勮〃鐜拌緝宸紝鈥滄毚鍔涒€濆湴灏嗘墍鏈夋爣绛惧潎鍒嗙被涓? 锛屽垎绫绘纭巼浠呬负 銆傞€氳繃浜ゅ弶楠岃瘉鍙涔庣粨鏋滀篃鍙互鐪嬪嚭锛岃绁炵粡缃戠粶妯″瀷瀵硅秴鍙傛暟閫夊彇杈冧负鏁忔劅锛岃€岃缁冮泦鍑嗙‘鐜囧潎涓? 锛屾病鏈変换浣曟敼鍠勩€?/p>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

4.3 缁撴灉姹囨€?/span>

c_ml_stata 鍛戒护鍏辨彁渚涗簡 8 绉嶅垎绫绘ā鍨嬶紝閫愪竴浣跨敤鍚勬ā鍨嬪鎵€鎻愪緵鐨勬牱鏈唴鏁版嵁闆嗚繘琛岃缁冨悗锛屾牱鏈唴鏈€浼樺垎绫荤粨鏋滅殑鍑嗙‘鐜囧強楠岃瘉闆嗛泦鍑嗙‘鐜囧涓嬫墍绀猴細

妯″瀷 鏈€浼樿缁冮泦
鍑嗙‘鐜?/th>
鏈€浼橀獙璇侀泦
鍑嗙‘鐜?/th>
鍐崇瓥鏍?/td>

闅忔満妫灄

鎻愬崌鏁?/td>

K 杩戦偦

绁炵粡缃戠粶

鏈寸礌璐濆彾鏂?/td>

鏀寔鍚戦噺鏈?/td>

Note锛?/strong> 姝e垯鍖栧椤瑰紡缁撴灉鏃犳硶鏀舵暃锛屾殏涓嶅弬涓庢瘮杈冦€?/p>

鈥?/p>

5. 鎬荤粨

鏈帹鏂囩殑鍐呭鍗冲皢杩涘叆灏惧0锛岀畝瑕佸洖椤炬垜浠笂杩扮殑涓昏鍐呭锛氭垜浠畝鍗曚簡瑙d簡浠€涔堟槸鍒嗙被闂浠ュ強鏈哄櫒瀛︿範鐨勫垎绫荤畻娉曪紝浠嬬粛浜?c_ml_stata 杩欎竴 Python 鍜?Stata 缁撳悎鐨勫懡浠ゅ強鍏朵娇鐢ㄦ柟娉曘€傛垜浠彲浠ョ湅鍒伴殢鐫€鏃朵唬鐨勫彂灞曪紝鏈哄櫒瀛︿範绠楁硶鐨勫鏍锋€т互鍙婂叾骞块様鐨勫簲鐢ㄨ寖鍥达紱浣嗘槸锛屾垜浠缁堜笉鑳藉皢鏂规硶鍋氫负鏈€缁堢殑鐩殑锛屽叾鑳屽悗鐨勭粡娴庡鍚箟渚濈劧鍊煎緱鎴戜滑鎬濊€冦€?/p>

鈥?/p>

6. 鍙傝€冭祫鏂?/span>

娓╅Θ鎻愮ず锛?/strong> 鏂囦腑閾炬帴鍦ㄥ井淇′腑鏃犳硶鐢熸晥銆傝鐐瑰嚮搴曢儴銆岄槄璇诲師鏂囥€?/span>銆?/p>

  • 鍛ㄥ織鍗庯紝鏈哄櫒瀛︿範锛屾竻鍗庡ぇ瀛﹀嚭鐗堢ぞ锛?016
  • 鏉庤埅锛岀粺璁″涔犳柟娉曪紝娓呭崕澶у鍑虹増绀撅紝2012
  • 杩炰韩浼氭帹鏂囦笓棰橈細
    • Stata: 浜ゅ弶楠岃瘉绠€浠?
    • 浜哄伐绁炵粡缃戠粶涓嶴tata搴旂敤

鈥?/p>

鈥?/p>

鈥?杩炰韩浼?路 鏈€鍙楁杩庣殑璇?/strong>

鈥?br>馃崜 2021 Stata 瀵掑亣鐝?/strong>
鈱?2021 骞?1.25-2.4

馃尣 涓昏锛氳繛鐜夊悰 (涓北澶у)锛涙睙鑹?(涓浗浜烘皯澶у)

馃憠 璇剧▼涓婚〉锛?/p>

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鈥?/p>


馃崗 馃崗 馃崗 馃崗
杩炰韩浼氫富椤碉細馃崕 www.lianxh.cn
鐩存挱瑙嗛锛歭ianxh.duanshu.com
鈥?br>Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

鍏嶈垂鍏紑璇撅細

  • 鐩村嚮闈㈡澘鏁版嵁妯″瀷锛歨ttps://gitee.com/arlionn/PanelData - 杩炵帀鍚涳紝鏃堕暱锛?灏忔椂40鍒嗛挓
  • Stata 33 璁诧細https://gitee.com/arlionn/stata101 - 杩炵帀鍚? 姣忚 15 鍒嗛挓.
  • Stata 灏忕櫧鐨勫彇缁忎箣璺細https://gitee.com/arlionn/StataBin - 榫欏織鑳? 2 灏忔椂
  • 閮ㄥ垎鐩存挱璇捐绋嬭祫鏂欎笅杞?馃憠 https://gitee.com/arlionn/Live (PPT锛宒ofiles绛?

娓╅Θ鎻愮ず锛?/strong> 鏂囦腑閾炬帴鍦ㄥ井淇′腑鏃犳硶鐢熸晥锛岃鐐瑰嚮搴曢儴銆岄槄璇诲師鏂囥€?/span>銆?/p>

鈥?/p>

鈥?/p>


鍏充簬鎴戜滑

  • 馃崕 杩炰韩浼?( 涓婚〉锛歭ianxh.cn ) 鐢变腑灞卞ぇ瀛﹁繛鐜夊悰鑰佸笀鍥㈤槦鍒涘姙锛屽畾鏈熷垎浜疄璇佸垎鏋愮粡楠屻€?
  • 馃憠 鐩磋揪杩炰韩浼氾細銆? 鐧惧害涓€涓嬶細 杩炰韩浼?/span>銆戝嵆鍙洿杈捐繛浜細涓婚〉銆備害鍙繘涓€姝ユ坊鍔? 涓婚〉锛岀煡涔庯紝闈㈡澘鏁版嵁锛岀爺绌惰璁?/span> 绛夊叧閿瘝缁嗗寲鎼滅储銆?

Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏
杩炰韩浼氫富椤? lianxh.cn
Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏

New锛?/span> lianxh 鍛戒护鍙戝竷浜嗭細    
闅忔椂鎼滅储杩炰韩浼氭帹鏂囥€丼tata 璧勬簮锛屽畨瑁呭懡浠ゅ涓嬶細
鈥?. ssc install lianxh
浣跨敤璇︽儏鍙傝甯姪鏂囦欢 (鏈夋儕鍠?锛?br>鈥?. help lianxh

鈥?/p>

鈥?/p>

鈥?/p>

以上是关于Stata锛氭満鍣ㄥ涔犲垎绫诲櫒澶у叏的主要内容,如果未能解决你的问题,请参考以下文章

鏈哄櫒瀛︿範瀹炴垬锛氭敮鎸佸悜閲忔満(涓?

甯歌闈㈣瘯绠楁硶锛氭敮鎸佸悜閲忔満

銆婃繁搴﹀涔犱箣pytorch銆媝df+闄勪功婧愮爜

甯歌闈㈣瘯绠楁硶锛氭湸绱犺礉鍙舵柉

鍔ㄥ浘璇﹁В锛欸oogle缈昏瘧鑳屽悗鐨勬満鍣ㄥ涔犵畻娉曚笌绁炵粡缃戠粶妯″瀷

mysql甯哥敤鍩虹鎸囦护澶у叏