鐮佸啘鎵嬭涓⊿PARK鎬ц兘璋冧紭
Posted 鏋侀摼绉戞妧
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鐮佸啘鎵嬭涓⊿PARK鎬ц兘璋冧紭相关的知识,希望对你有一定的参考价值。
鏍忕洰馃摪锛?span class="mq-12">鐮佸啘鎵嬭 馃捇
鎾版枃鉁嶐煆伙細鏋侀摼绉戞妧 楂樼骇鍚庣寮€鍙?nbsp;鍚村畯浼?/span>
缂栬緫馃摎锛氬叚鍏?/span>
鍏抽敭璇嶐煋?/span>锛?/span> SPARK 銆?/span> 寮€鍙戣皟浼?span class="mq-37"> 銆?/span> 璧勬簮鍙傛暟璋冧紭
1銆乻park-submit鎻愪氦涓€涓猄park浣滀笟涔嬪悗锛岃繖涓綔涓氬氨浼氬惎鍔ㄤ竴涓搴旂殑Driver杩涚▼
2銆佹牴鎹綘浣跨敤鐨勯儴缃叉ā寮忥紙deploy-mode:client/cluster锛変笉鍚岋紝Driver杩涚▼鍙兘鍦ㄦ湰鍦板惎鍔紝涔熷彲鑳藉湪闆嗙兢涓煇涓伐浣滆妭鐐逛笂鍚姩
3銆丏river杩涚▼鏈韩浼氭牴鎹垜浠缃殑鍙傛暟锛屽崰鏈変竴瀹氭暟閲忕殑鍐呭瓨鍜孋PU Core
4銆丏river杩涚▼瑕佸仛鐨勭涓€浠朵簨鎯咃紝灏辨槸鍚戦泦缇ょ鐞嗗櫒锛堝彲浠ユ槸Spark Standalone5闆嗙兢锛屼篃鍙互鏄痀ARN锛夌敵璇疯繍琛孲park浣滀笟闇€瑕佷娇鐢ㄧ殑璧勬簮銆?span>璧勬簮鎸囩殑灏辨槸Executor杩涚▼銆?/strong>鍦ㄥ悇涓伐浣滆妭鐐逛笂锛屽惎鍔ㄤ竴瀹氭暟閲忕殑Executor杩涚▼锛屾瘡涓狤xecutor杩涚▼閮藉崰鏈変竴瀹氭暟閲忕殑鍐呭瓨鍜孋PU Core
5銆佸湪鐢宠鍒颁簡浣滀笟鎵ц鎵€闇€鐨勮祫婧愪箣鍚庯紝Driver杩涚▼灏变細寮€濮嬭皟搴﹀拰鎵ц鎴戜滑缂栧啓鐨勪綔涓氫唬鐮佷簡
6銆丏river杩涚▼浼氬皢鎴戜滑缂栧啓鐨凷park浣滀笟浠g爜鍒嗘媶涓哄涓猄tage锛屾瘡涓猄tage鎵ц涓€閮ㄥ垎浠g爜鐗囨锛屽苟涓烘瘡涓猄tage鍒涘缓涓€鎵筎ask锛岀劧鍚庡皢杩欎簺Task鍒嗛厤鍒板悇涓狤xecutor杩涚▼涓墽琛?nbsp;
7銆?span>Task鏄渶灏忕殑璁$畻鍗曞厓锛岃礋璐f墽琛屼竴妯′竴鏍风殑璁$畻閫昏緫锛堜篃灏辨槸鎴戜滑缂栧啓鐨勬煇涓唬鐮佺墖娈碉級锛屽彧鏄瘡涓猅ask澶勭悊鐨勬暟鎹笉鍚岃€屽凡銆備竴涓猄tage鐨勬墍鏈塗ask閮芥墽琛屽畬姣曚箣鍚庯紝浼氬湪鍚勪釜鑺傜偣鏈湴鐨勭鐩樻枃浠朵腑鍐欏叆璁$畻涓棿缁撴灉锛岀劧鍚嶥river灏变細璋冨害杩愯涓嬩竴涓猄tage銆備笅涓€涓猄tage鐨凾ask鐨勮緭鍏ユ暟鎹氨鏄笂涓€涓猄tage杈撳嚭鐨勪腑闂寸粨鏋?nbsp;
8銆丼park鏄牴鎹甋huffle绫荤畻瀛愭潵杩涜Stage鐨勫垝鍒嗭紝Shuffle绠楀瓙鎵ц涔嬪墠鐨勪唬鐮佷細琚垝鍒嗕负涓€涓猄tage
涓嬮潰灏嗕粠涓や釜鏂归潰瀵箂park璋冧紭杩涜浠嬬粛
涓昏鍖呮嫭锛?span>RDD lineage璁捐銆佺畻瀛愮殑鍚堢悊浣跨敤銆佺壒娈婃搷浣滅殑浼樺寲绛?/strong>銆傚湪寮€鍙戣繃绋嬩腑锛屾椂鏃跺埢鍒婚兘搴旇娉ㄦ剰浠ヤ笂鍘熷垯锛屽苟灏嗚繖浜涘師鍒欐牴鎹叿浣撶殑涓氬姟浠ュ強瀹為檯鐨勫簲鐢ㄥ満鏅紝鐏垫椿鍦拌繍鐢ㄥ埌鑷繁鐨凷park浣滀笟涓€?/span>
鎴戜滑鍦ㄥ紑鍙戣繃绋嬩腑瑕佹敞鎰忥細瀵逛簬鍚屼竴浠芥暟鎹紝鍙簲璇ュ垱寤轰竴涓猂DD锛屼笉鑳藉垱寤哄涓猂DD鏉ヤ唬琛ㄥ悓涓€浠芥暟鎹€?/span>
濡傛灉鏈夊彲鑳界殑璇濓紝瑕佸敖閲忛伩鍏嶄娇鐢╯huffle绫荤畻瀛愩€?/strong>鍥犱负Spark浣滀笟杩愯杩囩▼涓紝鏈€娑堣€楁€ц兘鐨勫湴鏂瑰氨鏄痵huffle杩囩▼銆俿huffle杩囩▼锛岀畝鍗曟潵璇达紝灏辨槸灏嗗垎甯冨湪闆嗙兢涓涓妭鐐逛笂鐨勫悓涓€涓猭ey锛屾媺鍙栧埌鍚屼竴涓妭鐐逛笂锛岃繘琛岃仛鍚堟垨join绛夋搷浣溿€傛瘮濡俽educeByKey銆乯oin绛夌畻瀛愶紝閮戒細瑙﹀彂shuffle鎿嶄綔銆?/span>
shuffle杩囩▼涓紝鍚勪釜鑺傜偣涓婄殑鐩稿悓key閮戒細鍏堝啓鍏ユ湰鍦扮鐩樻枃浠朵腑锛岀劧鍚庡叾浠栬妭鐐归渶瑕侀€氳繃缃戠粶浼犺緭鎷夊彇鍚勪釜鑺傜偣涓婄殑纾佺洏鏂囦欢涓殑鐩稿悓key銆傝€屼笖鐩稿悓key閮芥媺鍙栧埌鍚屼竴涓妭鐐硅繘琛岃仛鍚堟搷浣滄椂锛岃繕鏈夊彲鑳戒細鍥犱负涓€涓妭鐐逛笂澶勭悊鐨刱ey杩囧锛屽鑷村唴瀛樹笉澶熷瓨鏀撅紝杩涜€屾孩鍐欏埌纾佺洏鏂囦欢涓€?/strong>鍥犳鍦╯huffle杩囩▼涓紝鍙兘浼氬彂鐢熷ぇ閲忕殑纾佺洏鏂囦欢璇诲啓鐨処O鎿嶄綔锛屼互鍙婃暟鎹殑缃戠粶浼犺緭鎿嶄綔銆傜鐩業O鍜岀綉缁滄暟鎹紶杈撲篃鏄痵huffle鎬ц兘杈冨樊鐨勪富瑕佸師鍥犮€?/span>
鍥犳鍦ㄦ垜浠殑寮€鍙戣繃绋嬩腑锛?span>鑳介伩鍏嶅垯灏藉彲鑳介伩鍏嶄娇鐢╮educeByKey銆乯oin銆乨istinct銆乺epartition绛変細杩涜shuffle鐨勭畻瀛愶紝灏介噺浣跨敤map绫荤殑闈瀞huffle绠楀瓙銆?/strong>杩欐牱鐨勮瘽锛屾病鏈塻huffle鎿嶄綔鎴栬€呬粎鏈夎緝灏憇huffle鎿嶄綔鐨凷park浣滀笟锛屽彲浠ュぇ澶у噺灏戞€ц兘寮€閿€銆?/span>
4 浣跨敤楂樻€ц兘鐨勭畻瀛?/span>
路 浣跨敤reduceByKey/aggregateByKey鏇夸唬groupByKey
groupByKey娌℃湁杩涜浠讳綍鏈湴鑱氬悎锛屾墍鏈夋暟鎹兘浼氬湪闆嗙兢鑺傜偣涔嬮棿浼犺緭reduceByKey姣忎釜鑺傜偣鏈湴鐨勭浉鍚宬ey鏁版嵁锛岄兘杩涜浜嗛鑱氬悎锛岀劧鍚庢墠浼犺緭鍒板叾浠栬妭鐐逛笂杩涜鍏ㄥ眬鑱氬悎銆?/span>
路 浣跨敤mapPartitions鏇夸唬鏅€歮ap
mapPartitions绫荤殑绠楀瓙锛屼竴娆″嚱鏁拌皟鐢ㄤ細澶勭悊涓€涓猵artition鎵€鏈夌殑鏁版嵁锛岃€屼笉鏄竴娆″嚱鏁拌皟鐢ㄥ鐞嗕竴鏉★紝鎬ц兘鐩稿鏉ヨ浼氶珮涓€浜涖€備絾鏄湁鐨勬椂鍊欙紝浣跨敤mapPartitions浼氬嚭鐜癘OM锛堝唴瀛樻孩鍑猴級鐨勯棶棰樸€?span>鍥犱负鍗曟鍑芥暟璋冪敤灏辫澶勭悊鎺変竴涓猵artition鎵€鏈夌殑鏁版嵁锛屽鏋滃唴瀛樹笉澶燂紝鍨冨溇鍥炴敹鏃舵槸鏃犳硶鍥炴敹鎺夊お澶氬璞$殑锛屽緢鍙兘鍑虹幇OOM寮傚父銆傛墍浠ヤ娇鐢ㄨ繖绫绘搷浣滄椂瑕佹厧閲嶏紒
路 浣跨敤foreachPartitions鏇夸唬foreach
鍘熺悊绫讳技浜庘€滀娇鐢╩apPartitions鏇夸唬map鈥濓紝涔熸槸涓€娆″嚱鏁拌皟鐢ㄥ鐞嗕竴涓猵artition鐨勬墍鏈夋暟鎹紝鑰屼笉鏄竴娆″嚱鏁拌皟鐢ㄥ鐞嗕竴鏉℃暟鎹€傚湪瀹炶返涓彂鐜帮紝foreachPartitions绫荤殑绠楀瓙锛屽鎬ц兘鐨勬彁鍗囪繕鏄緢鏈夊府鍔╃殑銆傛瘮濡傚湪foreach鍑芥暟涓紝灏哛DD涓墍鏈夋暟鎹啓MySQL锛岄偅涔堝鏋滄槸鏅€氱殑foreach绠楀瓙锛屽氨浼氫竴鏉℃暟鎹竴鏉℃暟鎹湴鍐欙紝姣忔鍑芥暟璋冪敤鍙兘灏变細鍒涘缓涓€涓暟鎹簱杩炴帴锛屾鏃跺氨鍔垮繀浼氶绻佸湴鍒涘缓鍜岄攢姣佹暟鎹簱杩炴帴锛屾€ц兘鏄潪甯镐綆涓嬶紱浣嗘槸濡傛灉鐢╢oreachPartitions绠楀瓙涓€娆℃€у鐞嗕竴涓猵artition鐨勬暟鎹紝閭d箞瀵逛簬姣忎釜partition锛屽彧瑕佸垱寤轰竴涓暟鎹簱杩炴帴鍗冲彲锛岀劧鍚庢墽琛屾壒閲忔彃鍏ユ搷浣滐紝姝ゆ椂鎬ц兘鏄瘮杈冮珮鐨勩€?span>瀹炶返涓彂鐜帮紝瀵逛簬1涓囨潯宸﹀彸鐨勬暟鎹噺鍐橫ySQL锛屾€ц兘鍙互鎻愬崌30%浠ヤ笂銆?/strong>
路 浣跨敤filter涔嬪悗杩涜coalesce鎿嶄綔
閫氬父瀵逛竴涓猂DD鎵цfilter绠楀瓙杩囨护鎺塕DD涓緝澶氭暟鎹悗锛堟瘮濡?0%浠ヤ笂鐨勬暟鎹級锛?span>寤鸿浣跨敤coalesce绠楀瓙锛屾墜鍔ㄥ噺灏慠DD鐨刾artition鏁伴噺锛屽皢RDD涓殑鏁版嵁鍘嬬缉鍒版洿灏戠殑partition涓幓銆傚洜涓篺ilter涔嬪悗锛孯DD鐨勬瘡涓猵artition涓兘浼氭湁寰堝鏁版嵁琚繃婊ゆ帀锛屾鏃跺鏋滅収甯歌繘琛屽悗缁殑璁$畻锛屽叾瀹炴瘡涓猼ask澶勭悊鐨刾artition涓殑鏁版嵁閲忓苟涓嶆槸寰堝锛屾湁涓€鐐硅祫婧愭氮璐癸紝鑰屼笖姝ゆ椂澶勭悊鐨則ask瓒婂锛屽彲鑳介€熷害鍙嶈€岃秺鎱€傚洜姝ょ敤coalesce鍑忓皯partition鏁伴噺锛屽皢RDD涓殑鏁版嵁鍘嬬缉鍒版洿灏戠殑partition涔嬪悗锛屽彧瑕佷娇鐢ㄦ洿灏戠殑task鍗冲彲澶勭悊瀹屾墍鏈夌殑partition銆傚湪鏌愪簺鍦烘櫙涓嬶紝瀵逛簬鎬ц兘鐨勬彁鍗囦細鏈変竴瀹氱殑甯姪銆?/span>
路 浣跨敤repartitionAndSortWithinPartitions鏇夸唬repartition涓巗ort绫绘搷浣?/span>
repartitionAndSortWithinPartitions鏄疭park瀹樼綉鎺ㄨ崘鐨勪竴涓畻瀛愶紝瀹樻柟寤鸿锛屽鏋滈渶瑕佸湪repartition閲嶅垎鍖轰箣鍚庯紝杩樿杩涜鎺掑簭锛屽缓璁洿鎺ヤ娇鐢╮epartitionAndSortWithinPartitions绠楀瓙銆傚洜涓鸿绠楀瓙鍙互涓€杈硅繘琛岄噸鍒嗗尯鐨剆huffle鎿嶄綔锛屼竴杈硅繘琛屾帓搴忋€俿huffle涓巗ort涓や釜鎿嶄綔鍚屾椂杩涜锛屾瘮鍏坰huffle鍐峴ort鏉ヨ锛屾€ц兘鍙兘鏄楂樼殑銆?/span>
姣忎釜Spark浣滀笟鐨勮繍琛屼竴鑸嵁闆嗙兢鐨勮妯¤Executor杩涚▼姣旇緝鍚堥€傦紝璁剧疆澶皯鎴栧お澶氱殑Executor杩涚▼閮戒笉濂?/strong>銆傝缃殑澶皯锛屾棤娉曞厖鍒嗗埄鐢ㄩ泦缇よ祫婧愶紱璁剧疆鐨勫お澶氱殑璇濓紝澶ч儴鍒嗛槦鍒楀彲鑳芥棤娉曠粰浜堝厖鍒嗙殑璧勬簮
2 executor-memory
姣忎釜Executor杩涚▼鐨勫唴瀛樿缃?G~8G杈冧负鍚堥€?/span>锛宯um-executors涔樹互executor-memory锛屾槸涓嶈兘瓒呰繃闃熷垪鐨勬渶澶у唴瀛橀噺鐨勶紝Spark闆嗙兢鍙互璁剧疆姣忎釜executor鏈€澶氫娇鐢ㄧ殑鍐呭瓨澶у皬銆傚鏋滀綘鏄窡鍥㈤槦閲屽叾浠栦汉鍏变韩杩欎釜璧勬簮闃熷垪锛岄偅涔堢敵璇风殑鍐呭瓨閲忔渶濂戒笉瑕佽秴杩囪祫婧愰槦鍒楁渶澶ф€诲唴瀛樼殑1/3锝?/2
3 executor-cores
璁剧疆姣忎釜Executor杩涚▼鐨凜PU core鏁伴噺鍐冲畾浜嗘瘡涓狤xecutor杩涚▼骞惰鎵цtask绾跨▼鐨勮兘鍔?/span>
鏁伴噺璁剧疆涓?~4涓緝涓哄悎閫傦紝渚濇嵁璧勬簮闃熷垪鐨勬渶澶PU Core闄愬埗鏄灏戯紝鍐嶄緷鎹缃殑Executor鏁伴噺锛屾潵鍐冲畾姣忎釜Executor杩涚▼鍙互鍒嗛厤鍒板嚑涓狢PU Core
璁剧疆Driver杩涚▼鐨勫唴瀛楧river鐨勫唴瀛橀€氬父鏉ヨ涓嶈缃紝鎴栬€呰缃?G宸﹀彸搴旇灏卞浜?/span>
濡傛灉闇€瑕佷娇鐢?collect 绠楀瓙灏哛DD鐨勬暟鎹叏閮ㄦ媺鍙栧埌Driver涓婅繘琛屽鐞嗭紝閭d箞蹇呴』纭繚Driver鐨勫唴瀛樿冻澶熷ぇ锛屽惁鍒欎細鍑虹幇OOM鍐呭瓨婧㈠嚭鐨勯棶棰?/strong>
璁剧疆姣忎釜stage鐨勯粯璁ask鏁伴噺涓嶅幓璁剧疆杩欎釜鍙傛暟锛岄偅涔圫park鏍规嵁搴曞眰HDFS鐨刡lock鏁伴噺鏉ヨ缃畉ask鐨勬暟閲忥紝榛樿鏄竴涓狧DFS block瀵瑰簲涓€涓猼ask锛岄€氬父鏉ヨ锛孲park榛樿璁剧疆鐨勬暟閲忔槸鍋忓皯鐨?/span>
璁剧疆璇ュ弬鏁颁负num-executors * executor-cores鐨?~3鍊嶈緝涓哄悎閫傚鏋渢ask鏁伴噺鍋忓皯鐨勮瘽锛孍xecutor杩涚▼鍙兘鏍规湰灏辨病鏈塼ask鎵ц锛屼篃灏辨槸鐧界櫧娴垂浜嗚祫婧?/span>
璁剧疆RDD鎸佷箙鍖栨暟鎹湪Executor鍐呭瓨涓兘鍗犵殑姣斾緥锛岄粯璁ゆ槸0.6鏍规嵁浣犻€夋嫨鐨勪笉鍚岀殑鎸佷箙鍖栫瓥鐣ワ紝濡傛灉鍐呭瓨涓嶅鏃讹紝鍙兘鏁版嵁灏变笉浼氭寔涔呭寲锛屾垨鑰呮暟鎹細鍐欏叆纾佺洏
濡傛灉Spark浣滀笟涓紝鏈夎緝澶氱殑RDD鎸佷箙鍖栨搷浣滐紝璇ュ弬鏁扮殑鍊煎彲浠ラ€傚綋鎻愰珮涓€浜涘鏋淪park浣滀笟涓殑Shuffle绫绘搷浣滄瘮杈冨锛岃€屾寔涔呭寲鎿嶄綔姣旇緝灏戯紝閭d箞杩欎釜鍙傛暟鐨勫€奸€傚綋闄嶄綆涓€浜涙瘮杈冨悎閫傚鏋滃彂鐜颁綔涓氱敱浜庨绻佺殑GC瀵艰嚧杩愯缂撴參锛堥€氳繃Spark WebUI鍙互瑙傚療鍒颁綔涓氱殑GC鑰楁椂锛夛紝鎰忓懗鐫€Task鎵ц鐢ㄦ埛浠g爜鐨勫唴瀛樹笉澶熺敤锛岄偅涔堝悓鏍峰缓璁皟浣庤繖涓弬鏁扮殑鍊?/span>
璁剧疆Shuffle杩囩▼涓竴涓猼ask鎷夊彇鍒颁笂涓猄tage鐨凾ask鐨勮緭鍑哄悗锛岃繘琛岃仛鍚堟搷浣滄椂鑳藉浣跨敤鐨凟xecutor鍐呭瓨鐨勬瘮渚嬶紝榛樿鏄?.2Shuffle鎿嶄綔鍦ㄨ繘琛岃仛鍚堟椂锛?span>濡傛灉鍙戠幇浣跨敤鐨勫唴瀛樿秴鍑轰簡杩欎釜20%鐨勯檺鍒讹紝閭d箞澶氫綑鐨勬暟鎹氨浼氭孩鍐欏埌纾佺洏鏂囦欢涓幓锛屾鏃跺氨浼氭瀬澶у湴闄嶄綆鎬ц兘
濡傛灉Spark浣滀笟涓殑RDD鎸佷箙鍖栨搷浣滆緝灏戯紝Shuffle鎿嶄綔杈冨鏃讹紝寤鸿闄嶄綆鎸佷箙鍖栨搷浣滅殑鍐呭瓨鍗犳瘮锛屾彁楂楽huffle鎿嶄綔鐨勫唴瀛樺崰姣旀瘮渚嬪鏋滃彂鐜颁綔涓氱敱浜庨绻佺殑GC瀵艰嚧杩愯缂撴參锛屾剰鍛崇潃Task鎵ц鐢ㄦ埛浠g爜鐨勫唴瀛樹笉澶熺敤锛岄偅涔?span>鍚屾牱寤鸿璋冧綆杩欎釜鍙傛暟鐨勫€笺€?/strong>
馃殌
馃憞馃徎鐐瑰嚮銆岄槄璇诲師鏂囥€?/strong>鏌ョ湅鏇村娣卞害鎶€鏈ソ鏂?/span>
以上是关于鐮佸啘鎵嬭涓⊿PARK鎬ц兘璋冧紭的主要内容,如果未能解决你的问题,请参考以下文章