MapReduce涔婼huffle璇﹁В

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MapReduce涔婼huffle璇﹁В相关的知识,希望对你有一定的参考价值。

鏍囩锛?a href='http://www.mamicode.com/so/1/ctas' title='ctas'>ctas   璇﹁В   hellip   nap   turn   nat   杈撳嚭   pre   鐢熸垚   

Hadoop鍘熺敓鐨勮绠楁鏋禡apReduce锛岀畝鍗曟鎷竴涓嬶細杩涚▼閲忕骇寰堥噸锛屽惎鍔ㄥ緢鎱紝浣嗚兘鎵胯浇鐨勬暟鎹噺寰堝ぇ锛屾晥鐜囩浉杈冧簬Spark寰壒澶勭悊鍜孎link瀹炴椂鏉ヨ寰堟參锛孲huffle浠讳綍涓€涓啓MR鍚屽閮藉繀椤绘帉鎻$殑涓滆タ锛岃闅句笉闅撅紝璇寸畝鍗曚篃涓嶇畝鍗?/p>

MapReduce绋嬪簭鐨勪簲涓樁娈碉細

  • input
  • map
  • shuffle
  • reduce
  • output

鎴戝皢Shuffle闃舵鍔犵矖浜嗭紝鍘熷洜寰堢畝鍗曪紝鍥犱负杩欓噷寰堥噸瑕?/p>

1. 鍏充簬Shuffle杩囩▼瀹炵幇鐨勫姛鑳斤細

1. 鍒嗗尯锛?/p>

  • 鍐冲畾褰撳墠鐨凨ey浜ょ粰鍝釜Reducer杩涜澶勭悊锛岀浉鍚岀殑Key鍒欑敱鐩稿悓鐨凴educer澶勭悊
  • 榛樿鏄牴鎹甂ey鐨凥ash鍊硷紝瀵筊educe涓暟鍙栦綑锛堟簮鐮佸涓嬶級
    public int getPartition(K2 key, V2 value, int numReduceTasks) {
          return (key.hashCode() & Integer.MAX_VALUE) % numReducTasks  
    }

     

2. 鍒嗙粍

  • 灏嗙浉鍚岀殑Key鐨剉alue杩涜鍚堝苟
  • Key鐩哥瓑鏃跺皢鍒嗗埌鍚屼竴涓粍閲岄潰
  • MapReduce闃舵锛屼竴琛岃皟鐢ㄤ竴娆ap鏂规硶锛屼竴绉岾ey璋冪敤涓€娆educe

3. 鎺掑簭锛氬皢Key鎸夌収瀛楀吀鎺掑簭

 

2. 鍏充簬Shuffle杩囩▼瀹炵幇鍔熻兘鐨勮缁嗘弿杩帮細

1. Map绔疭huffle锛?/div>
  1. Spill锛氭孩鍐?/div>
    1. 姣忎竴涓狹ap澶勭悊涔嬪悗鐨勭粨鏋滈兘浼氳繘鍏ョ幆褰㈢紦鍐插尯锛堝唴瀛橈紝榛樿100M锛?鍏充簬鐜舰缂撳啿鍖烘湁蹇呰鍗曠嫭浜嗚В涓€涓嬶紝涓嶈缁嗗睍寮€浜?
    2. 鍒嗗尯锛氬姣忎竴鏉ey-value杩涜鍒嗗尯锛屾墦鏍囩
    3. 鎺掑簭锛氬皢鐩稿悓鍒嗗尯鐨勬暟鎹繘琛屽垎鍖哄唴鎺掑簭
    4. 褰撶幆褰㈢紦鍐插尯杈惧埌闃堝€肩殑80%锛屽皢鍒嗗尯鎺掑簭鍚庣殑鏁版嵁鍐欏埌纾佺洏鍙樻垚鏂囦欢锛屾渶缁堜細鐢熸垚澶氫釜灏忔枃浠讹紝
  2. Merge鍚堝苟锛?/div>
    1. 灏唖pill鐢熸垚鐨勫皬鏂囦欢杩涜鍚堝苟
    2. 灏嗙浉鍚屽垎鍖虹殑鏁版嵁杩涜鎺掑簭
  3. 锛?strong>Map task缁撴潫锛夐€氱煡ApplicationMaster锛孯educe涓诲姩杩囨潵鎷夊彇鏁版嵁Reduce绔疭huffle

2. Reduce绔疭huffle锛?/p>

  1. 鍚姩澶氫釜绾跨▼锛屽幓姣忓彴鏈哄櫒涓婃媺鍘诲睘浜庤嚜宸卞垎鍖虹殑M鏁版嵁
  2. Merge锛?/div>
    1. 灏嗘瘡涓狹aptask鐨勭粨鏋滃睘浜庤嚜宸卞垎鍖虹殑鏁版嵁杩涜鍚堝苟
    2. 灏嗘暣浣撳睘浜庤嚜宸卞垎鍖虹殑鏁版嵁杩涜鎺掑簭
  3. 鍒嗙粍锛氬鐩稿悓鐨刱ey鐨剉alue杩涜鍚堝苟

 

3. 鍏充簬MapReduce鐨凷huffle浼樺寲锛?/h3>
MapReduce Shuffle杩囩▼鐨勪紭鍖栵細
  1. Combiner锛氬悎骞?/div>
    1. 鍦╩ap闃舵鎻愬墠杩涜浜嗕竴娆″悎骞讹紝涓€鑸潵璇寸瓑鍚屼簬鎻愬墠杩涜浜唕educe锛岄檷浣巖educe鐨勫帇鍔?/div>
    2. 涓嶆槸鎵€鏈夌殑绋嬪簭閮介€傚悎combiner
  2. Compress锛氬帇缂?/div>
    1. 鑳藉ぇ澶у噺灏戠鐩樺拰缃戠粶鐨処O
  3. hadoop涓缃帇缂╋細
    1. hadoop checknative鏌ョ湅鏈湴鏀寔鍝簺鍘嬬缉
    2. 甯歌鐨勫帇缂╂牸寮忥細snappy锛宭zo锛宭z4
    3. 淇敼鏈湴鏀寔鐨勫帇缂╂柟寮忥細鏇挎崲lib/native
  4. MapReduce绋嬪簭鍙互璁剧疆鍘嬬缉鐨勪綅缃細
    1. 杈撳叆
    2. map鐨勪腑闂寸粨鏋?闇€瑕佸悓鏃舵寚瀹?
      1. mapreduce.map.output.compress
      2. mapreduce.map.output.compress.codec=榛樿鏄疍efaultCodec
    3. reduce鐨勮緭鍑?/div>
      1. mapreduce.output.fileoutputformat.compress
      2. Mapreduce,output.fileoutputformat.compress.codec
  5. 鎬庝箞璁剧疆鍘嬬缉锛?/div>
    1. 闆嗙兢閰嶇疆鏂囦欢鍐?/div>
    2. 璁剧疆conf瀵硅薄褰撳墠绋嬪簭鏈夋晥
    3. 杩愯鏃舵寚瀹氬弬鏁帮細 -Dmapreduce.output.fileoutputformat.compress=true ….

 

以上是关于MapReduce涔婼huffle璇﹁В的主要内容,如果未能解决你的问题,请参考以下文章

wsdl璇﹁В

property璇﹁В

mysql浜嬪姟闅旂绾у埆/鑴忚/涓嶅彲閲嶅璇?骞昏璇﹁В

璇﹁ВPython涓殑鍚勭杞箟绗n

HTTP keepalive璇﹁В

閫氳繃`RestTemplate`涓婁紶鏂囦欢(InputStreamResource璇﹁В)