大数据Hadoop Streaming编程实战之C++PhpPython

Posted 2020-10-29

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了大数据Hadoop Streaming编程实战之C++PhpPython相关的知识，希望对你有一定的参考价值。

Streaming框架允许任何程序语言实现的程序在HadoopMapReduce中使用，方便已有程序向Hadoop平台移植。因此可以说对于hadoop的扩展性意义重大。接下来我们分别使用C++、Php、Python语言实现HadoopWordCount。

　　实战一：C++语言实现Wordcount

　　代码实现：

　　1）C++语言实现WordCount中的Mapper，文件命名为mapper.cpp，以下是详细代码

　　#include

　　usingnamespacestd;

　　intmain{

　　stringkey;

　　stringvalue="1";

　　while(cin>>key){

　　cout<}

　　return0;

　　}

　　2）C++语言实现WordCount中的Reducer，文件命名为reducer.cpp，以下是详细代码

　　#include

　　usingnamespacestd;

　　intmain{

　　stringkey;

　　stringvalue;

　　mapword2count;

　　map::iteratorit;

　　while(cin>>key){

　　cin>>value;

　　it=word2count.find(key);

　　if(it!=word2count.end){

　　(it->second)++;

　　}

　　else{

　　word2count.insert(make_pair(key,1));

　　}

　　for(it=word2count.begin;it!=word2count.end;++it){

　　cout}

　　return0;

　　}

　　测试运行C++实现Wordcount的具体步骤

　　1）在线安装C++

　　在Linux环境下，如果没有安装C++，需要我们在线安装C++

　　yum-yinstallgcc-c++

　　2）对c++文件编译，生成可执行文件

　　我们通过以下命令将C++程序编译成可执行文件，然后才能够运行

　　g++-omappermapper.cpp

　　g++-oreducerreducer.cpp

　　3）本地测试

　　集群运行C++版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

　　catdjt.txt|./mapper|sort|./reducer

　　4）集群运行

　　切换到hadoop安装目录下，提交C++版本的WordCount作业，进行单词统计。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"./mapper"

　　-reducer"./reducer"

　　-filemapper

　　-filereducer

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最终出现想要的结果，说明C++语言成功实现Wordcount

　　实战二：Php语言实现Wordcount

　　代码实现：

　　1）Php语言实现WordCount中的Mapper，文件命名为wc_mapper.php，以下是详细代码

　　#!/usr/bin/php

　　error_reporting(E_ALL^E_NOTICE);

　　$word2count=array;

　　while(($line=fgets(STDIN))!==false){

　　$line=trim($line);

　　$words=preg_split(‘/\W/‘,$line,0,PREG_SPLIT_NO_EMPTY);

　　foreach($wordsas$word){

　　echo$word,chr(9),"1",PHP_EOL;

　　}

　　2）Php语言实现WordCount中的Reducer，文件命名为wc_reducer.php，以下是详细代码

　　#!/usr/bin/php

　　error_reporting(E_ALL^E_NOTICE);

　　$word2count=array;

　　while(($line=fgets(STDIN))!==false){

　　$line=trim($line);

　　list($word,$count)=explode(chr(9),$line);

　　$count=intval($count);

　　$word2count[$word]+=$count;

　　}

　　foreach($word2countas$word=>$count){

　　echo$word,chr(9),$count,PHP_EOL;

　　}

　　测试运行Php实现Wordcount的具体步骤

　　1）在线安装Php

　　在Linux环境下，如果没有安装Php，需要我们在线安装Php环境

　　yum-yinstallphp

　　2）本地测试

　　集群运行Php版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

　　catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php

　　3）集群运行

　　切换到hadoop安装目录下，提交Php版本的WordCount作业，进行单词统计。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"phpwc_mapper.php"

　　-reducer"phpwc_reducer.php"

　　-filewc_mapper.php

　　-filewc_reducer.php

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最终出现想要的结果，说明Php语言成功实现Wordcount

　　实战三：Python语言实现Wordcount

　　代码实现：

　　1）Python语言实现WordCount中的Mapper，文件命名为Mapper.py，以下是详细代码

　　#!/usr/java/hadoop/envpython

　　importsys

　　word2count={}

　　forlineinsys.stdin:

　　line=line.strip

　　words=filter(lambdaword:word,line.split)

　　forwordinwords:

　　print‘%s\t%s‘%(word,1)

　　2）Python语言实现WordCount中的Reducer，文件命名为Reducer.py，以下是详细代码

　　#!/usr/java/hadoop/envpython

　　fromoperatorimportitemgetter

　　importsys

　　word2count={}

　　forlineinsys.stdin:

　　line=line.strip

　　word,count=line.split

　　try:

　　count=int(count)

　　word2count[word]=word2count.get(word,0)+count

　　exceptValueError:

　　pass

　　sorted_word2count=sorted(word2count.items,key=itemgetter(0))

　　forword,countinsorted_word2count:

　　print‘%s\t%s‘%(word,count)

　　测试运行Python实现Wordcount的具体步骤

　　1）在线安装Python

　　在Linux环境下，如果没有安装Python，需要我们在线安装Python环境

　　yum-yinstallpython27

　　2）本地测试

　　集群运行Python版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

　　catdjt.txt|pythonMapper.py|sort|pythonReducer.py

　　3）集群运行

　　切换到hadoop安装目录下，提交Python版本的WordCount作业，进行单词统计。

　　hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

　　-Dmapred.reduce.tasks=2

　　-mapper"pythonMapper.py"

　　-reducer"pythonReducer.py"

　　-fileMapper.py

　　-fileReducer.py

　　-input/dajiangtai/djt.txt

　　-output/dajiangtai/out

　　如果最终出现想要的结果，说明Python语言成功实现Wordcount

以上是关于大数据Hadoop Streaming编程实战之C++PhpPython的主要内容，如果未能解决你的问题，请参考以下文章

hadoop streaming编程小demo(python版)

慕课网实战Spark Streaming实时流处理项目实战笔记二十一之铭文升级版

慕课网实战Spark Streaming实时流处理项目实战笔记五之铭文升级版

hadoop mapreduce开发实践之HDFS文件分发by streaming

基于大数据技术之电视收视率企业项目实战（hadoop+Spark）

慕课网实战Spark Streaming实时流处理项目实战笔记十六之铭文升级版