运行Hadoop自带的wordcount单词统计程序
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了运行Hadoop自带的wordcount单词统计程序相关的知识,希望对你有一定的参考价值。
0.前言
前面一篇《Hadoop初体验:快速搭建Hadoop伪分布式环境》搭建了一个Hadoop的环境,现在就使用Hadoop自带的wordcount程序来做单词统计的案例。
1.使用示例程序实现单词统计
(1)wordcount程序
wordcount程序在hadoop的share目录下,如下:
[[email protected] mapreduce]# pwd /usr/local/hadoop/share/hadoop/mapreduce [[email protected] mapreduce]# ls hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar hadoop-mapreduce-client-hs-2.6.5.jar lib hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples hadoop-mapreduce-client-jobclient-2.6.5.jar sources
就是这个hadoop-mapreduce-examples-2.6.5.jar程序。
(2)创建HDFS数据目录
创建一个目录,用于保存MapReduce任务的输入文件:
[[email protected] ~]# hadoop fs -mkdir -p /data/wordcount
创建一个目录,用于保存MapReduce任务的输出文件:
[[email protected] ~]# hadoop fs -mkdir /output
查看刚刚创建的两个目录:
[[email protected] ~]# hadoop fs -ls / drwxr-xr-x - root supergroup 0 2017-09-01 20:34 /data drwxr-xr-x - root supergroup 0 2017-09-01 20:35 /output
(3)创建一个单词文件,并上传到HDFS
创建的单词文件如下:
[[email protected] ~]# cat myword.txt leaf yyh yyh xpleaf katy ling yeyonghao leaf xpleaf katy
上传该文件到HDFS中:
[[email protected] ~]# hadoop fs -put myword.txt /data/wordcount
在HDFS中查看刚刚上传的文件及内容:
[[email protected] ~]# hadoop fs -ls /data/wordcount -rw-r--r-- 1 root supergroup 57 2017-09-01 20:40 /data/wordcount/myword.txt [[email protected] ~]# hadoop fs -cat /data/wordcount/myword.txt leaf yyh yyh xpleaf katy ling yeyonghao leaf xpleaf katy
(4)运行wordcount程序
执行如下命令:
[[email protected] ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount ... 17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully 17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38 File System Counters FILE: Number of bytes read=585940 FILE: Number of bytes written=1099502 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=114 HDFS: Number of bytes written=48 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=97 Map output materialized bytes=78 Input split bytes=112 Combine input records=10 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=78 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=92 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=241049600 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=57 File Output Format Counters Bytes Written=48
(5)查看统计结果
如下:
[[email protected] ~]# hadoop fs -cat /output/wordcount/part-r-00000 katy 2 leaf 2 ling 1 xpleaf 2 yeyonghao 1 yyh 2
3.参考资料
http://www.aboutyun.com/thread-7713-1-1.html
本文出自 “香飘叶子” 博客,请务必保留此出处http://xpleaf.blog.51cto.com/9315560/1962271
以上是关于运行Hadoop自带的wordcount单词统计程序的主要内容,如果未能解决你的问题,请参考以下文章
第六篇:Eclipse上运行第一个Hadoop实例 - WordCount(单词统计程序)
hadoop---运行自带的MapReduce WordCount程序
hadoop的统计单词程序WordCount提示找不到WordCount类