[0004] Hadoop 版hello word mapreduce wordcount 运行

Posted sunzebo

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[0004] Hadoop 版hello word mapreduce wordcount 运行相关的知识,希望对你有一定的参考价值。

目的:

初步感受一下hadoop mapreduce

环境:

hadoop 2.6.4 

1 准备输入文件

paper.txt 内容一般为英文文章,随便弄点什么进去
[email protected]:~$ hadoop fs -mkdir /input
[email protected]:~$ ls
Desktop  Documents  Downloads  examples.desktop  hadoop-2.6.4.tar.gz  Music  paper.txt  Pictures  Public  Templates  Videos
[email protected]:~$ hadoop fs -put paper.txt  /input
[email protected]:~$ hadoop fs -ls /input
Found 1 items
-rw-r--r--   1 hadoop supergroup       1762 2016-10-23 00:45 /input/paper.txt

 

注意:输出目录/output 不用提前创建,程序会自动做这一步

2  执行

[email protected]:~$ hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar  wordcount /input /output
16/10/23 00:51:09 INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:8032
16/10/23 00:51:11 INFO input.FileInputFormat: Total input paths to process : 1
16/10/23 00:51:12 INFO mapreduce.JobSubmitter: number of splits:1
16/10/23 00:51:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477208120905_0001
16/10/23 00:51:14 INFO impl.YarnClientImpl: Submitted application application_1477208120905_0001
16/10/23 00:51:14 INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477208120905_0001/
16/10/23 00:51:14 INFO mapreduce.Job: Running job: job_1477208120905_0001
16/10/23 00:51:38 INFO mapreduce.Job: Job job_1477208120905_0001 running in uber mode : false
16/10/23 00:51:38 INFO mapreduce.Job:  map 0% reduce 0%

6/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 00:52:17 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 00:52:39 INFO mapreduce.Job: map 100% reduce 100%
16/10/23 00:52:41 INFO mapreduce.Job: Job job_1477208120905_0001 completed successfully
16/10/23 00:52:41 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2061
FILE: Number of bytes written=217797
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1863
HDFS: Number of bytes written=1425
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=35792
Total time spent by all reduces in occupied slots (ms)=18540
Total time spent by all map tasks (ms)=35792
Total time spent by all reduce tasks (ms)=18540
Total vcore-milliseconds taken by all map tasks=35792
Total vcore-milliseconds taken by all reduce tasks=18540
Total megabyte-milliseconds taken by all map tasks=36651008
Total megabyte-milliseconds taken by all reduce tasks=18984960
Map-Reduce Framework
Map input records=11
Map output records=303
Map output bytes=2969
Map output materialized bytes=2061
Input split bytes=101
Combine input records=303
Combine output records=158
Reduce input groups=158
Reduce shuffle bytes=2061
Reduce input records=158
Reduce output records=158
Spilled Records=316
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1093
CPU time spent (ms)=5550
Physical memory (bytes) snapshot=442781696
Virtual memory (bytes) snapshot=1448112128
Total committed heap usage (bytes)=276299776
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1762
File Output Format Counters
Bytes Written=1425

 

可以从Web监控页面查看执行状态

http://ssmaster:8088/cluster

Cluster Metrics

Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted Nodes
1 0 1 0 2 3 GB 8 GB 0 B 2 8 0 1 0 0 0 0
 
ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
Blacklisted Nodes
application_1477208120905_0001 hadoop word count MAPREDUCE default Sun, 23 Oct 2016 07:51:13 GMT N/A RUNNING UNDEFINED   ApplicationMaster 0

 

3 查看输出结果

[email protected]:~$ hadoop fs -ls /output
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2016-10-23 00:52 /output/_SUCCESS
-rw-r--r--   1 hadoop supergroup       1425 2016-10-23 00:52 /output/part-r-00000
[email protected]:~$ hadoop fs -cat  /output/part-r-00000
Always    1
Dream    1
There    1
a    4
all    1
along    1
always    1
...........
...........

 

Q 总结

非常简单,没什么感觉。

后续:

  •     自己编写mapreduce wordcount 程序
  •     搭建一个纯分布式,同样的程序处理一个大文件,观察一下速度

 






























































以上是关于[0004] Hadoop 版hello word mapreduce wordcount 运行的主要内容,如果未能解决你的问题,请参考以下文章

每天一个小程序—0004题(统计单词出现次数)

python基础学习笔记——Python基础教程(第2版 修订版)第一章

CLIENT_0004:Unable to find valid Kerberos ticket cache (kinit)

wix - 错误 CNDL0004:从命令行运行时文件元素包含意外的属性“src”

WORD COUNT

Hello Word