[0004] Hadoop 版hello word mapreduce wordcount 运行
Posted sunzebo
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[0004] Hadoop 版hello word mapreduce wordcount 运行相关的知识,希望对你有一定的参考价值。
目的:
初步感受一下hadoop mapreduce
环境:
hadoop 2.6.4
1 准备输入文件
paper.txt 内容一般为英文文章,随便弄点什么进去
[email protected]:~$ hadoop fs -mkdir /input [email protected]:~$ ls Desktop Documents Downloads examples.desktop hadoop-2.6.4.tar.gz Music paper.txt Pictures Public Templates Videos [email protected]:~$ hadoop fs -put paper.txt /input [email protected]:~$ hadoop fs -ls /input Found 1 items -rw-r--r-- 1 hadoop supergroup 1762 2016-10-23 00:45 /input/paper.txt
注意:输出目录/output 不用提前创建,程序会自动做这一步
2 执行
[email protected]:~$ hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /input /output 16/10/23 00:51:09 INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:8032 16/10/23 00:51:11 INFO input.FileInputFormat: Total input paths to process : 1 16/10/23 00:51:12 INFO mapreduce.JobSubmitter: number of splits:1 16/10/23 00:51:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477208120905_0001 16/10/23 00:51:14 INFO impl.YarnClientImpl: Submitted application application_1477208120905_0001 16/10/23 00:51:14 INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477208120905_0001/ 16/10/23 00:51:14 INFO mapreduce.Job: Running job: job_1477208120905_0001 16/10/23 00:51:38 INFO mapreduce.Job: Job job_1477208120905_0001 running in uber mode : false 16/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
6/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 00:52:17 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 00:52:39 INFO mapreduce.Job: map 100% reduce 100%
16/10/23 00:52:41 INFO mapreduce.Job: Job job_1477208120905_0001 completed successfully
16/10/23 00:52:41 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2061
FILE: Number of bytes written=217797
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1863
HDFS: Number of bytes written=1425
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=35792
Total time spent by all reduces in occupied slots (ms)=18540
Total time spent by all map tasks (ms)=35792
Total time spent by all reduce tasks (ms)=18540
Total vcore-milliseconds taken by all map tasks=35792
Total vcore-milliseconds taken by all reduce tasks=18540
Total megabyte-milliseconds taken by all map tasks=36651008
Total megabyte-milliseconds taken by all reduce tasks=18984960
Map-Reduce Framework
Map input records=11
Map output records=303
Map output bytes=2969
Map output materialized bytes=2061
Input split bytes=101
Combine input records=303
Combine output records=158
Reduce input groups=158
Reduce shuffle bytes=2061
Reduce input records=158
Reduce output records=158
Spilled Records=316
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1093
CPU time spent (ms)=5550
Physical memory (bytes) snapshot=442781696
Virtual memory (bytes) snapshot=1448112128
Total committed heap usage (bytes)=276299776
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1762
File Output Format Counters
Bytes Written=1425
可以从Web监控页面查看执行状态
http://ssmaster:8088/cluster
Cluster Metrics
Apps Submitted | Apps Pending | Apps Running | Apps Completed | Containers Running | Memory Used | Memory Total | Memory Reserved | VCores Used | VCores Total | VCores Reserved | Active Nodes | Decommissioned Nodes | Lost Nodes | Unhealthy Nodes | Rebooted Nodes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 2 | 3 GB | 8 GB | 0 B | 2 | 8 | 0 | 1 | 0 | 0 | 0 | 0 |
ID
|
User
|
Name
|
Application Type
|
Queue
|
StartTime
|
FinishTime
|
State
|
FinalStatus
|
Progress
|
Tracking UI
|
Blacklisted Nodes
|
---|---|---|---|---|---|---|---|---|---|---|---|
application_1477208120905_0001 | hadoop | word count | MAPREDUCE | default | Sun, 23 Oct 2016 07:51:13 GMT | N/A | RUNNING | UNDEFINED | ApplicationMaster | 0 |
3 查看输出结果
[email protected]:~$ hadoop fs -ls /output Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2016-10-23 00:52 /output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 1425 2016-10-23 00:52 /output/part-r-00000 [email protected]:~$ hadoop fs -cat /output/part-r-00000 Always 1 Dream 1 There 1 a 4 all 1 along 1 always 1 ........... ...........
Q 总结
非常简单,没什么感觉。
后续:
- 自己编写mapreduce wordcount 程序
- 搭建一个纯分布式,同样的程序处理一个大文件,观察一下速度
以上是关于[0004] Hadoop 版hello word mapreduce wordcount 运行的主要内容,如果未能解决你的问题,请参考以下文章
python基础学习笔记——Python基础教程(第2版 修订版)第一章
CLIENT_0004:Unable to find valid Kerberos ticket cache (kinit)