本文进行操作的虚拟机是在伪分布式配置的基础上进行的,具体配置本文不再赘述,请参考本人博文:http://www.cnblogs.com/VeryGoodVeryGood/p/8507795.html
本文主要参考 给力星的博文——Hadoop集群安装配置教程_Hadoop2.6.0_Ubuntu/CentOS,以及《Hadoop应用开发技术详解(作者:刘刚)》
本文主要用3台虚拟机来搭建Hadoop分布式环境,三台虚拟机的拓扑图如下图所示
Hadoop集群中每个节点的角色如下表所示
主机名 | Hadoop角色 | IP地址 | Hadoop jps命令结果 | Hadoop用户 | Hadoop安装目录 |
Master |
master slave |
192.168.8.210 |
Jps NameNode SecondaryNameNode ResourceManager JobHistoryServer |
hadoop | /usr/local/hadoop |
Slave1 | slave | 192.168.8.211 |
Jps NameNode DataNode |
||
Slave2 | slave | 192.168.8.212 |
Jps NameNode DataNode |
||
Windows | 开发环境 | 192.168.0.169 |
一、网络设置
1. 虚拟机设为桥接模式
网络配置方法见博文:http://blog.csdn.net/zhongyoubing/article/details/71081464
2. 按照上表修改对应主机名,配置文件/etc/hostname
3. 设置IP映射,配置文件/etc/hosts, 所有节点上的配置均相同
127.0.0.1 localhost 192.168.8.210 Master 192.168.8.211 Slave1 192.168.8.212 Slave2
4. 重启,测试是否相互 ping 得通
ping Master -c 3 ping Slave1 -c 3 ping Slave2 -c 3
二、SSH无密码登录节点
Master:
rm ~/.ssh ssh Master cd ~/.ssh ssh-keygen -t rsa cat ./id_rsa.pub >> ./authorized_keys scp ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/ scp ~/.ssh/id_rsa.pub [email protected]:/home/hadoop/
Slave1 & Slave2:
rm ~/.ssh mkdir ~/.ssh cat ~/id_rsa.pub >> ~/.ssh/authorized_keys rm ~/id_rsa.pub
Master:
登录节点Slave2
ssh Slave2
退出
exit
三、Master节点配置分布式环境
配置文件在目录/usr/local/hadoop/etc/hadoop/下
slaves
Slave1
Slave2
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>Master:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>Master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>Master:19888</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
四、其他节点配置分布式环境
Master:
cd /usr/local sudo rm -r ./hadoop/tmp sudo rm -r ./hadoop/logs/* tar -zcf ~/hadoop.master.tar.gz ./hadoop cd ~ scp ./hadoop.master.tar.gz Slave1:/home/hadoop
Slave1 & Slave2:
sudo rm -r /usr/local/hadoop sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local sudo chown -R hadoop /usr/local/hadoop
五、启动Hadoop
Master:
hdfs namenode -format start-all.sh mr-jobhistory-daemon.sh start historyserver
查看进程
jps
查看DataNode
hdfs dfsadmin -report
Slave1 & Slave2:
查看进程
jps
七、分布式实例
1. 创建文件test.txt
Hello world
Hello world
Hello world
Hello world
Hello world
2. 在 HDFS 中创建用户目录
hdfs dfs -mkdir -p /user/hadoop
3. 创建input目录
hdfs dfs -mkdir input
4. 将本地文件复制到input里
hdfs dfs -put ./test.txt input
5. 查看是否上传成功
hdfs dfs -ls /user/hadoop/input
6. 操作
hdfs dfs -rm -r output #Hadoop运行程序时,输出目录不能存在,否则会提示错误 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /user/hadoop/input/test.txt /user/hadoop/output
7. 查看运行结果
hdfs dfs -cat output/*
8. 将运行结果取回本地
rm -r ./output hdfs dfs -get output ./output cat ./output/*
9. 删除output目录
hdfs dfs -rm -r output rm -r ./output
以上