Hadoop完全分布式搭建
Posted 未知007
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop完全分布式搭建相关的知识,希望对你有一定的参考价值。
一、环境
VirtualBox:
系统Ubuntu14.04:
Master节点
IP:172.16.1.110、192.168.1.110
主机名:Master
安装用户:密码->hadoop:hadoop
OpenSatck:
系统CentOS72:
Slave1节点:
IP:192.168.200.101
主机名:Slave1
安装用户:密码->hadoop:hadoop
Slave2节点:
IP:192.168.200.102
主机名:Slave2
安装用户:密码->hadoop:hadoop
Slave3节点:
IP:192.168.200.103
主机名:Slave3
安装用户:密码->hadoop:hadoop
Slave4节点:
IP:192.168.200.104
主机名:Slave4
安装用户:密码->hadoop:hadoop
Hadoop版本:hadoop-2.7.1.tar.gz
JDK版本:jdk-7u71-linux-x64.tar.gz
(Hadoop3.x需要使用JDK8+)
二、搭建
1、在Ubuntu上安装ssh
apt-get install openssh-server
2、所有节点关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
#清除防火墙规则
iptables -F
iptables -X
iptables -Z
setenforce 0
3、所有节点修改主机名
hostnamectl set-hostname Master
hostnamectl set-hostname Slave1
hostnamectl set-hostname Slave2
hostnamectl set-hostname Slave3
hostnamectl set-hostname Slave4
4、在主节点配置hosts,再分发到所有从节点
vi /etc/hosts
172.16.1.110 Master
192.168.200.101 Slave1
192.168.200.102 Slave2
192.168.200.103 Slave3
192.168.200.104 Slave4
scp /etc/hosts Slave1:/etc/hosts
scp /etc/hosts Slave2:/etc/hosts
scp /etc/hosts Slave3:/etc/hosts
scp /etc/hosts Slave4:/etc/hosts
5、所有节点新建Hadoop用户并进入
ubuntu
useradd -m hadoop -s /bin/bash
passwd hadoop
adduser hadoop sudo
centos
[root@slave* ~]# useradd -m hadoop -s /bin/bash
[root@slave* ~]# passwd hadoop
[root@slave* ~]# adduser -g hadoop sudo
su hadoop
6、在主节点生成ssh公钥,分发到所有从节点
hadoop@master:~$ ssh-keygen -t rsa
hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave1
hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave2
hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave3
hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave4
hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Master
hadoop@master:~$ ssh Slave*
7、在主节点下载解压jdk、hadoop
hadoop@master:~$ wget ftp://192.168.100.10/jdk-7u71-linux-x64.tar.gz
hadoop@master:~$ wget ftp://192.168.100.10/hadoop-2.7.1.tar.gz
hadoop@master:~$ mkdir app
hadoop@master:~$ tar -zxvf jdk-7u71-linux-x64.tar.gz -C app/
hadoop@master:~$ tar -zxvf hadoop-2.7.1.tar.gz -C app/
8、在主节点配置环境变量
hadoop@master:~$ vi .bashrc #或/etc/profile或.profile
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_71
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin
hadoop@master:~$ source .bashrc
hadoop@master:~$ java -version
hadoop@master:~$ hadoop version
9、在主节点配置Hadoop
(1)配置hadoop-env.sh
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_71
(2)配置core-site.xml
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Master:9000</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
</configuration>
(3)配置hdfs-site.xml
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/app/hdfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/app/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>副本个数,配置默认是3,应小于datanode机器数量</description>
</property>
</configuration>
(4)配置mapred-site.xml
hadoop@master:~$ cp app/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template app/hadoop-2.7.1/etc/hadoop/mapred-site.xml
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(5)配置yarn-site.xml
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:8088</value>
</property>
</configuration>
(6)配置slaves
hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/slaves
Master
Slave1
Slave2
Slave3
Slave4
10、将主节点配置好的jdk、hadoop、环境变量分发到所有从节点
hadoop@master:~$ scp -r app Slave1:/home/hadoop
hadoop@master:~$ scp -r app Slave2:/home/hadoop
hadoop@master:~$ scp -r app Slave3:/home/hadoop
hadoop@master:~$ scp -r app Slave4:/home/hadoop
hadoop@master:~$ scp .bashrc Slave1:/home/hadoop
hadoop@master:~$ scp .bashrc Slave2:/home/hadoop
hadoop@master:~$ scp .bashrc Slave3:/home/hadoop
hadoop@master:~$ scp .bashrc Slave4:/home/hadoop
11、所有从节点source环境变量
[hadoop@slave1 ~]$ source .bashrc
[hadoop@slave2 ~]$ source .bashrc
[hadoop@slave3 ~]$ source .bashrc
[hadoop@slave4 ~]$ source .bashrc
12、在主节点格式化namenode
hadoop@master:~$ hdfs namenode -format
13、在主节点启动hadoop集群
hadoop@master:~$ app/hadoop-2.7.1/sbin/./start-all.sh
hadoop@master:~$ jps
5231 DataNode
5591 ResourceManager
5764 Jps
5082 NameNode
5446 SecondaryNameNode
5729 NodeManager
hadoop@slave1:~$ jps
10412 DataNode
10607 Jps
10507 NodeManager
14、访问web站点测试
http://172.16.1.110:50070/dfshealth.html#tab-overview
http://172.16.1.110:8088/cluster
或
http://192.168.1.110:50070/dfshealth.html#tab-overview
15、执行MapReduce作业测试
计算圆周率
hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 1 1
统计单词
数据源在HDFS文件系统
hadoop@master:~$ hadoop fs -put input.txt /
hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input.txt /output
hadoop@master:~$ hadoop fs -cat /output/part-r-00000 |head -3
或
hadoop@master:~$ hadoop fs -rmr /output
hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount hdfs://Master:9000/input.txt /output
或(数据源在本地)
hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount file:///home/hadoop/input.txt /output
补充:
搭建ntp服务器
Ubuntu
root@master:~# apt-get install ntp
root@master:~# vi /etc/ntp.conf
server 127.127.1.0
fudge 127.127.1.0 stratum 10
root@master:~# service ntp restart
root@master:~# service ntp status
* NTP server is running
[root@slave1 ~]# yum search ntpd
[root@slave1 ~]# yum install ntpdate -y
[root@slave1 ~]# ntpdate Master
5 Feb 09:34:25 ntpdate[17039]: step time server 172.16.1.110 offset -34689.864848 sec
搭建ftp服务器
root@master:~# apt-get install vsftpd
anon_root=/ #允许访问的目录
anonymous_enable=YES #允许匿名访问
root@master:~# service vsftpd restart
vsftpd stop/waiting
vsftpd start/running, process 8325
------END
以上是关于Hadoop完全分布式搭建的主要内容,如果未能解决你的问题,请参考以下文章