Hadoop完全分布式搭建

Posted 未知007

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop完全分布式搭建相关的知识,希望对你有一定的参考价值。

一、环境

VirtualBox:

        系统Ubuntu14.04:

            Master节点

                IP:172.16.1.110、192.168.1.110

                主机名:Master

                安装用户:密码->hadoop:hadoop

OpenSatck:

        系统CentOS72:

            Slave1节点:

                IP:192.168.200.101

                主机名:Slave1

                安装用户:密码->hadoop:hadoop

            Slave2节点:

                IP:192.168.200.102

                主机名:Slave2

                安装用户:密码->hadoop:hadoop

            Slave3节点:

                IP:192.168.200.103

                主机名:Slave3

                安装用户:密码->hadoop:hadoop

            Slave4节点:

                IP:192.168.200.104

                主机名:Slave4

                安装用户:密码->hadoop:hadoop

Hadoop版本:hadoop-2.7.1.tar.gz

JDK版本:jdk-7u71-linux-x64.tar.gz

(Hadoop3.x需要使用JDK8+)

二、搭建

1、在Ubuntu上安装ssh

apt-get install openssh-server

2、所有节点关闭防火墙

systemctl stop firewalld

systemctl disable firewalld

#清除防火墙规则

iptables -F

iptables -X

iptables -Z

setenforce 0

3、所有节点修改主机名

hostnamectl set-hostname Master

hostnamectl set-hostname Slave1

hostnamectl set-hostname Slave2

hostnamectl set-hostname Slave3

hostnamectl set-hostname Slave4

4、在主节点配置hosts,再分发到所有从节点

vi /etc/hosts

172.16.1.110 Master

192.168.200.101 Slave1

192.168.200.102 Slave2

192.168.200.103 Slave3

192.168.200.104 Slave4

scp /etc/hosts Slave1:/etc/hosts

scp /etc/hosts Slave2:/etc/hosts

scp /etc/hosts Slave3:/etc/hosts

scp /etc/hosts Slave4:/etc/hosts

5、所有节点新建Hadoop用户并进入

ubuntu

useradd -m hadoop -s /bin/bash

passwd hadoop

adduser hadoop sudo

centos

[root@slave* ~]# useradd -m hadoop -s /bin/bash

[root@slave* ~]# passwd hadoop

[root@slave* ~]# adduser -g hadoop sudo

su hadoop

6、在主节点生成ssh公钥,分发到所有从节点

hadoop@master:~$ ssh-keygen -t rsa

hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave1

hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave2

hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave3

hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Slave4

hadoop@master:~$ ssh-copy-id -i .ssh/id_rsa.pub Master

hadoop@master:~$ ssh Slave*

7、在主节点下载解压jdk、hadoop

hadoop@master:~$ wget ftp://192.168.100.10/jdk-7u71-linux-x64.tar.gz

hadoop@master:~$ wget ftp://192.168.100.10/hadoop-2.7.1.tar.gz

hadoop@master:~$ mkdir app

hadoop@master:~$ tar -zxvf jdk-7u71-linux-x64.tar.gz -C app/

hadoop@master:~$ tar -zxvf hadoop-2.7.1.tar.gz -C app/

8、在主节点配置环境变量

hadoop@master:~$ vi .bashrc                #或/etc/profile或.profile

export JAVA_HOME=/home/hadoop/app/jdk1.7.0_71

export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.1

export PATH=$PATH:$HADOOP_HOME/bin

                                                                  

hadoop@master:~$ source .bashrc

hadoop@master:~$ java -version

hadoop@master:~$ hadoop version

9、在主节点配置Hadoop

(1)配置hadoop-env.sh

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

# The java implementation to use.

#export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/home/hadoop/app/jdk1.7.0_71

(2)配置core-site.xml

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/core-site.xml

<configuration>

<property>

    <name>fs.default.name</name>

    <value>hdfs://Master:9000</value>

    <description>HDFS的URI,文件系统://namenode标识:端口号</description>

</property>

<property>

    <name>hadoop.tmp.dir</name>

    <value>/home/hadoop/app/tmp</value>

     <description>namenode上本地的hadoop临时文件夹</description>

</property>

</configuration>

(3)配置hdfs-site.xml

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

<configuration>

<property>

    <name>dfs.name.dir</name>

    <value>/home/hadoop/app/hdfs/name</value>

    <description>namenode上存储hdfs名字空间元数据 </description>

</property>

<property>

    <name>dfs.data.dir</name>

    <value>/home/hadoop/app/hdfs/data</value>

    <description>datanode上数据块的物理存储位置</description>

</property>

<property>

    <name>dfs.replication</name>

    <value>3</value>

    <description>副本个数,配置默认是3,应小于datanode机器数量</description>

</property>

</configuration>

(4)配置mapred-site.xml

hadoop@master:~$ cp app/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template app/hadoop-2.7.1/etc/hadoop/mapred-site.xml

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/mapred-site.xml

<configuration>

<property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

</property>

</configuration>

(5)配置yarn-site.xml

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/yarn-site.xml

<configuration>

<property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

</property>

<property>

        <name>yarn.resourcemanager.webapp.address</name>

        <value>Master:8088</value>

</property>

</configuration>

(6)配置slaves

hadoop@master:~$ vi app/hadoop-2.7.1/etc/hadoop/slaves

Master

Slave1

Slave2

Slave3

Slave4

10、将主节点配置好的jdk、hadoop、环境变量分发到所有从节点

hadoop@master:~$ scp -r app Slave1:/home/hadoop

hadoop@master:~$ scp -r app Slave2:/home/hadoop

hadoop@master:~$ scp -r app Slave3:/home/hadoop

hadoop@master:~$ scp -r app Slave4:/home/hadoop

hadoop@master:~$ scp .bashrc Slave1:/home/hadoop

hadoop@master:~$ scp .bashrc Slave2:/home/hadoop

hadoop@master:~$ scp .bashrc Slave3:/home/hadoop

hadoop@master:~$ scp .bashrc Slave4:/home/hadoop

11、所有从节点source环境变量

[hadoop@slave1 ~]$ source .bashrc

[hadoop@slave2 ~]$ source .bashrc

[hadoop@slave3 ~]$ source .bashrc

[hadoop@slave4 ~]$ source .bashrc

12、在主节点格式化namenode

hadoop@master:~$ hdfs namenode -format

13、在主节点启动hadoop集群

hadoop@master:~$ app/hadoop-2.7.1/sbin/./start-all.sh

hadoop@master:~$ jps

5231 DataNode

5591 ResourceManager

5764 Jps

5082 NameNode

5446 SecondaryNameNode

5729 NodeManager

hadoop@slave1:~$ jps

10412 DataNode

10607 Jps

10507 NodeManager

14、访问web站点测试

http://172.16.1.110:50070/dfshealth.html#tab-overview

http://172.16.1.110:8088/cluster

http://192.168.1.110:50070/dfshealth.html#tab-overview

15、执行MapReduce作业测试

算圆周率

hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 1 1

统计单词

数据源在HDFS文件系统

hadoop@master:~$ hadoop fs -put input.txt /

hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input.txt /output

hadoop@master:~$ hadoop fs -cat /output/part-r-00000 |head -3

hadoop@master:~$ hadoop fs -rmr /output

hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount hdfs://Master:9000/input.txt /output

或(数据源在本地)

hadoop@master:~$ hadoop jar app/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount file:///home/hadoop/input.txt /output

补充:

搭建ntp服务器

Ubuntu

root@master:~# apt-get install ntp

root@master:~# vi /etc/ntp.conf

server 127.127.1.0

fudge 127.127.1.0 stratum 10

root@master:~# service ntp restart

root@master:~# service ntp status

* NTP server is running

[root@slave1 ~]# yum search ntpd

[root@slave1 ~]# yum install ntpdate -y

[root@slave1 ~]# ntpdate Master

5 Feb 09:34:25 ntpdate[17039]: step time server 172.16.1.110 offset -34689.864848 sec

搭建ftp服务器

root@master:~# apt-get install vsftpd

anon_root=/        #允许访问的目录

anonymous_enable=YES        #允许匿名访问

root@master:~# service vsftpd restart

vsftpd stop/waiting

vsftpd start/running, process 8325

------END


以上是关于Hadoop完全分布式搭建的主要内容,如果未能解决你的问题,请参考以下文章

集群搭建Hadoop搭建HDFS(完全分布式)

Hadoop完全分布式搭建

Hadoop基础教程4Hadoop之完全分布式环境搭建

Hadoop完全分布式集群搭建

Hadoop完全分布式集群搭建

Hadoop完全分布式集群搭建