Hadoop3.3.2 完全分布式集群构建步骤无脑版
Posted cg.family
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop3.3.2 完全分布式集群构建步骤无脑版相关的知识,希望对你有一定的参考价值。
一、准备工作
1.服务器
准备 三台 linux服务器(node-01、node-02、node-03)
全部关闭防火墙:systemctl disable firewalld
1)ip映射(每台服务器都要执行如下操作)
执行命令: vim /etc/hosts , 文件末端加入如下内容,ip要修改成自己的
192.168.2.101 node-01
192.168.2.102 node-02
192.168.2.103 node-03
2)修改主机名
每台需要执行命令:vim /etc/sysconfig/network,修改如下内容
例如在node-02服务器上,修改为:HOSTNAME=node-02
2. 添加用户
node-01、node-02、node-03上都添加一个hadop用户
执行两个命令 1)adduser hadoop 2)passwd hadoop,完成密码修改
3.设置免密登录
创建秘钥=================================
node-01上执行命令:1)su hadoop 2) ssh-keygen -t dsa -P '' 3) cd /home/hadoop/.ssh/
4) ls -l , 查看到两个文件 id_dsa、id_dsa.pub 5) cat id_dsa.pub >> authorized_keys
6) scp id_dsa.pub node-02:$(pwd)/node-01.pub ; scp id_dsa.pub node-03:$(pwd)/node-01.pub
node-02上执行命令:1)su hadoop 2) ssh-keygen -t dsa -P '' 3) cd /home/hadoop/.ssh/
4) ls -l , 查看到两个文件 id_dsa、id_dsa.pub 5) cat id_dsa.pub >> authorized_keys
6) scp id_dsa.pub node-01:$(pwd)/node-01.pub ; scp id_dsa.pub node-03:$(pwd)/node-01.pub
node-03上执行命令:1)su hadoop 2) ssh-keygen -t dsa -P '' 3) cd /home/hadoop/.ssh/
4) ls -l , 查看到两个文件 id_dsa、id_dsa.pub 5) cat id_dsa.pub >> authorized_keys
6) scp id_dsa.pub node-01:$(pwd)/node-01.pub ; scp id_dsa.pub node-02:$(pwd)/node-01.pub
添加秘钥=================================
node-01上执行命令:1)su hadoop 2) cd /home/hadoop/.ssh/
3) ls -l , 多出来两个文件 node-02.pub、node-03.pub
4) cat node-02.pub >> authorized_keys ; cat node-03.pub >> authorized_keys
node-02上执行命令:1)su hadoop 2) cd /home/hadoop/.ssh/
3) ls -l , 多出来两个文件 node-01.pub、node-03.pub
4) cat node-01.pub >> authorized_keys ; cat node-03.pub >> authorized_keys
node-03上执行命令:1)su hadoop 2) cd /home/hadoop/.ssh/
3) ls -l , 多出来两个文件 node-01.pub、node-02.pub
4) cat node-01.pub >> authorized_keys ; cat node-02.pub >> authorized_keys
4. 下载
jdk-8u301-linux-x64.tar.gz、zookeeper3.7.0.tar.gz、hadoop-3.3.2.tar.gz
分别拷贝到node-01,node-02,node-03三台服务器的中
二、安装软件(3台linux服务做同样操作)
1. JDK
1)删除openjdk(没有可以忽略此步)
先在shell中执行 rpm -qa|grep jdk ,再执行 yum -y remove copy-jdk-configs-*
2)将 jdk-8u301-linux-x64.tar.gz拷贝到 /usr/local 目录中,并解压, tar -zxvf jdk-8u301-linux-x64.tar.gz
3)在shell中 执行 vim /etc/profile,将下面内容放入到文件最下面
# jdk 1.8
export JAVA_HOME=/usr/local/jdk1.8.0_301
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
4)创建软连接, 后续hadoop会用到
ln -s /usr/local/jdk1.8.0_301/bin/java /bin/java
2. Zookeeper
1)将 zookeeper3.7.0.tar.gz拷贝到 /opt 目录中,并解压, tar -zxvf zookeeper3.7.0.tar.gz
2)chown -R hadoop:hadoop /opt/zookeeper-3.7.0/
3)在shell中 执行 vim /etc/profile,将下面内容放入到文件最下面
# zookeeper
export ZOOKEEPER_HOME=/opt/zookeeper-3.7.0
export PATH=$PATH:$ZOOKEEPER_HOME/bin
4)执行 cp /opt/zookeeper-3.7.0/conf/zoo_sample.cfg /opt/zookeeper-3.7.0/conf/zoo.cfg,并修改其中内容,详细如下,修改dataDir的值,追加server.1 2 3
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper/data
clientPort=2181
server.1=node-01:2888:3888
server.2=node-02:2888:3888
server.3=node-03:2888:3888
3. Hadoop安装及配置
首先登陆到node-01节点,登录用户 hadoop
1)将 hadoop-3.3.2.tar.gz拷贝到 /opt 目录中,并解压, tar -zxvf hadoop-3.3.2.tar.gz
2)cd /opt/hadoop-3.3.2/etc/hadoop/,修改hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers
==================hadoop-env.sh
尾端追加:JAVA_HOME=/usr/local/jdk1.8.0_301
==================core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/ha/data</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node-01:2181,node-02:2181,node-03:2181</value>
</property>
</configuration>
==================hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2, nn3</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node-01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node-02:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>node-03:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node-01:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node-02:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>node-03:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node-01:8485;node-02:8485;node-03:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>$hadoop.tmp.dir/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
==================mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
==================yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node-03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node-02</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node-03:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node-02:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>node-01:2181,node-02:2181,node-03:2181</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
</property>
</configuration>
==================workers
node-01
node-02
node-03
3)分发hadoop-3.3.2工程到node-02、node-03
执行如下3条命令:
cd /opt ; scp -R /opt/hadoop-3.3.2/ node-02:$(pwd) ; scp -R /opt/hadoop-3.3.2/ node-03:$(pwd)
三、启动软件(有序)
1. 启动zookeeper(过半成功)
在node-01、node-02、node-03 三台服务器上分别执行命令:
1) su hadoop 2) zkServer.sh start 3)jps 发现有 QuorumPeerMain 即成功
2.启动hadoop
1)启动Journalnode
在node-01、node-02、node-03 三台服务器上分别执行命令:hdfs --daemon start journalnode
全部执行完成后 通过jps命令查看有JournalNode,即成功
2)格式化Namenode
在node-01服务器上执行命令:
a)格式化:hdfs namenode -format
输出的日志倒数第10行左右的末尾,若有 ... has been successfully formated,即成功
b)启动:hdfs --daemon start namenode
在node-02、node-03两台服务器上执行命令:
c)同步数据:hdfs namenode -bootstrapStandby
在node-02、node-03两台服务器上执行命令:
d)启动:hdfs --daemon start namenode
3)启动Datanode
在node-01、node-02、node-03 三台服务器上分别执行命令:hdfs --daemon start datanode
4)设置namenode主节点
在node-01服务器上执行命令:hdfs haadmin -transitionToActive nn1
注:也可以改成 nn2、nn3, 对应到hdfs-site.xml中dfs.ha.namenodes.mycluster 的某value值即可
另外,执行的时候可能会报错,改成 hdfs haadmin -transitionToActive --forcemanual nn1
主从设置命令:hdfs haadmin -transitionToActive / -transitionToStandby xxx
5)启动yarn集群
在node-01服务器上,执行命令:start-yarn.sh
四、总结
hdfs启动:start-dfs.sh
hdfs关闭:stop-dfs.sh
yarn启动:start-yarn.sh
yarn关闭:stop-yarn.sh
启动全部:start-all.sh
关闭全部:stop-all.sh
单节点启动:hadoop-daemon.sh start xxx; hdfs --daemon start xxx;
yarn --daemon start resourcemanager / nodemanager
太晚了,先睡觉啦 ... 抽空进行细节完善。
以上是关于Hadoop3.3.2 完全分布式集群构建步骤无脑版的主要内容,如果未能解决你的问题,请参考以下文章