Install hadoop3.0 on multiple nodes

Posted starheng

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Install hadoop3.0 on multiple nodes相关的知识,希望对你有一定的参考价值。

在实际操作中,用到了两台ubuntu,一台ip 192.168.9.128,一台ip192.168.9.153. 第一台作为master(主机名为hadoop-master),第二台作为slave(主机名为hadoop)。需要注意的是,这两台ubuntu需要有同一登陆名(用户)。并且JDK以及hadoop安装都需要使用此用户来进行。两台ubuntu需要配置SSH免密码登陆,配置方法见前文。由于在启动hadoop时,master会SSH免密码登陆slave,登陆时使用的用户名为master的用户名,如果slave上没有此用户,便无法正常启动。这也是master和slave需要有同一用户的原因。

Step1. 下载并解压缩安装JDK,详见前文

Step2. 现在master上下载并解压缩安装hadoop,目录为/opt/hadoop3.0-distributed/hadoop-3.0.0 

Step3. 进行配置,etc/hadoop/目录下

core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/usr/tmp</value>
</property>
</configuration>

---------------------------------------

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/dataNode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop:9001</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-master:9870</value>
</property>
</configuration>

---------------------------------------

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop3.0-distributed/hadoop-3.0.0/etc/hadoop,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/common/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/common/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/hdfs/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/hdfs/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/mapreduce/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/yarn/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/yarn/lib/*
</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
</configuration>

--------------------------------------

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8040</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>99.9</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>

 

蓝色部分是为了解决在启动hadoop后,运行192.168.9.128:8088/cluster/nodes 出现的unhealthy nodes(报错为空间占用已经超过90%)

红色部分是为了解决在运行map reduce example时的报错:2.4GB of 2.1GB virtual memory used. killing container

-----------------------------

workers 文件中添加

hadoop

-------------------------------

如前文,在hadoop-env.sh,mapred-env.sh,yarn-env.sh中添加JAVA_HOME

Step4. 使用scp -r /opt/hadoop3.0-distributed [email protected]:/opt/hadoop3.0-distributed。其中user就是两台unbuntu共有的用户

Step5. 格式化namenode:bin/hdfs namenode -format 

Step6. 启动hadoop: sbin/start-all.sh。注意检查log文件,如果有错误消息,根据其解决问题。如果启动成功,则ttp://192.168.9.128:8088及http://192.168.9.128:9870均可以打开

Step7. run map reduce example,进一步验证。详见前文。也可以run wordcount示例:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar wordcount /user/hduser/input/yarn-site.xml output/wordcount,统计yarn-site.xml中的word

 





































































































以上是关于Install hadoop3.0 on multiple nodes的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop3.0的新特性

centos7下hadoop3.0搭建文档

centos7下hadoop3.0搭建文档

Hadoop3.0配置

Windows10安装Hadoop3.0.0

hadoop3.0.0 分布式集群安装过程