在实际操作中,用到了两台ubuntu,一台ip 192.168.9.128,一台ip192.168.9.153. 第一台作为master(主机名为hadoop-master),第二台作为slave(主机名为hadoop)。需要注意的是,这两台ubuntu需要有同一登陆名(用户)。并且JDK以及hadoop安装都需要使用此用户来进行。两台ubuntu需要配置SSH免密码登陆,配置方法见前文。由于在启动hadoop时,master会SSH免密码登陆slave,登陆时使用的用户名为master的用户名,如果slave上没有此用户,便无法正常启动。这也是master和slave需要有同一用户的原因。
Step1. 下载并解压缩安装JDK,详见前文
Step2. 现在master上下载并解压缩安装hadoop,目录为/opt/hadoop3.0-distributed/hadoop-3.0.0
Step3. 进行配置,etc/hadoop/目录下
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/usr/tmp</value>
</property>
</configuration>
---------------------------------------
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop3.0-distributed/hadoop-3.0.0/dataNode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop:9001</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-master:9870</value>
</property>
</configuration>
---------------------------------------
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop3.0-distributed/hadoop-3.0.0/etc/hadoop,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/common/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/common/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/hdfs/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/hdfs/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/mapreduce/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/yarn/*,
/opt/hadoop3.0-distributed/hadoop-3.0.0/share/hadoop/yarn/lib/*
</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop3.0-distributed/hadoop-3.0.0</value>
</property>
</configuration>
--------------------------------------
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8040</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>99.9</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>
蓝色部分是为了解决在启动hadoop后,运行192.168.9.128:8088/cluster/nodes 出现的unhealthy nodes(报错为空间占用已经超过90%)
红色部分是为了解决在运行map reduce example时的报错:2.4GB of 2.1GB virtual memory used. killing container
-----------------------------
workers 文件中添加
hadoop
-------------------------------
如前文,在hadoop-env.sh,mapred-env.sh,yarn-env.sh中添加JAVA_HOME
Step4. 使用scp -r /opt/hadoop3.0-distributed [email protected]:/opt/hadoop3.0-distributed。其中user就是两台unbuntu共有的用户
Step5. 格式化namenode:bin/hdfs namenode -format
Step6. 启动hadoop: sbin/start-all.sh。注意检查log文件,如果有错误消息,根据其解决问题。如果启动成功,则ttp://192.168.9.128:8088及http://192.168.9.128:9870均可以打开
Step7. run map reduce example,进一步验证。详见前文。也可以run wordcount示例:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar wordcount /user/hduser/input/yarn-site.xml output/wordcount,统计yarn-site.xml中的word