hadoop环境搭建
Posted 徐伟的博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hadoop环境搭建相关的知识,希望对你有一定的参考价值。
配置Linux静态IP
1. 配置 vi /etc/sysconfig/network-scripts/ifcfg-ens33
[[email protected] ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
BOOTPROTO="static" #静态IP
DEFROUTE="yes"
PEERDNS="yes"
PEERROUTES="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="a95ef74c-c9df-4d72-bae5-5820a50e6228"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.95.128 #IP地址
GATEWAY=192.168.95.2 #网关
NETMASK=255.255.255.0 #子网掩码
DNS1=180.76.76.76 #DNS
2. 测试是否联网
[[email protected] ~]# ping www.baidu.com
PING www.baidu.com (14.215.177.38) 56(84) bytes of data.
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=1 ttl=128 time=10.8 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=2 ttl=128 time=10.7 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=3 ttl=128 time=10.2 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=4 ttl=128 time=9.48 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=5 ttl=128 time=9.93 ms
^C
--- www.baidu.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 9.485/10.249/10.850/0.512 ms
修改主机名
1. hostname
[[email protected] ~]# hostname
192.168.95.128
[[email protected] ~]# set-hostname hadoop-senior01.ibeifeng.com #修改主机名
2. windows本机host配置
C:\Windows\System32\drivers\etc\hosts
## hadoop-senior
192.168.95.128 hadoop-senior01.ibeifeng.com hadoop-senior01
3.配置网络映射
[[email protected] ~]# vi /etc/hosts
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.95.128 hadoop-senior01.ibeifeng.com hadoop-senior01
4.重启
init 6
创建普通用户
[[email protected]01 ~]# useradd beifeng
[[email protected] ~]# echo ‘123456‘ | passwd --stdin beifeng
[[email protected] ~]# su - beifeng
配置超级权限
[[email protected] ~]# visudo
## Allow root to run any commands anywhere
root ALL=(ALL) ALL #找到这行
beifeng ALL=(ALL) ALL #添加一行
方法2
su -
echo ‘beifeng ALL=(ALL) ALL‘ >> /etc/sudoers
搭建环境
1.规划目录
[[email protected] opt]$ sudo rm -rf ./* #删掉opt下所有目录
[[email protected] opt]$ sudo mkdir software #建立需要目录
[[email protected] opt]$ sudo mkdir modules
[[email protected] opt]$ sudo mkdir datas
[[email protected] opt]$ sudo mkdir tools
[[email protected] opt]$ ll
total 0
drwxr-xr-x. 2 root root 6 Nov 10 07:25 datas
drwxr-xr-x. 2 root root 6 Nov 10 07:25 modules
drwxr-xr-x. 2 root root 6 Nov 10 07:24 software
drwxr-xr-x. 2 root root 6 Nov 10 07:25 tools
[[email protected] opt]$ sudo chown -R beifeng:beifeng * #改变所有者
[[email protected] opt]$ ll
total 0
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 datas
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 modules
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:24 software
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 tools
2. 安装rz工具(上传本机文件)
[[email protected] opt]$ sudo yum -y install lrzsz
[[email protected] ~]$ cd /opt/software/ #到software
[[email protected] software]$ rz #上传本机文件到此目录
3. 解压文件
[[email protected] software]$ tar -zxf jdk-7u67-linux-x64.tar.gz -C /opt/modules/
[[email protected] software]# tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/
4. 配置JAVA环境变量
[[email protected] ~]$ sudo vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH
方法2
su -
echo ‘JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH‘ >> /etc/profile
[[email protected] jdk1.7.0_67]$ su -
[[email protected] ~]# source /etc/profile #刷新
[[email protected] ~]# exit
[[email protected] ~]$ java -version #查看版本
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
5. 删除doc文档(里面是英文文档,没有作用)
[[email protected] hadoop-2.5.0]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 38G 7.1G 31G 19% /
devtmpfs 474M 0 474M 0% /dev
tmpfs 489M 84K 489M 1% /dev/shm
tmpfs 489M 7.2M 482M 2% /run
tmpfs 489M 0 489M 0% /sys/fs/cgroup
/dev/sda1 297M 152M 146M 51% /boot
tmpfs 98M 16K 98M 1% /run/user/42
tmpfs 98M 0 98M 0% /run/user/0
[[email protected] share]$ rm -rf doc/
[[email protected] share]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 38G 5.6G 33G 15% /
devtmpfs 474M 0 474M 0% /dev
tmpfs 489M 84K 489M 1% /dev/shm
tmpfs 489M 7.2M 482M 2% /run
tmpfs 489M 0 489M 0% /sys/fs/cgroup
/dev/sda1 297M 152M 146M 51% /boot
tmpfs 98M 16K 98M 1% /run/user/42
tmpfs 98M 0 98M 0% /run/user/0
查看文件路径
[[email protected] etc]# pwd 显示当前目录
/opt/modules/hadoop-2.5.0/etc
[[email protected] hadoop-2.5.0]# ls | sed "s:^:`pwd`/: " #显示所有文件夹路径
/opt/modules/hadoop-2.5.0/bin
/opt/modules/hadoop-2.5.0/etc
/opt/modules/hadoop-2.5.0/include
/opt/modules/hadoop-2.5.0/lib
/opt/modules/hadoop-2.5.0/libexec
/opt/modules/hadoop-2.5.0/sbin
/opt/modules/hadoop-2.5.0/share
[[email protected] modules]# find /opt/modules/hadoop-2.5.0/etc/hadoop/ #find 用于查看当前目录文件绝对路径
/opt/modules/hadoop-2.5.0/etc/hadoop/
/opt/modules/hadoop-2.5.0/etc/hadoop/capacity-scheduler.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/configuration.xsl
/opt/modules/hadoop-2.5.0/etc/hadoop/container-executor.cfg
/opt/modules/hadoop-2.5.0/etc/hadoop/core-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-metrics.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-metrics2.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-policy.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/hdfs-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-log4j.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-signature.secret
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/log4j.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-queues.xml.template
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-site.xml.template
/opt/modules/hadoop-2.5.0/etc/hadoop/slaves
/opt/modules/hadoop-2.5.0/etc/hadoop/ssl-client.xml.example
/opt/modules/hadoop-2.5.0/etc/hadoop/ssl-server.xml.example
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-site.xml
配置HDFS、启动及测试读写文件
1.设置JAVA的安装目录
说明:对Hadoop 、YARN、MapReduce模块进行JAVA安装配置
etc/hadoop/hadoop-env.sh
etc/hadoop/mapred-env.sh
etc/hadoop/yarn-env.sh
[[email protected] ~]# echo ${JAVA_HOME} #查看位置
/opt/modules/jdk1.8.0_144
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME} #替换成安装路径
export JAVA_HOME=/opt/modules/jdk1.8.0_144
[[email protected] hadoop-2.5.0]# bin/hadoop #显示hadoop脚本的使用文档
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
2.配置HDFS相关xml文件属性
core-site.xml
说明:配置主节点NameNode位置及交互端口
fs.defaultFS表示默认文件系统
etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-senior01.ibeifeng.com:8020</value>
</property>
指定hadoop运行时产生文件的存储路径
[[email protected] hadoop-2.5.0]$ mkdir -p data/tmp #创建临时数据
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0/data/tmp</value>
</property>
slaves
hadoop-senior01.ibeifeng.com
** hdfs-site.xml**
文件块的副本个数,伪分布式只有一个datanode,副本为1
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
3. 格式化HDFS文件系统
[[email protected] hadoop-2.5.0]$ bin/hdfs namenode -format #格式化
4. 启动HDFS文件系统测试读写文件
[[email protected] hadoop-2.5.0]$ jps
47713 Jps
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-beifeng-namenode-hadoop-senior01.ibeifeng.com.out
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-beifeng-datanode-hadoop-senior01.ibeifeng.com.out
[[email protected] hadoop-2.5.0]$ jps #验证是否启动成功
47811 DataNode
47875 Jps
47737 NameNode
5. 关闭防火墙登录web管理界面
[[email protected] ~]# firewall-cmd --state #查看防火墙状态
running
[[email protected] ~]# systemctl stop firewalld.service #关闭防火墙
[[email protected] ~]# firewall-cmd --state
not running
[[email protected] ~]# systemctl disable firewalld.service #禁止firewall开机启动
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
centos7之前的版本
service iptables stop #停止
chkconfig iptables off #禁用
HDFS管理界面:http://hadoop-senior01.ibeifeng.com:50070
创建目录
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p temp/conf #创建目录 无/
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir /text
/user/beifeng/temp/conf #web页面
/text #web页面
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
上传文件
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -put etc/hadoop/
web页面
/user/beifeng/hadoop
读取文件
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/beifeng/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
下载文件
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/beifeng/hadoop/hdfs-site.xml /home/beifeng/Downloads
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/beifeng/hadoop/hdfs-site.xml /home/beifeng/Downloads/get-hdfs-site.xml
[[email protected] ~]$ cd Downloads/
[[email protected] Downloads]$ ls
hdfs-site.xml
[[email protected] Downloads]$ ls
get-hdfs-site.xml hdfs-site.xml
配置YARN,启动及MapReduce运行在YARN上
1.配置 etc/hadoop/mapred-site.xml:
mapred-site.xml.template 改成mapred-site.xml
指定MapReduce 运行在YARN上
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.配置 etc/hadoop/yarn-site.xml:
Reduce 获取数据的方式
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
指定resourcemanager 的位置
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-senior01.ibeifeng.com</value>
</property>
3. 启动yarn
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager
[[email protected] hadoop-2.5.0]$ jps
2690 NameNode
8402 Jps
8309 NodeManager
2749 DataNode
8061 ResourceManager
[[email protected] hadoop-2.5.0]$ sudo find /tmp/ -name ‘*.pid‘ #查找pid文件
/tmp/hadoop-beifeng-namenode.pid
/tmp/hadoop-beifeng-datanode.pid
/tmp/yarn-beifeng-resourcemanager.pid
/tmp/yarn-beifeng-nodemanager.pid
yarn外部访问:8088
HDFS外部访问:50070
http://hadoop-senior01.ibeifeng.com:8088/
在YARN上运行MapReduce WordCount程序
1 .创建目录(输入路径)
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/beifeng/wordcount/input 创建测试
2 .上传待计算文件到input
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -put /opt/datas/test1.input /user/beifeng/wordcount/input #将本地文件上传
列出本jar能用的命令
[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
3.运行程序(输出路径)
[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/beifeng/wordcount/input /user/beifeng/wordcount/output #打jar包-命令-输入-输出
4.查看运行结果
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/beifeng/wordcount/output/part* #查看结果
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/beifeng/wordcount/output/part*
Authentication 1
Authorization 1
Availability 2
Browse 2
Building 1
Built 1
By 1
C 1
CHANGES.txt 3
CLI 1
Cache 2
Capacity 1
Centralized 1
Circuit 1
Cluster 5
Cluster. 2
Commands 2
Common 2
Compatibility 1
Compatibilty 1
Configuration 4
Configure 1
Copy 2
DataNode 1
Deploy 1
Deprecated 1
Dist 1
DistCp 1
Distributed 2
Download 3
Edits 1
开机启动脚本
sbin/hadoop-daemon.sh start datanode
sbin/hadoop-daemon.sh start namenode
sbin/yarn-daemon.sh start nodemanager
sbin/yarn-daemon.sh start resourcemanager
sbin/hadoop-daemon.sh stop datanode
sbin/hadoop-daemon.sh stop namenode
sbin/yarn-daemon.sh stop nodemanager
sbin/yarn-daemon.sh stop resourcemanager
以上是关于hadoop环境搭建的主要内容,如果未能解决你的问题,请参考以下文章