hadoop环境搭建

Posted 2020-10-14 徐伟的博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hadoop环境搭建相关的知识，希望对你有一定的参考价值。

配置Linux静态IP

1. 配置 vi /etc/sysconfig/network-scripts/ifcfg-ens33

[[email protected] ~]#  vi /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE="Ethernet"
BOOTPROTO="static" #静态IP
DEFROUTE="yes"
PEERDNS="yes"
PEERROUTES="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="a95ef74c-c9df-4d72-bae5-5820a50e6228"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.95.128  #IP地址
GATEWAY=192.168.95.2  #网关
NETMASK=255.255.255.0  #子网掩码
DNS1=180.76.76.76   #DNS

2. 测试是否联网

[[email protected] ~]# ping www.baidu.com
PING www.baidu.com (14.215.177.38) 56(84) bytes of data.
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=1 ttl=128 time=10.8 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=2 ttl=128 time=10.7 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=3 ttl=128 time=10.2 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=4 ttl=128 time=9.48 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=5 ttl=128 time=9.93 ms
^C
--- www.baidu.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 9.485/10.249/10.850/0.512 ms

修改主机名

1. hostname

[[email protected] ~]# hostname
192.168.95.128
[[email protected] ~]# set-hostname hadoop-senior01.ibeifeng.com #修改主机名

2. windows本机host配置

C:\Windows\System32\drivers\etc\hosts

## hadoop-senior
192.168.95.128   hadoop-senior01.ibeifeng.com hadoop-senior01

3.配置网络映射

[[email protected] ~]# vi /etc/hosts

# 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.95.128   hadoop-senior01.ibeifeng.com hadoop-senior01

4.重启

init 6

创建普通用户

[[email protected]01 ~]# useradd beifeng
[[email protected] ~]# echo ‘123456‘ | passwd --stdin beifeng
[[email protected] ~]# su - beifeng

配置超级权限

[[email protected] ~]# visudo
## Allow root to run any commands anywhere

root    ALL=(ALL)       ALL    #找到这行
beifeng ALL=(ALL)       ALL    #添加一行

方法2

su -
echo ‘beifeng ALL=(ALL) ALL‘ >> /etc/sudoers

搭建环境

1.规划目录

[[email protected] opt]$ sudo rm -rf ./* #删掉opt下所有目录
[[email protected] opt]$  sudo mkdir software #建立需要目录
[[email protected] opt]$  sudo mkdir modules
[[email protected] opt]$  sudo mkdir datas
[[email protected] opt]$  sudo mkdir tools
[[email protected] opt]$ ll
total 0
drwxr-xr-x. 2 root root 6 Nov 10 07:25 datas
drwxr-xr-x. 2 root root 6 Nov 10 07:25 modules
drwxr-xr-x. 2 root root 6 Nov 10 07:24 software
drwxr-xr-x. 2 root root 6 Nov 10 07:25 tools

[[email protected] opt]$  sudo chown -R beifeng:beifeng * #改变所有者
[[email protected] opt]$ ll
total 0
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 datas
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 modules
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:24 software
drwxr-xr-x. 2 beifeng beifeng 6 Nov 10 07:25 tools

2. 安装rz工具（上传本机文件）

[[email protected] opt]$ sudo yum -y install lrzsz
[[email protected] ~]$ cd /opt/software/ #到software
[[email protected] software]$ rz #上传本机文件到此目录

3. 解压文件

[[email protected] software]$ tar -zxf jdk-7u67-linux-x64.tar.gz -C /opt/modules/
[[email protected] software]# tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/

4. 配置JAVA环境变量

[[email protected] ~]$ sudo vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH

方法2

su -
echo ‘JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH‘ >> /etc/profile

[[email protected] jdk1.7.0_67]$ su -
[[email protected] ~]# source /etc/profile #刷新
[[email protected] ~]# exit

[[email protected] ~]$ java -version #查看版本
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

5. 删除doc文档(里面是英文文档，没有作用)

[[email protected] hadoop-2.5.0]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        38G  7.1G   31G  19% /
devtmpfs        474M     0  474M   0% /dev
tmpfs           489M   84K  489M   1% /dev/shm
tmpfs           489M  7.2M  482M   2% /run
tmpfs           489M     0  489M   0% /sys/fs/cgroup
/dev/sda1       297M  152M  146M  51% /boot
tmpfs            98M   16K   98M   1% /run/user/42
tmpfs            98M     0   98M   0% /run/user/0

[[email protected] share]$ rm -rf doc/
[[email protected] share]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        38G  5.6G   33G  15% /
devtmpfs        474M     0  474M   0% /dev
tmpfs           489M   84K  489M   1% /dev/shm
tmpfs           489M  7.2M  482M   2% /run
tmpfs           489M     0  489M   0% /sys/fs/cgroup
/dev/sda1       297M  152M  146M  51% /boot
tmpfs            98M   16K   98M   1% /run/user/42
tmpfs            98M     0   98M   0% /run/user/0

查看文件路径

[[email protected] etc]# pwd 显示当前目录
/opt/modules/hadoop-2.5.0/etc

[[email protected] hadoop-2.5.0]# ls | sed "s:^:`pwd`/: " #显示所有文件夹路径
/opt/modules/hadoop-2.5.0/bin
/opt/modules/hadoop-2.5.0/etc
/opt/modules/hadoop-2.5.0/include
/opt/modules/hadoop-2.5.0/lib
/opt/modules/hadoop-2.5.0/libexec
/opt/modules/hadoop-2.5.0/sbin
/opt/modules/hadoop-2.5.0/share

[[email protected] modules]# find /opt/modules/hadoop-2.5.0/etc/hadoop/ #find 用于查看当前目录文件绝对路径
/opt/modules/hadoop-2.5.0/etc/hadoop/
/opt/modules/hadoop-2.5.0/etc/hadoop/capacity-scheduler.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/configuration.xsl
/opt/modules/hadoop-2.5.0/etc/hadoop/container-executor.cfg
/opt/modules/hadoop-2.5.0/etc/hadoop/core-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-metrics.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-metrics2.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-policy.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/hdfs-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-log4j.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-signature.secret
/opt/modules/hadoop-2.5.0/etc/hadoop/httpfs-site.xml
/opt/modules/hadoop-2.5.0/etc/hadoop/log4j.properties
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-queues.xml.template
/opt/modules/hadoop-2.5.0/etc/hadoop/mapred-site.xml.template
/opt/modules/hadoop-2.5.0/etc/hadoop/slaves
/opt/modules/hadoop-2.5.0/etc/hadoop/ssl-client.xml.example
/opt/modules/hadoop-2.5.0/etc/hadoop/ssl-server.xml.example
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-env.cmd
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-env.sh
/opt/modules/hadoop-2.5.0/etc/hadoop/yarn-site.xml

配置HDFS、启动及测试读写文件

1.设置JAVA的安装目录

说明：对Hadoop 、YARN、MapReduce模块进行JAVA安装配置
etc/hadoop/hadoop-env.sh
etc/hadoop/mapred-env.sh
etc/hadoop/yarn-env.sh

[[email protected] ~]# echo ${JAVA_HOME} #查看位置
/opt/modules/jdk1.8.0_144
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME} #替换成安装路径
export JAVA_HOME=/opt/modules/jdk1.8.0_144

[[email protected] hadoop-2.5.0]# bin/hadoop #显示hadoop脚本的使用文档
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

2.配置HDFS相关xml文件属性

core-site.xml
说明：配置主节点NameNode位置及交互端口
fs.defaultFS表示默认文件系统
etc/hadoop/core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop-senior01.ibeifeng.com:8020</value>
</property>

指定hadoop运行时产生文件的存储路径

[[email protected] hadoop-2.5.0]$ mkdir -p data/tmp #创建临时数据

<property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/modules/hadoop-2.5.0/data/tmp</value>
</property>

slaves

hadoop-senior01.ibeifeng.com

** hdfs-site.xml**
文件块的副本个数，伪分布式只有一个datanode，副本为1

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

3. 格式化HDFS文件系统


[[email protected] hadoop-2.5.0]$ bin/hdfs namenode -format #格式化

4. 启动HDFS文件系统测试读写文件

[[email protected] hadoop-2.5.0]$ jps
47713 Jps
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-beifeng-namenode-hadoop-senior01.ibeifeng.com.out
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-beifeng-datanode-hadoop-senior01.ibeifeng.com.out
[[email protected] hadoop-2.5.0]$ jps #验证是否启动成功
47811 DataNode
47875 Jps
47737 NameNode

5. 关闭防火墙登录web管理界面

[[email protected] ~]# firewall-cmd --state #查看防火墙状态
running
[[email protected] ~]# systemctl stop firewalld.service #关闭防火墙
[[email protected] ~]# firewall-cmd --state
not running
[[email protected] ~]# systemctl disable firewalld.service #禁止firewall开机启动
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

centos7之前的版本
service iptables stop #停止
chkconfig iptables off #禁用

HDFS管理界面：http://hadoop-senior01.ibeifeng.com:50070

创建目录

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p temp/conf #创建目录 无/
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir /text
/user/beifeng/temp/conf #web页面
/text #web页面

[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode

上传文件

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -put etc/hadoop/
web页面
/user/beifeng/hadoop

读取文件

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/beifeng/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

</configuration>

下载文件

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/beifeng/hadoop/hdfs-site.xml /home/beifeng/Downloads
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/beifeng/hadoop/hdfs-site.xml /home/beifeng/Downloads/get-hdfs-site.xml
[[email protected] ~]$ cd Downloads/
[[email protected] Downloads]$ ls
hdfs-site.xml
[[email protected] Downloads]$ ls
get-hdfs-site.xml  hdfs-site.xml

配置YARN，启动及MapReduce运行在YARN上

1.配置 etc/hadoop/mapred-site.xml:

mapred-site.xml.template 改成mapred-site.xml
指定MapReduce 运行在YARN上

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

2.配置 etc/hadoop/yarn-site.xml:

Reduce 获取数据的方式

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

指定resourcemanager 的位置

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-senior01.ibeifeng.com</value>
    </property>

3. 启动yarn

[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager

[[email protected] hadoop-2.5.0]$ jps
2690 NameNode
8402 Jps
8309 NodeManager
2749 DataNode
8061 ResourceManager

[[email protected] hadoop-2.5.0]$ sudo find /tmp/ -name ‘*.pid‘ #查找pid文件
/tmp/hadoop-beifeng-namenode.pid
/tmp/hadoop-beifeng-datanode.pid
/tmp/yarn-beifeng-resourcemanager.pid
/tmp/yarn-beifeng-nodemanager.pid

yarn外部访问：8088
HDFS外部访问：50070
http://hadoop-senior01.ibeifeng.com:8088/

在YARN上运行MapReduce WordCount程序

1 .创建目录（输入路径）

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/beifeng/wordcount/input 创建测试

2 .上传待计算文件到input


[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -put /opt/datas/test1.input /user/beifeng/wordcount/input #将本地文件上传

列出本jar能用的命令

[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

3.运行程序（输出路径）

[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/beifeng/wordcount/input /user/beifeng/wordcount/output #打jar包-命令-输入-输出

4.查看运行结果

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/beifeng/wordcount/output/part* #查看结果

[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/beifeng/wordcount/output/part*

Authentication  1
Authorization   1
Availability    2
Browse  2
Building    1
Built   1
By  1
C   1
CHANGES.txt 3
CLI 1
Cache   2
Capacity    1
Centralized 1
Circuit 1
Cluster 5
Cluster.    2
Commands    2
Common  2
Compatibility   1
Compatibilty    1
Configuration   4
Configure   1
Copy    2
DataNode    1
Deploy  1
Deprecated  1
Dist    1
DistCp  1
Distributed 2
Download    3
Edits   1

开机启动脚本

sbin/hadoop-daemon.sh start datanode
sbin/hadoop-daemon.sh start namenode
sbin/yarn-daemon.sh start nodemanager
sbin/yarn-daemon.sh start resourcemanager

sbin/hadoop-daemon.sh stop datanode
sbin/hadoop-daemon.sh stop namenode
sbin/yarn-daemon.sh stop nodemanager
sbin/yarn-daemon.sh stop resourcemanager

以上是关于hadoop环境搭建的主要内容，如果未能解决你的问题，请参考以下文章