Hadoop Zookeeper Hbase Hive 分布式集群搭建实例

Posted twt企业IT社区

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop Zookeeper Hbase Hive 分布式集群搭建实例相关的知识,希望对你有一定的参考价值。

作者:赵海,某城商行系统架构师,专注并擅长银行数据中心解决方案规划及设计。

个人主页:http://www.talkwithtrend.com/home/space.php?p=blog&t=&uid=353923


1 基础架构




2 运行机制




3 Hadoop 集群搭建


----------------------------------------

环境描述

----------------------------------------

RedhatLinux 7.1(Node50=namenode;Node51-53=datanode)

hadoop-2.7.1.tar.gz

jdk-8u77-linux-x64.tar.gz


----------------------------------------

环境配置

----------------------------------------

1. Add hadoop application user.(Node50 & Node51 & Node52 & Node53)

# groupadd hadoop

# useradd -G hadoop hadoop

# passwd hadoop


2. 互信配置

node50-node53: "/etc/hosts"

192.168.239.50 node50

192.168.239.51 node51

192.168.239.52 node52

192.168.239.53 node53


"node50-node53: /home/hadoop/.ssh/authorized_keys"

# su - hadoop

# ssh-keygen -t rsa

# cd /home/hadoop/.ssh

# cp id_rsa.pub authorized_keys


"node50:"

# scp hadoop@node51:/home/hadoop/.ssh/id_rsa.pub /tmp/id_rsa.pub.node51

# scp hadoop@node52:/home/hadoop/.ssh/id_rsa.pub /tmp/id_rsa.pub.node52

# scp hadoop@node53:/home/hadoop/.ssh/id_rsa.pub /tmp/id_rsa.pub.node53

# cat /tmp/id_rsa.pub.node51 >> /home/hadoop/.ssh/authorized_keys

# cat /tmp/id_rsa.pub.node52 >> /home/hadoop/.ssh/authorized_keys

# cat /tmp/id_rsa.pub.node53 >> /home/hadoop/.ssh/authorized_keys

# scp /home/hadoop/.ssh/authorized_keys hadoop@node51:/home/hadoop/.ssh/authorized_keys

# scp /home/hadoop/.ssh/authorized_keys hadoop@node52:/home/hadoop/.ssh/authorized_keys

# scp /home/hadoop/.ssh/authorized_keys hadoop@node53:/home/hadoop/.ssh/authorized_keys


----------------------------------------

安装准备

----------------------------------------

"node50: 解压&修改权限"

# cd /home/hadoop

# tar -zxvf hadoop-2.7.1.tar.gz

# tar -zxvf jdk-8u77-linux-x64.tar.gz

# chown -R hadoop:hadoop hadoop-2.7.1

# chown -R hadoop:hadoop jdk1.8.0_77

# mv jdk1.8.0_77 ./hadoop-2.7.1/java


"node50-node53: /home/hadoop/.bashrc"

...

# Hadoop Config

export JAVA_HOME=/home/hadoop/hadoop-2.7.1/java

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JRE_HOME/lib/dt.jar:$JRE_HOME/lib/tools.jar

export PATH=$PATH:$JAVA_HOME:$JRE_HOME:$CLASSPATH:$JAVA_HOME/bin:/home/hadoop/hadoop-2.7.1/bin:/home/hadoop/hadoop-2.7.1/sbin

export JAVA_LIBRARY_PATH=/home/hadoop/hadoop-2.7.1/lib/native


----------------------------------------

配置准备(Node50)

----------------------------------------

# su - hadoop

# cd /home/hadoop/hadoop-2.7.1

# mkdir dfs

# mkdir ./dfs/name

# mkdir ./dfs/data

# mkdir tmp


----------------------------------------

配置集群(Node50)

----------------------------------------

1) /home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml

<configuration>

<property>

               <name>hadoop.tmp.dir</name>

               <value>file:/home/hadoop/hadoop-2.7.1/tmp</value>

</property>

<property>

               <name>fs.defaultFS</name>

               <value>hdfs://node50:9000</value>

</property>

<property>

               <name>io.native.lib.available</name>

               <value>true</value>

</property>

</configuration>


2) /home/hadoop/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/hadoop-2.7.1/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/hadoop-2.7.1/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>node50:9001</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

</configuration>


3) /home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>node50:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>node50:19888</value>

</property>

</configuration>


4) /home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>node50:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>node50:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>node50:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>node50:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>node50:8088</value>

</property>

</configuration>


5) /home/hadoop/hadoop-2.7.1/etc/hadoop/slaves

node51

node52

node53


6) /home/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

...

# The java implementation to use.

export JAVA_HOME=/home/hadoop/hadoop-2.7.1/java

...


7) /home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-env.sh

...

# some Java parameters

# export JAVA_HOME=/home/y/libexec/jdk1.6.0/

export JAVA_HOME=/home/hadoop/hadoop-2.7.1/java

...


----------------------------------------

配置复制(Node50 & Node51 & Node52 & Node53)

----------------------------------------

# 通过SCP的方式将Hadoop配置包以及环境变量文件复制到所有节点。


----------------------------------------

启动集群(Node50)

----------------------------------------

1)格式化HDFS。

# hdfs namenode -format


2) 启动Hadoop集群。

# /home/hadoop/hadoop-2.7.1/sbin/start-dfs.sh

# /home/hadoop/hadoop-2.7.1/sbin/start-yarn.sh


----------------------------------------

集群验证(Node50 & Node51 & Node52 & Node53)

----------------------------------------

1)http://192.168.239.50:50070/

2)http://192.168.239.50:8088/


// namenode

# hadoop dfsadmin -report

# jps


// datanode

# jps


// namenode

# hadoop fs -ls /

# hadoop fs -mkdir /test


----------------------------------------

注意的问题点

----------------------------------------

 【操作报警告】

 “WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable”

 

 【问题诊断步骤】

 // 打开日志模式(DEBUG)。

 # export HADOOP_ROOT_LOGGER=DEBUG,console

 # hadoop fs -ls /

 

 // 找到错误点:

 “NativeCodeLoader:/lib/libc.so.6: version `GLIBC_2.14' not found”

 ==> 初步判断是库文件的事儿。

 

 # rpm -qa | grep glic-

 ==> 确认操作系统库版本确实低了一些。

 

 # strings /lib64/libc.so.6 |grep GLIBC_

 ==> 操作系统支持列表里面有2.14这个版本。

 ==> 那就不用考虑别的了,下载新版本,然后编译安装。


2 数据节点无法启动


【现象】

 //集群启动之后,从50070 WEB端口看的话,没有数据节点信息;从数据节点的JPS来看,服务已经启动。

 

【问题诊断步骤】

 //查看配置文件<slaves>。

==> 确保所有数据节点都已经配置到文件里面。

 

 //查看数据节点的日志:

 /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-node51.log

 ==> 找到错误点:

 "node50:9000 connection error: refused ......"

 

 # netstat -an | grep 9000

 ==> 有点奇怪。

 “tcp        0      0 127.0.0.1:9000     0.0.0.0:-               LISTEN   ”

 

 # cat /etc/hosts

 ==> 发现问题原因所在

 127.0.0.1 node50

 ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 192.168.239.50 node50

 192.168.239.51 node51

 192.168.239.52 node52

 192.168.239.53 node53

 ==> 把第一行改回来

 “127.0.0.1 localhost localhost.localdomain localhost6 localhost6.localdomain6”


4 Zookeeper 集群搭建


----------------------------------------

环境描述

----------------------------------------

RedhatLinux 7.1(Node50-53)

zookeeper-3.4.8.tar.gz


----------------------------------------

安装准备(Node50)

----------------------------------------

# tar -zxvf zookeeper-3.4.8.tar.gz

# chown -R hadoop:hadoop zookeeper-3.4.8

# mv zookeeper-3.4.8 /home/hadoop/


----------------------------------------

环境配置(Node50 & Node51 & Node52 & Node53)

----------------------------------------

/home/hadoop/.bashrc

...

# Zookeeper Config

...

export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.8

export PATH=$PATH:$ZOOKEEPER_HOME/bin


----------------------------------------

配置准备(Node50)

----------------------------------------

# su - hadoop

# cd /home/hadoop/zookeeper-3.4.8

# mkdir data

# mkdir log


----------------------------------------

配置集群(Node50)

----------------------------------------

1) /home/hadoop/zookeeper-3.4.8/conf/zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial 

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between 

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just 

# example sakes.

dataDir=/home/hadoop/zookeeper-3.4.8/data

dataLogDir=/home/hadoop/zookeeper-3.4.8/log

# the port at which the clients will connect

clientPort=2181

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

server.2=node51:2888:3888

server.3=node52:2888:3888

server.4=node53:2888:3888

2) /home/hadoop/zookeeper-3.4.8/data/myid

1

==> 注意:每一个服务器对应自己的ID(node50为1,node51为2,node52为3,node53为4)


----------------------------------------

配置复制(Node50 & Node51 & Node52 & Node53)

----------------------------------------

# 通过SCP的方式将Hadoop配置包以及环境变量文件复制到所有节点。


----------------------------------------

启动集群(Node50 & Node51 & Node52 & Node53)

----------------------------------------

# zkServer.sh start


----------------------------------------

集群验证(Node50 & Node51 & Node52 & Node53)

----------------------------------------

// Node50:

[hadoop@node50 ~]$ jps

3446 QuorumPeerMain    <==这个出现

2711 DataNode

3064 ResourceManager

2603 NameNode

2878 SecondaryNameNode

4478 Jps


// Node51-53:

[hadoop@node52 ~]$ jps

2356 DataNode

2501 QuorumPeerMain    <==这个出现

2942 Jps


5 Hbase 搭建


----------------------------------------

环境描述

----------------------------------------

RedhatLinux 7.1(Node50-53)

hbase-1.1.4.tar.gz

jdk-8u77-linux-x64.tar.gz


----------------------------------------

安装准备(Node50)

----------------------------------------

# tar -zxvf hbase-1.1.4.tar.gz

# chown -R hadoop:hadoop hbase-1.1.4

# mv hbase-1.1.4 /home/hadoop/


----------------------------------------

配置准备(Node50)

----------------------------------------

# su - hadoop

# cd /home/hadoop/hbase-1.1.4

# mkdir data


----------------------------------------

配置集群(Node50)

----------------------------------------

1) /home/hadoop/hbase-1.1.4/conf/hbase-site.xml

<configuration>

   <property>

           <name>hbase.rootdir</name>  

           <value>hdfs://node50:9000/hbase</value>  

   </property>

   <property>

           <name>hbase.cluster.distributed</name>

           <value>true</value>

   </property>

   <property>

           <name>hbase.zookeeper.quorum</name>

           <value>node50,node51,node52,node53</value>

   </property>

   <property>  

           <name>dfs.replication</name>

           <value>3</value>

   </property>

   <property>  

           <name>hbase.zookeeper.property.dataDir</name>

           <value>/home/hadoop/zookeeper-3.4.8/data</value>

   </property>

</configuration>


2) /home/hadoop/hbase-1.1.4/conf/hbase-env.sh


...

# The java implementation to use.  Java 1.7+ required.

export JAVA_HOME=/home/hadoop/hadoop-2.7.1/java


# see http://wiki.apache.org/hadoop/PerformanceTuning

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"


# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+

export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"


# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false


3) /home/hadoop/hbase-1.1.4/conf/regionservers

node51

node52

node53


----------------------------------------

配置复制(Node50 & Node51 & Node52 & Node53)

----------------------------------------

# 通过SCP的方式将Hadoop配置包以及环境变量文件复制到所有节点。


----------------------------------------

启动集群(Node50)

----------------------------------------

# start-hbase.sh


----------------------------------------

集群验证(Node50 & Node51 & Node52 & Node53)

----------------------------------------

// Node50:

[hadoop@node51 logs]$ jps

5396 Jps

3446 QuorumPeerMain

2711 DataNode

3064 ResourceManager

2603 NameNode

2878 SecondaryNameNode

3599 HMaster     <===


// Node51-53:

[hadoop@node52 conf]$ jps

3346 Jps

2356 DataNode

2501 QuorumPeerMain

2665 HRegionServer     <===


6 Hive搭建


----------------------------------------

环境描述

----------------------------------------

RedhatLinux 7.1(Node50)

apache-hive-2.0.0-bin.tar.gz

mysql-5.7.12-1.el7.x86_64.rpm-bundle.tar


----------------------------------------

安装准备

----------------------------------------

# tar -xvf mysql-5.7.12-1.el7.x86_64.rpm-bundle.tar

# rpm -Uvh mysql-community-common-5.7.12-1.el7.x86_64.rpm

# rpm -Uvh mysql-community-libs-5.7.12-1.el7.x86_64.rpm

# rpm -Uvh mysql-community-libs-compat-5.7.12-1.el7.x86_64.rpm

# rpm -Uvh mysql-community-client-5.7.12-1.el7.x86_64.rpm

# rpm -Uvh mysql-community-server-5.7.12-1.el7.x86_64.rpm


# tar -zxvf apache-hive-2.0.0-bin.tar.gz

# chown -R hadoop:hadoop apache-hive-2.0.0-bin

# mv apache-hive-2.0.0-bin /home/hadoop/hive-2.0.0


----------------------------------------

配置数据库

----------------------------------------

# systemctl start mysqld.service


//获取mysql产生的临时密码

# grep 'temporary password' /var/log/mysqld.log


//输入临时密码

# mysql -u root -p


//修改用户密码

# ALTER USER 'root'@'localhost' IDENTIFIED BY $newpassword;


//创建数据库并赋予权限

mysql> create database hive;

mysql> grant all on hive.* to 'root'@'%' identified by 'ROOT@root@123';

mysql> grant all on *.* to 'root'@'%' identified by 'ROOT@root@123';


//这一步非常重要,否则hive连接mysql会报错

# systemctl restart mysqld.service


----------------------------------------

配置Hive

----------------------------------------

1) /home/hadoop/hive-2.0.0/conf/hive-env.sh

...

export HADOOP_HEAPSIZE=1024

HADOOP_HOME=/home/hadoop/hadoop-2.7.1

export HIVE_CONF_DIR=/home/hadoop/hive-2.0.0/conf

export HIVE_AUX_JARS_PATH=/home/hadoop/hive-2.0.0/lib

...

//修改以上几处


2) /home/hadoop/hive-2.0.0/conf/hive-site.xml

<configuration>  

<property>  

<name>javax.jdo.option.ConnectionURL</name>  

<value>jdbc:mysql://node50:3306/hive?createDatabaseIfNotExist=true</value>

</property>  

<property>  

<name>javax.jdo.option.ConnectionDriverName</name>  

<value>com.mysql.jdbc.Driver</value>

</property>  

<property>  

<name>javax.jdo.option.ConnectionUserName</name>  

<value>root</value>  

</property>  

<property>  

<name>javax.jdo.option.ConnectionPassword</name>  

<value>ROOT@root@123</value>  

</property>  

</configuration>  


----------------------------------------

环境变量

----------------------------------------

1)/home/.bashrc

...

# Hive Config

export HIVE_HOME=/home/hadoop/hive-2.0.0

export PATH=$PATH:$HIVE_HOME/bin


2)复制mysql连接驱动包

cp mysql-connector-java-5.1.18-bin.jar /home/hadoop/hive-2.0.0/lib/


----------------------------------------

Hive启动&验证

----------------------------------------

# schematool -dbType mysql -initSchema

# hive

# show tables;



点及阅读原文到社区“Hadoop”主题,还有更多你感兴趣的内容


以上是关于Hadoop Zookeeper Hbase Hive 分布式集群搭建实例的主要内容,如果未能解决你的问题,请参考以下文章

Ubuntu下搭建单机版的hadoop+hbase+zookeeper

hadoop+zookeeper+hbase 开机自启动

hadoop2.6.2+hbase+zookeeper环境搭建

伪分布式Hadoop + zookeeper + Hbase

Hadoop 2.7.4 + HBase 1.2.6 + ZooKeeper 3.4.10

什么是Zookeeper,Zookeeper的作用是什么,在Hadoop及hbase中具体作用是什么