Ubuntu安装hadoop(3.2.4),hbase(2.4.0),hive(3.1.0),phoenix(5.1.2)集群

Posted fan_bigdata

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Ubuntu安装hadoop(3.2.4),hbase(2.4.0),hive(3.1.0),phoenix(5.1.2)集群相关的知识,希望对你有一定的参考价值。

集群安装

1. 环境准备

1.1 服务器的准备

​ 192.168.12.253 ds1

​ 192.168.12.38 ds2

​ 192.168.12.39 ds3

1.2 修改hostname(所有节点)

​ 在192.168.12.253 节点上 hostnamectl set-hostname ds1

​ 在192.168.12.38 节点上 hostnamectl set-hostname ds2

​ 在192.168.12.39 节点上 hostnamectl set-hostname ds3

1.3 配置节点的IP-主机名映射信息(所有节点)

vi /etc/hosts

新增下面内容

192.168.12.253 ds1
192.168.12.38 ds2
192.168.12.39 ds3

1.4 关闭防火墙(所有节点)

	sudo systemctl stop ufw 
	sudo systemctl disable ufw

1.5 修改SSH配置(所有节点)

	vim /etc/ssh/sshd_config

​ 将 PermitEmptyPasswords no 改为 PermitEmptyPasswords yes

1.6 配置免密登录(所有节点)

生成ssh key:(每个节点执行)

	ssh-keygen -t rsa 

ds1、ds2、ds3上操作互信配置:(每个节点执行)

	ssh-copy-id -i ~/.ssh/id_rsa.pub ds1
	ssh-copy-id -i ~/.ssh/id_rsa.pub ds2
	ssh-copy-id -i ~/.ssh/id_rsa.pub ds3

上同上面操作类似,完成互信配置

1.7 安装jdk(所有节点)

安装命令:

	sudo apt-get install openjdk-8-jre

查看jdk版本:

	java -version

安装路径:

	/usr/lib/jvm/java-8-openjdk-amd64

1.8 安装mysql(ds1)

安装命令:

	sudo apt-get update

	sudo apt-get install mysql-server

初始化配置:

sudo mysql_secure_installation

查看mysql的服务状态:

systemctl status mysql.service

修改配置文件 mysqld.cnf

cd /etc/mysql/mysql.conf.d
vim mysqld.cnf

将 bind-address = 127.0.0.1注释掉

配置mysql的环境变量

vim /etc/profile

添加

export MYSQL_HOME=/usr/share/mysql 

export PATH=$MYSQL_HOME/bin:$PATH

刷新环境变量

source /etc/profile

启动和停止mysql服务

停止:

	sudo service mysql stop

启动:

	sudo service mysql start

进入mysql数据库

	mysql -u root -p introcks1234

2. 安装zookeeper集群

2.1 下载zookeeper安装包

下载地址: https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz

2.2 上传与解压

上传到 /home/intellif中

解压 :

	 tar -zxvf apache-zookeeper-3.8.0-bin.tar.gz  -C /opt

修改文件权限:

	chmod -R 755 apache-zookeeper-3.8.0-bin

2.3 修改配置文件

	cd apache-zookeeper-3.8.0-bin/conf
	cp zoo_sample.cfg zoo.cfg
	vim zoo.cfg

新增下面内容:

server.1=ds1:2888:3888
server.2=ds2:2888:3888
server.3=ds3:2888:3888

注意3888后面不能有空格,否则后面启动时会报错: Address unresolved: ds1:3888

2.4 分发到ds2,ds3中

将zookeeper分发到ds2,ds3中

	scp -r apache-zookeeper-3.8.0-bin ds2:/opt
	scp -r apache-zookeeper-3.8.0-bin ds3:/opt

2.5 配置环境变量(所有节点)

	vim  ~/.profile

添加下面两行:

	export ZOOKEEPER_HOME=/opt/apache-zookeeper-3.8.0-bin
	export PATH=$ZOOKEEPER_HOME/bin:$PATH 


环境变量生效

	source /etc/profile

2.6 创建myid文件

先创建目录/bigdata/zookeeper (所有节点)

	mkdir -p /bigdata/zookeeper
	cd /bigdata/zookeeper

在ds1上执行

	echo 1 > myid

在ds2上执行

	echo 2 > myid

在ds3上执行

	echo 3 > myid

2.7 启动与停止、查看状态命令(所有节点)

启动:

	zkServer.sh start

停止:

	zkServer.sh stop

查看状态:

	zkServer.sh status

3. 安装hadoop高可用集群

3.1集群规划

ds1ds2ds3
NameNodeyesyesno
DataNodeyesyesyes
JournalNodeyesyesyes
NodeManageryesyesyes
ResourceManageryesnono
Zookeeperyesyesyes
ZKFCyesyesno

3.2下载安装包

hadoop版本: 3.2.4

下载地址:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz

3.3上传与解压

上传到服务器/home/intellif下

解压 :

	tar -zxvf hadoop-3.2.4.tar.gz 
	mv hadoop-3.2.4/ /opt

3.4修改配置文件

Hadoop核心配置文件介绍:

文件名称描述
hadoop-env.sh脚本中要用到的环境变量,以运行hadoop
mapred-env.sh脚本中要用到的环境变量,以运行mapreduce(覆盖hadoop-env.sh中设置的变量)
yarn-env.sh脚本中要用到的环境变量,以运行YARN(覆盖hadoop-env.sh中设置的变量)
core-site.xmlHadoop Core的配置项,例如HDFS,MAPREDUCE,YARN中常用的I/O设置等
hdfs-site.xmlHadoop守护进程的配置项,包括namenode和datanode等
mapred-site.xmlMapReduce守护进程的配置项,包括job历史服务器
yarn-site.xmlYarn守护进程的配置项,包括资源管理器和节点管理器
workers具体运行datanode和节点管理器的主机名称
	cd  /opt/hadoop-3.2.4/etc/hadoop

3.4.1 hadoop-env.sh配置修改

	vim hadoop-env.sh

在最后添加:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=hdfs
export HDFS_DATANODE_USER=hdfs
export HDFS_ZKFC_USER=hdfs
export HDFS_JOURNALNODE_USER=hdfs

3.4.2 yarn-env.sh 配置修改

	vim yarn-env.sh

在最后添加(jdk的路径):

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

3.4.3 core-site.xml 配置修改

	vim core-site.xml

将替换为如下内容

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/bigdata/hadoop/tmpdir</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>ds1:2181,ds2:2181,ds3:2181</value>
  </property>

<property>
   <name>hadoop.proxyuser.hdfs.hosts</name>
   <value>*</value>
</property>
<property>
   <name>hadoop.proxyuser.hdfs.groups</name>
   <value>*</value>
</property>

</configuration>

3.4.4 hdfs-site.xml配置修改

	vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 与前面core-site.xml配置的匹配-->
  <property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
 </property>
 <!--给两个namenode起别名nn1,nn2这个名字不固定,可以自己取 -->
 <property>
   <name>dfs.ha.namenodes.mycluster</name>
   <value>nn1,nn2</value>
 </property>
 <!--nn1的rpc通信端口 -->
 <property>
   <name>dfs.namenode.rpc-address.mycluster.nn1</name>
   <value>ds1:8020</value>
 </property>
 <!-- nn1的http通信地址-->
 <property>
   <name>dfs.namenode.http-address.mycluster.nn1</name>
   <value>ds1:50070</value>
 </property>
 <!--nn2的rpc通信地址 -->
 <property>
   <name>dfs.namenode.rpc-address.mycluster.nn2</name>
   <value>ds2:8020</value>
 </property>
 <!--nn2的http通信地址 -->
 <property>
   <name>dfs.namenode.http-address.mycluster.nn2</name>
   <value>ds2:50070</value>
 </property>
 <!--设置共享edits文件夹:告诉集群哪些机器要启动journalNode -->
 <property>
   <name>dfs.namenode.shared.edits.dir</name>
   <value>qjournal://ds1:8485;ds2:8485;ds3:8485/mycluster</value>
 </property>
 <!-- journalnode的edits文件存储目录-->
 <property>
   <name>dfs.journalnode.edits.dir</name>
   <value>/bigdata/hadoop/journal</value>
 </property>
 <!--该配置开启高可用自动切换功能 -->
 <property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>
 <!--配置切换的实现方式 -->
 <property>
   <name>dfs.client.failover.proxy.provider.mycluster</name>
   <value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <!--配置隔离机制 -->
 <property>
   <name>dfs.ha.fencing.methods</name>
   <value>sshfence</value>
 </property>
<!--生成的秘钥所存储的目录 -->
 <property>
   <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/hdfs/.ssh/id_rsa</value>
 </property>
<!-- 配置namenode存储元数据的目录-->
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:///bigdata/hadoop/namenode </value>
 </property>
<!-- 配置datanode存储数据块的目录-->
 <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///bigdata/hadoop/datanode </value>
 </property>
<!--指定block副本数为3,该数字不能超过主机数。 -->
 <property>
    <name>dfs.replication</name>
    <value>3</value>
 </property>
<!--设置hdfs的操作权限,设置为false表示任何用户都可以在hdfs上操作并且可以使用插件 -->
 <property>
	 <name>dfs.permissions.enabled</name>
    <value>false</value>
 </property>
</configuration>

3.4.5 mapred-site.xml配置修改

	vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>  
        /opt/hadoop-3.2.4/share/hadoop/common/*,
        /opt/hadoop-3.2.4/share/hadoop/common/lib/*,
        /opt/hadoop-3.2.4/share/hadoop/hdfs/*,
        /opt/hadoop-3.2.4/share/hadoop/hdfs/lib/*,
        /opt/hadoop-3.2.4/share/hadoop/mapreduce/*,
        /opt/hadoop-3.2.4/share/hadoop/mapreduce/lib/*,
        /opt/hadoop-3.2.4/share/hadoop/yarn/*,
        /opt/hadoop-3.2.4/share/hadoop/yarn/lib/*
    </value>
  </property>
</configuration>

3.4.6 yarn-site.xml配置修改

	vim  yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>
<property>
    <!--配置启用fair调度器-->
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
  <!-- yarn ha configuration-->
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <!-- 定义集群名称 -->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster1</value>
  </property>
  <!-- 定义本机在在高可用集群中的id 要与 yarn.resourcemanager.ha.rm-ids 定义的值对应,如果不作为resource manager 则删除这项配置。-->
  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm1</value>
  </property>
  <!-- 定义高可用集群中的 id 列表 -->
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <!-- 定义高可用RM集群具体是哪些机器 -->
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>ds1</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>ds2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>ds1:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>ds2:8088</value>
  </property>
  <property>
    <name>hadoop.zk.address</name>
    <value>ds1:2181,ds2:2181,ds3:2181</value>
  </property>
  <!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

 <!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
 <property>
  <name>yarn.nodemanager.pmem-check-enabled</name>
  <value>false</value>
 </property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是 true -->
 <property>
  <name>yarn.nodemanager.vmem-check-enabled</name>
  <value>false</value>
 </property>
<!--设置该节点上yarn可使用的内存,默认为8G,如果节点内存资源不足8G,要减少这个值,yarn不会智能的去检测内存资源,一般这个设置yarn的可用内存资源-->
 <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>204800</value>
  </property>
<!--单个任务最小申请物理内存量,默认1024MB,根据自己的业务设定-->
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>8192</value>
  </property>
<!--单个任务最大申请物理内存量,默认为8291MB-->
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>614400</value>
  </property>
  
  <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>8192</value>
  </property> 
  <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx6553m</value>
  </property>
  <!--表示该节点服务器上yarn可以使用的虚拟CPU个数,默认是8-->
 <property>
  <name>yarn.nodemanager.resource.cpu-vcores</name>
  <value>32</value>
 </property>
 <!--单个任务最大可申请的虚拟核数,默认为4,如果申请资源时,超过这个配置,会抛出InvalidResourceRequestException-->
 <property>
  连接数据库失败提示hba.conf不符合的处理方法

flume-kafka-storm-hdfs-hadoop-hbase

Ubuntu 安装postgres后,解决可接受任何密码或无密码的问题

在 ubuntu 上安装 hadoop

Ubuntu-64位下hadoop安装

Ubuntu16下Hadoop安装