centos8上安装hbase

Posted PacosonSWJTU

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了centos8上安装hbase相关的知识,希望对你有一定的参考价值。

【README】

1.本文部分内容转自:

https://computingforgeeks.com/how-to-install-apache-hadoop-hbase-on-centos-7/

2.本文是在单机上安装hbase (仅用于学习交流); 


【1】更新系统

因为 hadoop和hbase是动态的,为便于hbase能够最大限度访问系统资源和网络权限,安装hbase前先关闭 SELinux与防火墙;

sudo systemctl disable --now firewalld
sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config
cat /etc/selinux/config | grep SELINUX= | grep -v '#'

更新系统(软件包)并重启

sudo yum -y install epel-release
sudo yum -y install vim wget curl bash-completion
sudo yum -y update
sudo reboot


【2】安装java  

sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel

校验java版本 

[root@centos202 ~]# java -version
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)

设置 JAVA_HOME 环境变量 

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=\\$(dirname \\$(dirname \\$(readlink \\$(readlink \\$(which javac)))))
export PATH=\\$PATH:\\$JAVA_HOME/bin
EOF

更新 $PATH变量和设置

source /etc/profile.d/hadoop_java.sh
[root@centos202 profile.d]# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64

【3】创建hadoop账号

创建独立的hadoop账号; 

sudo adduser hadoop
passwd hadoop
sudo usermod -aG wheel hadoop

生成ssh key用于免密登录

[root@centos202 ~]# sudo su - hadoop
[hadoop@centos202 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:HuqG5V6O7Od64sQdmXUFedVNc/GEda4ujA+xuivxs2k hadoop@centos202
The key's randomart image is:
+---[RSA 3072]----+
|            .o.B@|
|            ..ooB|
|          . ..  o|
|         + .   . |
|        S .   .  |
|     .o+ o = .   |
|     ++o+ + o .  |
|    .+=+Eo o .   |
|     =OOO=  .    |
+----[SHA256]-----+

把用户hadoop添加到ssh免密登录授权列表;

[hadoop@centos202 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@centos202 ~]$ chmod 0600 ~/.ssh/authorized_keys

使用生成的ssh key登录本机

[hadoop@centos202 ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:EdoFy44sWPaZHE6jgJCVGkbGKxK63ToPAP24sQ2Gj3Y.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Sun Mar  5 03:56:07 2023

【4】下载并安装hadoop 

下载 hadoop, hadoop安装包参见 https://hadoop.apache.org/releases.html ;

方式1)可以用 wget直接下载

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-$RELEASE/hadoop-$RELEASE.tar.gz

方式2)利用代理下载到本地(window10),然后通过 rz 从windows传输到 centos(本文采用);

本文版本:hadoop-3.2.4.tar.gz 

[root@centos202 hadoop]# ls -l
total 480832
-rwxrwxrwx. 1 root root 492368219 Jan 30 22:20 hadoop-3.2.4.tar.gz

解压 

tar -xzvf hadoop-3.2.4.tar.gz 
# 结果
[root@centos202 hadoop-3.2.4]# pwd
/usr/local/hadoop/hadoop-3.2.4
[root@centos202 hadoop-3.2.4]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

把 hadoop家目录添加到 $PATH 

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=\\$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.2.4
export HADOOP_HDFS_HOME=\\$HADOOP_HOME
export HADOOP_MAPRED_HOME=\\$HADOOP_HOME
export YARN_HOME=\\$HADOOP_HOME
export HADOOP_COMMON_HOME=\\$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=\\$HADOOP_HOME/lib/native
export PATH=\\$PATH:\\$JAVA_HOME/bin:\\$HADOOP_HOME/bin:\\$HADOOP_HOME/sbin
EOF

source命令刷新当前执行环境,添加hadoop_java.sh文件中定义的环境变量;

source /etc/profile.d/hadoop_java.sh

查看hadoop版本:

[root@centos202 hadoop-3.2.4]# hadoop version
Hadoop 3.2.4
Source code repository Unknown -r 7e5d9983b388e372fe640f21f048f2f2ae6e9eba
Compiled by ubuntu on 2022-07-12T11:58Z
Compiled with protoc 2.5.0
From source with checksum ee031c16fe785bbb35252c749418712
This command was run using /usr/local/hadoop/hadoop-3.2.4/share/hadoop/common/hadoop-common-3.2.4.jar

【5】配置hadoop 

所有hadoop的配置都在 /usr/local/hadoop/hadoop-3.2.4/etc/hadoop 目录下; 

[root@centos202 hadoop]# pwd
/usr/local/hadoop/hadoop-3.2.4/etc/hadoop
[root@centos202 hadoop]# ls 
capacity-scheduler.xml      hadoop-policy.xml                 kms-acls.xml          mapred-queues.xml.template     yarn-env.cmd
configuration.xsl           hadoop-user-functions.sh.example  kms-env.sh            mapred-site.xml                yarn-env.sh
container-executor.cfg      hdfs-site.xml                     kms-log4j.properties  shellprofile.d                 yarnservice-log4j.properties
core-site.xml               httpfs-env.sh                     kms-site.xml          ssl-client.xml.example         yarn-site.xml
hadoop-env.cmd              httpfs-log4j.properties           log4j.properties      ssl-server.xml.example
hadoop-env.sh               httpfs-signature.secret           mapred-env.cmd        user_ec_policies.xml.template
hadoop-metrics2.properties  httpfs-site.xml                   mapred-env.sh         workers

许多配置文件需要修改以完成hadoop的安装; 


【5.1】hadoop-env.sh 

编辑 hadoop-env.sh 文件的 JAVA_HOME (54行) 

vim hadoop-env.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac))))) 
  1. 编辑 core-site.xml  文件 

core-site.xml 文件包含hadoop集群启动所需信息,其属性包括:

  • hadoop实例端口号; 

  • 文件系统分配内存大小; 

  • 数据存储的内存限制; 

  • 读写缓冲区大小; 

编辑如下: 在 <configuration>元素内新增文件系统属性:

<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://localhost:9000</value>
                <description>The default file system URI</description>
        </property>
 </configuration>

【5.2】hdfs-site.xml 

这个文件是集群中所有主机都需要配置的问题。其包含的内容如下。

  • namenode和datanode在文件系统中的路径;

  • 副本数据的值

  1. 创建namenode 与 datanode 文件夹, 把 hadoop文件所有者修改为 hadoop:hadoop  

[hadoop@centos202 hadoop]$ sudo mkdir -p /hadoop/hdfs/namenode,datanode
[sudo] password for hadoop: 
[hadoop@centos202 hadoop]$ 
[hadoop@centos202 hadoop]$ sudo chown -R hadoop:hadoop /hadoop
  1. 编辑hdfs-site.xml ,如下:

<configuration>
   <property>
         <name>dfs.replication</name>
         <value>1</value>
   </property>
     
   <property>
         <name>dfs.name.dir</name>
         <value>file:///hadoop/hdfs/namenode</value>
   </property>
               
   <property>
        <name>dfs.data.dir</name>
        <value>file:///hadoop/hdfs/datanode</value>
  </property>
</configuration>

【5.3】mapred-site.xml 

用于设置 mapreduce框架; 

编辑如下:

<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
</configuration>

【5.4】yarn-site.xml 

yarn-site.xml 定义了资源管理和job调度逻辑。编辑如下;

<configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
</configuration>

【6】验证hadoop配置(启动hadoop)

  1. 切换到haodop , 

sudo su - hadoop

【6.1】格式化 hdfs namenode 

格式化的作用是: 删除hdfs的所有文件夹;临时文件夹包含 datanode和namenode,如果格式化namenode,这些文件都会变为空。

小结:namenode维护了与datanode关联的元数据,当我们格式化时,也会格式化这些元数据,以便新数据复用。

you also refer2 https://stackoverflow.com/questions/27143409/what-the-command-hadoop-namenode-format-will-do

【6.2】启动hdfs

[hadoop@centos202 ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [centos202]
centos202: Warning: Permanently added 'centos202,192.168.163.202' (ECDSA) to the list of known hosts.
2023-03-05 05:18:42,008 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@centos202 ~]$ 

【6.3】启动yarn 

[hadoop@centos202 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

【6.4】hadoop web ui界面

1)hadoop3.x 的 默认web ui端口如下:

  • namenode(hadoop仪表盘):9870

  • resource manager(hadop集群概览): 8088

  • MapReduce job history server:19888 

 我们导出hadoop使用的端口:

[hadoop@centos202 ~]$ ss -tunelp | grep java
tcp   LISTEN 0      128          0.0.0.0:8030       0.0.0.0:*    users:(("java",pid=15007,fd=320)) uid:1000 ino:81561 sk:1 <-> 
tcp   LISTEN 0      128          0.0.0.0:8031       0.0.0.0:*    users:(("java",pid=15007,fd=310)) uid:1000 ino:80766 sk:2 <-> 
tcp   LISTEN 0      128          0.0.0.0:8032       0.0.0.0:*    users:(("java",pid=15007,fd=330)) uid:1000 ino:81957 sk:3 <-> 
tcp   LISTEN 0      128          0.0.0.0:8033       0.0.0.0:*    users:(("java",pid=15007,fd=299)) uid:1000 ino:80016 sk:4 <-> 
tcp   LISTEN 0      128          0.0.0.0:41059      0.0.0.0:*    users:(("java",pid=15158,fd=306)) uid:1000 ino:88126 sk:5 <-> 
tcp   LISTEN 0      128        127.0.0.1:44965      0.0.0.0:*    users:(("java",pid=14546,fd=279)) uid:1000 ino:74525 sk:6 <-> 
tcp   LISTEN 0      128          0.0.0.0:8040       0.0.0.0:*    users:(("java",pid=15158,fd=317)) uid:1000 ino:88014 sk:7 <-> 
tcp   LISTEN 0      128          0.0.0.0:9864       0.0.0.0:*    users:(("java",pid=14546,fd=308)) uid:1000 ino:74543 sk:8 <-> 
tcp   LISTEN 0      128        127.0.0.1:9000       0.0.0.0:*    users:(("java",pid=14418,fd=285)) uid:1000 ino:70479 sk:9 <-> 
tcp   LISTEN 0      128          0.0.0.0:8042       0.0.0.0:*    users:(("java",pid=15158,fd=328)) uid:1000 ino:88858 sk:a <-> 
tcp   LISTEN 0      128          0.0.0.0:9866       0.0.0.0:*    users:(("java",pid=14546,fd=278)) uid:1000 ino:74483 sk:b <-> 
tcp   LISTEN 0      128          0.0.0.0:9867       0.0.0.0:*    users:(("java",pid=14546,fd=309)) uid:1000 ino:74560 sk:c <-> 
tcp   LISTEN 0      128          0.0.0.0:9868       0.0.0.0:*    users:(("java",pid=14770,fd=279)) uid:1000 ino:77941 sk:d <-> 
tcp   LISTEN 0      128          0.0.0.0:9870       0.0.0.0:*    users:(("java",pid=14418,fd=274)) uid:1000 ino:70244 sk:e <-> 
tcp   LISTEN 0      128          0.0.0.0:8088       0.0.0.0:*    users:(("java",pid=15007,fd=289)) uid:1000 ino:78820 sk:10 <->
tcp   LISTEN 0      128          0.0.0.0:13562      0.0.0.0:*    users:(("java",pid=15158,fd=327)) uid:1000 ino:89419 sk:11 <->

2)访问  centos202:9870  查看hadoop 数据仪表盘 (虚拟机主机名为centos202,也可以通过ip地址访问)

3)访问 centos202:8088 查看hadoop集群概览 

【6.5】创建 hdfs 文件夹 

[hadoop@centos202 ~]$ hadoop fs -mkdir /test
[hadoop@centos202 ~]$ 
[hadoop@centos202 ~]$ hadoop fs -ls /
drwxr-xr-x   - hadoop supergroup          0 2023-03-05 05:29 /test

【补充】停止 hadoop 服务 , 停止 hdfs,yarn

[hadoop@centos202 ~]$ stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [centos202]
[hadoop@centos202 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
[hadoop@centos202 ~]$ 

【7】安装hbase 

【7.1】 下载并安装hbase

  1. hbase安装包, refer2 http://apache.mirror.gtcomm.net/hbase/  

  1. 本文用的版本是 hbase-2.4.15 ; 可以用 wget,也可以用代理下直到本地,然后用rz传输到centos(本文采用这种);

  1. 解压  

sudo tar -xzvf hbase-2.4.15-bin.tar.gz
[hadoop@centos202 hbase-2.4.15]$ pwd
/usr/local/hbase/hbase-2.4.15
[hadoop@centos202 hbase-2.4.15]$ ls
bin  CHANGES.md  conf  docs  hbase-webapps  LEGAL  lib  LICENSE.txt  NOTICE.txt  README.txt  RELEASENOTES.md
  1. 更新 $PATH 环境变量 

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.2.4
export HADOOP_HDFS_HOME=\\$HADOOP_HOME
export HADOOP_MAPRED_HOME=\\$HADOOP_HOME
export YARN_HOME=\\$HADOOP_HOME
export HADOOP_COMMON_HOME=\\$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=\\$HADOOP_HOME/lib/native
export HBASE_HOME=/usr/local/hbase/hbase-2.4.15
export PATH=\\$PATH:\\$JAVA_HOME/bin:\\$HADOOP_HOME/bin:\\$HADOOP_HOME/sbin:\\$HBASE_HOME/bin
EOF

5.刷新shell环境变量, 并验证 HBASE_HOME

[hadoop@centos202 hbase-2.4.15]$ source /etc/profile.d/hadoop_java.sh
[hadoop@centos202 conf]$ echo $HBASE_HOME
/usr/local/hbase/hbase-2.4.15 

6.编辑 hbase-env.sh   , 设置 JAVA_HOME 

[hadoop@centos202 conf]$ pwd
/usr/local/hbase/hbase-2.4.15/conf
[hadoop@centos202 conf]$ 
[hadoop@centos202 conf]$ 
[hadoop@centos202 conf]$ vim hbase-env.sh

修改28行为:

export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))

【7.2】配置 hbase  (单机安装)

1)就像配置hadoop 一样, 配置hbase; hbase所有的配置文件在  /usr/local/hbase/hbase-2.4.15/conf 目录下;  

2)单机模式下: 所有后台线程(HMaster, HRegionServer, zk)运行在单虚拟机上; 

【7.2.1】创建hbase根文件夹  

[hadoop@centos202 conf]$ sudo mkdir -p /hadoop/hbase/hfile
[hadoop@centos202 conf]$ sudo mkdir -p /hadoop/zookeeper
[hadoop@centos202 conf]$ sudo chown -R hadoop:hadoop /hadoop/

【7.2.2】编辑 hbase-site.xml 文件   

<configuration>
   <property>
      <name>hbase.rootdir</name>
      <value>file:/hadoop/hbase/hfile</value>
   </property>
    
   <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/hadoop/zookeeper</value>
   </property>
</configuration>

【补充】默认情况下, 除非你配置了 hbase.rootdir ,否则 你的数据仍然存储在 /tmp/ 目录下;  


【7.3】启动hbase

  1. 启动 hbase 

[hadoop@centos202 conf]$ start-hbase.sh 

【补充】hbase安装到集群可以参考 https://computingforgeeks.com/how-to-install-apache-hadoop-hbase-on-centos-7/ 的 option 2.


【7.4】管理HMaster和 HRegionServer (仅参考)

HMaster服务器 控制hbase集群。你可以启动最多9个备用HMaster服务器,共计10个。

HRegionServer  按照 HMaster的指示去管理 StoreFile中的数据。 一般情况,一个HRegionServer 运行在集群的一个节点上。

HMaster 和 HRegionServer 分别用 命令 local-master-backup.sh , local-regionservers.sh 来启动和停止,如下。

local-master-backup.sh start 2 # 启动备用HMaster
local-regionservers.sh start 3 # 启动多个 RegionServers
local-regionservers.sh stop 3 # 停止多个 RegionServers

【补充】

每一个HMaster 使用2个端口(160000 16010)。


【8】启动hbase shell脚本  

  1. hadoop 与 hbase 应该在 运行hbase shell 之前运行,如下: 

start-dfs.sh 
start-yarn.sh  
start-hbase.sh   

【补充】 start-all.sh 可以代替start-dfs 和 start-yarn.sh 

  1. 启动 hbase shell

hbase shell 
  1. 关闭 hbase 

[hadoop@centos202 conf]$ stop-hbase.sh 

以上是关于centos8上安装hbase的主要内容,如果未能解决你的问题,请参考以下文章

大数据技术之HBaseHBase简介HBase快速入门HBase进阶

大数据技术之HBaseHBase简介HBase快速入门HBase进阶

HBaseHBase入门简介

大数据常用软件安装指南

HbaseHbase 一些面试题

HBaseHBase集群Shell操作