hbase 数据库深入使用与相关数据的加载

Posted 2020-10-30

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hbase 数据库深入使用与相关数据的加载相关的知识，希望对你有一定的参考价值。

一：hbase 数据检索流程

二：hbase 数据库java api 调用

三：hbase 各个服务的作用

四：hbase 与mapreduce集成

五：hbase 使用BulkLoad 加载数据

一：hbase 数据检索流程

1.1 hbase 数据检索流程图：

技术分享图片

1.2 hbase 读的流程：

读流程：
1、client请求zookeeper集群（root/meta）(meta)
        --有多少table，table有哪些region（startrow、stoprow）
2、client找到region对应的region server
3、region server响应客户端请求

1.3. hhbase 写的流程

1、client请求zookeeper集群，该数据应该写入哪个region
2、向region所在的region server 发起写请求
3、数据先写进HLOG（WAL）
4、然后写入memstore（flush） 
5、当memstore达到阀值，写入storefile（compact）
6、当storefile达到阀值，合并成新的storefile
7、当region达到阀值，当前region会划分为两个新的region（split）

1.4 hbase 读写流程存储核心的三个机制

1. flush机制：当memstore满了以后会flush陈一个storefile
2. compact机制：当storefile达到阀值，合并storefile，合并过程中cell版本合并和数据删除
3. split机制：当region不断增大，达到阀值，region会分成两个新的region

二：hbase 数据库java api 调用

2.1 eclipse 环境配置

 更改maven 的源：
 上传repository.tar.gz

cd .m2
mv repository repository.bak2016612 
rz repository.tar.gz 
tar -zxvf repository.tar.gz 

cd /home/hadoop/yangyang/hbase
cp -p hbase-site.xml log4j.properties /home/hadoop/workspace/studyhbase/src/main/rescourse

更改eclipse 的pom.xml

增加：

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>0.98.6-hadoop2</version>
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>0.98.6-hadoop2</version>
</dependency>

2.2 hbase java api 掉用：

package org.apache.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class HbaseOperation {

    /**
     * 
     * @param args
     * @throws IOException
     */

    public static HTable getTable(String tableName) throws IOException {
        // Get configuration
        Configuration conf = HBaseConfiguration.create();

        // Get Table
        HTable table = new HTable(conf, tableName);

        return table;
    }

    public static void getData() throws IOException {
        HTable table = HbaseOperation.getTable("user");
        // Get Data
        Get get = new Get(Bytes.toBytes("1001"));
        Result result = table.get(get);
        Cell[] cells = result.rawCells();

        for (Cell cell : cells) {
            System.out.print(Bytes.toString(CellUtil.cloneFamily(cell)) + ":");
            System.out.print(Bytes.toString(CellUtil.cloneQualifier(cell))
                    + "==>");
            System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));
        }
        table.close();
    }

    /**
     * 
     * @param args
     * @throws IOException
     */

    public static void putData() throws IOException {
        HTable table = HbaseOperation.getTable("user");

        Put put = new Put(Bytes.toBytes("1004"));

        put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),
                Bytes.toBytes("zhaoliu"));
        put.add(Bytes.toBytes("info"), Bytes.toBytes("age"),
                Bytes.toBytes("50"));
        put.add(Bytes.toBytes("info"), Bytes.toBytes("sex"),
                Bytes.toBytes("male"));

        table.put(put);
        table.close();
    }

    public static void main(String[] args) throws IOException {
        HTable table = HbaseOperation.getTable("user");

        Scan scan = new Scan();
        scan.setStartRow(Bytes.toBytes("1001")) ;
        scan.setStopRow(Bytes.toBytes("1002")) ;
        ResultScanner resultScanner = table.getScanner(scan);

        for (Result res : resultScanner) {
            Cell[] ress = res.rawCells();
            for (Cell cell : ress) {
                System.out.print(Bytes.toString(CellUtil.cloneRow(cell))
                        + "\t");
                System.out.print(Bytes.toString(CellUtil.cloneFamily(cell))
                        + ":");
                System.out.print(Bytes.toString(CellUtil.cloneQualifier(cell))
                        + "==>");
                System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));
            }
            table.close();

        }

    }
}

三：hbase 各个服务的作用

3.1 Hmaster 作用：

    1、为region server分配region
    2、负责region server的负责均衡
    3、发现失效的region server，需要重新分配其上的region
    4、监听zk,基于zookeeper感应region server的上下线
    5、监听zk,基于zookeeper来保证HA
    6、不参与客户端数据读写访问
    7、负载低（通常情况下可以把它和其他服务器（NN/SNN）整合在一起）
    8、无单点故障（SPOF）

3.2 Hregionserver 作用：

    1、维护master分配给它的region
    2、响应客户端的IO访问请求（读写）
    3、处理region的flush、compact、split
    4、维护region的cache

3.4 zookeeper 作用：

    1、保证集群里面只有一个master（HA）
    2、保存了root region的位置（meta），访问入口地址
    3、实时监控region server的状态，及时通知region server上下线消息给master
    4、存储了hbase的schema，包括哪些table，每个表有哪些列簇

四：hbase 与mapreduce集成

4.1 hbase 获取jar命令

bin/hbase mapredcp

4.2 配置环境变量

vim .bash_profile
export HADOOP_HOME=/home/hadoop/yangyang/hadoop
export HBASE_HOME=/home/hadoop/yangyang/hbase
export HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase mapredcp`
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:${HADOOP_HOME}/bin:${MAVEN_HOME}/bin:${HBASE_HOME}:${HADOOP_CLASSPATH}
soure .bash_profile

技术分享图片

4.3 统计一个hbase表：

cd /home/hadoop/yangyang/hadoop

bin/yarn jar /home/hadoop/yangyang/hbase/lib/hbase-server-0.98.6-cdh5.3.6.jar rowcounter user

技术分享图片

4.4 导入一个生成的hbase 表的in.tsv

vim in.tsv
---
10010   zhangsan        30      shanghai
10011   lisi    31      beijin
10012   wangwu  32      shanghai
10013   zaoliu  30      beijin

hdfs dfs -put in.tsv /input

yarn jar /home/hadoop/yangyang/hbase/lib/hbase-server-0.98.6-cdh5.3.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:address user /input/in.tsv

技术分享图片

五：使用BulkLoad加载数据

vim out.tsv 
110     zhangsan        30      shanghai
111     lisi    31      beijin
112     wangwu  32      shanghai
113     zaoliu  30      beijin

hdfs dfs -put out.tsv /input 
<!-- 将tsv 文件转换成hfile 文件（在hdfs 上面）-->
yarn jar /home/hadoop/yangyang/hbase/lib/hbase-server-0.98.6-cdh5.3.6.jar importtsv -Dimporttsv.bulk.output=/hfileoutput/ -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:tel user /input/out.tsv

<!-- 将hfile 加载到hbase 的表中
yarn jar /home/hadoop/yangyang/hbase/lib/hbase-server-0.98.6-cdh5.3.6.jar completebulkload /hfileoutput user

技术分享图片

5.1 hbase 表中提取相关字段，生成新的表

提取hbase 表中的user 表的name 与age 字段生成新表student

package org.apache.hadoop.studyhbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class User2StudentMapReduce extends Configured implements Tool{

    // step 1: Mapper
    public static class ReadUserMapper //
            extends TableMapper<ImmutableBytesWritable, Put>{

        @Override
        protected void map(ImmutableBytesWritable key, Result value,
                Context context)
                        throws IOException, InterruptedException {
            // user: name & age  ->  student: name & age : put
            // create Put
            Put put = new Put(key.get()) ;
            // add column
            for(Cell cell: value.rawCells()){
                // add family: info
                if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
                    // add column: name
                    if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
                        put.add(cell) ;
                    //  CellUtil.cloneValue(cell)
                    //  put.add(family, qualifier, value) ;
                    }
                    // add column: age
                    else if("age".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
                        put.add(cell) ;
                    }
                }
            }
            // context output
            context.write(key, put);
        }

    }

    // step 2: Reducer
    public static class WriteStudentReducer //
            extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{

        @Override
        protected void reduce(ImmutableBytesWritable key, Iterable<Put> values,
                Context context) throws IOException, InterruptedException {
            for(Put put : values){
                context.write(NullWritable.get(), put);
            }
        }
    }

    // step 3: Driver
    public int run(String[] args) throws Exception {
        // 1) Configuration
        Configuration conf = this.getConf();

        // 2) create job
        Job job = Job.getInstance(conf, this.getClass().getSimpleName()) ;
        job.setJarByClass(User2StudentMapReduce.class);

        // 3) set job
        // input -> mapper -> shuffle -> reducer -> output
        Scan scan = new Scan() ;
        scan.setCacheBlocks(false);
        scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs

        TableMapReduceUtil.initTableMapperJob(
                  "user",        // input table
                  scan,               // Scan instance to control CF and attribute selection
                  ReadUserMapper.class,     // mapper class
                  ImmutableBytesWritable.class,         // mapper output key
                  Put.class,  // mapper output value
                  job //
          );

        TableMapReduceUtil.initTableReducerJob(
                  "student",        // output table
                  WriteStudentReducer.class,    // reducer class
                  job //
            );

        job.setNumReduceTasks(1);   // at least one, adjust as required

        boolean isSuccess = job.waitForCompletion(true);
        if (!isSuccess) {
          throw new IOException("error with job!");
        }
        return isSuccess ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = HBaseConfiguration.create();
        int status = ToolRunner.run(//
                conf, //
                new User2StudentMapReduce(), //
                args //
            );
        System.exit(status);
    }

}

去hbase 上面新建空表student
create ‘student‘,‘info‘

技术分享图片

导出新生成jar包User2Student.jar运行：

cd /home/hadoop/jars
yarn jar User2Student.jar

技术分享图片

查询hbase进行验证：

技术分享图片

以上是关于hbase 数据库深入使用与相关数据的加载的主要内容，如果未能解决你的问题，请参考以下文章

hbase 数据库深入使用与相关数据的加载

一：hbase 数据检索流程

1.1 hbase 数据检索流程图：

1.2 hbase 读的流程：

1.3. hhbase 写的流程

1.4 hbase 读写流程存储核心的三个机制

二：hbase 数据库java api 调用

2.1 eclipse 环境配置

更改eclipse 的pom.xml

增加：

2.2 hbase java api 掉用：

三：hbase 各个服务的作用

3.1 Hmaster 作用：

3.2 Hregionserver 作用：

3.4 zookeeper 作用：

四：hbase 与mapreduce集成

4.1 hbase 获取jar命令

4.2 配置环境变量

4.3 统计一个hbase表：

4.4 导入一个生成的hbase 表的in.tsv

五： 使用BulkLoad加载数据

5.1 hbase 表中提取相关字段，生成新的表

导出新生成jar包User2Student.jar运行：

查询hbase进行验证：

五：使用BulkLoad加载数据