Hbase入门整理

Posted master-dragon

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hbase入门整理相关的知识,希望对你有一定的参考价值。

目录

hbase 安装

  • 配置文件 conf/hbase-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home
export HBASE_CLASSPATH=/Users/mubi/hadoop/hbase-2.2.6/conf
export HBASE_MANAGES_ZK=true
  • 配置文件 hbase-site.xml
<property>
  <name>hbase.rootdir</name>
  <value>file:////Users/mubi/hadoop/data/hbase</value>
</property>
//或者
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:9000/hbase</value>
</property>
  • 环境变量
export HBASE_HOME=/Users/mubi/hadoop/hbase-2.2.6
export PATH=$PATH:$HBASE_HOME/bin

https://hbase.apache.org/book.html#datamodel

  • 命令
./bin/start-hbase.sh
./bin/stop-hbase.sh
hbase shell
  • 正常启动
mubi@mubideMacBook-Pro hbase-2.2.6 $ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/mubi/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/mubi/hadoop/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020
Took 0.0027 seconds
hbase(main):001:0> list
TABLE
0 row(s)
Took 0.4003 seconds
=> []
hbase(main):002:0>

Hbase 基础

hbase官网文档

Hbase 架构


读流程

  • 先获取元数据,知晓数据的存储位置
  • 然后发起真正的读内存/磁盘操作

写流程


写操作是写hlog:先内存缓冲区,然后会持久化到磁盘

Flush 刷写

hbase-default.xml配置文件

  • hbase.regionserver.global.memorystore.size: 触发刷写到storefile的整个RegionServer最大内存,默认是堆的40%
  • optionalcacheflushinterval: RegionServer中任一Region的MemoryStore时间间隔达到该值,触发刷写,默认1小时

这两个刷写机制会触发整个RegionServer的所有MemoryStore刷写

  • hbase.hregion.memstore.flush.size: 单个region的memory store达到某个上限,会触发该memory store刷写,默认128MB
Compacy 合并小文件

因为可能有一些memory store数据量很少的时候被刷写,因此可能存在刷写到磁盘的小文件,这就需要定时进行合并

  • hbase.hregion.majorcompaction: 默认是7天,但该操作非常耗资源,因此生产环境下应该关闭,空闲时手动打开
  • hbase.hstore.compactionThreshold: 当一个region的storeFile个数超过一定数量,自动进行合并,默认是3

Hbase 数据模型

表的各种基础操作

创建并显示表

hbase(main):002:0>    create 'student','info','course'
Created table student
Took 1.3815 seconds
=> Hbase::Table - student
hbase(main):003:0>
hbase(main):004:0* list
TABLE
student
1 row(s)
Took 0.0295 seconds
=> ["student"]
hbase(main):005:0>

修改/查看表结构

hbase(main):002:0> alter 'student','NAME'=>'course','VERSIONS'=>'3'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.2785 seconds
hbase(main):003:0> desc 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
NAME => 'course', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION
_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals
e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICA
TION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMO
RY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'fals
e', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'

NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_B
EHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false'
, DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATI
ON_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY
 => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false'
, COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'

2 row(s)

QUOTAS
0 row(s)
Took 0.1168 seconds
hbase(main):004:0>

输入数据

info列族:name、age、sex、dept
course列族:english、math、physics

put 'student','201601','info:name','liu',4
put 'student','201601','info:age',15
put 'student','201601','info:sex','nv'
put 'student','201601','info:dept','PE'

put 'student','201602','info:name','wang'
put 'student','201602','info:age',16,7
put 'student','201602','info:sex','nan'
put 'student','201602','info:dept','PC'

put 'student','201603','info:name','sun',6
put 'student','201603','info:age',19
put 'student','201603','info:sex','nv'
put 'student','201603','info:dept','JAVA'

put 'student','201601','course:english',72,3
put 'student','201601','course:math',79
put 'student','201601','course:physics',82

put 'student','201602','course:english',62
put 'student','201602','course:math',68,8
put 'student','201602','course:physics',49

put 'student','201603','course:english',73,8
put 'student','201603','course:math',69
put 'student','201603','course:physics',48,6

get查看数据

hbase(main):004:0> get 'student','201601'
COLUMN                CELL
 course:english       timestamp=3, value=72
 course:math          timestamp=1610862600400, value=79
 course:physics       timestamp=1610862683257, value=82
 info:age             timestamp=1610862599809, value=15
 info:dept            timestamp=1610862599915, value=PE
 info:name            timestamp=4, value=liu
 info:sex             timestamp=1610862599862, value=nv
1 row(s)
Took 0.0528 seconds
hbase(main):005:0> get 'student','201602'
COLUMN                CELL
 course:physics       timestamp=1610862689724, value=49
 info:age             timestamp=7, value=16
 info:dept            timestamp=1610862600086, value=PC
 info:name            timestamp=1610862599977, value=wang
 info:sex             timestamp=1610862600046, value=nan
1 row(s)
Took 0.0257 seconds
hbase(main):006:0> get 'student','201603'
COLUMN                CELL
 course:english       timestamp=8, value=73
 course:math          timestamp=1610862600605, value=69
 course:physics       timestamp=6, value=48
 info:age             timestamp=1610862600198, value=19
 info:dept            timestamp=1610862600282, value=JAVA
 info:name            timestamp=6, value=sun
 info:sex             timestamp=1610862600239, value=nv
1 row(s)
Took 0.0182 seconds
hbase(main):007:0>

put更新数据

hbase(main):006:0> get 'student','201603'
COLUMN                CELL
 course:english       timestamp=8, value=73
 course:math          timestamp=1610862600605, value=69
 course:physics       timestamp=6, value=48
 info:age             timestamp=1610862600198, value=19
 info:dept            timestamp=1610862600282, value=JAVA
 info:name            timestamp=6, value=sun
 info:sex             timestamp=1610862600239, value=nv
1 row(s)
Took 0.0182 seconds
hbase(main):007:0> put 'student','201603','course:physics',60
Took 0.0085 seconds
hbase(main):008:0> get 'student','201603'
COLUMN                CELL
 course:english       timestamp=8, value=73
 course:math          timestamp=1610862600605, value=69
 course:physics       timestamp=1610862848284, value=60
 info:age             timestamp=1610862600198, value=19
 info:dept            timestamp=1610862600282, value=JAVA
 info:name            timestamp=6, value=sun
 info:sex             timestamp=1610862600239, value=nv
1 row(s)
Took 0.0361 seconds
hbase(main):009:0>

get查询

hbase(main):012:0> get 'student','201603'
COLUMN                CELL
 course:english       timestamp=8, value=73
 course:math          timestamp=1610862600605, value=69
 course:physics       timestamp=1610862848284, value=60
 info:age             timestamp=1610862600198, value=19
 info:dept            timestamp=1610862600282, value=JAVA
 info:name            timestamp=6, value=sun
 info:sex             timestamp=1610862600239, value=nv
1 row(s)
Took 0.0078 seconds
hbase(main):013:0> get 'student','201603',COLUMN=>'course',TIMERANGE=>[5,8]
COLUMN                CELL
 course:physics       timestamp=6, value=48
1 row(s)
Took 0.0118 seconds
hbase(main):014:0>
hbase(main):014:0> get 'student','201603',COLUMN=>'course',TIMERANGE=>[7,8]
COLUMN                CELL
0 row(s)
Took 0.0051 seconds
hbase(main):015:0>

scan查询

hbase(main):015:0> scan 'student',COLUMN => 'info:name'
ROW                   COLUMN+CELL
 201601               column=info:name, timestamp=4, value=liu
 201602               column=info:name, timestamp=1610862599977, value=wang
 201603               column=info:name, timestamp=6, value=sun
3 row(s)
Took 0.0206 seconds
hbase(main):016:0> scan 'student',COLUMN => 'info:dept'
ROW                   COLUMN+CELL
 201601               column=info:dept, timestamp=1610862599915, value=PE
 201602               column=info:dept, timestamp=1610862600086, value=PC
 201603               column=info:dept, timestamp=1610862600282, value=JAVA
3 row(s)
Took 0.0107 seconds
hbase(main):017:0>
hbase(main):027:0> scan 'student',COLUMN => 'course'
ROW                   COLUMN+CELL
 201601               column=course:english, timestamp=3, value=72
 201601               column=course:math, timestamp=1610862600400, value=79
 201601               column=course:physics, timestamp=1610862683257, value=82
 201602               column=course:physics, timestamp=1610862689724, value=49
 201603               column=course:english, timestamp=8, value=73
 201603               column=course:math, timestamp=1610862600605, value=69
 201603               column=course:physics, timestamp=1610862848284, value=60
3 row(s)
Took 0.0120 seconds

过滤器

RowFilter
hbase(main):020:0> scan 'student',FILTER=>"RowFilter(=,'substring:2')"
ROW                   COLUMN+CELL
 201601               column=course:english, timestamp=3, value=72
 201601               column=course:math, timestamp=1610862600400, value=79
 201601               column=course:physics, timestamp=1610862683257, value=82
 201601               column=info:age, timestamp=1610862599809, value=15
 201601               column=info:dept, timestamp=1610862599915, value=PE
 201601               column=info:name, timestamp=4, value=liu
 201601               column=info:sex, timestamp=1610862599862, value=nv
 201602               column=course:physics, timestamp=1610862689724, value=49
 201602               column=info:age, timestamp=7, value=hbase基础语法

kaggle竞赛入门整理

在过去 X 小时内从 HBase 表中获取所有数据的最佳方法是啥?

Java 泛型通配符上限和通配符下限(正在整理)

HBase笔记整理

Hbase的存储逻辑整理