hbase 操作

Posted jason-dong

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hbase 操作相关的知识,希望对你有一定的参考价值。

视频随笔
视频地址:hbase教程
1.与传统关系型数据库的区别
hbase 传统
分布式   单机
列动态增减   建表时候指定
只有字符串一种数据类型   数值,字符
空值不被存储   存储
不支持SQL
查询方式单一,通过rowkey,或rowkey范围,或全表扫描
列式   行式
非结构化,json  结构化

2.hbase特点:
分布式
快速随机写,基于key简单读  是否支持单挑更新?
亿级行,百万列  关系型数据库对列数有限制
列式存储
不支持sql,java api,(套一个壳通过SQL访问)

3.hbase能否替代关系型数据库
不支持事务,交易数据mysql
不能提供丰富的查询,join等
只能作为补充

4.hmaster作用
1.管理regionserver
2.管理ddl,源数据定义
 
5.regionserver作用
1.dml
2.wal(write ahead log)
 
6.简单概念:
DML(Data Manipulation Language)数据操纵语言命令使用户能够查询数据库以及操作已有数据库中的数据。
如insert,delete,update,select等都是DML.
DDL语句用语定义和管理数据库中的对象,如Create,Alter和Drop.
 
7.hbhbase逻辑视图;
类似sortedMap,其中key 是 (rowkey,column,version)组成的三维坐标,查询时候必须提供rowkey,根据查询粒度,column和version可选
 
8.hbase的物理存储:
1.table = n个region  按照rowkey水平切分
2.Region = n store 一个column family 一个store
3.store = 1个 memstore (内存) + n 个 hfile(hdfs文件) ,memstore 中的数据flush一次会产生一个hfile

9.hbase 设计建议
1.自己定义一个anmespace(database)
2.定义合理的schema
3.建表时设置合理预分区 pre-split auto-split force-split
4.选择合适的字段做rowkey,比如手机号,imsi
5.column family 和column的名字短一些,节省存储空间
6.设置合适的版本数量,建议保留3份

10.hbase 的操作
1.put 单条/批量操作,无update方法,类似map
2.delete 单条/批量操作
 
11.操作演练:
./hbase shell
1).简单状态查询
hbase(main):006:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
Took 0.0175 seconds 
                                                            
hbase(main):007:0> whoami
hadoop (auth:SIMPLE)
    groups: hadoop
Took 0.0006 seconds

2).查看某一具体命令用法

hbase(main):012:0> help "status"
Show cluster status. Can be summary, simple, detailed, or replication. The
default is summary. Examples:
  hbase> status
  hbase> status simple
  hbase> status summary
  hbase> status detailed
  hbase> status replication
  hbase> status replication, source
  hbase> status replication, sink
hbase(main):013:0> 

3)查看namespace 可以用tab补全功能

hbase(main):013:0> list_namespace
NAMESPACE                                                                       
default                                                                         
hbase                                                                           
2 row(s)
Took 0.1524 seconds                                                             
hbase(main):014:0> 

4).创建namespace  

reate             create_namespace   
hbase(main):019:0> create_namespace gp
Took 0.2463 seconds                                                             
hbase(main):020:0> 
hbase(main):020:0> list_namespace
NAMESPACE                                                                       
default                                                                         
gp                                                                              
hbase                                                                           
3 row(s)
Took 0.0270 seconds    

5)创建带预分区的表: 

create ‘namespace:表名’,列族,...
hbase(main):024:0>  create gp:test,info,{NUMREGIONS => 4, SPLITALGO => HexStringSplit}
Created table gp:test
Took 2.6835 seconds                                                             
=> Hbase::Table - gp:test
hbase(main):025:0> desc gp:test
Table gp:test is ENABLED                                                        
gp:test                                                                         
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => info, VERSIONS => 1, EVICT_BLOCKS_ON_CLOSE => false, NEW_VERSION_
BEHAVIOR => false, KEEP_DELETED_CELLS => FALSE, CACHE_DATA_ON_WRITE => fals
e, DATA_BLOCK_ENCODING => NONE, TTL => FOREVER, MIN_VERSIONS => 0, REPLIC
ATION_SCOPE => 0, BLOOMFILTER => ROW, CACHE_INDEX_ON_WRITE => false, IN_ME
MORY => false, CACHE_BLOOMS_ON_WRITE => false, PREFETCH_BLOCKS_ON_OPEN => f
alse, COMPRESSION => NONE, BLOCKCACHE => true, BLOCKSIZE => 65536}       
1 row(s)
Took 0.3126 seconds                                                             
hbase(main):026:0>

6)修改表属性,将存储的version由一个 改为 3个

hbase(main):028:0> alter gp:test,{NAME=>info,VERSIONS=>3}
Updating all regions with the new schema...
4/4 regions updated.
Done.
Took 2.3734 seconds                                                             
hbase(main):029:0> desc gp:test
Table gp:test is ENABLED                                                        
gp:test                                                                         
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => info, VERSIONS => 3, EVICT_BLOCKS_ON_CLOSE => false, NEW_VERSION_
BEHAVIOR => false, KEEP_DELETED_CELLS => FALSE, CACHE_DATA_ON_WRITE => fals
e, DATA_BLOCK_ENCODING => NONE, TTL => FOREVER, MIN_VERSIONS => 0, REPLIC
ATION_SCOPE => 0, BLOOMFILTER => ROW, CACHE_INDEX_ON_WRITE => false, IN_ME
MORY => false, CACHE_BLOOMS_ON_WRITE => false, PREFETCH_BLOCKS_ON_OPEN => f
alse, COMPRESSION => NONE, BLOCKCACHE => true, BLOCKSIZE => 65536}       
1 row(s)
Took 0.0597 seconds                                                             
hbase(main):030:0>

7)插入数据:

语法 put ‘namespace:tablename’,‘rowkey’,‘columnfamily:column’,‘value’,version(版本可不指定,默认是时间戳)
hbase(main):030:0>  put gp:test,123,info:col1,v1
Took 0.2623 seconds                                                                                                                       
hbase(main):033:0> scan gp:test
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, value=v1       
1 row(s)
Took 0.1840 seconds 

8)用get查询数据:

hbase(main):035:0>  put gp:test,456,info:col1,v2,12
Took 0.0188 seconds                                                             
hbase(main):036:0> scan gp:test
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, value=v1       
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.0526 seconds                                                             
hbase(main):037:0> get gp:test,123
COLUMN                CELL                                                      
 info:col1            timestamp=1534082352792, value=v1                         
1 row(s)
Took 0.0783 seconds                                                             
hbase(main):038:0> 

9)get rowkey=‘123’ 的指定列

hbase(main):038:0>  put gp:test,123,info:col2,v3
Took 0.0487 seconds                                                             
hbase(main):039:0> get gp:test,123,info:col1
COLUMN                CELL                                                      
 info:col1            timestamp=1534082352792, value=v1                         
1 row(s)
Took 0.0104 seconds                                                             
hbase(main):040:0>

10)删除某一行的指定列:        

hbase(main):022:0> delete gp:test,123,info:col1                                                    
hbase(main):043:0> scan gp:test
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.0606 seconds                                                             
hbase(main):044:0>

11)删除整行记录:

hbase(main):044:0> deleteall gp:test,456
Took 0.0225 seconds                                                             
hbase(main):045:0> scan gp:test
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
1 row(s)
Took 0.0687 seconds                                                             
hbase(main):046:0> 

执行delete操作之后并未马上删除数据,只是打上了delete标志
可以通过如下命令查看
hbase(main):050:0> scan gp:test, {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, type=Delete    
 123                  column=info:col1, timestamp=1534082352792, value=v1       
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:, timestamp=1534083246672, type=DeleteFamily  
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.1143 seconds                                                             
hbase(main):051:0> 
delete其实是一个put操作,插入了type=Deletexxx
目前数据还在memstore 中,未flush到hfile中

12)执行flush,major_compact后数据会被删掉

hbase(main):051:0> flush gp:test
Took 0.8562 seconds                                                             
hbase(main):055:0> scan gp:test, {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, type=Delete    
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:, timestamp=1534083246672, type=DeleteFamily  
2 row(s)
Took 0.0718 seconds 
hbase(main):002:0> major_compact gp:test
Took 0.3532 seconds
hbase(main):001:0> scan gp:test, {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
1 row(s)
Took 0.8065 seconds                                                             
hbase(main):002:0> 
生产中很少进行compact ,会阻塞读写

13)清空表和namespace                                                            

hbase(main):003:0> truncate gp:test
Truncating gp:test table (it may take a while):
Disabling table...
Truncating table...
Took 2.1177 seconds                                                             
hbase(main):004:0> scan gp:test
ROW                   COLUMN+CELL                                               
0 row(s)
Took 1.1058 seconds                                                             
hbase(main):005:0> disable gp:test
Took 0.5193 seconds                                                             
hbase(main):006:0> scan gp:test
ROW                   COLUMN+CELL                                               
org.apache.hadoop.hbase.TableNotEnabledException: gp:test is disabled.
 at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:714)
 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328)
 at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
ERROR: Table gp:test is disabled!
For usage try help "scan"
Took 0.1323 seconds                                                             
hbase(main):007:0> drop gp:test
Took 0.3581 seconds                                                             
hbase(main):008:0> drop
drop             drop_all         drop_namespace   
hbase(main):008:0> list
list                         list_deadservers             
list_labels                  list_locks                   
list_namespace               list_namespace_tables        
list_peer_configs            list_peers                   
list_procedures              list_quota_snapshots         
list_quota_table_sizes       list_quotas                  
list_regions                 list_replicated_tables       
list_rsgroups                list_security_capabilities   
list_snapshot_sizes          list_snapshots               
list_table_snapshots         
hbase(main):008:0> list_namespace
list_namespace          list_namespace_tables   
hbase(main):008:0> list_namespace gp
NAMESPACE                                                                       
gp                                                                              
1 row(s)
Took 0.1517 seconds                                                             
hbase(main):009:0> drop
drop             drop_all         drop_namespace   
hbase(main):009:0> drop_namespace gp
Took 0.2719 seconds                                                             
hbase(main):010:0> list
list                         list_deadservers             
list_labels                  list_locks                   
list_namespace               list_namespace_tables        
list_peer_configs            list_peers                   
list_procedures              list_quota_snapshots         
list_quota_table_sizes       list_quotas                  
list_regions                 list_replicated_tables       
list_rsgroups                list_security_capabilities   
list_snapshot_sizes          list_snapshots               
list_table_snapshots         
hbase(main):010:0> list_namespace
list_namespace          list_namespace_tables   
hbase(main):010:0> list_namespace
NAMESPACE                                                                       
default                                                                         
hbase                                                                           
2 row(s)
Took 0.0322 seconds                                                             
hbase(main):011:0>

 

 
 
 
 
 

以上是关于hbase 操作的主要内容,如果未能解决你的问题,请参考以下文章

VSCode自定义代码片段15——git命令操作一个完整流程

VSCode自定义代码片段15——git命令操作一个完整流程

Hbase的python操作

通过Java操作HBase

HBase基本数据操作具体解释

如何使用Java API操作Hbase