Hive与HBase实现数据互导

Posted 汤高

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive与HBase实现数据互导相关的知识,希望对你有一定的参考价值。

HiveHBase实现数据互导

建立与HBase的识别表

hive> create table hive_hbase_1(key int,value string)

    > stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:info")

    > TBLPROPERTIES ("hbase.table.name" = "userinfo");

OK

Time taken: 8.896 seconds

hive>

 

 

查看Hbase中的表

hbase(main):001:0> list

TABLE                                                                                                                                          

blog                                                                                                                                           

friend                                                                                                                                         

friend02                                                                                                                                       

heroes                                                                                                                                         

heroesIndex                                                                                                                                    

myt                                                                                                                                            

stu                                                                                                                                            

table                                                                                                                                          

tanggao                                                                                                                                        

tanggao11                                                                                                                                      

tanggao111                                                                                                                                     

tanggaozhou                                                                                                                                    

test

userinfo

word                                                    

word2

16 row(s) in 0.7760 seconds

 

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):002:0>

 

 

2.使用sql导入数据

i.预先准备数据 在hdfsuser/tg目录下放一个a.txt

1tanggao

2zhousiyuan

3mother

4father

a)新建hive的数据表

 

hive> create table  famaly(id int,name string) row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;

OK

Time taken: 0.483 seconds

2.0.0版本会默认到你的hdfs根目录下的user/hadoop用户下找 ,比如我的是user/tg

hive> load data inpath ‘a.txt‘ overwrite into table famaly;

Loading data to table default.famaly

OK

Time taken: 2.139 seconds

查看信息

hive> select * from famaly;

OK

1tanggao

2zhousiyuan

3mother

4father

Time taken: 2.912 seconds, Fetched: 4 row(s)

查看表结构

hive> desc famaly;

OK

id                  int                                     

name                string                                  

Time taken: 0.549 seconds, Fetched: 2 row(s)

第二种方法查看表结构  同上

hive> describe famaly;

OK

id                  int                                     

name                string                                  

Time taken: 0.087 seconds, Fetched: 2 row(s)

hive>

 

使用sql导入数据到hive_hbase_1

 

hive> insert  overwrite table hive_hbase_1 select * from famaly where id=1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there‘s no reduce operator

Starting Job = job_1464498775870_0001, Tracking URL = http://master:8088/proxy/application_1464498775870_0001/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0001

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 22:57:58,093 Stage-0 map = 0%,  reduce = 0%

2016-05-28 22:58:28,439 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.81 sec

MapReduce Total cumulative CPU time: 3 seconds 810 msec

Ended Job = job_1464498775870_0001

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 3.91 sec   HDFS Read: 10964 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 910 msec

OK

Time taken: 89.327 seconds

 

查看数据

会显示刚刚插入的数据

hive> select * from hive_hbase_1;

OK

1tanggao

Time taken: 0.916 seconds, Fetched: 1 row(s)

hive>

 

登录HBase 查看HBase数据

hbase(main):002:0> scan ‘userinfo‘

ROW                   COLUMN+CELL                                               

 1                    column=cf1:info, timestamp=1464501508097, value=tanggao   

1 row(s) in 0.7990 seconds

 

hbase(main):003:0>

 

是不是很神奇,在hive中添加的数据已经在Hbase中了

 

下面再看看在Hbase中插入数据,看看hive中是不是也有了?试试看

hbase(main):003:0> put ‘userinfo‘,‘fid‘,‘cf1:info‘,‘tangshaoyan‘

0 row(s) in 0.2270 seconds

 

hbase(main):004:0>

 

查看hive

hive> select * from hive_hbase_1;

OK

1tanggao

NULLtangshaoyan

Time taken: 0.235 seconds, Fetched: 2 row(s)

hive>

 

没错,刚刚在hbase中插入的数据,已经在hive里了

 

 

hive访问Hbase中已经存在的Hbase

 

HBase表报备

:已经存在了heroes

hbase(main):007:0> scan ‘heroes‘

ROW                                  COLUMN+CELL                                                                                               

 0                                   column=info:email, timestamp=1463743975381, [email protected]                                                

 0                                   column=info:name, timestamp=1463743975381, value=peter                                                    

 0                                   column=info:power, timestamp=1463743975381, value=Idotknow                                                

 1                                   column=info:email, timestamp=1463743975391, [email protected]                                                

 1                                   column=info:name, timestamp=1463743975391, value=hiro                                                     

 1                                   column=info:power, timestamp=1463743975391, value=Idotknow                                                

 2                                   column=info:email, timestamp=1463743975396, [email protected]                                                

 2                                   column=info:name, timestamp=1463743975396, value=sylar                                                    

 2                                   column=info:power, timestamp=1463743975396, value=Idotknow                                                

 3                                   column=info:email, timestamp=1463743975399, [email protected]                                                

 3                                   column=info:name, timestamp=1463743975399, value=claire                                                   

 3                                   column=info:power, timestamp=1463743975399, value=Idotknow                                                

 4                                   column=info:email, timestamp=1463743975403, [email protected]                                                

 4                                   column=info:name, timestamp=1463743975403, value=noah                                                     

 4                                   column=info:power, timestamp=1463743975403, value=Idotknow                                                

5 row(s) in 0.2140 seconds

 

hbase(main):008:0>

 

使用CREATE EXTERNAL TABLE

hive> create external table  hbase_hive_1(key int,value string)

    > STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")

    > TBLPROPERTIES("hbase.table.name" = "heroes");

OK

Time taken: 0.424 seconds

hive> select * from hbase_hive_1;

OK

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.222 seconds, Fetched: 5 row(s)

hive>

 

从上面的操作后,hive已经可以访问HBase中已经存在的原有数据了

 

 

 

三、多列和多列族(Multiple Columns and Families

 

 

 

hive> CREATE TABLE hive_hbase_add1(key int, value1 string, value2 int, value3 int)

    > STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:col1,info:col2,city:area")

    > TBLPROPERTIES("hbase.table.name" = "student_info");

OK

Time taken: 2.624 seconds

hive> select * from hbase_hive_1;

OK

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.225 seconds, Fetched: 5 row(s)

hive> set hive.cli.print.header=true;

hive> select * from hbase_hive_1;

OK

hbase_hive_1.keyhbase_hive_1.value

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.203 seconds, Fetched: 5 row(s)

hive> desc hbase_hive_1;

OK

col_namedata_typecomment

key                 int                                     

value               string                                  

Time taken: 0.198 seconds, Fetched: 2 row(s)

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,value from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there‘s no reduce operator

Starting Job = job_1464498775870_0002, Tracking URL = http://master:8088/proxy/application_1464498775870_0002/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0002

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:48:40,536 Stage-0 map = 0%,  reduce = 0%

2016-05-28 23:49:06,565 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.52 sec

MapReduce Total cumulative CPU time: 3 seconds 520 msec

Ended Job = job_1464498775870_0002

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 3.52 sec   HDFS Read: 5082 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 520 msec

OK

_col0_col1_col2_col3

Time taken: 86.323 seconds

 

 

 

 

 

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,key+1000 from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there‘s no reduce operator

Starting Job = job_1464498775870_0003, Tracking URL = http://master:8088/proxy/application_1464498775870_0003/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0003

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:54:01,682 Stage-0 map = 0%,  reduce = 0%

2016-05-28 23:54:24,725 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 4.23 sec

MapReduce Total cumulative CPU time: 4 seconds 230 msec

Ended Job = job_1464498775870_0003

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 4.23 sec   HDFS Read: 5166 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 230 msec

OK

keyvaluec2c3

Time taken: 72.543 seconds

 

 

hive> select * from hive_hbase_add1;

OK

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter1NULL

1hiro2NULL

2sylar3NULL

3claire4NULL

4noah5NULL

Time taken: 0.454 seconds, Fetched: 5 row(s)

第四列类型不匹配,没有插进,都是null

 

下面换为正确的类型

hive> select * from hive_hbase_add1;

OK

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter11000

1hiro21001

2sylar31002

3claire41003

4noah51004

Time taken: 0.746 seconds, Fetched: 5 row(s)

hive>

 

登录HBase查看数据

 

hbase(main):008:0> list

TABLE                                            

blog                                             

friend                                           

friend02                                         

heroes                                           

heroesIndex                                      

myt                                              

stu                                              

student_info                                     

table                                            

tanggao                                          

tanggao11                                        

tanggao111                                       

tanggaozhou                                      

test                                             

userinfo                                         

                                            

17 row(s) in 0.1140 seconds

 

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "student_info", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):009:0> scan ‘student_info‘

ROW           COLUMN+CELL                        

 0            column=city:area, timestamp=1464504

              863521, value=1000                 

 0            column=info:col1, timestamp=1464504

              863521, value=peter                

 0            column=info:col2, timestamp=1464504

              863521, value=1                    

 1            column=city:area, timestamp=1464504

              863521, value=1001        

以上是关于Hive与HBase实现数据互导的主要内容,如果未能解决你的问题,请参考以下文章

四十centos安装sqoop(使用Sqoop完成MySQL和HDFS之间的数据互导)

hadoop生态系统学习之路hbase与hive的数据同步以及hive与impala的数据同步

非关系型数据库实训-大数据平台及应用

非关系型数据库实训-大数据平台及应用

hive 与 hbase 结合

大数据学习系列之五 ----- Hive整合HBase图文详解