Hive与HBase实现数据互导
Posted 汤高
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive与HBase实现数据互导相关的知识,希望对你有一定的参考价值。
Hive与HBase实现数据互导
建立与HBase的识别表
hive> create table hive_hbase_1(key int,value string)
> stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:info")
> TBLPROPERTIES ("hbase.table.name" = "userinfo");
OK
Time taken: 8.896 seconds
hive>
查看Hbase中的表
hbase(main):001:0> list
TABLE
blog
friend
friend02
heroes
heroesIndex
myt
stu
table
tanggao
tanggao11
tanggao111
tanggaozhou
test
userinfo
word
word2
16 row(s) in 0.7760 seconds
=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]
hbase(main):002:0>
2.使用sql导入数据
i.预先准备数据 在hdfs的user/tg目录下放一个a.txt
1tanggao
2zhousiyuan
3mother
4father
a)新建hive的数据表
hive> create table famaly(id int,name string) row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;
OK
Time taken: 0.483 seconds
2.0.0版本会默认到你的hdfs根目录下的user/hadoop用户下找 ,比如我的是user/tg
hive> load data inpath ‘a.txt‘ overwrite into table famaly;
Loading data to table default.famaly
OK
Time taken: 2.139 seconds
查看信息
hive> select * from famaly;
OK
1tanggao
2zhousiyuan
3mother
4father
Time taken: 2.912 seconds, Fetched: 4 row(s)
查看表结构
hive> desc famaly;
OK
id int
name string
Time taken: 0.549 seconds, Fetched: 2 row(s)
第二种方法查看表结构 同上
hive> describe famaly;
OK
id int
name string
Time taken: 0.087 seconds, Fetched: 2 row(s)
hive>
使用sql导入数据到hive_hbase_1
hive> insert overwrite table hive_hbase_1 select * from famaly where id=1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1464498775870_0001, Tracking URL = http://master:8088/proxy/application_1464498775870_0001/
Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0001
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-05-28 22:57:58,093 Stage-0 map = 0%, reduce = 0%
2016-05-28 22:58:28,439 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 3.81 sec
MapReduce Total cumulative CPU time: 3 seconds 810 msec
Ended Job = job_1464498775870_0001
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 3.91 sec HDFS Read: 10964 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 910 msec
OK
Time taken: 89.327 seconds
查看数据
会显示刚刚插入的数据
hive> select * from hive_hbase_1;
OK
1tanggao
Time taken: 0.916 seconds, Fetched: 1 row(s)
hive>
登录HBase 查看HBase数据
hbase(main):002:0> scan ‘userinfo‘
ROW COLUMN+CELL
1 column=cf1:info, timestamp=1464501508097, value=tanggao
1 row(s) in 0.7990 seconds
hbase(main):003:0>
是不是很神奇,在hive中添加的数据已经在Hbase中了
下面再看看在Hbase中插入数据,看看hive中是不是也有了?试试看
hbase(main):003:0> put ‘userinfo‘,‘fid‘,‘cf1:info‘,‘tangshaoyan‘
0 row(s) in 0.2270 seconds
hbase(main):004:0>
查看hive
hive> select * from hive_hbase_1;
OK
1tanggao
NULLtangshaoyan
Time taken: 0.235 seconds, Fetched: 2 row(s)
hive>
没错,刚刚在hbase中插入的数据,已经在hive里了
hive访问Hbase中已经存在的Hbase表
HBase表报备
:已经存在了heroes表
hbase(main):007:0> scan ‘heroes‘
ROW COLUMN+CELL
0 column=info:email, timestamp=1463743975381, [email protected]
0 column=info:name, timestamp=1463743975381, value=peter
0 column=info:power, timestamp=1463743975381, value=Idotknow
1 column=info:email, timestamp=1463743975391, [email protected]
1 column=info:name, timestamp=1463743975391, value=hiro
1 column=info:power, timestamp=1463743975391, value=Idotknow
2 column=info:email, timestamp=1463743975396, [email protected]
2 column=info:name, timestamp=1463743975396, value=sylar
2 column=info:power, timestamp=1463743975396, value=Idotknow
3 column=info:email, timestamp=1463743975399, [email protected]
3 column=info:name, timestamp=1463743975399, value=claire
3 column=info:power, timestamp=1463743975399, value=Idotknow
4 column=info:email, timestamp=1463743975403, [email protected]
4 column=info:name, timestamp=1463743975403, value=noah
4 column=info:power, timestamp=1463743975403, value=Idotknow
5 row(s) in 0.2140 seconds
hbase(main):008:0>
使用CREATE EXTERNAL TABLE:
hive> create external table hbase_hive_1(key int,value string)
> STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")
> TBLPROPERTIES("hbase.table.name" = "heroes");
OK
Time taken: 0.424 seconds
hive> select * from hbase_hive_1;
OK
0peter
1hiro
2sylar
3claire
4noah
Time taken: 0.222 seconds, Fetched: 5 row(s)
hive>
从上面的操作后,hive已经可以访问HBase中已经存在的原有数据了
三、多列和多列族(Multiple Columns and Families)
hive> CREATE TABLE hive_hbase_add1(key int, value1 string, value2 int, value3 int)
> STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:col1,info:col2,city:area")
> TBLPROPERTIES("hbase.table.name" = "student_info");
OK
Time taken: 2.624 seconds
hive> select * from hbase_hive_1;
OK
0peter
1hiro
2sylar
3claire
4noah
Time taken: 0.225 seconds, Fetched: 5 row(s)
hive> set hive.cli.print.header=true;
hive> select * from hbase_hive_1;
OK
hbase_hive_1.keyhbase_hive_1.value
0peter
1hiro
2sylar
3claire
4noah
Time taken: 0.203 seconds, Fetched: 5 row(s)
hive> desc hbase_hive_1;
OK
col_namedata_typecomment
key int
value string
Time taken: 0.198 seconds, Fetched: 2 row(s)
hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,value from hbase_hive_1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1464498775870_0002, Tracking URL = http://master:8088/proxy/application_1464498775870_0002/
Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-05-28 23:48:40,536 Stage-0 map = 0%, reduce = 0%
2016-05-28 23:49:06,565 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 3.52 sec
MapReduce Total cumulative CPU time: 3 seconds 520 msec
Ended Job = job_1464498775870_0002
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 3.52 sec HDFS Read: 5082 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 520 msec
OK
_col0_col1_col2_col3
Time taken: 86.323 seconds
hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,key+1000 from hbase_hive_1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1464498775870_0003, Tracking URL = http://master:8088/proxy/application_1464498775870_0003/
Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0003
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-05-28 23:54:01,682 Stage-0 map = 0%, reduce = 0%
2016-05-28 23:54:24,725 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 4.23 sec
MapReduce Total cumulative CPU time: 4 seconds 230 msec
Ended Job = job_1464498775870_0003
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 4.23 sec HDFS Read: 5166 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 230 msec
OK
keyvaluec2c3
Time taken: 72.543 seconds
hive> select * from hive_hbase_add1;
OK
hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3
0peter1NULL
1hiro2NULL
2sylar3NULL
3claire4NULL
4noah5NULL
Time taken: 0.454 seconds, Fetched: 5 row(s)
第四列类型不匹配,没有插进,都是null
下面换为正确的类型
hive> select * from hive_hbase_add1;
OK
hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3
0peter11000
1hiro21001
2sylar31002
3claire41003
4noah51004
Time taken: 0.746 seconds, Fetched: 5 row(s)
hive>
登录HBase查看数据
hbase(main):008:0> list
TABLE
blog
friend
friend02
heroes
heroesIndex
myt
stu
student_info
table
tanggao
tanggao11
tanggao111
tanggaozhou
test
userinfo
17 row(s) in 0.1140 seconds
=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "student_info", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]
hbase(main):009:0> scan ‘student_info‘
ROW COLUMN+CELL
0 column=city:area, timestamp=1464504
863521, value=1000
0 column=info:col1, timestamp=1464504
863521, value=peter
0 column=info:col2, timestamp=1464504
863521, value=1
1 column=city:area, timestamp=1464504
863521, value=1001   以上是关于Hive与HBase实现数据互导的主要内容,如果未能解决你的问题,请参考以下文章 四十centos安装sqoop(使用Sqoop完成MySQL和HDFS之间的数据互导)