使用Mongo dump 将数据导入到hive
Posted abcdwxc
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用Mongo dump 将数据导入到hive相关的知识,希望对你有一定的参考价值。
概述:使用dump 方式将mongo数据导出,上传到hdfs,然后在hive中建立外部表。
1. 使用mongodump 将集合导出
mongodump --host=localhost:27017 --db=mydb --collection=users --out=/tmp/root/mongodump0712
[[email protected] root]# mongodump --host=localhost:27017 --db=mydb --collection=users --out=/tmp/root/mongodump0712
2018-07-12T10:07:27.894+0800 writing mydb.users to
2018-07-12T10:07:27.896+0800 done dumping mydb.users (2 documents)
[[email protected] root]# cd /tmp/root
[[email protected] root]# ls
3604abd2-a359-4c53-a7b4-e4ea84185801 3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout dump hive.log hive.log.2018-07-11 mongodump0712
[[email protected] root]# ll
total 624
drwx------. 2 root root 6 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea84185801
-rw-r--r--. 1 root root 0 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout
drwxr-xr-x. 5 root root 44 Jul 12 10:04 dump
-rw-r--r--. 1 root root 88700 Jul 12 09:39 hive.log
-rw-r--r--. 1 root root 547126 Jul 11 21:07 hive.log.2018-07-11
drwxr-xr-x. 3 root root 18 Jul 12 10:07 mongodump0712
[[email protected] root]# cd mongodump0712/
[[email protected] mongodump0712]# ls
mydb
[[email protected] mongodump0712]# cd mydb
[[email protected] mydb]# ls
users.bson users.metadata.json
2. 将dump文件上传到hdfs
hdfs dfs -mkdir /user/hive/warehouse/mongo
hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/
[[email protected] mydb]# hdfs dfs -mkdir /user/hive/warehouse/mongo
[[email protected] mydb]# hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/ 3. 创建表并测试
hive> create EXTERNAL table muser
> (
> id string,
> userid string,
> age bigint,
> status string
> )
> row format serde ‘com.mongodb.hadoop.hive.BSONSerDe‘
> WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘{"id":"_id","userid":"user_id","age":"age","status":"status"}‘)
> stored as inputformat ‘com.mongodb.hadoop.mapred.BSONFileInputFormat‘
> outputformat ‘com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat‘
> location ‘/user/hive/warehouse/muser‘;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://ns1/user/hive/warehouse/muser is not a directory or unable to create one)
hive> create EXTERNAL table muser
> (
> id string,
> userid string,
> age bigint,
> status string
> )
> row format serde ‘com.mongodb.hadoop.hive.BSONSerDe‘
> WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘{"id":"_id","userid":"user_id","age":"age","status":"status"}‘)
> stored as inputformat ‘com.mongodb.hadoop.mapred.BSONFileInputFormat‘
> outputformat ‘com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat‘
> location ‘/user/hive/warehouse/mongo‘;
OK
Time taken: 0.123 seconds
hive> select * from muser;
OK
5b456e33a93daf7ae53e6419 abc123 58 D
5b45705ca93daf7ae53e8b2a bcd001 45 C
Time taken: 0.181 seconds, Fetched: 2 row(s)
以上是关于使用Mongo dump 将数据导入到hive的主要内容,如果未能解决你的问题,请参考以下文章
将 mongo 导入到 Azure Cosmos 模拟器时出现问题