Hive同步MongoDB的数据

Posted shujuxiong

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive同步MongoDB的数据相关的知识,希望对你有一定的参考价值。

问题:

     将MongoDB数据导入Hive,按照https://blog.csdn.net/thriving_fcl/article/details/51471248文章,在hive建外部表与mongodb做映射后,执行后出现

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON

建表语句如下:  

CREATE EXTERNAL TABLE mongotohive
  id string,
  userid string,
  age bigint,
  status string
)
STORED BY ‘com.mongodb.hadoop.hive.MongoStorageHandler‘
WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘"id":"_id","userid":"user_id","age":"age","status":"status"‘)
TBLPROPERTIES(‘mongo.uri‘=‘mongodb://localhost:27017/mydb.users‘); 

mongodb 数据如下:

db.users.find()
"_id" : ObjectId("5b456e33a93daf7ae53e6419"), "user_id" : "abc123", "age" : 58, "status" : "D"
"_id" : ObjectId("5b45705ca93daf7ae53e8b2a"), "user_id" : "bcd001", "age" : 45, "status" : "C"

 

解决方案:

将mongo-hadoop-core-2.0.0.jar、mongo-hadoop-hive-2.0.0.jar、mongo-java-driver-3.7.1.jar三个jar包放到hive的lib文件夹下后,再次运行成功。如下:

 
hive> CREATE EXTERNAL TABLE mongotohive
    > ( 
    >   id string,
    >   userid string,
    >   age bigint,
    >   status string
    > )
    > STORED BY ‘com.mongodb.hadoop.hive.MongoStorageHandler‘
    > WITH SERDEPROPERTIES(‘mongo.columns.mapping‘=‘"id":"_id","userid":"user_id","age":"age","status":"status"‘)
    > TBLPROPERTIES(‘mongo.uri‘=‘mongodb://localhost:27017/mydb.users‘);
OK
Time taken: 1.431 seconds
hive> select * from mongotohive;
OK
5b456e33a93daf7ae53e6419        abc123  58      D
5b45705ca93daf7ae53e8b2a        bcd001  45      C
Time taken: 0.601 seconds, Fetched: 2 row(s)
hive>
 
原文:https://www.cnblogs.com/abcdwxc/p/9295794.html

 

以上是关于Hive同步MongoDB的数据的主要内容,如果未能解决你的问题,请参考以下文章

MongoDB数据导入Hive

如何将数据从mongodb导入hive

把mongoDB数据导入hive

把mongoDB数据导入hive

将Mongodb的表导入到Hive中

Linux下Mongodb数据库主从同步配置