源码Apache配置单元读写JSON SerDe
Posted MATLAB的科学与工程应用
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了源码Apache配置单元读写JSON SerDe相关的知识,希望对你有一定的参考价值。
JsonSerde - a read/write SerDe for JSON Data
This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF.
Features
Read data stored in JSON format
Convert data to JSON format during
INSERT INTO <table>
Support for JSON arrays and maps
Support for nested data structures
Support for Cloudera's Distribution Including Apache Hadoop (CDH)
Support for multiple versions of Hadoop
Installation
Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar
and json-udf-X.Y.Z-jar-with-dependencies.jar
) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib
or use ADD JAR
in Hive.
JSON Data Files
Upload JSON files to HDFS with hadoop fs -put
or LOAD DATA LOCAL
. JSON records in data files must appear one per line, an empty line would produce a NULL record. This is because Hadoop partitions files as text using CR/LF as a separator to distribute work.
The following example will work.
{ "key" : 10 }
{ "key" : 20 }
The following example will not work.
{
"key" : 10
}
{
"key" : 20
}
Loading a JSON File and Querying Data
Uses json-serde/src/test/scripts/test-without-cr-lf.json.
以上是关于源码Apache配置单元读写JSON SerDe的主要内容,如果未能解决你的问题,请参考以下文章
无法验证 serde:org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe
执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。无法验证 serde:org.apache.hadoop.hive.serde2.avro.A