源码Apache配置单元读写JSON SerDe

Posted MATLAB的科学与工程应用

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了源码Apache配置单元读写JSON SerDe相关的知识,希望对你有一定的参考价值。

JsonSerde - a read/write SerDe for JSON Data

This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF.

Features

  • Read data stored in JSON format

  • Convert data to JSON format during INSERT INTO <table>

  • Support for JSON arrays and maps

  • Support for nested data structures

  • Support for Cloudera's Distribution Including Apache Hadoop (CDH)

  • Support for multiple versions of Hadoop

Installation

Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.

JSON Data Files

Upload JSON files to HDFS with hadoop fs -put or LOAD DATA LOCAL. JSON records in data files must appear one per line, an empty line would produce a NULL record. This is because Hadoop partitions files as text using CR/LF as a separator to distribute work.

The following example will work.

{ "key" : 10 }
{ "key" : 20 }

The following example will not work.

{
"key" : 10
}
{
"key" : 20
}

Loading a JSON File and Querying Data

Uses json-serde/src/test/scripts/test-without-cr-lf.json.


以上是关于源码Apache配置单元读写JSON SerDe的主要内容,如果未能解决你的问题,请参考以下文章

我们可以在 avro 模式支持的配置单元表中使用分桶吗

Amazon Hive 中的多分隔符 SerDe 设置

无法验证 serde:org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

如何使用最新的 AVRO 模式文件更新配置单元表元数据

执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。无法验证 serde:org.apache.hadoop.hive.serde2.avro.A

使用serde生成漂亮(缩进)的JSON