如何将 json 字符串数据类型列转换为配置单元中的映射数据类型列?

Posted

技术标签:

【中文标题】如何将 json 字符串数据类型列转换为配置单元中的映射数据类型列?【英文标题】:How to convert json string datatype column to map datatype column in hive? 【发布时间】:2019-02-19 10:55:40 【问题描述】:

我需要从所有行中获取所有唯一键值每一行都有不同的键和值请找到上图的列。

例如:一行看起来像

"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585

【问题讨论】:

【参考方案1】:

用两行数据对此进行了测试,所有 key_value 对都是相同的,除了在第二个 JSON 中还有一个额外的 NEW_KEYPARSING.NGRAM_MATCHES_CACHED 值不同。

with data as
(
select stack(2, --data example
'"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585',
'"NEW_KEY":12345,"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":154900280758'
) as str
)

select str_to_map(concat_ws(',',collect_set(key_value)),',',':') --collect set, concatenate and convert to map
from
(
select explode(split(regexp_replace (str,'["]',''),',')) key_value from data --remove JSON delimiters, split and explode pairs
)s;

结果:

OK
"START_TIME":"1549002807568","PARSING.QUERY_FORMED":"1549002807586","CUBES_WITH_PERMISSIONS":"1549002807568","PARSING.CUBE_MATCH_SELECTED":"1549002807586","POTENTIAL_COMPLETIONS_ADDED":"1549002807587","QUERY_PARSED":"1549002807586","SUGGESTIONS_FORMED":"1549002807606","PARSING.SEQUENCES_GENERATED":"1549002807568","PARSING.NGRAM_MATCHES_CACHED":"154900280758","NEW_KEY":"12345"
Time taken: 158.414 seconds, Fetched: 1 row(s)

当然,"PARSING.NGRAM_MATCHES_CACHED" 在结果中只存在一次,因为 map 不允许同一个键出现两次。所有 key_values 都是唯一的。 请阅读代码中的 cmets。

【讨论】:

以上是关于如何将 json 字符串数据类型列转换为配置单元中的映射数据类型列?的主要内容,如果未能解决你的问题,请参考以下文章

如何在配置单元中将字符串转换为数组?

动态和可配置地更改几种 Spark DataFrame 列类型

Kafka Connect - JSON 转换器 - JDBC Sink 连接器 - 列类型 JSON

vba中怎么用代码将单元格内容转换为文本类型?

如何使用 C#/LINQ 将 XML 转换为 JSON?

如何将字符串转换为配置单元中的结构数组并爆炸?