如何将字符串转换为配置单元中的结构数组并爆炸?

Posted

技术标签:

【中文标题】如何将字符串转换为配置单元中的结构数组并爆炸?【英文标题】:How to Convert string to array of struct in hive and explode? 【发布时间】:2018-03-01 05:53:26 【问题描述】:

我在 hive 中有以下格式的数据。表test(seq string, result string);

|seq  | result                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|0001 | ["offerId":"Default_XYZ","businessName":"Apple","businessGroup":"Default","businessIssue":"Default","interactionId":"-4930126168287369915","campaignID":"P-1","rank":"1","offerId":"Default_NAV","businessName":"Orange","businessGroup":"Default","businessIssue":"Default","interactionId":"-7830126168223452134","campaignID":"P-1","rank":"2"] |

输出应该是这样的

|seq  | offerId     | businessName   | businsesGroup| businessIssue | interactionId        | campaignId | rank |
----------------------------------------------------------------------------------------------------------------
|0001 | Default_XYZ | Apple          | Default      | Default       | -4930126168287369915 | P-1        | 1    |
|0001 | Default_NAV | Orange         | Default      | Default       | -7830126168223452134 | P-1        | 2    |

我尝试将字符串转换为结构数组,但它不适用于直接 CAST。

有什么帮助吗?

[编辑 - 尝试以下查询]

 select sequenceNumber, offerId, businessName, rank from (

 select sequenceNumber,
        collect_list(oid['offerId']) as offerid_list
       , collect_list(oid['businessName']) as businessName_list
        ,collect_list(oid['rank']) as rank_list
  from (
 select sequenceNumber,
        str_to_map(translate(offer_Id,'','')) as oid

        from test
        lateral view explode (split(translate(result, '[]"',''),"\\,")) oid as offer_id
    ) x
    group by sequenceNumber

      ) y lateral view explode(offerid_list) olist as offerId
      lateral view explode(businessName_list) olist as businessName
      lateral view explode(rank_list) rlist as rank

【问题讨论】:

到目前为止您尝试过的任何查询? 是的。尝试了一些,但没有得到所需的结果。用那个查询编辑了我的问题。 我只是想确保我理解,但看起来你有 2 个字符串列,其中一个是 json。如果您能够将您的 seq 添加为 json 的 prat,您应该能够使用 json serde。 【参考方案1】:

为我的问题找到了一个解决方案:

select                                                   
seq, 
split(split(results,",")[0],':')[1] as offerId,
split(split(results,",")[1],':')[1] as businessName,
split(split(results,",")[2],':')[1] as businessGroup,
split(split(results,",")[3],':')[1] as businessIssue,
split(split(results,",")[4],':')[1] as interactionId,
split(split(results,",")[5],':')[1] as campignId
regexp_replace(split(split(results,",")[6],":")[1], "[\\]|]", "") as  rank

from
(
  select seq,
     split(translate(result), '"\\[|]|\""',''), ",") as r
      from test  
) t1
LATERAL VIEW explode(r) rr AS results

【讨论】:

【参考方案2】:

你可以试试get_json_object函数。

select seq, get_json_object(result,'$\[0].offerId') as offerId,
            get_json_object(result,'$\[0].businessName') as businessName,
            get_json_object(result,'$\[0].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[0].businessIssue') as businessIssue,
            get_json_object(result,'$\[0].interactionId') as interactionId,
            get_json_object(result,'$\[0].campaignId') as campaignId,
            get_json_object(result,'$\[0].rank') as rank
    from t
    UNION ALL
select seq, get_json_object(result,'$\[1].offerId') as offerId,
            get_json_object(result,'$\[1].businessName') as businessName,
            get_json_object(result,'$\[1].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[1].businessIssue') as businessIssue,
            get_json_object(result,'$\[1].interactionId') as interactionId,
            get_json_object(result,'$\[1].campaignId') as campaignId,
            get_json_object(result,'$\[1].rank') as rank
    from t

【讨论】:

谢谢考希克。如果结果数组包含超过 2 个元素怎么办?我们无法动态更改配置单元查询。 @Naveen :我不确定是否可以这样做。但是,正如 hlagos 建议的那样,您可以为此目的使用 JSON serde。由于多个函数调用,即使您的解决方案对于大型数据集也不会表现得那么好。因此,您可能需要一种不同的方法。祝你一切顺利。

以上是关于如何将字符串转换为配置单元中的结构数组并爆炸?的主要内容,如果未能解决你的问题,请参考以下文章

如何在配置单元中将字符串转换为数组?

Matlab:如何将单元格数组转换为字符串数组?

如何将 json 字符串数据类型列转换为配置单元中的映射数据类型列?

如何将 Matlab 单元字符串数组转换为 .NET 字符串数组

Hive:如何将字符串转换为数组数组

如何将字符串数字的单元格数组转换为数值向量[重复]