具有数组和字典混合的横向展平雪管数据
Posted
技术标签:
【中文标题】具有数组和字典混合的横向展平雪管数据【英文标题】:Lateral Flatten Snowpipe data with mixture of arrays and dict 【发布时间】:2020-05-29 12:26:10 【问题描述】:我有两个不同的结构化 json 文件从雪管中导入。唯一的区别是它有许多嵌套数组,而不是嵌套字典。我试图弄清楚如何将 structure 1 转换为一张最终表格。我已经成功地将结构 2 转换为表格并包含以下代码。
我知道我需要使用横向展平,但没有成功。
**Structure 1: Nested Arrays (Need help on)**
This json lives within a table and in column **JSONTEXT**
[
"ID": "xxx-xxxx-xxxx xxx-xxx",
"caseTypeID": "xx-xxxx-xxxx-xxxxx",
"content":
"AccountID": "xx-xxxxx-xxxx-xxxx xxxx-xxxxx",
"AccountName": "XXXX",
"Address":
"pxObjClass": "Data-Address-Postal"
,
"Addresses": [],
"AllKickoffsComplete": "true",
"BillingContactList": [],
"ClientCurrency": "USD",
"ClientID": "XXXXXX",
"ClientNSID": "XXXXXXXX-00",
"ClientName": "XXXXX XXXX Inc.",
"CompanyPhoneNumber": "XXX-XXX-XXXX",
"CrmSearchOrg": "XXXX",
"EEList": [
"AccountID": "xxx-xxxxx-xxxx-xxxxx xxxx-xxxxx",
"AccountName": "XXXX",
"AllowanceList": [
"AllowanceAmount": "327",
"AllowanceName": "Car Allowance",
"pxObjClass": "xxxxx-xxxxx-xxxxx"
]
结构 2:嵌套字典 这个 json 存在于一个表和列中 JSONTEXT
[
"OppID": "xxxx-xxxxx",
"pxObjClass": "xx-xxxxx-xxxx-xxxxxx",
"pxPages":
"EEList":
"Country": "xxx",
"CountryName": "xxx",
"Currency": "xxx",
"EstimatedICPCost": "xxxxxxxxxxx",
"ICPCurrency": "xxxxx",
"ICPID": "xxxxxxxxx.",
"ICPNSID": "xxxx-xx",
"ICPName": "xxx xx xx.",
"LocalMonthlySalary": "xxxxxx",
"MinFee": "xxxx",
"MonthlyGrossCost": "xxxxx",
"NewOrRepeatCustomer": "xxxxx",
"OppCloseDate": "xxx-xxx-xx",
"OppID": "xxx-xxxx",
"OpportunityName": "xxx - xxx xxx - xxx - xxxx",
"ReferralSource": "xxxxxx",
"pxObjClass": "Index-xx-xxxx-xxxx-xxxxxx",
"pxSubscript": "EEList"
,
"pyID": "xxxxxx",
"pzInsKey": "xxxx-xxxx-xxxx xxxxx-xxx"
,
]
这是我的第二个有效结构的代码。
create or replace table xxxx
as select
value:ID::varchar as ID,
value:caseTypeID::varchar as caseTypeID,
value:content:AccountID::varchar as AccountID,
value:content:AccountName::varchar as AccountName,
value:content:AllKickoffsComplete::boolean as AllKickoffsComplete,
value:content:ClientCurrency::varchar as ClientCurrency,
value:content:ClientID::varchar as ClientID,
value:content:ClientNSID::varchar as ClientNSID,
value:content:ClientName::varchar as ClientName,
value:content:CompanyAddressCountryName::varchar as CompanyAddressCountryName,
value:content:CompanyPhoneNumber::varchar as CompanyPhoneNumber,
value:content:CreateNew::boolean as CreateNew,
value:content:CrmSearchOrg::varchar as CrmSearchOrg,
value:content:EEList:AccountID::varchar as EE_AccountID,
value:content:EEList:AccountName::varchar as EE_AccountName
from new_raw_json,
lateral flatten (input =>jsontext);
这是我尝试过的代码,它仅在您输入 jsontext[Nth] 时才有效。
select
value:ID::varchar as ID,
value:EEListID::varchar as EEListID,
value:caseTypeID::varchar as caseTypeID
from new_raw_json,
lateral flatten (input => jsontext[0]:content:EEList);
感谢任何帮助!
【问题讨论】:
我认为主要问题是我的表的第一行实际上是一个列表,而不仅仅是一个原始 json。我正在尝试在 python 中重建数据是如何过来的! 【参考方案1】:您可以chain multiple lateral views using FLATTEN 继续分解成嵌套结构(数组中的数组)。
一个明确定义的方法可能会以这种方式出现(这里仅投影了一些列,以说明级别):
SELECT
outer_object.value:caseTypeID AS caseTypeID,
outer_object.value:content.AccountID AS parentAccountID,
eelist_object.value:AccountID AS eeListAccountID,
allowance_object.value:AllowanceName
FROM
new_raw_json,
LATERAL FLATTEN (input => jsontext) outer_object,
LATERAL FLATTEN (input => outer_object.value:content.EEList) eelist_object,
LATERAL FLATTEN (input => eelist_object.value:AllowanceList) allowance_object;
请注意,这只会分解一个已识别的多值路径 (List -> EEList -> AllowanceList
)。从问题中不清楚是否必须分解所有路径(例如List -> EEList -> Addresses AND AllowanceList
),或者是否可以将其中一些路径存储为VARIANT
(或其他复杂)类型在最终结果中。
例如,如果需要为Addresses
下EEList
下的每个列出的地址复制AllowanceList
值,这可以通过从两个爆炸查询结果(一个链接List -> Addresses
和另一个链接 List -> EEList -> AllowanceList
)。
【讨论】:
感谢您的建议和指导,非常感谢! @Harsh J,你能看看我的帖子吗?我面临与雪花相关的问题:***.com/questions/62823080/…以上是关于具有数组和字典混合的横向展平雪管数据的主要内容,如果未能解决你的问题,请参考以下文章