如何从雪花中的变体列创建子集?
Posted
技术标签:
【中文标题】如何从雪花中的变体列创建子集?【英文标题】:How to create a subset from a variant column in snowflake? 【发布时间】:2020-05-20 09:54:07 【问题描述】:例如,假设我的变量列“xyz”包含如下数据:
"post_new_visits": "Repeat",
"post_new1_week": "Thursday",
"post_new2_appt": "Weekday",
"post_new3_site": "12:50AM",
"post_new4_channel": "5.0",
"pre_new2_appt": "Weekday",
"pre_new3_site": "12:50AM",
"pre_new4_channel": "5.0"
我想从上面的变量列中创建一个新的变量列,它应该只有“post*”键值,如下所示,输出应该是这样的。
"post_new_visits": "Repeat",
"post_new1_week": "Thursday",
"post_new2_appt": "Weekday",
"post_new3_site": "12:50AM",
"post_new4_channel": "5.0",
有什么办法可以实现吗?
【问题讨论】:
问题编辑器中有一个代码格式选项 - 我建议你使用它。 它在雪花中 【参考方案1】:也许您可以展平并重新构建 JSON。例如:
create table tmp ( v variant )
as
select
parse_json(
' "post_new_visits": "Repeat",
"post_new1_week": "Thursday",
"post_new2_appt": "Weekday",
"post_new3_site": "12:50AM",
"post_new4_channel": "5.0",
"pre_new2_appt": "Weekday",
"pre_new3_site": "12:50AM",
"pre_new4_channel": "5.0"'
)
union all
select
parse_json(
' "post_new_visits": "New",
"post_new1_week": "Friday",
"post_new2_appt": "Weekday",
"post_new3_site": "13:50AM",
"post_new4_channel": "4.0",
"pre_new2_appt": "Weekday",
"pre_new3_site": "14:50AM",
"pre_new4_channel": "2.0"'
);
select
OBJECT_AGG(v2.key, v2.value)
from
tmp,
lateral flatten(input => v) v2
where
v2.key like 'post%'
group by
v2.seq;
【讨论】:
该解决方案适用于一个值(作为 parse_json 中的参数给出)但不适用于所有列值,我的意思是不能申请表 您是否有多个变体列或多行?它应该适用于多行(所有表格)。 with tmp as ( select parse_json(variant_column) v from table_name ) select OBJECT_AGG(v2.key, v2.value) as persisted_vars from tmp,lateral flatten(input => v) v2 where v2.key像 'post_evar%' group by v2.seq; 如果我们使用 flatten,它会对性能产生巨大的影响,例如,在我的例子中,我有大约 500 个键值对和一百万条记录,所以性能太低了,还有其他的吗方式而不是扁平化,我的意思是 UDF,SP。 FLATTEN/OBJECT_AGG 确实很慢。但是你总是可以写一个 javascript UDF 做你想做的事。以上是关于如何从雪花中的变体列创建子集?的主要内容,如果未能解决你的问题,请参考以下文章