在谷歌大查询中按未嵌套值分组时获取不同值的总和
Posted
技术标签:
【中文标题】在谷歌大查询中按未嵌套值分组时获取不同值的总和【英文标题】:Getting sum of distinct values when grouping by an unnested value in google big query 【发布时间】:2021-04-28 19:42:22 【问题描述】:我正在查询有很多行的谷歌大查询表,但我感兴趣的看起来像这样:
date fullVisitorId hits.product.productSKU hits.product.v2ProductName hits.transaction.transactionId
20210427 63546815 MM52AF panda 149816182
20210427 65198162 KGSA5A giraffe 321498182
我正在尝试通过计算不同的hits.transaction.transactionId
来计算总交易量。
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
product.productSKU as sku,
product.v2ProductName as v2,
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
product.productSKU as sku,
product.v2ProductName as v2,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
返回:
month sku v2 pdp add_cart conversions transactions
2021-04-01 AHBS 615 10146410 365569 46885 46640
2021-03-01 HERD 154 10074095 399483 58162 57811
但是transactions
不正确,我使用这个得到正确的输出:
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, -- changed this
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list, -- changed this
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
0 AS views_impressions,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
0 AS add_shortlist,
count(case when hits.ecommerceaction.action_type = '5' then fullvisitorid else null end) AS checkouts,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
返回正确的transactions
month pdp add_cart conversions transactions
2021-04-01 9978511 396333 46885 30917
2021-03-01 15101718 568904 58162 23017
但是使用这个:
...
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
...
不允许我在第二个 select 语句中分组或选择 productSKU_list
和 productName_list
。
我相信这是因为如果一个订单是用购物篮中的多个项目进行的,那么在谷歌大查询中会有多行具有相同的 hits.transaction.transactionId
我尝试通过以下方式确认:
select distinct(hits.transaction.transactionId), count(distinct hits.transaction.transactionId) as total
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
WHERE _TABLE_SUFFIX between '200101' AND '210428'
GROUP BY 1
order by 2 desc
但我明白了:
transactionId total
ABSAD54 1
515ABDG 1
所以在这一点上,我迷路了,因为我不确定如果我使用第二个脚本或者当我从第一个查询中注释掉这部分时,为什么我会得到正确的答案。
--product.productSKU,
--product.v2ProductName,
接受有关 google 大查询如何工作的任何提示。
我的目标是获得正确的 transactions
输出,这在第二个脚本中实现,但仍然能够分组并具有 product.productSKU
和 product.v2ProductName
的值。
【问题讨论】:
【参考方案1】:在您的第二个查询中,您需要再次聚合它们:
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
,ARRAY_AGG(productSKU_list)
,ARRAY_AGG(productName_list)
from t1
group by month
order by month desc;
【讨论】:
是否可以像我返回的第一个查询一样将它放在结构中?让 SKU 和名称成为单个项目而不是聚合? 如果他们不是分组的一部分(这不是你想要的),那么没有以上是关于在谷歌大查询中按未嵌套值分组时获取不同值的总和的主要内容,如果未能解决你的问题,请参考以下文章
在谷歌大查询中获得完全加入,在大查询中保持所有频率组合,让我只为所有类型的加入提供左加入