在谷歌大查询中按未嵌套值分组时获取不同值的总和

Posted

技术标签:

【中文标题】在谷歌大查询中按未嵌套值分组时获取不同值的总和【英文标题】:Getting sum of distinct values when grouping by an unnested value in google big query 【发布时间】:2021-04-28 19:42:22 【问题描述】:

我正在查询有很多行的谷歌大查询表,但我感兴趣的看起来像这样:

date             fullVisitorId       hits.product.productSKU     hits.product.v2ProductName     hits.transaction.transactionId

20210427        63546815            MM52AF                      panda                           149816182
20210427        65198162            KGSA5A                      giraffe                         321498182

我正在尝试通过计算不同的hits.transaction.transactionId 来计算总交易量。

with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    product.productSKU as sku,
    product.v2ProductName as v2,
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select 
    month,
    product.productSKU as sku,
    product.v2ProductName as v2,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;

返回:

month               sku          v2      pdp            add_cart    conversions     transactions    
2021-04-01          AHBS         615     10146410       365569      46885           46640
2021-03-01          HERD         154     10074095       399483      58162           57811

但是transactions 不正确,我使用这个得到正确的输出:


with t1 as 
(
SELECT
    DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
    fullVisitorId,
    ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, -- changed this
    ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list, -- changed this
    case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
    0 AS views_impressions,
    count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
    count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
    0 AS add_shortlist,
    count(case when hits.ecommerceaction.action_type = '5' then fullvisitorid else null end) AS checkouts,
    count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
    count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
GROUP BY 1,2,5
)
select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;

返回正确的transactions

month       pdp         add_cart     conversions      transactions  
2021-04-01  9978511     396333       46885            30917 
2021-03-01  15101718    568904       58162            23017

但是使用这个:

...
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, 
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
...

不允许我在第二个 select 语句中分组或选择 productSKU_listproductName_list

我相信这是因为如果一个订单是用购物篮中的多个项目进行的,那么在谷歌大查询中会有多行具有相同的 hits.transaction.transactionId 我尝试通过以下方式确认:

select distinct(hits.transaction.transactionId), count(distinct hits.transaction.transactionId) as total
FROM `table` AS nr, 
    UNNEST(hits) hits,
    UNNEST(product) product
WHERE _TABLE_SUFFIX between '200101' AND '210428'  
GROUP BY 1
order by 2 desc

但我明白了:

transactionId   total   
ABSAD54         1   
515ABDG         1

所以在这一点上,我迷路了,因为我不确定如果我使用第二个脚本或者当我从第一个查询中注释掉这部分时,为什么我会得到正确的答案。

 --product.productSKU,
 --product.v2ProductName,

接受有关 google 大查询如何工作的任何提示。

我的目标是获得正确的 transactions 输出,这在第二个脚本中实现,但仍然能够分组并具有 product.productSKUproduct.v2ProductName 的值。

【问题讨论】:

【参考方案1】:

在您的第二个查询中,您需要再次聚合它们:

select 
    month,
    sum(views_pdp) as pdp 
    ,sum(add_cart) as add_cart
    ,sum(conversions) as conversions
    ,sum(transaction_id_cnt) as transactions
    ,ARRAY_AGG(productSKU_list)
    ,ARRAY_AGG(productName_list)
from t1
group by month
order by month desc;

【讨论】:

是否可以像我返回的第一个查询一样将它放在结构中?让 SKU 和名称成为单个项目而不是聚合? 如果他们不是分组的一部分(这不是你想要的),那么没有

以上是关于在谷歌大查询中按未嵌套值分组时获取不同值的总和的主要内容,如果未能解决你的问题,请参考以下文章

如何使用谷歌应用脚​​本在谷歌大查询中保存 2 个表?

在谷歌大查询中获得完全加入,在大查询中保持所有频率组合,让我只为所有类型的加入提供左加入

pandas DataFrame中按日期(在索引中)的加权平均分组(每列不同的操作)

谷歌大查询命令行执行复杂查询给出错误意外'('

如何将嵌套的 json 导入谷歌大查询

计算谷歌大查询中的每周留存率