Google BigQuery SQL:计算来自其他商店的用户

Posted

技术标签:

【中文标题】Google BigQuery SQL:计算来自其他商店的用户【英文标题】:Google BigQuery SQL: Calculate users that come from other shops 【发布时间】:2021-03-29 13:07:40 【问题描述】:

我需要按商店计算唯一用户,其中第一次访问是在另一家商店。 我有两张桌子: 访问次数

ShopID  UserID
10      1001
11      1002
12      1001
13      1002
14      1001
15      1003
16      1005
17      1002
18      1003
10      1005
11      1003
12      1002
13      1005

和 首次访问:

UserID  First ShopID
1001       10
1002       13
1003       18
1005       16

需要输出为

ShopID  Total Users from other shops
10               0
11               2
12               2
13               1
14               1
15               1
16               0
17               1
18               0

我可以为单个 ShopID 计算,但不能为每个 ShopID 动态计算:

SELECT 
shopid,
COUNT (DISTINCT UserID) AS TOTAL_USERS
FROM project.dataset.table_visits
WHERE shopid=12
AND UserID IN
(
    SELECT UserID
    FROM project.dataset.table_first_visit
    WHERE shopid<>12
)
GROUP BY shopid

如何为每个 ShopID 动态完成这项工作?

【问题讨论】:

【参考方案1】:

试试这个:

with visits as (
  select 10 as shopid, 1001 as userid union all
  select 11, 1002 union all
  select 12, 1001 union all
  select 13, 1002 union all
  select 14, 1001 union all
  select 15, 1003 union all
  select 16, 1005 union all
  select 17, 1002 union all
  select 18, 1003 union all
  select 10, 1005 union all
  select 11, 1003 union all
  select 12, 1002 union all
  select 13, 1005)
, first_visit as (
  select 1001 as userid, 10 as first_shopid union all
  select 1002, 13 union all
  select 1003, 18 union all
  select 1005, 16
)
select
  shopid,
  count(distinct if(shopid != first_shopid, userid, null)) as users_from_other_shop
from visits join first_visit using(userid)
group by shopid
order by shopid

【讨论】:

【参考方案2】:

嗯。 . .我想你想要一个left join 和聚合:

select v.shop_id,
       count(*) as total_visits,
       count(distinct v.userId) as total_users,
       count(distinct case when fv.userId is null then v.userId end) as total_users_from_other_shops
from `project.dataset.table_visits` v left join
     `project.dataset.table_first_visit` fv
     on fv.userId = v.userId
group by v.shop_id

【讨论】:

【参考方案3】:

考虑下面的无连接解决方​​案(我希望在执行持续时间方面更有效,在插槽消耗方面订单更有效)

select shopid, sum(flag) users_from_other_shop
from (
  select distinct shopid, userid, 1 flag
  from `project.dataset.table_visits` 
  union all 
  select distinct first_shopid, userid, -1 
  from `project.dataset.table_first_visit` 
)
group by shopid   

如果应用于您问题中的样本数据 - 输出是

【讨论】:

以上是关于Google BigQuery SQL:计算来自其他商店的用户的主要内容,如果未能解决你的问题,请参考以下文章

用于 Google BigQuery 的 SQL 查询以计算会话和浏览量

有没有办法将来自多个来源的数据与 Google 的新 BigQuery 混合?

如何计算 Google BigQuery 中多列的中位数?

Google Bigquery - 运行参数化查询 - php

在 Google BigQuery UI 中识别奇怪查询的来源

BigQuery API 通信公共互联网/谷歌内网