2 个表连接:转化前的印象数
Posted
技术标签:
【中文标题】2 个表连接:转化前的印象数【英文标题】:2 tables join: Impressions count before Conversion 【发布时间】:2017-02-08 12:04:27 【问题描述】:我实际上不知道如何执行这样的查询。 我的 Google BigQuery 中有 2 个表:
第一个表(展示次数):
+-----------+--------+------------+-------+
| Timestamp | UserID | Event_Type | Count |
+-----------+--------+------------+-------+
| 100 | 111 | impression | 2 |
| 105 | 111 | impression | 1 |
| 110 | 111 | impression | 1 |
| 120 | 111 | impression | 2 |
| 100 | 222 | impression | 1 |
| 105 | 222 | impression | 1 |
| 110 | 222 | impression | 1 |
| 120 | 222 | impression | 1 |
+-----------+--------+------------+-------+
第二张表(转化):
+-----------+--------+------------+-------+
| Timestamp | UserID | Event_Type | Count |
+-----------+--------+------------+-------+
| 115 | 111 | conversion | 1 |
| 117 | 222 | conversion | 1 |
+-----------+--------+------------+-------+
我想要得到的 - 转换所需的每个用户的展示次数,所以我必须计算转换之前发生的所有展示次数(按时间戳 - 实际上是 unix 格式)。
+--------+--------------------+
| UserID | Impressions Needed |
+--------+--------------------+
| 111 | 4 |
| 222 | 3 |
+--------+--------------------+
我可以通过 UserID 加入这些表并获得印象和转换的总数,我可以将它们合并并按 UserID 和时间戳排序,但我不知道如何获得最终答案,所以很遗憾我在这里没有什么可展示的。我希望有办法做到这一点,这里有人可以帮助我。
答案是(标准 SQL):
SELECT t2.User_ID, COUNT(t1.User_ID) as ImpressionsNeeded
FROM
(
SELECT MIN(Event_Time) as Event_Time, User_ID, Advertiser_ID, Campaign_ID, count(*) AS Conv_Count
FROM ``db.dcm_account111111.activity_111111_*``
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170110' AND Advertiser_ID = '888888' AND Campaign_ID = '888888' AND Event_Sub_Type = 'POSTCLICK'
GROUP BY User_ID, Advertiser_ID, Campaign_ID
) as t2
LEFT JOIN
(
SELECT Event_Time, User_ID, Advertiser_ID, Campaign_ID, count(*) AS Imps_Count
FROM ``db.dcm_account111111.impression_111111_*``
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170110' AND Advertiser_ID = '888888' AND Campaign_ID = '888888'
GROUP BY Event_Time, User_ID, Advertiser_ID, Campaign_ID
) as t1
ON t1.User_ID = t2.User_ID AND t1.Advertiser_ID = t2.Advertiser_ID AND t1.Campaign_ID = t2.campaign_ID AND t1.Event_Time < t2.Event_Time
GROUP BY t2.User_ID
ORDER BY ImpressionsNeeded DESC
【问题讨论】:
【参考方案1】:这听起来像是left join
和聚合:
select t2.userid, count(t1.userid)
from table2 t2 left join
table1 t1
on t1.userid = t2.userid and
t1.event_type = 'impression' and
t1.timestamp < t2.timestamp
group by t2.userid;
【讨论】:
我已经添加了上面的示例代码(使用了你的答案,但是没有用) @EdgardGomezSennovskaya 。 . .切换到标准 SQL。 我选中了“使用旧版 SQL”复选框 @EdgardGomezSennovskaya 。 . .取消选中该框。 它现在似乎可以工作了。但是,我得到了一个 user_id 'AMsySZbs1sNtG7tCkB942LhR8B1y',前面有 3146 次展示。当我只是从印象表(相同日期范围)中查询该 user_id 的 COUNT 时,我得到 255。【参考方案2】:下面介绍了更一般的情况,当您可以确定每次转化(不仅仅是第一次转化)带来多少印象时 额外的好处 - 没有任何 显式 JOIN 和 GROUP BY
#standardSQL
WITH all_events AS (
SELECT ts, UserID, Event_Type, cnt FROM Impressions UNION ALL
SELECT ts, UserID, Event_Type, cnt FROM Conversions
)
SELECT ts as ConversionTS, UserID, cum_sum -
IFNULL(
SUM(cum_sum) OVER(PARTITION BY UserID, Event_Type ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0
) AS Impressions
FROM (
SELECT ts, UserID, Event_Type,
SUM(IF(Event_Type = 'impression', cnt, 0)) OVER(PARTITION BY UserID
ORDER BY ts) AS cum_sum
FROM all_events
)
WHERE Event_Type = 'conversion'
上面可以用下面的数据测试(作为例子)
WITH Impressions AS (
SELECT 100 AS ts, 111 AS UserID, 'impression' AS Event_Type, 2 AS cnt UNION ALL SELECT 105, 111, 'impression', 1 UNION ALL SELECT 110, 111, 'impression', 1 UNION ALL
SELECT 120, 111, 'impression', 2 UNION ALL SELECT 123, 111, 'impression', 2 UNION ALL SELECT 125, 111, 'impression', 1 UNION ALL SELECT 130, 111, 'impression', 1 UNION ALL
SELECT 140, 111, 'impression', 2 UNION ALL SELECT 100, 222, 'impression', 1 UNION ALL SELECT 105, 222, 'impression', 1 UNION ALL SELECT 110, 222, 'impression', 1 UNION ALL
SELECT 120, 222, 'impression', 1 UNION ALL SELECT 130, 222, 'impression', 1 UNION ALL SELECT 135, 222, 'impression', 1 UNION ALL SELECT 140, 222, 'impression', 1 UNION ALL
SELECT 150, 222, 'impression', 1
),
Conversions AS (
SELECT 115 AS ts, 111 AS UserID, 'conversion' AS Event_Type, 1 AS cnt UNION ALL
SELECT 135, 111, 'conversion', 1 UNION ALL SELECT 117, 222, 'conversion', 1 UNION ALL SELECT 147, 222, 'conversion', 1
)
预期结果如下
ConversionTS UserID Impressions
115 111 4
135 111 6
117 222 3
147 222 4
【讨论】:
【参考方案3】:好的,我明白了。由于转换表具有相同 User_ID 的多行,因此我的结果成倍增加。所以我必须在查询转换表时使用 MIN ,然后才使用 LEFT JOIN Impression 表。 修复了上面的代码。 谢谢戈登!
【讨论】:
本网站的工作方式是标记回答您的问题和答案的答案,而不是添加您自己的评论作为答案。我建议删除它并勾选@Gordons 答案以将其标记为正确。以上是关于2 个表连接:转化前的印象数的主要内容,如果未能解决你的问题,请参考以下文章
SQL - 内连接 2 个表,但如果 1 个表为空,则返回所有表