2 个表连接:转化前的印象数

Posted

技术标签:

【中文标题】2 个表连接:转化前的印象数【英文标题】:2 tables join: Impressions count before Conversion 【发布时间】:2017-02-08 12:04:27 【问题描述】:

我实际上不知道如何执行这样的查询。 我的 Google BigQuery 中有 2 个表:

第一个表(展示次数):

+-----------+--------+------------+-------+
| Timestamp | UserID | Event_Type | Count |
+-----------+--------+------------+-------+
|       100 |    111 | impression |     2 |
|       105 |    111 | impression |     1 |
|       110 |    111 | impression |     1 |
|       120 |    111 | impression |     2 |
|       100 |    222 | impression |     1 |
|       105 |    222 | impression |     1 |
|       110 |    222 | impression |     1 |
|       120 |    222 | impression |     1 |
+-----------+--------+------------+-------+ 

第二张表(转化):

+-----------+--------+------------+-------+
| Timestamp | UserID | Event_Type | Count |
+-----------+--------+------------+-------+
|       115 |    111 | conversion |     1 |
|       117 |    222 | conversion |     1 |
+-----------+--------+------------+-------+ 

我想要得到的 - 转换所需的每个用户的展示次数,所以我必须计算转换之前发生的所有展示次数(按时间戳 - 实际上是 unix 格式)。

+--------+--------------------+
| UserID | Impressions Needed |
+--------+--------------------+
|    111 |                  4 |
|    222 |                  3 |
+--------+--------------------+

我可以通过 UserID 加入这些表并获得印象和转换的总数,我可以将它们合并并按 UserID 和时间戳排序,但我不知道如何获得最终答案,所以很遗憾我在这里没有什么可展示的。我希望有办法做到这一点,这里有人可以帮助我。

答案是(标准 SQL):

SELECT t2.User_ID, COUNT(t1.User_ID) as ImpressionsNeeded FROM ( SELECT MIN(Event_Time) as Event_Time, User_ID, Advertiser_ID, Campaign_ID, count(*) AS Conv_Count FROM ``db.dcm_account111111.activity_111111_*`` WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170110' AND Advertiser_ID = '888888' AND Campaign_ID = '888888' AND Event_Sub_Type = 'POSTCLICK' GROUP BY User_ID, Advertiser_ID, Campaign_ID ) as t2 LEFT JOIN ( SELECT Event_Time, User_ID, Advertiser_ID, Campaign_ID, count(*) AS Imps_Count FROM ``db.dcm_account111111.impression_111111_*`` WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170110' AND Advertiser_ID = '888888' AND Campaign_ID = '888888' GROUP BY Event_Time, User_ID, Advertiser_ID, Campaign_ID ) as t1 ON t1.User_ID = t2.User_ID AND t1.Advertiser_ID = t2.Advertiser_ID AND t1.Campaign_ID = t2.campaign_ID AND t1.Event_Time < t2.Event_Time GROUP BY t2.User_ID ORDER BY ImpressionsNeeded DESC

【问题讨论】:

【参考方案1】:

这听起来像是left join 和聚合:

select t2.userid, count(t1.userid)
from table2 t2 left join
     table1 t1
     on t1.userid = t2.userid and
        t1.event_type = 'impression' and
        t1.timestamp < t2.timestamp
group by t2.userid;

【讨论】:

我已经添加了上面的示例代码(使用了你的答案,但是没有用) @EdgardGomezSennovskaya 。 . .切换到标准 SQL。 我选中了“使用旧版 SQL”复选框 @EdgardGomezSennovskaya 。 . .取消选中该框。 它现在似乎可以工作了。但是,我得到了一个 user_id 'AMsySZbs1sNtG7tCkB942LhR8B1y',前面有 3146 次展示。当我只是从印象表(相同日期范围)中查询该 user_id 的 COUNT 时,我得到 255。【参考方案2】:

下面介绍了更一般的情况,当您可以确定每次转化(不仅仅是第一次转化)带来多少印象时 额外的好处 - 没有任何 显式 JOIN 和 GROUP BY

#standardSQL
WITH all_events AS (
  SELECT ts, UserID, Event_Type, cnt FROM Impressions UNION ALL
  SELECT ts, UserID, Event_Type, cnt FROM Conversions
)
SELECT ts as ConversionTS, UserID, cum_sum - 
  IFNULL(
    SUM(cum_sum) OVER(PARTITION BY UserID, Event_Type ORDER BY ts 
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0
  ) AS Impressions
FROM (
  SELECT ts, UserID, Event_Type, 
    SUM(IF(Event_Type = 'impression', cnt, 0)) OVER(PARTITION BY UserID 
      ORDER BY ts) AS cum_sum
  FROM all_events
)
WHERE Event_Type = 'conversion'

上面可以用下面的数据测试(作为例子)

WITH Impressions AS (
  SELECT 100 AS ts, 111 AS UserID, 'impression' AS Event_Type, 2 AS cnt UNION ALL SELECT 105, 111, 'impression', 1 UNION ALL SELECT 110, 111, 'impression', 1 UNION ALL
  SELECT 120, 111, 'impression', 2 UNION ALL SELECT 123, 111, 'impression', 2 UNION ALL SELECT 125, 111, 'impression', 1 UNION ALL SELECT 130, 111, 'impression', 1 UNION ALL
  SELECT 140, 111, 'impression', 2 UNION ALL SELECT 100, 222, 'impression', 1 UNION ALL SELECT 105, 222, 'impression', 1 UNION ALL SELECT 110, 222, 'impression', 1 UNION ALL
  SELECT 120, 222, 'impression', 1 UNION ALL SELECT 130, 222, 'impression', 1 UNION ALL SELECT 135, 222, 'impression', 1 UNION ALL SELECT 140, 222, 'impression', 1 UNION ALL
  SELECT 150, 222, 'impression', 1 
),
Conversions AS (
  SELECT 115 AS ts, 111 AS UserID, 'conversion' AS Event_Type, 1 AS cnt UNION ALL
  SELECT 135, 111, 'conversion', 1 UNION ALL SELECT 117, 222, 'conversion', 1 UNION ALL SELECT 147, 222, 'conversion', 1 
)

预期结果如下

ConversionTS    UserID  Impressions  
115             111     4    
135             111     6    
117             222     3    
147             222     4    

【讨论】:

【参考方案3】:

好的,我明白了。由于转换表具有相同 User_ID 的多行,因此我的结果成倍增加。所以我必须在查询转换表时使用 MIN ,然后才使用 LEFT JOIN Impression 表。 修复了上面的代码。 谢谢戈登!

【讨论】:

本网站的工作方式是标记回答您的问题和答案的答案,而不是添加您自己的评论作为答案。我建议删除它并勾选@Gordons 答案以将其标记为正确。

以上是关于2 个表连接:转化前的印象数的主要内容,如果未能解决你的问题,请参考以下文章

SQL - 内连接 2 个表,但如果 1 个表为空,则返回所有表

SQL怎么连接查询2个表?

在pyspark中加入2个表,多个条件,左连接?

MySql - 加入 2 个表,其中第二个表连接了两次

LARAVEL 如何连接 2 个不同的数据库并连接不同服务器上的 2 个表?

SQL怎么连接查询2个表?