优化 7 天动态队列 Firebase BigQuery

Posted

技术标签:

【中文标题】优化 7 天动态队列 Firebase BigQuery【英文标题】:Optimizing a Dynamic 7-Day Cohort Firebase BigQuery 【发布时间】:2018-08-23 16:09:49 【问题描述】:

我针对我们的移动应用数据编写了以下查询。由于用户群较高,当我在底部添加 ORDER BY 时,我收到 400 请求错误 "Resources exceeded during query execution: The query could not be executed in the allotted memory"

问题:我可以做些什么来优化查询,但仍保留底部的ORDER BY

我已经在 firebase 的演示数据集中添加了,但我认为他们的数据集太小了,不会有问题(与我的数据集相比,它有 5-10 百万条记录)。

SELECT 
  f.user_pseudo_id,
  f.event_timestamp, 
  DATE(TIMESTAMP_MICROS(f.event_timestamp)) as event_timestamp_date,
  f.event_name,
  f.user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(f.user_first_touch_timestamp)) as user_first_touch_date,
  CASE WHEN r.has_appRemove >= 1 THEN "removed" ELSE "not-removed" END AS status_after_first7days
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` f
LEFT JOIN (
    SELECT user_pseudo_id, 1 has_appRemove
    FROM `firebase-analytics-sample-data.ios_dataset.app_events_*`
    WHERE DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
      AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
      AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
      AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
      AND platform = "android"
      AND event_name = "app_remove"
    GROUP BY user_pseudo_id
    ) r on f.user_pseudo_id = r.user_pseudo_id
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

【问题讨论】:

这会产生多少行?你可以使用限制吗? cloud.google.com/bigquery/docs/… 嗨,Elliot,很遗憾,我不能应用 LIMIT,因为我正在使用此查询将数据写入长期表(每天通过 bigquery 任务调度程序)。我使用了 Mikhail 的分区答案,它完成了工作:) 太棒了!我很高兴米哈伊尔的回答对你有用。请注意,如果您将结果写入另一个表,ORDER BY 是没有意义的,因为表不保留顺序;只有查询结果。 【参考方案1】:

您可以应用窗口/分析功能而不是加入 - 如下例所示(未测试)

#standardSQL
SELECT 
  user_pseudo_id,
  event_timestamp, 
  DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_timestamp_date,
  event_name,
  user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_date,
  COUNTIF(event_name = "app_remove") OVER(PARTITION BY user_pseudo_id) > 0 isRemoved
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` 
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

【讨论】:

以上是关于优化 7 天动态队列 Firebase BigQuery的主要内容,如果未能解决你的问题,请参考以下文章

Firebase Analytics 中的事件未显示在 Big Query 中

单调队列优化动态规划

HDU 3401 Trade(单调队列优化)

Leetcode刷题100天—面试题 17.14. 最小K个数(优先队列)—day27

Leetcode刷题100天—面试题 17.14. 最小K个数(优先队列)—day27

P2569 [SCOI2010]股票交易 dp 单调队列优化