按分区上的 MIN(日期)过滤 |数据洞察
Posted
技术标签:
【中文标题】按分区上的 MIN(日期)过滤 |数据洞察【英文标题】:Filter by MIN(date) over partition | Data Studio 【发布时间】:2020-06-23 16:31:10 【问题描述】:我正在尝试将 BigQuery 中的日期参数连接到 Data Studio,因此我将一些日期变量添加到我的查询中。但是,我在这个日期过滤时遇到了一些问题。
这是我的查询:
SELECT first_item,
COUNT(*) AS first_purchases,
SUM(purchases_within_90_days) AS purchased_within_90_days,
SUM(purchases_within_180_days) AS purchased_within_180_days,
SUM(purchases_within_270_days) AS purchased_within_270_days,
SUM(revenue90days) as total_revenue_90,
SUM(revenue180days) as total_revenue_180,
SUM(revenue270days) as total_revenue_270
FROM (
SELECT email, first_item, processed_at,
SUM(purch_90_days) OVER(PARTITION BY email) AS purchases_within_90_days, SUM(rev_90) OVER(PARTITION BY email) AS revenue90days,
SUM(purch_180days) OVER(PARTITION BY email) AS purchases_within_180_days, SUM(rev_180) OVER(PARTITION BY email) AS revenue180days,
SUM(purch_270days) OVER(PARTITION BY email) AS purchases_within_270_days, SUM(rev_270) OVER(PARTITION BY email) AS revenue270days
FROM (
SELECT email, first_item, processed_at, SUM(purchases_within_90_days) as purch_90_days, SUM(purchases_within_180_days) as purch_180days, SUM(purchases_within_270_days) as purch_270days, SUM(revenue_within_90_days) as rev_90, SUM(revenue_within_180_days) as rev_180, SUM(revenue_within_270_days) as rev_270
FROM (
SELECT email, processed_at, first_item, MAX(CASE WHEN hours_since_first_purchase < 90 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_90_days,
MAX(CASE WHEN hours_since_first_purchase < 180 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_180_days,
MAX(CASE WHEN hours_since_first_purchase < 270 * 24 AND hours_since_first_purchase > 0 THEN 1 ELSE 0 END) AS purchases_within_270_days,
SUM(CASE WHEN hours_since_first_purchase < 90 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_90_days,
SUM(CASE WHEN hours_since_first_purchase < 180 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_180_days,
SUM(CASE WHEN hours_since_first_purchase < 270 * 24 AND hours_since_first_purchase > 0 THEN price ELSE 0 END) AS revenue_within_270_days,
FROM (
SELECT order_number, email, processed_at, sku, price, hours_since_first_purchase, first_date,
CASE
WHEN hours_since_first_purchase = 0 OR hours_since_first_purchase is null then sku
else null
end as first_item,
FROM (
SELECT order_number, customer.id, email, MIN(processed_at) over(partition by email) as first_date, processed_at, title, price,sku,
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY customer.id ORDER BY processed_at) = 1 THEN null
ELSE TIMESTAMP_DIFF(processed_at, FIRST_VALUE(processed_at) OVER(PARTITION BY customer.id ORDER BY processed_at), HOUR)
END AS hours_since_first_purchase
FROM (
SELECT * EXCEPT(instance, line_items) FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
), UNNEST(line_items) as item
-- identify duplicate rows
WHERE instance = 1
)
order by email desc
)
where first_date > PARSE_DATE('%Y%m%d', @DS_START_DATE) and first_date < PARSE_DATE('%Y%m%d', @DS_END_DATE);
--where first_date <= '2019-09-28'--and first_date > '2020-06-07'
)
group by first_item, email, processed_at
)
where email <> ""
group by email, first_item,processed_at
order by processed_at asc
)
order by processed_at asc
)
where first_item is not null and first_item <> "" and first_item <> "unknown" and first_item not like '%variant%' and first_item not like '%product%'
group by first_item
当我尝试过滤 first_date 变量时,Data Studio 的查询出现错误。我可以做些什么来过滤我添加的这个新变量吗?
我收到错误" "查询返回错误"
导致此错误的代码行如下:
where first_date > PARSE_DATE('%Y%m%d', @DS_START_DATE) and first_date < PARSE_DATE('%Y%m%d', @DS_END_DATE)
当我使用以下内容切换该行时,我的查询执行完美:
where first_date <= '2019-09-28'--and first_date > '2020-06-07'
更新:
这是非常接近工作。当我应用了 1 个过滤器时它可以工作,但是当我应用第 2 个过滤器时,它会抛出相同的错误。
当我添加这一行时它可以工作:
where cast(first_date as date) <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
但是当我遇到这个时再次抛出该错误:
where cast(first_date as date) <= PARSE_DATE('%Y%m%d', @DS_END_DATE) and cast(first_date as date) >= PARSE_DATE('%Y%m%d', @DS_START_DATE)
【问题讨论】:
您不能在与定义它们的SELECT
相同级别的任何其他子句中引用列别名。使用子查询或 CTE。
您是否在 DataStudio 中运行此查询?
是的 - 对不起。更新了上面的文字以反映这一点。我在 Data Studio 中运行它,它给了我一个错误。我可以用硬编码的日期过滤 first_date 变量,但不是我目前拥有的。
你能分享你得到的错误吗?我无法理解您的问题是否与查询中的参数或某些内容有关。此外,如果可能的话,以文本形式分享您的查询,以便其他人更容易重现您的问题
【参考方案1】:
您的 first_date 字段可能不是DATE
,而是TIMESTAMP
为了向您展示这个问题,我将使用一个公共表 (bigquery-public-data.covid19_italy.data_by_region)
如下图所示,该表有一个名为 date 的 TIMESTAMP 字段。
为了重现您的问题,我将尝试通过DataStudio
访问此表。
在DataStudio
,如果我尝试你的方法,我会收到一个错误,如下所示
1 - 查询
2 - 错误
但是,如果我将查询更改为下面的查询,它可以正常工作,如您在图像中看到的那样。
SELECT * FROM `bigquery-public-data.covid19_italy.data_by_region` WHERE cast(date as date) < PARSE_DATE('%Y%m%d',@DS_START_DATE)
1 - 更新查询
2 - 仪表板工作
【讨论】:
在上面添加了更新的文本。这几乎可以完美运行。 我使用 BETWEEN 而不是 AND,这似乎有效。谢谢! 不客气 :) 如果这篇文章回答了您的问题,请考虑奖励赏金以上是关于按分区上的 MIN(日期)过滤 |数据洞察的主要内容,如果未能解决你的问题,请参考以下文章