在一行 BigQuery 中获取今年的销售额和去年的销售额(对于同一周数)
Posted
技术标签:
【中文标题】在一行 BigQuery 中获取今年的销售额和去年的销售额(对于同一周数)【英文标题】:Getiing This Year sales and Last Year Sales (for the same week number) in one row BigQuery 【发布时间】:2020-02-25 04:35:52 【问题描述】:在 BigQuery 中,我试图在一行中创建一个包含 Sales TY(今年财务)和 Sales LY 的视图。按照下面的示例,销售 LY 度量应针对相同的相应周数 TY。
我正在使用以下代码:
SUM(table_1.sales) OVER (PARTITION BY table_1.client ORDER BY table_1.fw_end_date DESC ROWS BETWEEN 53 FOLLOWING AND 53 FOLLOWING ) as SALES_LY
问题是有时缺少周数,这意味着窗口函数不会反映它,因为它计算的是行数而不是实际周数。
在下面的示例中,缺少本财政年度的第 19 周和第 20 周,因此 sales_ly 度量的相应结果是错误的。
是否可以通过添加/加入具有 NULL 值的缺失日期的附加行来修复?
row_number client fy wofy fw_end_date year sales sales_LY
1 1111 2020 34 2020-02-23 TY 74.97971177 75.63215281
2 1111 2020 33 2020-02-16 TY 42.42109122 68.14894689
3 1111 2020 32 2020-02-09 TY 19.85037174 87.12654065
4 1111 2020 31 2020-02-02 TY 16.56226835 3.122137476
5 1111 2020 30 2020-01-26 TY 1.800185225 10.74736963
6 1111 2020 29 2020-01-19 TY 24.75012318 38.63631908
7 1111 2020 28 2020-01-12 TY 25.25663409 1.387588554
8 1111 2020 27 2020-01-05 TY 72.26309414 0.873563634
9 1111 2020 26 2019-12-29 TY 48.42015566 81.57363107
10 1111 2020 25 2019-12-22 TY 29.94857681 41.80248501
11 1111 2020 24 2019-12-15 TY 60.84110104 26.66495665
12 1111 2020 23 2019-12-08 TY 6.810985936 48.21324987
13 1111 2020 22 2019-12-01 TY 55.25000762 8.681731452
14 1111 2020 21 2019-11-24 TY 7.314435129 61.50365765
15 1111 2020 18 2019-11-03 TY 28.86674749 84.68936948 --wrong
16 1111 2020 17 2019-10-27 TY 93.06304298
17 1111 2020 16 2019-10-20 TY 31.18234699
18 1111 2020 15 2019-10-13 TY 56.40700057
19 1111 2020 14 2019-10-06 TY 70.7995385
20 1111 2020 13 2019-09-29 TY 88.80009525
21 1111 2020 12 2019-09-22 TY 75.12011037
22 1111 2020 11 2019-09-15 TY 28.54977137
23 1111 2020 10 2019-09-08 TY 38.05238915
24 1111 2020 9 2019-09-01 TY 7.256419393
25 1111 2020 8 2019-08-25 TY 32.81145188
26 1111 2020 7 2019-08-18 TY 32.04938194
27 1111 2020 6 2019-08-11 TY 53.890913
28 1111 2020 5 2019-08-04 TY 17.4527262
29 1111 2020 4 2019-07-28 TY 66.08187866
30 1111 2020 3 2019-07-21 TY 9.331124689
31 1111 2020 2 2019-07-14 TY 61.35972079
32 1111 2020 1 2019-07-07 TY 68.02729471
33 1111 2019 52 2019-06-23 LY 65.09319706
34 1111 2019 51 2019-06-16 LY 46.3647103
35 1111 2019 50 2019-06-09 LY 45.04742519
36 1111 2019 49 2019-06-02 LY 10.72003618
37 1111 2019 48 2019-05-26 LY 69.73143446
38 1111 2019 47 2019-05-19 LY 8.3106988
39 1111 2019 46 2019-05-12 LY 18.53940931
40 1111 2019 45 2019-05-05 LY 78.27473501
41 1111 2019 44 2019-04-28 LY 4.799286544
42 1111 2019 43 2019-04-21 LY 94.58933139
43 1111 2019 42 2019-04-14 LY 27.06059414
44 1111 2019 41 2019-04-07 LY 53.42375151
45 1111 2019 40 2019-03-31 LY 22.92189479
46 1111 2019 39 2019-03-24 LY 95.3449253
47 1111 2019 38 2019-03-17 LY 62.34159147
48 1111 2019 37 2019-03-10 LY 73.99278829
49 1111 2019 36 2019-03-03 LY 39.517017
50 1111 2019 35 2019-02-24 LY 50.46146309
51 1111 2019 34 2019-02-17 LY 75.63215281
52 1111 2019 33 2019-02-10 LY 68.14894689
53 1111 2019 32 2019-02-03 LY 87.12654065
54 1111 2019 31 2019-01-27 LY 3.122137476
55 1111 2019 30 2019-01-20 LY 10.74736963
56 1111 2019 29 2019-01-13 LY 38.63631908
57 1111 2019 28 2019-01-06 LY 1.387588554
58 1111 2019 27 2018-12-30 LY 0.873563634
59 1111 2019 26 2018-12-23 LY 81.57363107
60 1111 2019 25 2018-12-16 LY 41.80248501
61 1111 2019 24 2018-12-09 LY 26.66495665
62 1111 2019 23 2018-12-02 LY 48.21324987
63 1111 2019 22 2018-11-25 LY 8.681731452
64 1111 2019 21 2018-11-18 LY 61.50365765
65 1111 2019 20 2018-11-11 LY 84.68936948
【问题讨论】:
【参考方案1】:不要跳过 53 行,而是去搜索确切的 1 年前日期:
WITH data AS (
SELECT *
FROM `fh-bigquery.weather_gsod.all`
WHERE date BETWEEN '2018-12-01' AND '2020-02-24'
AND name LIKE 'SAN FRANCISCO INTERNATIONAL A'
), main_query AS (
SELECT name, date, temp
, ARRAY_AGG(STRUCT(date, temp)) OVER(PARTITION BY name ORDER BY date ROWS BETWEEN 366 PRECEDING AND 310 PRECEDING ) over_array
FROM data a
)
SELECT * EXCEPT(over_array)
, (SELECT temp FROM UNNEST(over_array) WHERE date=DATE_SUB(a.date, INTERVAL 1 year)) prev_year
FROM main_query a
ORDER BY name, date DESC
我用几天来做这个 - 你可以用几周来做同样的事情。
解决了数周:
WITH data AS (
SELECT ROUND(AVG(temp),1) temp, DATE_TRUNC(date, week) week, name
FROM `fh-bigquery.weather_gsod.all`
WHERE date BETWEEN '2018-12-01' AND '2020-02-24'
AND name LIKE 'SAN FRANCISCO INTERNATIONAL A'
GROUP BY name, week
), main_query AS (
SELECT name, week, temp
, ARRAY_AGG(STRUCT(week, temp))
OVER(PARTITION BY name ORDER BY week
ROWS BETWEEN 53 PRECEDING AND 40 PRECEDING ) over_array
FROM data a
)
SELECT * EXCEPT(over_array)
, (SELECT temp FROM UNNEST(over_array)
WHERE EXTRACT(YEAR FROM week)+1=EXTRACT(YEAR FROM a.week)
AND EXTRACT(WEEK FROM week)=EXTRACT(WEEK FROM a.week)) prev_year
FROM main_query a
ORDER BY name, week DESC
【讨论】:
您好 Felipe,感谢您的回答。我已经尝试过您的代码,它为 prev_year 提供了 NULL。我想这是因为我使用的是财政周结束日期而不是日历日。例如,对于 fy_end_date 2020-02-21,去年的对应周是 2019-02-17(就财务周数而言,它们都是第 34 周),而 BigQuery 会搜索 2019-02-21 周而不是“ t 存在于数据集中,因为它不是周末结束日期。可以修吗?以上是关于在一行 BigQuery 中获取今年的销售额和去年的销售额(对于同一周数)的主要内容,如果未能解决你的问题,请参考以下文章