统计每个会计年度的不同客户数量并在查询结果中显示所有日期
Posted
技术标签:
【中文标题】统计每个会计年度的不同客户数量并在查询结果中显示所有日期【英文标题】:Count distinct number of customers per fiscal year and display all dates in query result 【发布时间】:2021-03-17 10:42:48 【问题描述】:DB-Fiddle
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
order_date DATE,
customerID VARCHAR(255)
);
INSERT INTO customers
(order_date, customerID
)
VALUES
('2020-01-15', 'Customer_01'),
('2020-02-03', 'Customer_01'),
('2020-02-15', 'Customer_01'),
('2020-03-18', 'Customer_01'),
('2020-03-20', 'Customer_01'),
('2020-04-22', 'Customer_01'),
('2021-01-19', 'Customer_01'),
('2020-01-25', 'Customer_02'),
('2020-02-26', 'Customer_02'),
('2020-11-23', 'Customer_02'),
('2021-01-17', 'Customer_02'),
('2021-02-20', 'Customer_02');
预期结果:
order_date | quantity
| (fiscal year)
-------------|----------------------------------------------------
2020-01-15 | 1 --> Customer_01 appears the first time between 2019-03 and 2020-02
2020-01-25 | 1 --> Customer_02 appears the first time between 2019-03 and 2020-02
2020-02-03 | 0
2020-02-15 | 0
2020-02-26 | 0
2020-03-18 | 1 --> Customer_01 appears the first time between 2020-03 and 2021-02
2020-03-20 | 0
2020-04-22 | 0
2020-11-23 | 1 --> Customer_02 appears the first time between 2020-03 and 2021-02
2021-01-17 | 0
2021-01-19 | 0
2021-02-20 | 0
在上面的结果中,我想列出所有order dates
并计算每个财政年度的customers
不同的数量。 fiscal year
在日历年之后两个月开始,因此从 March
变为 February
。
(例如,从 2020-03
到 2021-02
)。
例如Customer_01
在2020-03
财政年度内首次出现在2020-03-18
直到2021-02
。
因此,这个order_date
被分配给它1
。
如果客户在会计年度内再次出现,则下一个order_date
将被分配给它0
。
参考MariaDB
中的this question,我能够达到预期的结果,正如您在DB-Fiddle 中看到的那样。
但是,现在我想使用 postgresSQL
获得相同的结果。
因此,到目前为止,我已将查询修改为:
SELECT
order_date,
SUM(rn = 1) AS quantity
FROM
(SELECT
order_date,
row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date) rn
FROM customers
) t
GROUP BY 1;
但是,现在我在SUM(rn = 1)
部分收到错误function sum(boolean) does not exist
。 postgresSQL
中的 SUM(rn = 1)
的等效语法是什么才能达到预期的结果?
【问题讨论】:
【参考方案1】:这个问题有两个部分。一是确定财政年度。第二个是做不同的计数。
第一个是通过减去两个月并提取日期来解决的。第二个是合乎逻辑的:
select c.*,
count(distinct customerId) over (partition by fyyyy order by order_date)
from (select c.*, date_trunc('year', order_date - interval '2 month') as fyyyy
from customers c
) c
order by date;
不幸的是,这在 Postgres 中不起作用。但您可以只计算客户第一次出现的时间:
select c.*,
count(*) filter (where seqnum = 1) over (partition by fyyyy order by order_date)
from (select c.*,
date_trunc('year', order_date - interval '2 month') as fyyyy,
row_number() over (partition by customerId, date_trunc('year', order_date - interval '2 month')
order by order_date
) as seqnum
from customers c
) c
order by order_date;
Here 是一个 dbfiddle。
【讨论】:
所以没有办法使用 postgresSQL 获得预期的结果? 红移有可能还是会出现同样的问题? 问题被标记为 Postgres,而不是 Redshfit。你已经知道它们是不同的。在任何情况下,您只需将filter
替换为 sum( (seqnum = 1)::int )
即可获得 Redshfit。【参考方案2】:
经过进一步调查,我想出了以下解决方案:
DB-Fiddle
SELECT
order_date,
(CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity
FROM
(SELECT
order_date,
(row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date)) AS rolling_count
FROM customers
ORDER BY 1
) t
GROUP BY 1,2
ORDER BY 1;
这里比较的是查询的MariaDB:
SELECT
order_date,
(CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity
FROM
(SELECT
order_date,
(row_number() over(PARTITION BY YEAR(order_date - INTERVAL 2 MONTH), customerID ORDER BY order_date)) AS rolling_count
FROM customers
ORDER BY 1
) t
GROUP BY 1
ORDER BY 1;
【讨论】:
以上是关于统计每个会计年度的不同客户数量并在查询结果中显示所有日期的主要内容,如果未能解决你的问题,请参考以下文章
SQL一次性查询一个字段不同条件下的统计结果(另一张表的统计数量)