根据给定日期的最大状态计数,并包含分组数据
Posted
技术标签:
【中文标题】根据给定日期的最大状态计数,并包含分组数据【英文标题】:Count based on the max status on a given date, with grouped data 【发布时间】:2021-12-30 07:33:19 【问题描述】:我的示例是一个票务系统,包含状态更新和创建票证的条目。
小提琴: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=a5ff4600adbab185eb14b08586f1bd29
ID | TICKETID | STATUS | TICKET_CREATED | STATUS_CHANGED |
---|---|---|---|---|
1 | 1 | other_error | 01-JAN-20 | 01-JAN-20 08.00.00 |
2 | 2 | tech_error | 01-JAN-20 | 01-JAN-20 09.00.00 |
3 | 3 | unknown | 01-JAN-20 | 01-JAN-20 09.10.00 |
4 | 4 | unknown | 01-JAN-20 | 01-JAN-20 09.20.00 |
5 | 4 | tech_error | 01-JAN-20 | 02-JAN-20 09.30.00 |
6 | 1 | solved | 01-JAN-20 | 02-JAN-20 10.00.00 |
7 | 2 | solved | 01-JAN-20 | 02-JAN-20 07.00.00 |
8 | 5 | tech_error | 02-JAN-20 | 02-JAN-20 08.00.00 |
9 | 6 | unknown | 02-JAN-20 | 02-JAN-20 08.30.00 |
10 | 6 | solved | 02-JAN-20 | 02-JAN-20 09.30.00 |
11 | 5 | solved | 02-JAN-20 | 03-JAN-20 08.00.00 |
12 | 4 | unknown | 01-JAN-20 | 03-JAN-20 09.00.00 |
我想根据工单创建日期评估数据,获取特定日期的三件事:
-
(完成)在给定日期总共创建了多少张工单
(完成)在给定日期创建了多少张状态为“未知”的工单
(未完成)在给定日期有多少票完全处于“未知”状态?棘手!因为重要的是给定日期午夜以下最大
STATUS_CHANGED
的状态。
01.01.2021 的预期结果:
TICKET_CREATED | Total Created | Tickets created in Unknown status | Total tickets in Unknown status |
---|---|---|---|
01-JAN-20 | 4 | 2 | 2 |
解释:20 年 1 月 1 日,票 3 和 4 在一天结束时处于“未知”状态
02.01.2021 的预期结果:
TICKET_CREATED | Total Created | Tickets created in Unknown status | Total tickets in Unknown status |
---|---|---|---|
02-JAN-20 | 2 | 1 | 1 |
说明:20 年 1 月 2 日,当天结束时只有票 3 处于“未知”状态
第 1 + 2 部分的当前解决方案:
select ticket_created,
count(*) as "Total Created",
sum(case when status = 'unknown' then 1 else 0 end) as "Unknown tickets created",
'?' as "Total tickets in Unknown status"
from myTable
where id in
(select min(id) as id
from myTable
where ticket_created = to_date('01.01.2020', 'DD.MM.YYYY')
group by ticketid)
group by ticket_created
你能给我一些关于如何接近第 3 点的提示吗?
【问题讨论】:
问题。 “未知状态的总票数”是否还需要计算在您想要的日期之前仍然具有“未知”状态的票? (你可能想改变你的样本数据) 【参考方案1】:假设我正确理解了您的逻辑,这就是我将如何实现您的目标:
with ticket_info as (select id,
ticketid,
status,
ticket_created,
status_changed,
row_number() over (partition by ticketid, trunc(status_changed) order by status_changed desc) rn_per_id_day_desc,
row_number() over (partition by ticketid order by status_changed) rn_per_id_asc
from mytable)
select ticket_created,
count(distinct case when trunc(ticket_created) = to_date('01/01/2020', 'dd/mm/yyyy') then ticketid end) as "Total Created",
count(case when rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_day_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
where status_changed >= to_timestamp('01/01/2020', 'dd/mm/yyyy')
and status_changed < to_timestamp('01/01/2020', 'dd/mm/yyyy') + interval '1' day
group by ticket_created;
db<>fiddle
您可以看到,首先,我使用了几个 row_number()
分析函数来为行提供标签 - 一个按照 id 的更改顺序为每个行添加标签(这使我们能够识别每行的第一行) id,即工单创建的行),另一个按降序标记每个 id 和天的行(这使我们能够识别每个 id 当天的最后一行)。
使用该信息,我们可以计算出您的所有三个案例:
-
一天创建的票证 - 这里我使用了不同的计数,但您可以将其更改为
count(case when rn_per_id_asc = 1 then 1 end)
,这样可能更有效且更易于理解。
在某一天创建为“未知”的票证 - 这里我使用了条件计数:如果它是第一行并且状态未知,则计算它
一天结束时处于“未知”状态的票 - 这里我使用了另一个条件计数:如果它是一天的最后一行并且状态未知,则计算它。
ETA:修改了第三部分的逻辑,以计算在一天结束时状态为未知的活动门票,我认为这应该可以解决问题:
with date_of_interest as (select start_date + level -1 dt,
start_date + level next_dt
from (select to_date('01/01/2020', 'dd/mm/yyyy') start_date,
to_date('03/01/2020', 'dd/mm/yyyy') end_date
from dual)
connect by level <= (end_date - start_date) + 1),
ticket_info as (select mt.id,
mt.ticketid,
mt.status,
mt.ticket_created,
mt.status_changed,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed) rn_per_id_asc,
row_number() over (partition by mt.ticketid, doi.dt order by mt.status_changed desc) rn_per_id_desc,
doi.dt,
doi.next_dt
from mytable mt
inner join date_of_interest doi on mt.status_changed < doi.next_dt
)
select dt,
count(case when ticket_created = dt and rn_per_id_asc = 1 then 1 end) as "Total Created",
count(case when ticket_created = dt and rn_per_id_asc = 1 and status = 'unknown' then 1 end) as "Unknown tickets created",
count(case when rn_per_id_desc = 1 and status = 'unknown' then 1 end) as "Total tickets in Unknown status"
from ticket_info
group by dt
order by dt;
您会注意到,我已将查询更新为跨多天运行 - 如果查询一次只运行一个日期,您可以像这样替换 date_of_interest 子查询:
with date_of_interest as (select dt,
dt + 1 next_dt
from (select to_date('03/01/2020', 'dd/mm/yyyy') dt
from dual)),
更新db<>fiddle
注意这不会是最有效的做事方式。随着时间的推移,随着越来越多的记录出现,查询会变慢。如果您能找到一种方法来轻松识别有效票证,尤其是如果您可以在索引中获取该信息,那就更好了。
【讨论】:
您好,感谢您的解决方案,不幸的是它不起作用:(“未知状态的总票数”列应提供所有状态未知的票,直到午夜,而不仅仅是从特定日期开始。那是为什么这么难;(所以从 03.Jan 开始,我想知道目前总共有多少“未知”票。 因此,使用您当前的解决方案,如果我输入日期 02.Jan ,我不会收到正确的结果;(也许让我们完全删除日期并输出所有 ticket_created 日期以使其更简单?:) dbfiddle.uk/… 我添加了一个新的小提琴来演示这个问题。 “状态未知的工单总数”没有提供正确的结果 @TeaCup 好的,所以您想知道在某一天,当天结束时有多少处于未知状态的有效票? 当然有可能!我已经相应地编辑了我的答案。【参考方案2】:这是一个单独计算第三个指标的解决方案。 然后将它们加入您已经知道的指标。
with cte_ranges as ( select id, status, ticketid, ticket_created , status_changed as started , coalesce( lead(status_changed) over (partition by ticketid order by status_changed) , current_timestamp) as ended from myTable where trunc(ticket_created) between DATE'2020-01-01' and DATE'2020-01-02' ) select q.ticket_date as "Ticket Created" , q.total_tickets as "Total Created" , q.total_unknown as "Unknown tickets created" , endofday.total_unknown "Total tickets in Unknown status" from ( select trunc(t.ticket_created) as ticket_date , count(distinct t.ticketid) as total_tickets , count(distinct case when t.status = 'unknown' then t.ticketid end) as total_unknown from cte_ranges t group by trunc(t.ticket_created) ) q left join ( select trunc(cast(dt as date)) as ticket_date , count(distinct case when status = 'unknown' then ticketid end) as total_unknown from cte_ranges join ( select distinct cast(trunc(ticket_created)+1 as timestamp) - interval '1' second as dt from cte_ranges ) cutoff on dt between started and ended group by cast(dt as date) ) endofday on endofday.ticket_date = q.ticket_date;
Ticket Created | Total Created | Unknown tickets created | Total tickets in Unknown status |
---|---|---|---|
01-JAN-20 | 4 | 2 | 2 |
02-JAN-20 | 2 | 1 | 1 |
db小提琴here
诀窍是首先使用LEAD
来计算状态处于活动状态的范围。
然后将截止时间(一天的最后一分钟)加入这些范围。 这样您就可以获得状态仍然有效的天数。
两个子查询都使用 CTE。 这样您只需要更改 CTE 中的日期标准。
【讨论】:
【参考方案3】:解决方法很简单:
使用LEAD
函数查找每个状态更改的结束日期
例如,4, unknown, 01-JAN-20 09.20.00
的结束日期为 02-JAN-20 09.30.00
测试此日期范围是否与指定的日期范围相交
要检查02-JAN-20
,您实际上将使用范围[02-JAN-20, 03-JAN-20)
注意它与[01-JAN-20 09.20.00, 02-JAN-20 09.30.00)
的某些部分相交
检查日期范围是否相交is trivial
以下查询说明了上述逻辑。它使用一组日期而不是一个特定的日期:
WITH calendar(calendar_date) AS (
SELECT DATE'2020-01-01' FROM DUAL UNION ALL
SELECT DATE'2020-01-02' FROM DUAL UNION ALL
SELECT DATE'2020-01-03' FROM DUAL UNION ALL
SELECT DATE'2020-01-04' FROM DUAL UNION ALL
SELECT DATE'2020-01-05' FROM DUAL
), tablecopy AS (
SELECT mytable.*
, LEAD(status_changed, 1, DATE'9999-12-31') OVER (PARTITION BY ticketid ORDER BY status_changed) AS next_changed
FROM mytable
)
SELECT calendar.calendar_date
, COUNT(DISTINCT CASE WHEN tablecopy.ticket_created = calendar.calendar_date THEN ticketid END) AS "Created on this date"
, COUNT(DISTINCT CASE WHEN tablecopy.ticket_created = calendar.calendar_date AND tablecopy.status = 'unknown' THEN ticketid END) AS "Created unknown on this date"
, COUNT(DISTINCT CASE WHEN tablecopy.status = 'unknown' THEN ticketid END) AS "Total unknown through this date"
FROM calendar
LEFT JOIN tablecopy ON tablecopy.ticket_created = calendar.calendar_date OR (
tablecopy.status = 'unknown' AND
tablecopy.status_changed < calendar.calendar_date + INTERVAL '1' DAY AND
tablecopy.next_changed > calendar.calendar_date
)
GROUP BY calendar.calendar_date
ORDER BY calendar.calendar_date
DB<>Fiddle for the above queryDB<>Fiddle for a variation that checks one date
请注意,票证 #3 和 4 的最后状态 = 未知,因此这些票证将无限期保持在该状态。
【讨论】:
以上是关于根据给定日期的最大状态计数,并包含分组数据的主要内容,如果未能解决你的问题,请参考以下文章
NSFetchRequest 分组并按 dateField 计数