使用 Oracle 查找任意日期范围内的行数
Posted
技术标签:
【中文标题】使用 Oracle 查找任意日期范围内的行数【英文标题】:Finding a count of rows in an arbitrary date range using Oracle 【发布时间】:2011-09-16 21:00:42 【问题描述】:我需要回答的问题是“我们在 60 分钟内收到的最大页面请求数是多少?”
我有一张类似这样的表格:
date_page_requested date;
page varchar(80);
我正在寻找任何 60 分钟时间片中的最大行数。
我认为分析函数可能会让我到达那里,但到目前为止我还处于空白状态。
我希望有一个指向正确方向的指针。
【问题讨论】:
60 分钟周期的粒度是多少?如果页面请求发生在“2011-09-15 08:30:59.535”,那么 60 分钟的时间段应该在“08”、“30”、“59”还是“535”(小时、分、秒或毫秒)? 【参考方案1】:您在答案中有一些可行的选项,这里有一个使用 Oracle 的“Windowing Functions with Logical Offset”功能而不是连接或相关子查询的选项。
首先是测试表:
Wrote file afiedt.buf
1 create table t pctfree 0 nologging as
2 select date '2011-09-15' + level / (24 * 4) as date_page_requested
3 from dual
4* connect by level <= (24 * 4)
SQL> /
Table created.
SQL> insert into t values (to_date('2011-09-15 11:11:11', 'YYYY-MM-DD HH24:Mi:SS'));
1 row created.
SQL> commit;
Commit complete.
T 现在在一天中每一刻钟包含一行,并在上午 11:11:11 增加一行。查询分三个步骤进行。第 1 步是,对于每一行,获取在该行时间之后的下一小时内出现的行数:
1 with x as (select date_page_requested
2 , count(*) over (order by date_page_requested
3 range between current row
4 and interval '1' hour following) as hour_count
5 from t)
然后按小时计数分配排序:
6 , y as (select date_page_requested
7 , hour_count
8 , row_number() over (order by hour_count desc, date_page_requested asc) as rn
9 from x)
最后选择后面行数最多的最早行。
10 select to_char(date_page_requested, 'YYYY-MM-DD HH24:Mi:SS')
11 , hour_count
12 from y
13* where rn = 1
如果多个 60 分钟窗口在小时计数中并列,则以上只会给您第一个窗口。
【讨论】:
非常感谢。其中一些产生了“正确”的结果,但这个产生的结果非常快,这意味着我将能够非常成功地调整它以在更大的时间范围内运行。【参考方案2】:这应该给你你需要的,返回的第一行应该有 页数最多的小时。
select number_of_pages
,hour_requested
from (select to_char(date_page_requested,'dd/mm/yyyy hh') hour_requested
,count(*) number_of_pages
from pages
group by to_char(date_page_requested,'dd/mm/yyyy hh')) p
order by number_of_pages
【讨论】:
如果 60 分钟的时间范围都从整点开始(例如 15:00)并在整点结束(例如 15:59),那将非常有效,但是什么如果 60 分钟的最大活动时间跨度是 22:29 到 23:28?【参考方案3】:这样的事情怎么样?
SELECT TOP 1
ranges.date_start,
COUNT(data.page) AS Tally
FROM (SELECT DISTINCT
date_page_requested AS date_start,
DATEADD(HOUR,1,date_page_requested) AS date_end
FROM @Table) ranges
JOIN @Table data
ON data.date_page_requested >= ranges.date_start
AND data.date_page_requested < ranges.date_end
GROUP BY ranges.date_start
ORDER BY Tally DESC
【讨论】:
请注意,这是 SQL Server (T-SQL),但您应该明白。 不错。添加SELECT TOP 1
和ORDER BY Tally DESC
只会给出最佳答案。【参考方案4】:
对于 PostgreSQL,我首先可能会为按分钟对齐的“窗口”编写类似的内容。您不需要 OLAP 窗口函数。
select w.ts,
date_trunc('minute', w.ts) as hour_start,
date_trunc('minute', w.ts) + interval '1' hour as hour_end,
(select count(*)
from weblog
where ts between date_trunc('minute', w.ts) and
(date_trunc('minute', w.ts) + interval '1' hour) ) as num_pages
from weblog w
group by ts, hour_start, hour_end
order by num_pages desc
Oracle 也有一个 trunc() 函数,但我不确定格式。我一会儿去查一下,或者去看朋友的滑稽表演。
【讨论】:
【参考方案5】:WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
@Table
)
SELECT
a.StartDate AS StartDate,
MAX(b.RowNo) - a.RowNo + 1 AS Tally
FROM
ranges a
JOIN
ranges b
ON a.StartDate <= b.StartDate
AND b.StartDate < a.EndDate
GROUP BY a.StartDate
, a.RowNo
ORDER BY Tally DESC
或:
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
@Table
)
SELECT
a.StartDate AS StartDate,
( SELECT MIN(b.RowNo) - a.RowNo
FROM ranges b
WHERE b.StartDate > a.EndDate
) AS Tally
FROM
ranges a
ORDER BY Tally DESC
【讨论】:
以上是关于使用 Oracle 查找任意日期范围内的行数的主要内容,如果未能解决你的问题,请参考以下文章