如何使用似乎忽略索引的窗口函数提高查询的性能?
Posted
技术标签:
【中文标题】如何使用似乎忽略索引的窗口函数提高查询的性能?【英文标题】:How can I improve performance of queries with window-functions, that seem to ignore indexes? 【发布时间】:2019-12-11 16:56:31 【问题描述】:或者,是否需要创建其他索引? 或者,我可以从最终查询中消除自联接吗?
我有一个关于窗口函数的查询,我需要让它顺利运行。
我可以删除窗口函数,然后切换到 GROUP BY
- 但我认为这会更慢?
此查询用于视图内部,用于频繁提取的外部报告。 换句话说,最终用户会频繁执行此查询,在某些情况下,对于大型报表会执行多次,并且最终用户会直接受到此查询执行时间的影响。 目前,查询按原样执行非常快 - 但稍后,我必须将此查询自行加入到自身中,并且它会慢到爬行。
基础表在引用的每一列上都定义了一个索引。
但是当我EXPLAIN
这个查询时,它没有使用任何索引,而是执行全表扫描。
该表目前有 28,000 行,但随着时间的推移会增加(每天大约 10,000 行)。
EDGE_VP
、EDGE_RM
和 EDGE_ASM
列包含电子邮件地址 - 字符串函数删除域。
SELECT DISTINCT SS_TIMESTAMP,
CASE WHEN INSTR(EDGE_VP,'@oracle.com')=0 THEN EDGE_VP ELSE SUBSTR(EDGE_VP,1,INSTR(EDGE_VP,'@oracle.com')-1) END AS EDGE_VP,
CASE WHEN INSTR(EDGE_RM,'@oracle.com')=0 THEN EDGE_RM ELSE SUBSTR(EDGE_RM,1,INSTR(EDGE_RM,'@oracle.com')-1) END AS EDGE_RM,
CASE WHEN INSTR(EDGE_ASM,'@oracle.com')=0 THEN EDGE_ASM ELSE SUBSTR(EDGE_ASM,1,INSTR(EDGE_ASM,'@oracle.com')-1) END AS EDGE_ASM,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_PIPELINE END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS PIPELINE,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_BEST END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS BEST,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_FORECAST END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS FORECAST,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Won' THEN ARR END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS CLOSED,
COUNT(*) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM) AS ROW_COUNT
FROM SS_EDGE_FORECAST
WHERE EDGE_ASM NOT IN('Email_Address1', 'Email_Address2')
点击图片放大:
这是我稍后使用的扩展查询。请注意顶部的查询,在 WITH
子句中。
在此查询结束时,我目前正在使用 WITH
查询的自联接 - 也许我也应该在这里使用窗口函数?
我无法告诉你这需要多长时间来执行,因为当我尝试运行它时它会挂起,而且我没有耐心等待几个小时等等。
CREATE OR REPLACE FORCE VIEW "EDGE_FORECAST_OVER_TIME" AS
WITH basequery AS (SELECT DISTINCT SS_TIMESTAMP,
CASE WHEN INSTR(EDGE_VP,'@oracle.com')=0 THEN EDGE_VP ELSE SUBSTR(EDGE_VP,1,INSTR(EDGE_VP,'@oracle.com')-1) END AS EDGE_VP,
CASE WHEN INSTR(EDGE_RM,'@oracle.com')=0 THEN EDGE_RM ELSE SUBSTR(EDGE_RM,1,INSTR(EDGE_RM,'@oracle.com')-1) END AS EDGE_RM,
CASE WHEN INSTR(EDGE_ASM,'@oracle.com')=0 THEN EDGE_ASM ELSE SUBSTR(EDGE_ASM,1,INSTR(EDGE_ASM,'@oracle.com')-1) END AS EDGE_ASM,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_PIPELINE END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS PIPELINE,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_BEST END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS BEST,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Open' THEN ARR_FORECAST END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS FORECAST,
NVL(SUM(CASE WHEN OPPTY_STATUS = 'Won' THEN ARR END) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM),0) AS CLOSED,
COUNT(*) OVER (PARTITION BY SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM) AS ROW_COUNT
FROM SS_EDGE_FORECAST
WHERE EDGE_ASM NOT IN('Email_Address1', 'Email_Address2'))
SELECT ss.TIMESTAMP,
ss.TIMESTAMP_DATE,
ss.FREQUENCY,
ss.PREV_TIMESTAMP,
ss.PREV_F_TIMESTAMP,
ss.PREV_H_TIMESTAMP,
ss.PREV_D_TIMESTAMP,
ss.PREV_W_TIMESTAMP,
ss.PREV_M_TIMESTAMP,
ss.PREV_Q_TIMESTAMP,
ss.PREV_Y_TIMESTAMP,
ss.PREV_TIMESTAMP_DATE,
ss.PREV_F_TIMESTAMP_DATE,
ss.PREV_H_TIMESTAMP_DATE,
ss.PREV_D_TIMESTAMP_DATE,
ss.PREV_W_TIMESTAMP_DATE,
ss.PREV_M_TIMESTAMP_DATE,
ss.PREV_Q_TIMESTAMP_DATE,
ss.PREV_Y_TIMESTAMP_DATE,
ss.DAYS_SINCE_PREV_TIMESTAMP,
ss.DAYS_SINCE_PREV_F_TIMESTAMP,
ss.DAYS_SINCE_PREV_H_TIMESTAMP,
ss.DAYS_SINCE_PREV_D_TIMESTAMP,
ss.DAYS_SINCE_PREV_W_TIMESTAMP,
ss.DAYS_SINCE_PREV_M_TIMESTAMP,
ss.DAYS_SINCE_PREV_Q_TIMESTAMP,
ss.DAYS_SINCE_PREV_Y_TIMESTAMP,
ss.DAYS_SINCE_PREV_TS_DATE,
ss.DAYS_SINCE_PREV_F_TS_DATE,
ss.DAYS_SINCE_PREV_H_TS_DATE,
ss.DAYS_SINCE_PREV_D_TS_DATE,
ss.DAYS_SINCE_PREV_W_TS_DATE,
ss.DAYS_SINCE_PREV_M_TS_DATE,
ss.DAYS_SINCE_PREV_Q_TS_DATE,
ss.DAYS_SINCE_PREV_Y_TS_DATE,
bq.EDGE_VP,
bq.EDGE_RM,
bq.EDGE_ASM,
bq.PIPELINE,
bq.BEST,
bq.FORECAST,
bq.CLOSED,
bq.PIPELINE + bq.BEST AS PIPE_BEST,
bq.CLOSED + bq.FORECAST AS CLOSED_FORECAST,
bqp.PIPELINE AS PREV_PIPELINE,
bqp.BEST AS PREV_BEST,
bqp.FORECAST AS PREV_FORECAST,
bqp.CLOSED AS PREV_CLOSED,
bqp.PIPELINE + bqp.BEST AS PREV_PIPE_BEST,
bqp.CLOSED + bqp.FORECAST AS PREV_CLOSED_FORECAST,
bq.PIPELINE - bqp.PIPELINE AS PIPELINE_DIFF,
bq.BEST - bqp.BEST AS BEST_DIFF,
bq.FORECAST - bqp.FORECAST AS FORECAST_DIFF,
bq.CLOSED - bqp.CLOSED AS CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqp.PIPELINE + bqp.BEST) AS PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqp.CLOSED + bqp.FORECAST) AS CLOSED_FORECAST_DIFF,
bqpf.PIPELINE AS PREV_F_PIPELINE,
bqpf.BEST AS PREV_F_BEST,
bqpf.FORECAST AS PREV_F_FORECAST,
bqpf.CLOSED AS PREV_F_CLOSED,
bqpf.PIPELINE + bqpf.BEST AS PREV_F_PIPE_BEST,
bqpf.CLOSED + bqpf.FORECAST AS PREV_F_CLOSED_FORECAST,
bq.PIPELINE - bqpf.PIPELINE AS F_PIPELINE_DIFF,
bq.BEST - bqpf.BEST AS F_BEST_DIFF,
bq.FORECAST - bqpf.FORECAST AS F_FORECAST_DIFF,
bq.CLOSED - bqpf.CLOSED AS F_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpf.PIPELINE + bqpf.BEST) AS F_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpf.CLOSED + bqpf.FORECAST) AS F_CLOSED_FORECAST_DIFF,
bqph.PIPELINE AS PREV_H_PIPELINE,
bqph.BEST AS PREV_H_BEST,
bqph.FORECAST AS PREV_H_FORECAST,
bqph.CLOSED AS PREV_H_CLOSED,
bqph.PIPELINE + bqph.BEST AS PREV_H_PIPE_BEST,
bqph.CLOSED + bqph.FORECAST AS PREV_H_CLOSED_FORECAST,
bq.PIPELINE - bqph.PIPELINE AS H_PIPELINE_DIFF,
bq.BEST - bqph.BEST AS H_BEST_DIFF,
bq.FORECAST - bqph.FORECAST AS H_FORECAST_DIFF,
bq.CLOSED - bqph.CLOSED AS H_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqph.PIPELINE + bqph.BEST) AS H_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqph.CLOSED + bqph.FORECAST) AS H_CLOSED_FORECAST_DIFF,
bqpd.PIPELINE AS PREV_D_PIPELINE,
bqpd.BEST AS PREV_D_BEST,
bqpd.FORECAST AS PREV_D_FORECAST,
bqpd.CLOSED AS PREV_D_CLOSED,
bqpd.PIPELINE + bqpd.BEST AS PREV_D_PIPE_BEST,
bqpd.CLOSED + bqpd.FORECAST AS PREV_D_CLOSED_FORECAST,
bq.PIPELINE - bqpd.PIPELINE AS D_PIPELINE_DIFF,
bq.BEST - bqpd.BEST AS D_BEST_DIFF,
bq.FORECAST - bqpd.FORECAST AS D_FORECAST_DIFF,
bq.CLOSED - bqpd.CLOSED AS D_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpd.PIPELINE + bqpd.BEST) AS D_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpd.CLOSED + bqpd.FORECAST) AS D_CLOSED_FORECAST_DIFF,
bqpw.PIPELINE AS PREV_W_PIPELINE,
bqpw.BEST AS PREV_W_BEST,
bqpw.FORECAST AS PREV_W_FORECAST,
bqpw.CLOSED AS PREV_W_CLOSED,
bqpw.PIPELINE + bqpw.BEST AS PREV_W_PIPE_BEST,
bqpw.CLOSED + bqpw.FORECAST AS PREV_W_CLOSED_FORECAST,
bq.PIPELINE - bqpw.PIPELINE AS W_PIPELINE_DIFF,
bq.BEST - bqpw.BEST AS W_BEST_DIFF,
bq.FORECAST - bqpw.FORECAST AS W_FORECAST_DIFF,
bq.CLOSED - bqpw.CLOSED AS W_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpw.PIPELINE + bqpw.BEST) AS W_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpw.CLOSED + bqpw.FORECAST) AS W_CLOSED_FORECAST_DIFF,
bqpm.PIPELINE AS PREV_M_PIPELINE,
bqpm.BEST AS PREV_M_BEST,
bqpm.FORECAST AS PREV_M_FORECAST,
bqpm.CLOSED AS PREV_M_CLOSED,
bqpm.PIPELINE + bqpm.BEST AS PREV_M_PIPE_BEST,
bqpm.CLOSED + bqpm.FORECAST AS PREV_M_CLOSED_FORECAST,
bq.PIPELINE - bqpm.PIPELINE AS M_PIPELINE_DIFF,
bq.BEST - bqpm.BEST AS M_BEST_DIFF,
bq.FORECAST - bqpm.FORECAST AS M_FORECAST_DIFF,
bq.CLOSED - bqpm.CLOSED AS M_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpm.PIPELINE + bqpm.BEST) AS M_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpm.CLOSED + bqpm.FORECAST) AS M_CLOSED_FORECAST_DIFF,
bqpq.PIPELINE AS PREV_Q_PIPELINE,
bqpq.BEST AS PREV_Q_BEST,
bqpq.FORECAST AS PREV_Q_FORECAST,
bqpq.CLOSED AS PREV_Q_CLOSED,
bqpq.PIPELINE + bqpq.BEST AS PREV_Q_PIPE_BEST,
bqpq.CLOSED + bqpq.FORECAST AS PREV_Q_CLOSED_FORECAST,
bq.PIPELINE - bqpq.PIPELINE AS Q_PIPELINE_DIFF,
bq.BEST - bqpq.BEST AS Q_BEST_DIFF,
bq.FORECAST - bqpq.FORECAST AS Q_FORECAST_DIFF,
bq.CLOSED - bqpq.CLOSED AS Q_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpq.PIPELINE + bqpq.BEST) AS Q_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpq.CLOSED + bqpq.FORECAST) AS Q_CLOSED_FORECAST_DIFF,
bqpy.PIPELINE AS PREV_Y_PIPELINE,
bqpy.BEST AS PREV_Y_BEST,
bqpy.FORECAST AS PREV_Y_FORECAST,
bqpy.CLOSED AS PREV_Y_CLOSED,
bqpy.PIPELINE + bqpy.BEST AS PREV_Y_PIPE_BEST,
bqpy.CLOSED + bqpy.FORECAST AS PREV_Y_CLOSED_FORECAST,
bq.PIPELINE - bqpy.PIPELINE AS Y_PIPELINE_DIFF,
bq.BEST - bqpy.BEST AS Y_BEST_DIFF,
bq.FORECAST - bqpy.FORECAST AS Y_FORECAST_DIFF,
bq.CLOSED - bqpy.CLOSED AS Y_CLOSED_DIFF,
(bq.PIPELINE + bq.BEST) - (bqpy.PIPELINE + bqpy.BEST) AS Y_PIPE_BEST_DIFF,
(bq.CLOSED + bq.FORECAST) - (bqpy.CLOSED + bqpy.FORECAST) AS Y_CLOSED_FORECAST_DIFF,
bq.ROW_COUNT,
bqp.ROW_COUNT AS PREV_ROW_COUNT,
bq.ROW_COUNT - bqp.ROW_COUNT AS NET_ROWS_ADDED
FROM basequery bq
LEFT JOIN SNAPSHOTS ss ON ss.TIMESTAMP = bq.SS_TIMESTAMP AND ss.TABLE_NAME = 'EDGE_FORECAST'
LEFT JOIN basequery bqp ON bqp.SS_TIMESTAMP = ss.PREV_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpf ON bqp.SS_TIMESTAMP = ss.PREV_F_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqph ON bqp.SS_TIMESTAMP = ss.PREV_H_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpd ON bqp.SS_TIMESTAMP = ss.PREV_D_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpw ON bqp.SS_TIMESTAMP = ss.PREV_W_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpm ON bqp.SS_TIMESTAMP = ss.PREV_M_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpq ON bqp.SS_TIMESTAMP = ss.PREV_Q_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpy ON bqp.SS_TIMESTAMP = ss.PREV_Y_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
ORDER BY ss.TIMESTAMP DESC,
bq.EDGE_VP ASC,
bq.EDGE_RM ASC,
bq.EDGE_ASM ASC
【问题讨论】:
请解释为什么投票接近。 忽略关闭投票。有些人错误地认为所有复杂的 SQL 问题都属于 DBA 站点。 【参考方案1】:对于您的第一个查询(select distinct
),您需要一个索引:SS_EDGE_FORECAST(SS_TIMESTAMP, EDGE_VP, EDGE_RM, EDGE_ASM)
。这应该有助于分析功能。四个单独的列不会有太大帮助。
您可以在索引中包含表达式中使用的其他列以及后面的键。
【讨论】:
是的,在发布这个问题之前我也有同样的想法。我创建了那个索引,但它仍然没有被使用。您可以在底部的第二张屏幕截图中看到它。 哦! 也许我还需要该索引中的 ID 列。该表有一个复合主键,有两列。 (ID 和 SS_TIMESTAMP) 我弄清楚了为什么没有使用索引...我以一种可怕的方式错字了我的 LEFT JOINS,但也很难注意到。详情见我的回答。【参考方案2】:正如@Gordon 写的那样,在每列上都有索引对你来说很有意义。 Oracle 不会使用多个索引来访问一个表(当您有多个 OR 谓词时除外)。所以你的选择是有一个多列索引。
还有一些索引不能使用 - 例如因为 NULL 值没有在单列索引中建立索引。
尝试使用:
alter session set index_cost_adj=1;
这将降低在会话中使用索引的成本。如果在这种情况下索引没有被使用,那么它可能不能被使用。
【讨论】:
【参考方案3】:我通过以下方式解决了这个问题:
a) 将较小的子查询移动到具有约束和索引的物化视图中。 (这是数据仓库快照上报操作,每天凌晨2:00更新表格,不用担心白天不断的“重建”性能。)
b) 修复我的左连接中的复制/粘贴失败 天哪......当我终于注意到这个疏忽时,我想打自己的脸。
LEFT JOIN basequery bqp ON bqp.SS_TIMESTAMP = ss.PREV_TIMESTAMP
AND bqp.EDGE_VP = bq.EDGE_VP
AND bqp.EDGE_RM = bq.EDGE_RM
AND bqp.EDGE_ASM = bq.EDGE_ASM
LEFT JOIN basequery bqpf ON bqp.SS_TIMESTAMP = ss.PREV_F_TIMESTAMP -- problem here
AND bqp.EDGE_VP = bq.EDGE_VP -- should be
AND bqp.EDGE_RM = bq.EDGE_RM -- bqpf.EDGE_RM etc.
AND bqp.EDGE_ASM = bq.EDGE_ASM
我的十几个 LEFT JOIN 中的每一个都有这个问题。 难怪每当我测试它时,服务器都会因为这个查询而窒息。
在修复 LEFT JOIN 拼写错误并使用一些有用的物化视图进行强化后,查询在 0.1 秒内执行 - 这对我来说非常令人兴奋,考虑到这个查询会做多少繁重的工作,以及它会有多大用处.
【讨论】:
以上是关于如何使用似乎忽略索引的窗口函数提高查询的性能?的主要内容,如果未能解决你的问题,请参考以下文章