如何对此查询进行性能调整

Posted

技术标签:

【中文标题】如何对此查询进行性能调整【英文标题】:How to do a performance tuning on this query 【发布时间】:2020-12-17 03:58:06 【问题描述】:

我有以下查询需要很长时间(大约 2 小时)才能执行:

CREATE TABLE #compareList  
(
     id  INT IDENTITY(1,1), 
     poy_no varchar(max), 
     poy_stat_cd varchar(max), 
     poy_eff_dd datetime, 
     poy_exp_dd datetime,   
     [Name] [nvarchar] (max)
);
    
DECLARE @poy_no varchar(max), @poy_stat_cd varchar(max),
        @poy_eff_dd datetime, @poy_exp_dd datetime, @remarks nvarchar(max)

DECLARE C_Compare CURSOR STATIC FOR
    SELECT b.poy_no, b.poy_stat_cd, b.poy_eff_dd, b.poy_exp_dd, a.remarks    
    FROM table1 a  

OPEN C_Compare

FETCH NEXT FROM C_Compare 
    INTO @poy_no, @poy_stat_cd, @poy_eff_dd, @poy_exp_dd, @remarks

WHILE @@FETCH_STATUS = 0
BEGIN
    INSERT INTO #compareList 
        SELECT @poy_no, @poy_stat_cd, @poy_eff_dd, @poy_exp_dd, @remarks 

    FETCH NEXT FROM C_Compare 
        INTO @poy_no, @poy_stat_cd, @poy_eff_dd, @poy_exp_dd, @remarks
END

CLOSE C_Compare;
DEALLOCATE C_Compare;

-- This query has performance issue
SELECT
    COUNT(1)
FROM 
    #compareList a,
    (SELECT
          pid, single_string_name, original_script_name, 
          surname, first_name, middle_name 
      FROM 
          DJ_PERSON WITH (INDEX (NCIndex_all_needed_columns))) AS p,
    (SELECT pid, desc1 FROM PERSON_DESC) AS pd,
    DESC1 AS d
WHERE  
    p.pid = pd.pid
    AND pd.desc1 = d.d1id
    AND replace(replace(replace(rtrim(ltrim(a.name)), ' ',''), ',',''), '.','') != ''
    AND (replace(replace(replace(a.Name, ' ',''), ',',''), '.','') = replace(replace(replace(p.single_string_name, ' ',''), ',',''), '.','')
        COLLATE database_default
        OR replace(replace(replace(a.Name, ' ',''), ',',''), '.','') = replace(replace(replace(p.original_script_name, ' ',''), ',',''), '.','')
        COLLATE database_default
        OR
         replace(replace(replace(a.Name, ' ',''), ',',''), '.','') = replace(replace(replace(p.surname+p.first_name+p.middle_name, ' ',''), ',',''), '.','')
        ) 

下面是每个表的行数。 PERSONPERSON_DESC 表的行数很高。

人 - 4638768

PERSON_DESC - 2040027

#compareList - 26

我尝试在表PERSONPERSON_DESC 上应用聚集索引和非聚集索引。

在桌子上PERSON 我在pid, single_string_name, original_script_name, surname, first_name, middle_name 上应用了索引

PERSON_DESC 表上,我在pid, desc1 上应用了索引。

下面是统计参数

Table '#compareList________________________________________________________________________________________________________0000000001C5'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 8055799, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 16, logical reads 43232, physical reads 5431, read-ahead reads 42753, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PERSON'. Scan count 1, logical reads 42966, physical reads 1, read-ahead reads 10440, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DESC'. Scan count 1, logical reads 7060, physical reads 1, read-ahead reads 7054, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DESC1'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

我可以进行哪些更改来缩短此查询的执行时间?

【问题讨论】:

Bad habits to kick : using old-style JOINs - 旧式 逗号分隔的表格列表 样式已替换为 ANSI 中的 proper ANSI JOIN 语法-92 SQL 标准(25 多年前),不鼓励使用它 为什么要使用游标对表执行简单的插入操作?原始查询中是否缺少某些内容? 就像@marc_s 说的,修复那些旧式的JOIN。您还在那些经过大量修改的列上执行 JOIN,这将杀死您拥有的所有索引。 如果查询没有严重错误...运行需要 2 个小时,要么您的存储速度非常慢,要么您在土豆上运行它,因为基于根据您的统计数据,您正在阅读约 62GB 并在 2 小时内处理完毕。 @DaleK:嗯,你可以把土豆切成所谓的“薯条”,数据库在这些东西上运行得很好。 【参考方案1】:

您有一个严重的可搜索性问题,即由于您在 where 子句中进行的所有函数调用。所以很少,如果有的话,索引将被使用。我有一些建议。


首先,如果您有任何方法可以限制在调用任何函数之前需要测试的记录,请执行此操作,将结果放入临时表中,然后针对它运行基于 where 子句的函数。比如:

select columns, compute columns that we can compute here (should be one side of the compare)
into #MyTempTable
from MyTable
where my saragable conditions;

-- Potentially add some indexes to the temp table computed columns

select columns
from #MyTempTable
where my unsaragable conditions;

其次,ORing 多个条件是一个众所周知的性能问题。这可以通过UNION ALL 解决,例如

SELECT your query
WHERE p.pid = pd.pid
AND pd.desc1 = d.d1id
AND replace(replace(replace(rtrim(ltrim(a.[Name])), ' ',''), ',',''), '.','') != ''
AND replace(replace(replace(a.[Name], ' ',''), ',',''), '.','') = replace(replace(replace(p.single_string_name, ' ',''), ',',''), '.','') COLLATE database_default

UNION ALL

SELECT your query
WHERE p.pid = pd.pid
AND pd.desc1 = d.d1id
AND replace(replace(replace(rtrim(ltrim(a.[Name])), ' ',''), ',',''), '.','') != ''
AND replace(replace(replace(a.[Name], ' ',''), ',',''), '.','') = replace(replace(replace(p.original_script_name, ' ',''), ',',''), '.','') COLLATE database_default

UNION ALL

SELECT your query
WHERE p.pid = pd.pid
AND pd.desc1 = d.d1id
AND replace(replace(replace(rtrim(ltrim(a.[Name])), ' ',''), ',',''), '.','') != ''
AND replace(replace(replace(a.[Name], ' ',''), ',',''), '.','') = replace(replace(replace(p.surname+p.first_name+p.middle_name, ' ',''), ',',''), '.','');

第三,前两个建议对您没有帮助,您可能需要考虑具体化您在where 子句中使用的数据。我的意思是作为一个例子:

replace(replace(replace(p.single_string_name, ' ',''), ',',''), '.','') COLLATE database_default

并将该值存储在表p 的新列中,然后您可以对其进行索引。您可能必须编写触发器来保持它的维护。

也就是说,鉴于您的部分数据已经在临时表中,#compareList,您应该直接将比较值存储在临时表中,即添加另一列存储:

replace(replace(replace(rtrim(ltrim(a.[Name])), ' ',''), ',',''), '.','')

然后可能索引它。

【讨论】:

谢谢戴尔。是的,它确实有帮助。

以上是关于如何对此查询进行性能调整的主要内容,如果未能解决你的问题,请参考以下文章

调整此查询的性能

针对 DELETE 查询的 MySQL 性能调整

如何在 ansible playbook 性能调整中减少 time.sleep 时间

性能调整实体框架查询

在 Snowflake 中,调整现有仓库的大小是不是有助于提高正在运行的查询的性能?

MySQL关于财政年度的查询性能调整