调整查询以解析 SQL Server 2014 上的 XML 数据

Posted

技术标签:

【中文标题】调整查询以解析 SQL Server 2014 上的 XML 数据【英文标题】:Tuning a query to parse XML data on SQL Server 2014 【发布时间】:2016-12-07 18:30:22 【问题描述】:

我在 SQL Server 2014 数据库上有一个表,该表将审计信息存储在 VARCHAR(MAX) 列(穷人的 CDC)中的记录更改。

此数据格式如下:

<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />
...

我需要解析该信息以检索转置的数据,使其看起来像这样:

Record    FieldName      OldValue   NewValue
------    ---------      --------   --------
1234      Assigned To    user1      user2
1234      Status         QA         Development
1234      Progress       Yes        No

存储过程尝试通过将数据转换为 XML,然后使用 XPath 检索必要的部分来做到这一点:

;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM dbo.Trans
   WHERE TransDate >= '11/1/2016'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

这里是执行计划:https://www.brentozar.com/pastetheplan/?id=rJF2GRB7g

对应的IO统计:

Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.

此查询速度非常慢(该示例仅针对 10 天的数据),并且随着数据的增加而变得越来越慢。

调整此查询的选项有哪些?

【问题讨论】:

您只需要TransDescription LIKE '%Status%',它包括LIKE '%Ticket rebooted...Status%' 的集合。但不要认为它会在性能方面做得很好 @HoneyBadger 是的,领先的 '%' 使这变得不可回避,因此性能已经因此受到影响。不过,根据 IO 统计数据,受到重创的是“工作台”。 您真正需要加快速度的是一些 xml 索引。但是,由于您正在动态创建 XML,因此不会发生这种情况。实际上,这大致相当于 CROSS JOIN,并且随着时间的推移会以指数方式变慢。 请参阅***.com/questions/24196516/… 了解详细讨论以及索引如何提供帮助。如果您想通过 XML 执行此操作,您确实需要存储 XML,以便您可以索引 XML。 @LaughingVergil 首先将 CTE 转储到临时表,然后分解 XML 确实可以解决问题。谢谢!您可以将其发布为答案以便我接受吗? 【参考方案1】:

您真正需要加快速度的是一些 xml 索引。但是,由于您正在动态创建 XML,因此不会发生这种情况。实际上,这大致相当于 CROSS JOIN,并且随着时间的推移会呈指数级增长。

请参阅cross apply xml query performs exponentially worse as xml document grows 了解详细讨论以及索引如何提供帮助。如果您想通过 XML 执行此操作,您确实需要存储 XML,以便您可以索引 XML。

【讨论】:

【参考方案2】:

CROSS JOIN 是不能很好扩展的事情之一,随着表变大,“嵌套循环”上的“执行次数”呈指数增长。 在您提交的执行计划中,每个循环的数字都超过 60 万。您的逻辑读取确实很低,但页面会一遍又一遍地处理。 (如果您的查询大小溢出缓冲区大小并假脱机到磁盘上,那么您将受到真正的伤害。

这是一个允许您利用 XML 索引的解决方案,它可能会对您的情况有所帮助。

--PREPARE SAMPLE DATA
DROP TABLE #Trans
CREATE TABLE #Trans(TransID INT
                    ,TransDate DATE
                    ,TransDescription VARBINARY(MAX)
                    )
INSERT INTO #Trans VALUES
(
1, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(2, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(3, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))

---------------------------------------------------------------------------------------------------
--RUN BELOW THIS LINE COLLECTIVELY, THE ORIGINAL QUERY IS SHOWING UP WITH APPROX 93% OR OVERALL COST

--BUILD A TEMP TABLE TO RECIEVE XML FORMATTED DATA
DROP TABLE #XmlData
CREATE TABLE #XmlData (
    TransId INT NOT NULL,
    TransXml xml NOT NULL,
CONSTRAINT [PK_XmlData] PRIMARY KEY CLUSTERED (TransId)
) 

--INSERT DATA INTO XML TABLE
INSERT INTO #XmlData
SELECT TransId,
    CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans
WHERE TransDate >= '11/1/2015'
    AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%')

--CREATE AN XML INDEX
CREATE PRIMARY XML INDEX PXML_TransXml
ON #XmlData(TransXml)

--APPLY NODES QUERY AGAINST XML INDEX
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
FROM #XmlData TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

---------------------------------
--Original Query
;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM #Trans--dbo.Trans
   WHERE TransDate >= '11/1/2015'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))

SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
--INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

【讨论】:

以上是关于调整查询以解析 SQL Server 2014 上的 XML 数据的主要内容,如果未能解决你的问题,请参考以下文章

在 SQL Server 中调整大型查询

可以在 SQL Server 2012 上恢复 SQL Server 2014 的备份吗?

调整SQL Server中的大型查询

SQL Server调优系列基础篇(索引运算总结)

在 SQL Server 2014 中写入基础表后立即查询视图

SQL Server 查询:调整货币列