SQL Server:清理旧数据需要太长时间
Posted
技术标签:
【中文标题】SQL Server:清理旧数据需要太长时间【英文标题】:SQL Server : cleaning old data takes too long 【发布时间】:2020-07-21 08:29:55 【问题描述】:我正在尝试从我的 SQL Server 数据库中清除旧数据(每个表中大约 5000 个条目),但是由于我在另一个数据库中循环了一个 CURSOR,所以它需要的时间太长(超过一个小时)。
BEGIN
DECLARE @UserId int
DECLARE @productNum varchar(50)
DECLARE user_ids CURSOR FOR SELECT id
FROM Users
WHERE productId IN (SELECT ap.id
FROM Account AS a, AccountProduct AS ap
WHERE a.id = ap.accountId
AND a.name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX'))
DECLARE product_cur CURSOR FOR
SELECT ap.id
FROM Account AS a, AccountProduct AS ap
WHERE a.id = ap.accountId
AND a.name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX')
OPEN user_ids
FETCH NEXT FROM user_ids INTO @UserId
WHILE @@FETCH_STATUS = 0
BEGIN
OPEN product_cur
FETCH NEXT FROM product_cur INTO @productNum
WHILE @@FETCH_STATUS = 0
BEGIN
DELETE FROM UserRole
WHERE userId = @UserId
AND productId = (SELECT id
FROM AccountProduct
WHERE number = @productNum)
DELETE FROM AccountProduct
WHERE number = @productNum
FETCH NEXT FROM product_cur INTO @productNum
END
CLOSE product_cur
DELETE FROM Users
WHERE id = @UserId
AND accountId IN (SELECT id FROM Account
WHERE name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX'))
FETCH NEXT FROM user_ids INTO @UserId
END
CLOSE user_ids
DEALLOCATE user_ids
DEALLOCATE product_cur
END
您知道完成这项任务的更好方法吗?
【问题讨论】:
Bad habits to kick : using old-style JOINs - 旧式 逗号分隔的表格列表 样式已替换为 ANSI 中的 proper ANSIJOIN
语法-92 SQL 标准(25 多年前),不鼓励使用它
不使用CURSOR
将是一个开始,尤其是光标内的光标。 SQL 是一种基于集合的语言,您应该使用基于集合的解决方案。
以上是完整的SQL吗?例如,您引用了光标room_cur
,但是,您从不声明它。
@Larnu,这是我的错误,room_cur 是 product_cur,我编辑了问题。抱歉错误
【参考方案1】:
这对于评论来说太长了,但是,这应该足以为您提供正确的想法。然而,上面的 SQL 似乎不是完整的 SQL,所以我不能给你提供相同行为的 SQL(例如,你在 FETCH
语句中引用游标 room_cur
,但是, SQL 中没有声明游标room_cur
)。
SQL 是一种基于集合的语言,它擅长基于集合的解决方案。游标不是基于集合的解决方案,它们是迭代任务,而 SQL Server 很烂。这是设计。 SQL 不是一种编程语言,因此像编程语言一样编写它意味着性能不佳。
对于DELETE
语句,您只需像对待任何其他语句一样对待它,因为DELETE
从FROM
返回的数据集中定义的表中删除行。对于Users
上的DELETE
,这(可能)意味着你想要这样的东西:
DELETE U
FROM dbo.Users U
JOIN dbo.Account A ON U.acccountID = A.id
WHERE A.[name] IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX');
这只是DELETE
dbo.Users
中的行,其中在dbo.Account
中找到连接行,name
中的值在IN
子句中。
【讨论】:
【参考方案2】:您可以通过删除进行频繁提交。我建议在事务中一次删除一个用户 ID 并提交它们。
Go for batch based deletion.
DECLARE @UserIdsToDelete TABLE(RowNo int, UserId int)
DECLARE @ProductsToDelete TABLE(RowNo int, ProductId int)
INSERT INTO @UserIdsToDelete
SELECT ROW_NUMBER() OVER (ORDER BY UserId) as RowNo, UserId
FROM Users
WHERE productId IN (SELECT ap.id
FROM Account AS a, AccountProduct AS ap
WHERE a.id = ap.accountId
AND a.name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX'))
INSERT INTO @productsToDelete
SELECT ROW_NUMBER() OVER (ORDER BY ap.id) as RowNo, ap.id
FROM Account AS a, AccountProduct AS ap
WHERE a.id = ap.accountId
AND a.name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX')
DECLARE @UserIdForDeletion INT
DECLARE @RowNoForDeletion INT = 1
SET @UserIdForDeletion = (SELECT UserID from
@UserIdsToDelete
WHERE RowNO = @RowNoForDeletion )
-- Deletion of Users
WHILE (@UserIdForDeletion IS NOT NULL )
BEGIN
BEGIN TRY
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM UserRole
WHERE UserId = @UserIdForDeletion
AND productId IN (SELECT ProductID from @ProductsToDelete)
DELETE FROM Users
WHERE id = @UserId
AND accountId IN (SELECT id FROM Account
WHERE name IN ('XXXXXX', 'XXXXXX', 'XXXXXX', 'XXXXXX'))
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
@RowNoForDeletion += 1;
SET @UserIdForDeletion = (SELECT UserID from
@UserIdsToDelete
WHERE RowNO = @RowNoForDeletion )
END
-- Delete the account products
BEGIN TRY
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM AccountProduct
WHERE number IN (SELECT ProductID from @ProductsToDelete)
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK TRANSACTION;
THROW;
END CATCH
【讨论】:
【参考方案3】:不要使用游标,因为它肯定会影响查询的性能。为什么不以较小的批量执行删除以加快查询执行时间? How to delete large data of table in SQL without log?
【讨论】:
以上是关于SQL Server:清理旧数据需要太长时间的主要内容,如果未能解决你的问题,请参考以下文章
数据库在 SQL Server 2012 上加入 HA 组需要很长时间
使用 Flex 时 SQL Server 需要很长时间才能将数据返回到 ColdFusion
在 PowerShell 中将大型 blob 从 SQL Server 提取到文件需要很长时间