DELETE with NOT IN (SELECT ...) 的性能

Posted

技术标签:

【中文标题】DELETE with NOT IN (SELECT ...) 的性能【英文标题】:Performance of DELETE with NOT IN (SELECT ...) 【发布时间】:2016-03-19 16:54:28 【问题描述】:

我有这两个表,想从 ms_author 中删除作者中不存在的所有作者。

author(160 万行)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI | true  |
| name  | text        | YES  |     |       |
+-------+-------------+------+-----+-------+

ms_author(1.2 亿行)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI |       |
| name  | text        | YES  |     | true  |
+-------+-------------+------+-----+-------+

这是我的查询:

    DELETE
FROM ms_author AS m
WHERE m.name NOT IN
                   (SELECT a.name
                    FROM author AS a);

我尝试估计查询持续时间:~ 130 小时。 有没有更快的方法来实现这一点?

编辑:

EXPLAIN VERBOSE 输出

Delete on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
  ->  Seq Scan on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
        Output: m.ctid"
        Filter: (NOT (SubPlan 1))"
        SubPlan 1"
          ->  Materialize  (cost=0.00..44334.43 rows=1660295 width=15)"
                Output: a.name"
                ->  Seq Scan on public.author a  (cost=0.00..27925.95 rows=1660295 width=15)"
                      Output: a.name"

索引作者(name):

create index author_name on author(name);

索引 ms_author(name):

create index ms_author_name on ms_author(name);

【问题讨论】:

我认为使用joinexists会更好 文本是索引字段吗? 我认为 danihp 的意思是,author.name 列是否被索引? @jarlh:author.name 未编入索引,而 ms_author 目前已编入索引。 @a_horse_with_no_name:我认为“输出”是指表格表示,对吧?我是手工制作的。 ;) 【参考方案1】:

我是“反加入”的忠实拥护者。这对大型和小型数据集都有效:

delete from ms_author ma
where not exists (
  select null
  from author a
  where ma.name = a.name
)

【讨论】:

这就是要走的路。 NOT IN (SELECT ...) 是一个棘手的子句。通常有better alternatives。 谢谢! :) 花了大约 10 个小时。大约 130 个小时是一个巨大的进步! ;)【参考方案2】:

使用NOT IN 的删除查询通常会导致嵌套循环反连接,从而导致性能下降。您可以按如下方式重写您的查询:

你可以这样写:

DELETE FROM ms_author AS m
WHERE m.id IN
               (SELECT m.id FROM ms_author AS m
                LEFT JOIN author AS a ON m.name = a.name
                WHERE a.name IS NULL);

这种方法的另一个优点是您使用主键“id”来删除行,这应该会快得多。

【讨论】:

以上是关于DELETE with NOT IN (SELECT ...) 的性能的主要内容,如果未能解决你的问题,请参考以下文章

MSSql (Compact) DELETE-Query with JOIN

SQL:with 查询

TypeORM delete using WHERE with OR operator using Repository

有没有办法在嵌套的 WITH 语句中包含 DELETE FROM 语句?

attempt to create delete event with null entity

SQL DELETE with JOIN another table for WHERE 条件