SQL：在表中查找缺失的 ID

Posted 2023-04-12

技术标签:

【中文标题】SQL：在表中查找缺失的 ID【英文标题】：SQL: find missing IDs in a table 【发布时间】：2009-09-07 14:12:15 【问题描述】：

我有一个具有唯一自动增量主键的表。随着时间的推移，可能会从表中删除条目，因此该字段的值中存在“漏洞”。例如，表格数据可能如下：

 ID  | Value    | More fields...
---------------------------------
 2   | Cat      | ... 
 3   | Fish     | ...
 6   | Dog      | ...
 7   | Aardvark | ...
 9   | Owl      | ...
 10  | Pig      | ...
 11  | Badger   | ...
 15  | Mongoose | ...
 19  | Ferret   | ...

我对将返回表中缺失 ID 列表的查询感兴趣。对于以上数据，预期结果为：

注意事项：

假设最初的第一个ID是1 应检查的最大 ID 是最后一个，即可以假设在当前最后一个之后没有其他条目（请参阅下面关于这一点的其他数据）

上述要求的一个缺点是该列表不会返回在 ID 19 之后创建并被删除的 ID。我目前正在用代码解决这种情况，因为我持有创建的最大 ID。但是，如果查询可以将 MaxID 作为参数，并且还返回当前 max 和 MaxID 之间的那些 ID，那将是一个不错的“奖励”（但肯定不是必须的）。

我目前正在使用 mysql，但考虑迁移到 SQL Server，因此我希望查询适合两者。另外，如果您使用的任何东西不能在 SQLite 上运行，请注明，谢谢。

【问题讨论】：

【参考方案1】：

我登陆此页面希望找到 SQLITE 的解决方案，因为这是我在为 SQLITE 搜索相同问题时找到的唯一答案。

我找到的最终解决方案来自这里的这篇文章 Float Middle Blog - SQLITE answer

希望它可以帮助其他人:-)

简单的解决方案是：

SELECT DISTINCT id +1
FROM mytable
WHERE id + 1 NOT IN (SELECT DISTINCT id FROM mytable);

天才。

【讨论】：

好的，所以我因帮助可能会像我一样登陆此页面的人而被否决。不错。我可以建议（一年多后）进行一个小的改进，以消除 SQLITE 中返回的最大缺失值，该值始终为 max(id) + 1 ：只需在查询末尾添加：AND id 这只是部分有效。如果您有 ID 24，但没有 25 或 26，则此请求将获得 ID 25，而不是 ID 26，因此您将忘记修复 ID 26 的情况。搜索了几个小时，发现了许多复杂的解决方案。这是一个多么简单而天才的答案。这对我帮助很大。非常感谢@NikBurns 请注意，如果您使用此解决方案，您将丢失值。我对其进行了测试，但就像@conradkleinespel 提到的那样，如果您连续丢失多个身份值，它只会捕获第一个。【参考方案2】：

这个问题经常出现，遗憾的是，最常见（也是最便携）的答案是创建一个临时表来保存应该存在的 ID，然后进行左连接。 MySQL 和 SQL Server 的语法非常相似。唯一真正的区别是临时表语法。

在 MySQL 中：

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(id) from tbl

create temporary table IDSeq
(
    id int
)

while @id < @maxid
begin
    insert into IDSeq values(@id)

    set @id = @id + 1
end

select 
    s.id 
from 
    idseq s 
    left join tbl t on 
        s.id = t.id 
 where t.id is null

 drop table IDSeq

在 SQL Server 中：

declare @id int
declare @maxid int

set @id = 1
select @maxid = max(id) from tbl

create table #IDSeq
(
    id int
)

while @id < @maxid --whatever you max is
begin
    insert into #IDSeq values(@id)

    set @id = @id + 1
end

select 
    s.id 
from 
    #idseq s 
    left join tbl t on 
        s.id = t.id 
 where t.id is null

 drop table #IDSeq

【讨论】：

我不确定这个环境中的场景是什么，但是如果不是 20 条，而是说一千条记录呢？ ..这是由同时为 50-60 个用户提供服务的网页上的代码调用的。每次创建和删除这些记录是否有效？考虑到我们省略了创建和删除临时表的部分。 @daemonkid：伙计，真是个该死的稻草人。如果您必须一次又一次地解决这个问题，对于 50-60 个用户，您显然需要一个永久表。您显然必须适应您的特定场景，但这是找到丢失 ID 问题的解决方案。 +1 我不确定我会接受它，但我会考虑。谢谢埃里克。虽然它适用于示例数据，但不适用于大量记录。如果想要一个大于 100.000 的范围，单独的“while”将需要几秒钟才能完成非常适合大量记录。我已经将它与包含 60 万条记录的表上的另一个答案进行了比较。【参考方案3】：

这是 SQL Server 的查询：

;WITH Missing (missnum, maxid)
AS
(
 SELECT 1 AS missnum, (select max(id) from @TT)
 UNION ALL
 SELECT missnum + 1, maxid FROM Missing
 WHERE missnum < maxid
)
SELECT missnum
FROM Missing
LEFT OUTER JOIN @TT tt on tt.id = Missing.missnum
WHERE tt.id is NULL
OPTION (MAXRECURSION 0);

希望这有帮助。

【讨论】：

MySQL 上的类似查询是什么？【参考方案4】：

仅限 PostgreSQL，受此处其他答案的启发。

SELECT all_ids AS missing_ids
FROM generate_series((SELECT MIN(id) FROM your_table), (SELECT MAX(id) FROM your_table)) all_ids
EXCEPT 
SELECT id FROM your_table

【讨论】：

我发现这对 postgres 非常有用。 generate_series 函数的链接如下：postgresql.org/docs/10/functions-srf.html。我很高兴找到了这个解决方案。谢谢！出于某种原因，ID 以非常随机的顺序出现，所以我在末尾添加了ORDER BY missing_ids。【参考方案5】：

我知道这是一个老问题，并且已经有一个公认的答案，但是使用临时表并不是真的必要。固定格式（抱歉重复发布）。

DECLARE @TEST_ID integer, @LAST_ID integer, @ID integer

SET @TEST_ID = 1 -- start compare with this ID 
SET @LAST_ID = 100 -- end compare with this ID

WHILE @TEST_ID <= @LAST_ID 
BEGIN 
  SELECT @ID = (SELECT <column> FROM <table> WHERE <column> = @TEST_ID) 
  IF @ID IS NULL 
  BEGIN 
    PRINT 'Missing ID: ' + CAST(@TEST_ID AS VARCHAR(10)) 
  END 
  SET @TEST_ID = @TEST_ID + 1 
END

【讨论】：

【参考方案6】：

这是唯一的 Oracle 解决方案。它没有解决完整的问题，但留给可能正在使用 Oracle 的其他人。

select level id           -- generate 1 .. 19
from dual
connect by level <= 19

minus                     -- remove from that set

select id                 -- everything that is currently in the 
from table                -- actual table

【讨论】：

【参考方案7】：

单个查询可以找到丢失的ID..

SELECT distinct number

FROM master..spt_values

WHERE number BETWEEN 1 and (SELECT max(id) FROM MyTable)

AND number NOT IN (SELECT id FROM MyTable)

【讨论】：

注意：这个答案是特定于 SQL Server 的奇怪.. 在 MS SQL Server 2008 上不起作用。表包含 600k 条记录，此查询仅检查 2150 条。 @naXa 那是因为spt_values 只是一个包含一堆随机数的表。这个答案是完全错误的。 @Stijn 正是我所暗示的【参考方案8】：

我刚刚找到了 Postgres 的解决方案：

select min(gs) 
from generate_series(1, 1999) as gs 
where gs not in (select id from mytable)

【讨论】：

小幅改进以获取所有丢失的 id：select gs from generate_series(1, (select MAX(id) from mytable)) as gs where gs not in (select id from mytable)【参考方案9】：

从表中获取缺失的行

DECLARE @MaxID INT = (SELECT MAX(ID) FROM TABLE1)
SELECT SeqID AS MissingSeqID
FROM (SELECT ROW_NUMBER() OVER (ORDER BY column_id) SeqID from sys.columns) LkUp
LEFT JOIN dbo.TABLE1 t ON t.ID = LkUp.SeqID
WHERE t.ID is null and SeqID < @MaxID

【讨论】：

【参考方案10】：

更新：此方法耗时太长，因此我编写了一个 linux 命令来查找文本文件中的空白。它以相反的顺序执行此操作，因此首先将所有 id 转储到这样的文本文件中；

nohup mysql --password=xx -e 'select id from tablename order by id desc' databasename > /home/ids.txt &

第一行和最后两行只是为了记录花费了多长时间。 150 万个 ID（ish）花了我 57 秒的时间，这是在一个慢速服务器上。在 i 中设置最大 id 并拥有它。

T="$(date +%s)"; \
i=1574115; \
while read line; do \
    if  [[ "$line" != "$i" ]] ; then \
        if [[ $i -lt 1 ]] ; then break; fi; \
        if  [[ $line -gt 1 ]] ; then \
            missingsequenceend=$(( $line + 1 )); \
            minusstr="-"; \
            missingsequence="$missingsequenceend$minusstr$i"; \
            expectnext=$(( $line - 1 )); \
            i=$expectnext; \
            echo -e "$missingsequence"; \
        fi; \
    else \
        i=$(( $i - 1 )); \
    fi; \
done \
< /home/ids.txt; \
T="$(($(date +%s)-T))"; \
echo "Time in seconds: $T"

示例输出：

1494505-1494507
47566-47572
Time in seconds: 57

此外，我在 Eric 的回答中遇到了代码语法错误，但在更改分隔符后，在适当的位置使用分号并将其存储在过程中，它可以工作。

确保设置正确的最大 ID、数据库名称和表名称（在选择查询中）。如果您想更改过程名称，请在所有 3 个位置进行更改。

use dbname;
drop procedure if exists dorepeat;
delimiter #
CREATE PROCEDURE dorepeat()
BEGIN
set @id = 1;
set @maxid = 1573736;
drop table if exists IDSeq;
create temporary table IDSeq
(
    id int
);

WHILE @id < @maxid DO
    insert into IDSeq values(@id);
    set @id = @id + 1;
END WHILE;

select 
    s.id 
from 
    IDSeq s 
    left join tablename t on 
        s.id = t.id 
 where t.id is null;

drop table if exists IDSeq;

END#
delimiter ;
CALL dorepeat;

我也在其他地方找到了这个查询，但我还没有测试过。

SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
    FROM tablename AS a, tablename AS b
    WHERE a.id < b.id
    GROUP BY a.id
    HAVING start < MIN(b.id)

【讨论】：

【参考方案11】：

这个问题只需要一个查询就可以解决

select lft.id + 1 as missing_ids
from tbl as lft left outer join tbl as rght on lft.id + 1 = rght.id
where rght.id is null and lft.id between 1 and (Select max(id)-1 from tbl)

在 Mysql 上测试

【讨论】：

这只查找比现有 ID 大 1 的缺失 ID。在 OP 的情况下，它会遗漏大部分缺失的 ID。 sqlfiddle.com/#!2/45412/1【参考方案12】：

在 MySQL

中尝试

DELIMITER ||
DROP PROCEDURE IF EXISTS proc_missing ||
CREATE PROCEDURE proc_missing()
BEGIN 
SET @minID = (SELECT MIN(`id`) FROM `tbl_name` WHERE `user_id`=13);
SET @maxID = (SELECT MAX(`id`) FROM `tbl_name` WHERE `user_id`=13);
REPEAT 
    SET @tableID = (SELECT `id` FROM `tbl_name` WHERE `id` = @minID);
    IF (@tableID IS NULL) THEN
        INSERT INTO temp_missing SET `missing_id` = @tableID;
    END IF;
    SET @minID = @minID + 1;
UNTIL(@minID <= @maxID)
END REPEAT;
END ||
DELIMITER ;

【讨论】：

【参考方案13】：

几天前，我正在编写一份生产报告，发现缺少一些数字。丢失的号码非常重要，因此我被要求查找所有丢失号码的列表以进行调查。 I posted a blog entry here, with a full demo, including a script to find missing numbers/IDs in a sample table.

建议的脚本很长，所以我不会在这里包含它。以下是使用的基本步骤：

创建一个临时表并存储所有不同的数字。查找之前缺少某些内容的 NextID。存储到一个 TempTable 中。创建一个临时表来存储丢失的号码详细信息。开始使用 WHILE 循环查找丢失的 id。从#MissingID 临时表中选择缺失的数据。

【讨论】：

【参考方案14】：

将 SQL CTE（来自 Paul Svirin）转换为 Oracle 版本，如下所示（将 :YOURTABLE 替换为您的表名）：

WITH Missing (missnum,maxid) as (
  SELECT 1 missnum, (select max(id) from :YOURTABLE) maxid from dual
  UNION ALL
  SELECT m.missnum + 1,m.maxid 
  FROM Missing m
  WHERE m.missnum < m.maxid
)
SELECT missnum
FROM Missing
LEFT OUTER JOIN :YOURTABLE tt on tt.id = Missing.missnum
WHERE tt.id is NULL

【讨论】：

【参考方案15】：

使用@PaulSvirin 的答案，我用UNION 对其进行了扩展，以显示我表中的所有数据，包括NULLs 的缺失记录。

WITH Missing(missnum, maxid) AS
          (SELECT (SELECT MIN(tmMIN.TETmeetingID)
                   FROM tblTETMeeting AS tmMIN)
                      AS missnum,
                  (SELECT MAX(tmMAX.TETmeetingID)
                   FROM tblTETMeeting AS tmMAX)
                      AS maxid
           UNION ALL
           SELECT missnum + 1, maxid
           FROM Missing
           WHERE missnum < maxid)
SELECT missnum AS TETmeetingID,
       tt.DateID,
       tt.WeekNo,
       tt.TETID
FROM Missing LEFT JOIN tblTETMeeting tt ON tt.TETmeetingID = Missing.missnum
WHERE tt.TETmeetingID IS NULL
UNION
SELECT tt.TETmeetingID,
       tt.DateID,
       tt.WeekNo,
       tt.TETID
FROM tblTETMeeting AS tt
OPTION ( MAXRECURSION 0 )

工作很棒！

TETmeetingID    DateID  WeekNo  TETID
29  3063    21  1
30  null    null    null
31  null    null    null
32  null    null    null
33  null    null    null
34  3070    22  1
35  3073    23  1

【讨论】：

【参考方案16】：

对我来说最简单的解决方案：创建一个选择，让所有 id 达到最大序列值（例如：1000000），然后过滤：

with listids as (
Select Rownum idnumber From dual Connect By Rownum <= 1000000)

select * from listids
where idnumber not in (select id from table where id <=1000000)

【讨论】：

【参考方案17】：

借用@Eric 提议的修改版本。这适用于 SQL Server，并在临时表中保存缺失范围的开始值和结束值。如果差距只是一个值，它会将NULL 作为最终值，以便于可视化。

它会产生这样的输出

|StartId| EndId |
|-------|-------|
|     1 | 10182 |
| 10189 | NULL  |
| 10246 | 15000 |

这是需要将myTable 和id 替换为您的表和标识列的脚本。

declare @id bigint
declare @endId bigint
declare @maxid bigint
declare @previousid bigint=0

set @id = 1
select @maxid = max(id) from myTable

create table #IDGaps
(
    startId bigint,
    endId bigint
)

while @id < @maxid
begin
    if NOT EXISTS(select id from myTable where id=@id)
    BEGIN
        SET @previousid=@id
        select top 1 @endId=id from myTable where id>@id

        IF @id=@endId-1
            insert into #IDGaps values(@id,null)
        ELSE
            insert into #IDGaps values(@id,@endId-1)

        SET @id=@endId
        
    END
    ELSE
        set @id = @id + 1
end

select * from #IDGaps

drop table #IDGaps

【讨论】：

【参考方案18】：

试试这个查询。这个单一的查询足以得到缺失的数字：（请将 TABLE_NAME 替换为您正在使用的表名）

select sno as missing from(SELECT @row := @row + 1 as sno FROM 
(select 0 union all select 1 union all select 3 union all select 4 union all 
select 5 union all select 6 union all select 6 union all select 7 union all 
select 8 union all select 9) t,(select 0 union all select 1 union all select 3 
union all select 4 union all select 5 union all select 6 union all select 6 
union all select 7 union all select 8 union all select 9) t2,(select 0 
union all select 1 union all select 3 union all select 4 union all select 5 
union all select 6 union all select 6 union all select 7 union all select 8 
union all select 9) t3, (select 0 union all select 1 union all select 3 union 
all select 4 union all select 5 union all select 6 union all select 6 union all 
select 7 union all select 8 union all select 9) t4, 
(SELECT @row:=0) as b where @row<1000) as a where a.sno  not in 
  (select distinct b.no from 
(select b.*,if(@mn=0,@mn:=b.no,@mn) as min,(@mx:=b.no) as max from 
  (select ID as no from TABLE_NAME as a) as b,
        (select @mn:=0,@mx:=0) as x order by no) as b) and 
         a.sno between @mn and @mx;

【讨论】：

这不是一个简单的查询【参考方案19】：

SELECT DISTINCT id -1
FROM users
WHERE id != 1 AND id - 1 NOT IN (SELECT DISTINCT id FROM users)

解释：（id - 1）.....检查表中存在的任何先前的id

( id != 1 ).....当当前 id 为 1 时忽略，因为它的前一个 id 将是 0 零。

【讨论】：

不错的尝试，但错过了一些缺少的 ID【参考方案20】：

这是我用来查找一个名为 tablename 的表的缺失 id

select a.id+1 missing_ID from tablename a where a.id+1 not in (select id from tablename b where b.id=a.id+1) and a.id!=(select id from tablename c order by id desc limit 1)

它将返回丢失的 ID。如果有两 (2) 个或更多连续缺失的 id，它将只返回第一个。

【讨论】：

以上是关于SQL：在表中查找缺失的 ID的主要内容，如果未能解决你的问题，请参考以下文章