创建和填充数字表的最佳方法是啥?
Posted
技术标签:
【中文标题】创建和填充数字表的最佳方法是啥?【英文标题】:What is the best way to create and populate a numbers table?创建和填充数字表的最佳方法是什么? 【发布时间】:2010-11-26 12:12:38 【问题描述】:我见过许多不同的方法来创建和填充数字表。但是,创建和填充一个的最佳方法是什么?从最重要到最不重要定义“最佳”:
使用最佳索引创建的表 生成的行最快 用于创建和填充的简单代码如果您不知道数字表是什么,请看这里:Why should I consider using an auxiliary numbers table?
【问题讨论】:
据我所知,这或多或少是 ***.com/questions/10819/… 的副本 目前最好的方法是投票支持不需要物理分配的虚拟表的内置实现。目前可以在这里完成:https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbers @LouisSomers,我喜欢这种方法。但是,如果您需要您的程序在他们开始添加该功能之前很久就可以运行,那么您将需要自行构建。 @KM 哈哈,没错,我在同一条船上,我只是想为我认为比管理工作室中的黑暗主题更重要的功能筹集更多选票。 .. @LouisSomers,试图点击你的链接添加我的投票,但 M$ 的那些流浪汉正在远离那个平台。 【参考方案1】:这里有一些代码示例取自网络和此问题的答案。
对于每个方法,我修改了原始代码,因此每个方法都使用相同的表和列:NumbersTest 和 Number,具有 10,000 行或尽可能接近的行数。另外,我提供了原产地的链接。
METHOD 1这是来自here的一个非常慢的循环方法 平均 13.01 秒 跑了 3 次最高移除,这里是以秒为单位的时间:12.42、13.60
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1))
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN
INSERT dbo.NumbersTest DEFAULT VALUES
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
方法 2 这是一个更快的循环来自 here 平均 1.1658 秒 最高跑了 11 次,这里是以秒为单位的次数:1.117、1.140、1.203、1.170、1.173、1.156、1.203、1.153、1.173、1.170
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
方法 3 这是基于 here 代码的单个 INSERT 平均 488.6 毫秒 最高运行 11 次,以下是毫秒数:686、673、623、686,343,343,376,360,343,453
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
方法 4 这是来自here 的“半循环”方法 平均 348.3 毫秒(由于代码中间的“GO”,很难获得好的时机,任何建议都将不胜感激) 最高运行 11 次,以下是毫秒数:356、360、283、346、360、376、326、373、330、373
DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest
方法 5 这是来自 Philip Kelley's answer 的单个 INSERT 平均 92.7 毫秒 最高运行 11 次,以下是毫秒数:80、96、96、93、110、110、80、76、93、93
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
--I removed Pass5, since I'm only populating the Numbers table to 10,000
Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
(Number)
SELECT Number
FROM Tally
WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
方法 6 这是来自 Mladen Prajdic answer 的单个 INSERT 平均 82.3 毫秒 最高运行 11 次,以下是毫秒数:80、80、93、76、93、63、93、76、93、76
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
方法 7 这里是基于 here 的代码的单个 INSERT 平均 56.3 毫秒 最高运行 11 次,以下是毫秒数:63、50、63、46、60、63、63、46、63、46
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO NumbersTest
FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
看了这么多方法,我很喜欢方法7,速度最快,代码也比较简单。
【讨论】:
多年后看到这篇文章。我会对 100 万行或更多行的计时感兴趣。有一天我可能会尝试,但 10000 可能是合理需要的数量。 虽然很有趣,但时间对我来说似乎并不重要。特别是因为如果我需要一个数字表,我会创建一次并一遍又一遍地使用它。 非常感谢!我知道这是旧的,但对于那些登陆这里的人,我建议创建一个 100,000 的数字表,以便您可以将它与日期结合使用。 方法 7 创建了一个包含 9604 行的表。 @Dave,HA,一条评论说他们在回答它四年后 9604 行!回答它六年后,你说它给出了随机结果。随机结果意味着您获得随机值。如果 sys.objects 中的行数很少,您将始终获得从 1 开始的连续整数值,可能小于 10,000。我在新数据库(sys.objects 中的 76 行)上尝试了方法 7,它可以创建 5,776 行(76*76)。如果您按照上一条评论中的建议添加CROSS JOIN sys.objects s3
,您将获得 438,976 行 (76*76*76)。【参考方案2】:
我用这个快得要命:
insert into Numbers(N)
select top 1000000 row_number() over(order by t1.number) as N
from master..spt_values t1
cross join master..spt_values t2
【讨论】:
请注意,Azure SQL 数据库不支持此功能。【参考方案3】:如果您只是在 SQL Server Management Studio 或 sqlcmd.exe
中执行此操作,则可以使用批处理分隔符允许您重复批处理的事实:
CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL);
GO
INSERT INTO Number DEFAULT VALUES;
GO 100000
这将使用下一个标识的默认值将 100000 条记录插入到 Numbers
表中。
很慢。它与@KM.'s answer中的 METHOD 1 相比,这是示例中最慢的。但是,它与代码一样简单。您可以通过在插入批处理之后添加主键约束来加快速度。
【讨论】:
@Bacon Bits,我可以只插入(a)特定列吗? @Azimuth 您可以使用此方法,只要您可以编写单个 INSERT 语句,该语句在重复执行时将为每一行创建数据。所有批处理转发器所做的就是告诉客户端(SSMS 或 sqlcmd.exe)重复完全相同的查询 N 次。您可以通过多种方式利用 T-SQL 做到这一点,但我怀疑它很快就会变得轻量级。【参考方案4】:我从以下模板开始,该模板源自 Itzik Ben-Gan 的大量印刷品:
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
Tally as (select row_number() over(order by C) as Number from Pass5)
select Number from Tally where Number <= 1000000
“WHERE N
由于这是一个 WITH 子句,它可以像这样用于 INSERT...SELECT...:
-- Sample use: create one million rows
CREATE TABLE dbo.Example (ExampleId int not null)
DECLARE @RowsToCreate int
SET @RowsToCreate = 1000000
-- "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Example (ExampleId)
select Number
from Tally
where Number <= @RowsToCreate
在构建表后对其进行索引将是对其进行索引的最快方法。
哦,我将其称为“Tally”表。我认为这是一个常用术语,您可以通过谷歌搜索找到大量技巧和示例。
【讨论】:
【参考方案5】:这是我利用 SQL Server 2008 中引入的Table Valued Constructors 提出的一个简短而快速的内存解决方案:
它将返回 1,000,000 行,但是您可以添加/删除 CROSS JOIN,或使用 TOP 子句来修改它。
;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z))
SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1
CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums
请注意,这可以在运行中快速计算,或者(甚至更好)存储在永久表中(只需在 SELECT N
段之后添加一个 INTO
子句),并在 N
字段上使用主键提高效率。
【讨论】:
如果你想要一个即时数字表,我喜欢这个主意。当你使用它来生成一个实际的表时,它比其他的要慢。 @KM。我刚刚在我的设置上进行了测试,不到一秒钟。但是假设它需要 10 秒,而另一个只需要 1 秒(设置永久表)。 IMO,考虑到这仍然是微不足道的,您只需设置一次永久表。其他因素,比如代码冗长,对我来说更重要。 1分钟对1秒?这会有点不同,但我的查询并没有那么慢。【参考方案6】:对于正在寻找 Azure 解决方案的任何人
SET NOCOUNT ON
CREATE TABLE Numbers (n bigint PRIMARY KEY)
GO
DECLARE @numbers table(number int);
WITH numbers(number) as (
SELECT 1 AS number
UNION all
SELECT number+1 FROM numbers WHERE number<10000
)
INSERT INTO @numbers(number)
SELECT number FROM numbers OPTION(maxrecursion 10000)
INSERT INTO Numbers(n) SELECT number FROM @numbers
来自 sql azure 团队博客 http://azure.microsoft.com/blog/2010/09/16/create-a-numbers-table-in-sql-azure/
【讨论】:
【参考方案7】:我知道这个线程很旧并且已经回答,但是有一种方法可以从方法 7 中挤出一点额外的性能:
而不是这个(本质上是方法 7,但有一些易于使用的润色):
DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO #TALLY
FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 --use sys.co
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
试试这个:
DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO #TALLY
FROM (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T32]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T64]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T128]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T256]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T512]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T1024]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2048]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4096]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8192]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16384]
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
在我的服务器上,这需要大约 10 毫秒,而从 sys.objects 中选择时大约需要 16-20 毫秒。它还具有不依赖于 sys.objects 中有多少对象的额外好处。虽然它很安全,但从技术上讲,它是一种依赖,而且另一个运行得更快。我认为如果你改变,速度提升取决于使用 BIT:
DECLARE @BIT AS BIT = 0
到:
DECLARE @BIT AS BIGINT = 0
它使我的服务器上的总时间增加了约 8-10 毫秒。也就是说,当您扩展到 1,000,000 条记录时,BIT vs BIGINT 不会再明显影响我的查询,但它仍然运行在 ~680ms 左右,而从 sys.objects 运行 ~730ms。
【讨论】:
【参考方案8】:我在 BIRT 中主要使用数字表来模拟报表,而不必摆弄记录集的动态创建。
我对日期也是如此,有一张从过去 10 年到未来 10 年的表格(以及一天中的几个小时以获取更详细的报告)。即使您的“真实”数据表中没有这些日期的数据,也能够获取 所有 日期的值,这是一个巧妙的技巧。
我有一个用于创建这些的脚本,例如(这是来自记忆):
drop table numbers; commit;
create table numbers (n integer primary key); commit;
insert into numbers values (0); commit;
insert into numbers select n+1 from numbers; commit;
insert into numbers select n+2 from numbers; commit;
insert into numbers select n+4 from numbers; commit;
insert into numbers select n+8 from numbers; commit;
insert into numbers select n+16 from numbers; commit;
insert into numbers select n+32 from numbers; commit;
insert into numbers select n+64 from numbers; commit;
每行的行数翻倍,因此生成真正巨大的表格并不需要太多。
我不确定我是否同意你的观点,即快速创建很重要,因为你只创建一次。其成本在所有对其的访问中摊销,因此时间相当微不足道。
【讨论】:
每个 commits; 导致 Msg 3902, Level 16, State 1, Line 1 COMMIT TRANSACTION 请求没有对应的 BEGIN TRANSACTION。跨度> @KM,第一点很容易通过开始事务来解决(DB/2,我选择的 DBMS,通常配置为自动启动事务)。而且,如果你想要更多的行,你只需要添加更多的插入。每一个都会使范围翻倍,因此如果您愿意,很容易获得大数字。我也更愿意尽可能提供通用 SQL 解决方案,而不是将解决方案限制为特定供应商。【参考方案9】:这里有几个额外的方法:方法 1
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
DROP TABLE dbo.Numbers
GO
CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO
DECLARE @i int = 1;
INSERT INTO dbo.Numbers (Number)
VALUES (1),(2);
WHILE 2*@i < 1048576
BEGIN
INSERT INTO dbo.Numbers (Number)
SELECT Number + 2*@i
FROM dbo.Numbers;
SET @i = @@ROWCOUNT;
END
GO
SELECT COUNT(*) FROM Numbers AS RowCownt --1048576 rows
方法二
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
DROP TABLE dbo.Numbers
GO
CREATE TABLE dbo.Numbers (Number int NOT NULL PRIMARY KEY);
GO
DECLARE @i INT = 0;
INSERT INTO dbo.Numbers (Number)
VALUES (1);
WHILE @i <= 9
BEGIN
INSERT INTO dbo.Numbers (Number)
SELECT N.Number + POWER(4, @i) * D.Digit
FROM dbo.Numbers AS N
CROSS JOIN (VALUES(1),(2),(3)) AS D(Digit)
ORDER BY D.Digit, N.Number
SET @i = @i + 1;
END
GO
SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法3
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
DROP TABLE dbo.Numbers
GO
CREATE TABLE Numbers (Number int identity NOT NULL PRIMARY KEY, T bit NULL);
WITH
T1(T) AS (SELECT T FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS T(T)) --10 rows
,T2(T) AS (SELECT A.T FROM T1 AS A CROSS JOIN T1 AS B CROSS JOIN T1 AS C) --1,000 rows
,T3(T) AS (SELECT A.T FROM T2 AS A CROSS JOIN T2 AS B CROSS JOIN T2 AS C) --1,000,000,000 rows
INSERT INTO dbo.Numbers(T)
SELECT TOP (1048576) NULL
FROM T3;
ALTER TABLE Numbers
DROP COLUMN T;
GO
SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法 4,取自 Alex Kuznetsov 的 Defensive Database Programming 书
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
DROP TABLE dbo.Numbers
GO
CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO
DECLARE @i INT = 1 ;
INSERT INTO dbo.Numbers (Number)
VALUES (1);
WHILE @i < 524289 --1048576
BEGIN;
INSERT INTO dbo.Numbers (Number)
SELECT Number + @i
FROM dbo.Numbers;
SET @i = @i * 2 ;
END
GO
SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows
方法 5,取自 Erland Sommarskog 的 Arrays and Lists in SQL Server 2005 and Beyond 文章
IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
DROP TABLE dbo.Numbers
GO
CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO
WITH digits (d) AS (
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
SELECT 0)
INSERT INTO Numbers (Number)
SELECT Number
FROM (SELECT i.d + ii.d * 10 + iii.d * 100 + iv.d * 1000 +
v.d * 10000 + vi.d * 100000 AS Number
FROM digits i
CROSS JOIN digits ii
CROSS JOIN digits iii
CROSS JOIN digits iv
CROSS JOIN digits v
CROSS JOIN digits vi) AS Numbers
WHERE Number > 0
GO
SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --999999 rows
总结: 在这 5 种方法中,方法 3 似乎是最快的。
【讨论】:
【参考方案10】:一些建议的方法基于系统对象(例如,基于“sys.objects”)。他们假设这些系统对象包含足够的记录来生成我们的数字。
我不会基于不属于我的应用程序且我无法完全控制的任何内容。例如:这些 sys 表的内容可能会发生变化,这些表在新版本的 SQL 中可能不再有效等。
作为一种解决方案,我们可以创建自己的记录表。然后我们使用那个代替这些与系统相关的对象(如果我们事先知道范围,那么包含所有数字的表应该没问题,否则我们可以使用那个进行交叉连接)。
基于 CTE 的解决方案运行良好,但存在与嵌套循环相关的限制。
【讨论】:
【参考方案11】:这是对已接受答案的重新打包 - 但以一种让您自己比较它们的方式 - 比较了前 3 种算法(并且 cmets 解释了为什么排除其他方法)并且您可以针对您的自己的设置,看看它们在你想要的序列大小下的表现如何。
SET NOCOUNT ON;
--
-- Set the count of numbers that you want in your sequence ...
--
DECLARE @NumberOfNumbers int = 10000000;
--
-- Some notes on choosing a useful length for your sequence ...
-- For a sequence of 100 numbers -- winner depends on preference of min/max/avg runtime ... (I prefer PhilKelley algo here - edit the algo so RowSet2 is max RowSet CTE)
-- For a sequence of 1k numbers -- winner depends on preference of min/max/avg runtime ... (Sadly PhilKelley algo is generally lowest ranked in this bucket, but could be tweaked to perform better)
-- For a sequence of 10k numbers -- a clear winner emerges for this bucket
-- For a sequence of 100k numbers -- do not test any looping methods at this size or above ...
-- the previous winner fails, a different method is need to guarantee the full sequence desired
-- For a sequence of 1MM numbers -- the statistics aren't changing much between the algorithms - choose one based on your own goals or tweaks
-- For a sequence of 10MM numbers -- only one of the methods yields the desired sequence, and the numbers are much closer than for smaller sequences
DECLARE @TestIteration int = 0;
DECLARE @MaxIterations int = 10;
DECLARE @MethodName varchar(128);
-- SQL SERVER 2017 Syntax/Support needed
DROP TABLE IF EXISTS #TimingTest
CREATE TABLE #TimingTest (MethodName varchar(128), TestIteration int, StartDate DateTime2, EndDate DateTime2, ElapsedTime decimal(38,0), ItemCount decimal(38,0), MaxNumber decimal(38,0), MinNumber decimal(38,0))
--
-- Conduct the test ...
--
WHILE @TestIteration < @MaxIterations
BEGIN
-- Be sure that the test moves forward
SET @TestIteration += 1;
/* -- This method has been removed, as it is BY FAR, the slowest method
-- This test shows that, looping should be avoided, likely at all costs, if one places a value / premium on speed of execution ...
--
-- METHOD - Fast looping
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'FastLoop';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
DECLARE @i INT = 1;
WHILE @i <= @NumberOfNumbers
BEGIN
INSERT INTO [Numbers].[Test](Number) VALUES (@i);
SELECT @i = @i + 1;
END;
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
*/
/* -- This method requires GO statements, which would break the script, also - this answer does not appear to be the fastest *AND* seems to perform "magic"
--
-- METHOD - "Semi-Looping"
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'SemiLoop';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
INSERT [Numbers].[Test] values (1);
-- GO --required
INSERT [Numbers].[Test] SELECT Number + (SELECT COUNT(*) FROM [Numbers].[Test]) FROM [Numbers].[Test]
-- GO 14 --will create 16384 total rows
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
*/
--
-- METHOD - Philip Kelley's algo
-- (needs tweaking to match the desired length of sequence in order to optimize its performance, relies more on the coder to properly tweak the algorithm)
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'PhilKelley';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
; WITH
RowSet0 as (select 1 as Item union all select 1), -- 2 rows -- We only have to name the column in the first select, the second/union select inherits the column name
RowSet1 as (select 1 as Item from RowSet0 as A, RowSet0 as B), -- 4 rows
RowSet2 as (select 1 as Item from RowSet1 as A, RowSet1 as B), -- 16 rows
RowSet3 as (select 1 as Item from RowSet2 as A, RowSet2 as B), -- 256 rows
RowSet4 as (select 1 as Item from RowSet3 as A, RowSet3 as B), -- 65536 rows (65k)
RowSet5 as (select 1 as Item from RowSet4 as A, RowSet4 as B), -- 4294967296 rows (4BB)
-- Add more RowSetX to get higher and higher numbers of rows
-- Each successive RowSetX results in squaring the previously available number of rows
Tally as (select row_number() over (order by Item) as Number from RowSet5) -- This is what gives us the sequence of integers, always select from the terminal CTE expression
-- Note: testing of this specific use case has shown that making Tally as a sub-query instead of a terminal CTE expression is slower (always) - be sure to follow this pattern closely for max performance
INSERT INTO [Numbers].[Test] (Number)
SELECT o.Number
FROM Tally o
WHERE o.Number <= @NumberOfNumbers
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
--
-- METHOD - Mladen Prajdic answer
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'MladenPrajdic';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
INSERT INTO [Numbers].[Test](Number)
SELECT TOP (@NumberOfNumbers) row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
--
-- METHOD - Single INSERT
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
-- The Table creation is part of this algorithm ...
-- Method information
SET @MethodName = 'SingleInsert';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
SELECT TOP (@NumberOfNumbers) IDENTITY(int,1,1) AS Number
INTO [Numbers].[Test]
FROM sys.objects s1 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
END
-- Calculate the timespan for each of the runs
UPDATE tt
SET
ElapsedTime = DATEDIFF(MICROSECOND, StartDate, EndDate)
FROM #TimingTest tt
--
-- Report the results ...
--
SELECT
MethodName, AVG(ElapsedTime) / AVG(ItemCount) as TimePerRecord, CAST(AVG(ItemCount) as bigint) as SequenceLength,
MAX(ElapsedTime) as MaxTime, MIN(ElapsedTime) as MinTime,
MAX(MaxNumber) as MaxNumber, MIN(MinNumber) as MinNumber
FROM #TimingTest tt
GROUP by tt.MethodName
ORDER BY TimePerRecord ASC, MaxTime ASC, MinTime ASC
【讨论】:
【参考方案12】:这样就可以了。 这种方法的优点:
-
对上下限值的更多控制。如果在任何时候您不得不取消正在运行的查询,您可以修改下限以重新开始该过程。
没有可能影响查询运行时间的主键或身份约束。
CREATE TABLE Numbers(N INT);
--
DECLARE @lower_range INT= 1;
DECLARE @upper_range INT= 10000;
--
WHILE(@lower_range <= @upper_range)
BEGIN
INSERT INTO Numbers(N)
VALUES(@lower_range);
SET @lower_range = @lower_range + 1;
END;
--
SELECT *
FROM Numbers
ORDER BY N;
【讨论】:
以上是关于创建和填充数字表的最佳方法是啥?的主要内容,如果未能解决你的问题,请参考以下文章
设计我的 mysql 表的最佳方法是啥?我需要用偏好图代表用户