mysql - 缺失范围和连续范围
Posted frank_quan
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了mysql - 缺失范围和连续范围相关的知识,希望对你有一定的参考价值。
初始化数据
# 创建表 DROP TABLE IF EXISTS g; CREATE TABLE g( a INT )ENGINE=INNODB; # 初始化数据 INSERT INTO g SELECT 1; INSERT INTO g SELECT 2; INSERT INTO g SELECT 3; INSERT INTO g SELECT 100; INSERT INTO g SELECT 101; INSERT INTO g SELECT 103; INSERT INTO g SELECT 104; INSERT INTO g SELECT 105; INSERT INTO g SELECT 106;
对于g表其缺失范围如4-16所示
对于g表其连续范围如4-17所示
对于缺失范围的问题,可以通过下列步骤来解决
1)找到间断点之前的值,然后对该值加1,即为start_range;
2)找到间断点之前的值,然后对该值减1,即为end_range;
对于间断点之前的值,可以用如下sql:
SELECT a FROM g AS A WHERE NOT EXISTS( SELECT * FROM g AS B WHERE A.a+1=B.a )
查出的106是无用的,因为它是表中的最大值,所以将其过滤掉。断点之前的值,对该值加1操作,即为start_range,可以通过以下sql语句得到:
SELECT a+1 start_range FROM g AS A WHERE NOT EXISTS( SELECT * FROM g AS B WHERE A.a+1=B.a ) AND a < (SELECT MAX(a) FROM g)
最后通过子查询为每个最小间断点返回表g中下一个已有的值并减一,即得到间断点end_range,最终sql语句如下所示:
SELECT a+1 start_range, ( SELECT MIN(a)-1 FROM g C WHERE C.a > A.a ) AS end_range FROM g AS A WHERE NOT EXISTS( SELECT * FROM g AS B WHERE A.a+1=B.a ) AND a < (SELECT MAX(a) FROM g)
这只是该问题的解决方案之一,更为简单直观的方法是,将表g中的数据进行移位匹配,如果是连续的值,那么其差值应该为1,如果不是连续的值就应该大于1。
对于表g,进行移位匹配后应该得到如表4-18所示的内容。
可以看出next - cur的值等于1表示连续的值,否则不连续,不连续的值为(3,100)、(101,103),而我们要求的不连续范围为(4,99)、(102,102),也就是(cur+1,next-1)就是我们要的确实范围,要得到图4-18所示的内容,可以执行下述SQL:
SELECT A.a AS cur,( SELECT MIN(a) FROM g AS B WHERE B.a > A.a ) AS next FROM g AS A
而要得到最终的结果,只需要对cur加1,next减1即可。该好处另外一个好处是无需处理最大值,因为最大数next的值为NULL。该解决方案的SQL语句如下所示:
SELECT cur+1 AS start_range, NEXT-1 AS end_range FROM ( SELECT A.a AS cur,( SELECT MIN(a) FROM g AS B WHERE B.a > A.a ) AS NEXT FROM g AS A ) AS C WHERE NEXT-cur > 1;
连续范围,如果采用子查询方案,我们要手动创建一个列,并对这个列进行分组。这个列应该是每个连续分组的最大值,对于{1,2,3}来说,这个最大值就应该是3。计算一组连续组中最大值所依据的原理是:返回大于或等于当前值且后面一个值为间断点的最小值。下面是该子查询的SQL:
SELECT a,( SELECT MIN(a) FROM g AS A WHERE NOT EXISTS( SELECT * FROM g AS B WHERE A.a+1 = B.a ) AND A.a >= C.a ) AS MAX FROM g AS C
剩下的工作就就简单了,在上一步查询中执行如下语句对max列进行分组,得到分组中的最小值和最大值,这就是我们要的连续范围。
SELECT MIN(a) AS start_range, MAX(a) AS end_range FROM( SELECT a,( SELECT MIN(a) FROM g AS A WHERE NOT EXISTS( SELECT * FROM g AS B WHERE A.a+1 = B.a ) AND A.a >= C.a ) AS MAX FROM g AS C ) AS D GROUP BY MAX;
上述查询给出了连续范围问题的解决方案,但其性能是值得商榷的。这里的扫描成本是O(N²)。对于表中数据量非常大的情况,其性能又会变得十分糟糕。因此解决连续范围问题的最优方案是采用行号方法。
SELECT MIN(a) AS start_range, MAX(a) AS end_range FROM ( SELECT a, rn, a-rn AS diff FROM ( SELECT a, @a:=@a+1 rn FROM g, (SELECT @a:=0) AS aa ) AS b ) AS c GROUP BY diff;
以上是关于mysql - 缺失范围和连续范围的主要内容,如果未能解决你的问题,请参考以下文章