读SQL进阶教程笔记08_处理数列

Posted 躺柒

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了读SQL进阶教程笔记08_处理数列相关的知识,希望对你有一定的参考价值。

1. 处理有序集合也并非SQL的直接用途

1.1. SQL语言在处理数据时默认地都不考虑顺序

2. 处理数据的方法有两种

2.1. 第一种是把数据看成忽略了顺序的集合

2.2. 第二种是把数据看成有序的集合

2.2.1. 首先用自连接生成起点和终点的组合

2.2.2. 其次在子查询中描述内部的各个元素之间必须满足的关系

2.2.2.1. 要在SQL中表达全称量化时,需要将全称量化命题转换成存在量化命题的否定形式,并使用NOT EXISTS谓词

3. 生成连续编号

3.1. 序列对象(sequence object)

3.1.1. CONNECT BY(Oracle)

3.1.2. WITH子句(DB2、SQL Server)

3.1.3. 依赖数据库实现的方法

3.2. 示例

3.2.1.

3.2.1.1. --求连续编号(1):求0~99的数

    SELECT D1.digit + (D2.digit * 10)  AS seq
      FROM Digits D1 CROSS JOIN Digits D2
     ORDER BY seq;

3.2.1.2. --求连续编号(2):求1~542的数

    SELECT D1.digit + (D2.digit * 10) + (D3.digit * 100) AS seq
      FROM Digits D1 CROSS JOIN Digits D2
            CROSS JOIN Digits D3
     WHERE D1.digit + (D2.digit * 10)
                    + (D3.digit * 100) BETWEEN 1 AND 542
     ORDER BY seq;

3.2.1.3. --生成序列视图(包含0~999)

    CREATE VIEW Sequence (seq)
    AS SELECT D1.digit + (D2.digit * 10) + (D3.digit * 100)
        FROM Digits D1 CROSS JOIN Digits D2
                CROSS JOIN Digits D3;
3.2.1.3.1. --从序列视图中获取1~100
    SELECT seq
      FROM Sequence
     WHERE seq BETWEEN 1 AND 100
     ORDER BY seq;

3.3. 冯·诺依曼的方法使用递归集合定义自然数,先定义0然后得到1,定义1然后得到2,是有先后顺序的

3.3.1. 适用于解决位次、累计值等与顺序相关的问题

3.4. 这里的解法完全丢掉了顺序这一概念,仅把数看成是数字的组合。这种解法更能体现出SQL语言的特色

4. 求全部的缺失编号

4.1. 示例

4.1.1. --EXCEPT版

    SELECT seq
      FROM Sequence
     WHERE seq BETWEEN 1 AND 12
    EXCEPT
    SELECT seq FROM SeqTbl;

4.1.1.1. --NOT IN版

    SELECT seq
      FROM Sequence
     WHERE seq BETWEEN 1 AND 12
      AND seq NOT IN (SELECT seq FROM SeqTbl);

4.1.2. --动态地指定连续编号范围的SQL语句

    SELECT seq
      FROM Sequence
     WHERE seq BETWEEN (SELECT MIN(seq) FROM SeqTbl)
                  AND (SELECT MAX(seq) FROM SeqTbl)
    EXCEPT
    SELECT seq FROM SeqTbl;

4.1.2.1. 查询上限和下限未必固定的表时非常方便

4.1.2.2. 两个自查询没有相关性,而且只会执行一次

4.1.2.3. 如果在“seq”列上建立索引,那么极值函数的运行可以变得更快速

5. 座位预订

5.1. 三个人能坐得下吗

5.1.1.

5.1.1.1. --找出需要的空位(1):不考虑座位的换排

    SELECT S1.seat   AS start_seat, \'~\', S2.seat AS end_seat
      FROM Seats S1, Seats S2
     WHERE S2.seat = S1.seat + (:head_cnt -1)  --决定起点和终点
      AND NOT EXISTS
              (SELECT *
                FROM Seats S3
                WHERE S3.seat BETWEEN S1.seat AND S2.seat
                  AND S3.status <>’未预订’);
5.1.1.1.1. “:head_cnt”是表示需要的空位个数的参数
5.1.1.1.2. 如果不减1,会多取一个座位

5.1.1.2. 第一步:通过自连接生成起点和终点的组合

5.1.1.2.1. S2.seat = S1.seat + (:head_cnt-1)的部分
5.1.1.2.2. 排除掉了像1~8、2~3这样长度不是3的组合

5.1.1.3. 第二步:描述起点到终点之间所有的点需要满足的条件

5.1.1.3.1. 序列内的点需要满足的条件“所有座位的状态都是‘未预订’”

5.1.1.4. --找出需要的空位(2):考虑座位的换排

    SELECT S1.seat   AS start_seat, \'~\', S2.seat AS end_seat
      FROM Seats2 S1, Seats2 S2
     WHERE S2.seat = S1.seat + (:head_cnt -1)  --决定起点和终点
      AND NOT EXISTS
              (SELECT *
                FROM Seats2 S3
                WHERE S3.seat BETWEEN S1.seat AND S2.seat
                  AND (    S3.status <>’未预订’
                        OR S3.row_id <> S1.row_id));
5.1.1.4.1. 所有座位的状态都是‘未预订’,且行编号相同

5.2. 最多能坐下多少人

5.2.1.

5.2.1.1. 条件1:起点到终点之间的所有座位状态都是“未预订”

5.2.1.2. 条件2:起点之前的座位状态不是“未预订”

5.2.1.3. 条件3:终点之后的座位状态不是“未预订”

5.2.2. --第一阶段:生成存储了所有序列的视图

    CREATE VIEW Sequences (start_seat, end_seat, seat_cnt) AS
    SELECT S1.seat  AS start_seat,
          S2.seat  AS end_seat,
          S2.seat - S1.seat + 1 AS seat_cnt
      FROM Seats3 S1, Seats3 S2
     WHERE S1.seat <= S2.seat  --第一步:生成起点和终点的组合
        AND NOT EXISTS    --第二步:描述序列内所有点需要满足的条件
            (SELECT *
              FROM Seats3 S3
              WHERE (     S3.seat BETWEEN S1.seat AND S2.seat
                      AND S3.status <>’未预订’)  --条件1的否定
                OR  (S3.seat = S2.seat + 1 AND S3.status =’未预订’)
                                                        --条件2的否定
                OR  (S3.seat = S1.seat -1 AND S3.status =’未预订’));
                                                        --条件3的否定

5.2.2.1. --第二阶段:求最长的序列

    SELECT start_seat, \'~\', end_seat, seat_cnt
      FROM Sequences
     WHERE seat_cnt = (SELECT MAX(seat_cnt) FROM Sequences);

6. 单调递增和单调递减

6.1. 示例

6.1.1.

6.1.2. --生成起点和终点的组合的SQL语句

    SELECT S1.deal_date  AS start_date,
          S2.deal_date  AS end_date
      FROM MyStock S1, MyStock S2
     WHERE S1.deal_date < S2.deal_date;

6.1.2.1. --求单调递增的区间的SQL语句:子集也输出

    SELECT S1.deal_date   AS start_date,
          S2.deal_date   AS end_date
      FROM MyStock S1, MyStock S2
     WHERE S1.deal_date < S2.deal_date  --第一步:生成起点和终点的组合
      AND  NOT EXISTS
              ( SELECT *  --第二步:描述区间内所有日期需要满足的条件
                  FROM MyStock S3, MyStock S4
                  WHERE S3.deal_date BETWEEN S1.deal_date AND S2.deal_date
                  AND S4.deal_date BETWEEN S1.deal_date AND S2.deal_date
                    AND S3.deal_date < S4.deal_date
                    AND S3.price >= S4.price);
6.1.2.1.1. --排除掉子集,只取最长的时间区间
    SELECT MIN(start_date) AS start_date,      --最大限度地向前延伸起点
          end_date
      FROM  (SELECT S1.deal_date AS start_date,
                    MAX(S2.deal_date) AS end_date  --最大限度地向后延伸终点
              FROM MyStock S1, MyStock S2
              WHERE S1.deal_date < S2.deal_date
                AND NOT EXISTS
                (SELECT *
                    FROM MyStock S3, MyStock S4
                  WHERE S3.deal_date BETWEEN S1.deal_date AND S2.deal_date
                    AND S4.deal_date BETWEEN S1.deal_date AND S2.deal_date
                    AND S3.deal_date < S4.deal_date
                    AND S3.price >= S4.price)
            GROUP BY S1.deal_date) TMP
    GROUP BY end_date;

读SQL进阶教程笔记05_关联子查询

1. 关联子查询

1.1. 关联子查询和自连接在很多时候都是等价的

1.2. 使用SQL进行行间比较时,发挥主要作用的技术是关联子查询,特别是与自连接相结合的“自关联子查询”

1.3. 缺点

  • 1.3.1. 代码的可读性不好

    • 1.3.1.1. 特别是在计算累计值和移动平均值的例题里,与聚合一起使用后,其内部处理过程非常难理解
  • 1.3.2. 性能不好

    • 1.3.2.1. 特别是在SELECT子句里使用标量子查询时,性能可能会变差

2. 增长、减少、维持现状

2.1. 使用基于时间序列的表进行时间序列分析

2.2. 示例

  • 2.2.1. --求与上一年营业额一样的年份(1):使用关联子查询
   SELECT year, sale
     FROM Sales S1
    WHERE sale = (SELECT sale
                   FROM Sales S2
                   WHERE S2.year = S1.year -1)
    ORDER BY year;
  • 2.2.2. S2.year = S1.year -1这个条件起到了将要比较的数据偏移一行的作用

  • 2.2.3. --求与上一年营业额一样的年份(2):使用自连接

   SELECT S1.year, S1.sale
     FROM Sales S1,
         Sales S2
    WHERE S2.sale = S1.sale
     AND S2.year = S1.year -1
    ORDER BY year;

3. 用列表展示与上一年的比较结果

3.1. 示例

  • 3.1.1. --求出是增长了还是减少了,抑或是维持现状(1):使用关联子查询
   SELECT S1.year, S1.sale,
         CASE WHEN sale =
               (SELECT sale
                   FROM Sales S2
                 WHERE S2.year = S1.year -1) THEN\'→\'--持平
               WHEN sale >
               (SELECT sale
                   FROM Sales S2
                 WHERE S2.year = S1.year -1) THEN\'↑\'--增长
               WHEN sale <
               (SELECT sale
                   FROM Sales S2
                 WHERE S2.year = S1.year -1) THEN\'↓\'--减少
         ELSE\'—\'END AS var
     FROM Sales S1
    ORDER BY year;
  • 3.1.2. --求出是增长了还是减少了,抑或是维持现状(2):使用自连接查询(最早的年份不会出现在结果里)
   SELECT S1.year, S1.sale,
         CASE WHEN S1.sale = S2.sale THEN\'→\'
               WHEN S1.sale > S2.sale THEN\'↑\'
               WHEN S1.sale < S2.sale THEN\'↓\'
         ELSE\'—\'END AS var
     FROM Sales S1, Sales S2
    WHERE S2.year = S1.year -1
    ORDER BY year;

4. 时间轴有间断时

4.1. 和过去最临近的时间进行比较

4.2. 示例

  • 4.2.1. --查询与过去最临近的年份营业额相同的年份
   SELECT year, sale
     FROM Sales2 S1
    WHERE sale =
     (SELECT sale
         FROM Sales2 S2
       WHERE S2.year =
         (SELECT MAX(year)  --条件2:在满足条件1的年份中,年份最早的一个
             FROM Sales2 S3
           WHERE S1.year > S3.year))  --条件1:与该年份相比是过去的年份
    ORDER BY year;
  • 4.2.2.  自连接版本
SELECT S1.year AS year,

         S1.year AS year
     FROM Sales2 S1, Sales2 S2
    WHERE S1.sale = S2.sale
     AND S2.year = (SELECT MAX(year)
                       FROM Sales2 S3
                     WHERE S1.year > S3.year)
    ORDER BY year;
  • 4.2.3. --求每一年与过去最临近的年份之间的营业额之差(1):结果里不包含最早的年份
   SELECT S2.year AS pre_year,
         S1.year AS now_year,
         S2.sale AS pre_sale,
         S1.sale AS now_sale,
         S1.sale - S2.sale  AS diff
     FROM Sales2 S1, Sales2 S2
    WHERE S2.year = (SELECT MAX(year)
                       FROM Sales2 S3
                     WHERE S1.year > S3.year)
    ORDER BY now_year;
  • 4.2.4. --求每一年与过去最临近的年份之间的营业额之差(1):结果里不包含最早的年份
   SELECT S2.year AS pre_year,
         S1.year AS now_year,
         S2.sale AS pre_sale,
         S1.sale AS now_sale,
         S1.sale - S2.sale  AS diff
     FROM Sales2 S1, Sales2 S2
    WHERE S2.year = (SELECT MAX(year)
                       FROM Sales2 S3
                     WHERE S1.year > S3.year)
    ORDER BY now_year;
  • 4.2.5. 使用极值函数时会发生排序

5. 移动累计值和移动平均值

5.1. 示例

  • 5.1.1. --求累计值:使用窗口函数
   SELECT prc_date, prc_amt,
         SUM(prc_amt) OVER (ORDER BY prc_date) AS onhand_amt
     FROM Accounts;
  • 5.1.2. 引入窗口函数的目的原本就是解决这类问题,因此这里的代码非常简洁

    • 5.1.2.1. 如果选用的数据库支持窗口函数,也可以考虑使用窗口函数
  • 5.1.3. 从性能方面来看,表的扫描和数据排序也都只进行了一次

    • 5.1.3.1. 依赖于具体的数据库的
  • 5.1.4. --求累计值:使用冯·诺依曼型递归集合

   SELECT prc_date, A1.prc_amt,
         (SELECT SUM(prc_amt)
           FROM Accounts A2
           WHERE A1.prc_date >= A2.prc_date ) AS onhand_amt
     FROM Accounts A1
    ORDER BY prc_date;
  • 5.1.5. --求移动累计值(1):使用窗口函数
   SELECT prc_date, prc_amt,
         SUM(prc_amt) OVER (ORDER BY prc_date
                           ROWS 2 PRECEDING) AS onhand_amt
     FROM Accounts;
  • 5.1.6. --求移动累计值(2):不满3行的时间区间也输出
   SELECT prc_date, A1.prc_amt,
         (SELECT SUM(prc_amt)
           FROM Accounts A2
           WHERE A1.prc_date >= A2.prc_date
             AND (SELECT COUNT(*)
                   FROM Accounts A3
                   WHERE A3.prc_date
                     BETWEEN A2.prc_date AND A1.prc_date  ) <= 3 )
                 AS mvg_sum
     FROM Accounts A1
    ORDER BY prc_date;
  • 5.1.7. A3.prc_date在以A2.prc_date为起点,以A1.prc_date为终点的区间内移动

  • 5.1.8. --移动累计值(3):不满3行的区间按无效处理

   SELECT prc_date, A1.prc_amt,
    (SELECT SUM(prc_amt)
       FROM Accounts A2
     WHERE A1.prc_date >= A2.prc_date
       AND (SELECT COUNT(*)
               FROM Accounts A3
             WHERE A3.prc_date
               BETWEEN A2.prc_date AND A1.prc_date  ) <= 3
     HAVING  COUNT(*) =3) AS mvg_sum  --不满3行数据的不显示
     FROM Accounts A1
    ORDER BY prc_date;

5.2. 基本思路是使用冯·诺依曼型递归集合

6. 查询重叠的时间区间

6.1. 示例

  • 6.1.1. --求重叠的住宿期间
   SELECT reserver, start_date, end_date
     FROM Reservations R1
    WHERE EXISTS
         (SELECT *
               FROM Reservations R2
              WHERE R1.reserver <> R2.reserver  --与自己以外的客人进行比较
                AND ( R1.start_date BETWEEN R2.start_date AND R2.end_date
                                   --条件(1):自己的入住日期在他人的住宿期间内
                   OR R1.end_date  BETWEEN R2.start_date AND R2.end_date));
                                   --条件(2):自己的离店日期在他人的住宿期间内
  • 6.1.2. --升级版:把完全包含别人的住宿期间的情况也输出
   SELECT reserver, start_date, end_date
    FROM Reservations R1
   WHERE EXISTS
         (SELECT *
             FROM Reservations R2
           WHERE R1.reserver <> R2.reserver
             AND (  (     R1.start_date BETWEEN R2.start_date
                                           AND R2.end_date
                       OR R1.end_date   BETWEEN R2.start_date
                                           AND R2.end_date)
                   OR (    R2.start_date BETWEEN R1.start_date
                                           AND R1.end_date
                       AND R2.end_date   BETWEEN R1.start_date
                                           AND R1.end_date)));

以上是关于读SQL进阶教程笔记08_处理数列的主要内容,如果未能解决你的问题,请参考以下文章

读SQL进阶教程笔记09_HAVING上

读SQL进阶教程笔记05_关联子查询

读SQL进阶教程笔记06_外连接

读SQL进阶教程笔记04_集合运算

读SQL进阶教程笔记14_SQL编程要点

读SQL进阶教程笔记10_HAVING下