如果条件不匹配 SQL,则基于列生成更多行

Posted

技术标签:

【中文标题】如果条件不匹配 SQL,则基于列生成更多行【英文标题】:Genrating more rows based on a colum, if criteria not matched SQL 【发布时间】:2019-09-20 15:20:27 【问题描述】:

我有下表,这些表包含有关文件数和其他一些字段的信息。这是一个查找表或参考表。

我使用的oracle版本是Oracle Database 18c Enterprise E

id             nm    expected_num_files     file_name_expr
1             CVS             3               cvs_d.*.zip
2             CVS             2               cvs_w.*.gz
3             Rite-aid        4               ra_d.*.gz
5             Walgreen        2               wal_d*.txt

我有一个审核表,其中包含收到的文件信息。这两个表都可以在 id 上连接

audit_id    id  file_nm
123          1  cvs_d1.zip
124          1  cvs_d2.zip
125          2  cvs_w1.gz
126          1  cvs_d3.zip

理想的情况是接收到所有文件。

理想结果 select id , count(*) from auditlog group by id

   id              count_files 
    1                 3           
    2                 2           
    3                 4           
    5                 2 

审计表的当前结果 但在目前的情况下,我只收到了一些文件

   id              count_files
    1                 3           
    2                 1  

为了达到理想情况,我需要用空的 auditid 从查找表中填充最终表中的虚拟记录

我需要一个最终的输出表应该是这样的。

如果我执行查询select id , count(*) from auditlog group by id 在决赛桌上,我会得到上面突出显示的理想结果

audit_id  id  file_nm
    123      1  cvs_d1.zip
    124      1  cvs_d2.zip
    126      1  cvs_d3.zip
    -1       2  cvs_w.*.gz
    125      2  cvs_w1.gz
    -1       3  ra_d*.gz
    -1       3  ra_d.*.gz
    -1       3  ra_d.*.gz
    -1       3  ra_d.*.gz
    -1       5  wal_d*.txt
    -1       5  wal_d*.txt

我们可以很容易地生成初始行,但是带有-1的行是根据列中未收到的文件数(未发送的文件数)生成的

解释最终表:因为我们在审计表中有 3 条记录为 id 1,所以我们在最终表中填充了它们,但是对于 id 2,我们在审计表中有一条记录,我们填充了该记录,而对于其他记录,我们填充了 - 1.

【问题讨论】:

file_name_expr 与 file_nm 不匹配。 我读了 3 遍,但我还是不明白你想做什么。请详细说明一下? @Sens,我们可以聊聊吗? @Achyuth 我现在只在应用程序上,我不知道它是否在那里工作。但是请 @Sens 你能查到问题知道吗 【参考方案1】:

至于您提供的数据,您可以按原样保留表格,并根据需要创建提供数据的视图:

WITH /* The following is the lookup data you provided: */
     lookup(id, nm, expected_num_files, file_name_expr) AS 
    (SELECT 1, 'CVS',     3,'cvs_d.*.zip' from dual union all
     SELECT 2, 'CVS',     2,'cvs_w.*.gz'  from dual union all
     SELECT 3, 'Rite-aid',4,'ra_d.*.gz'   from dual union all
     SELECT 5, 'Walgreen',2,'wal_d*.txt'  from dual)
   , /* This is the current auditlog as you described: */
     auditlog(audit_id, id, file_nm) AS
    (select 123, 1, 'cvs_d1.zip' from dual union all
     select 124, 1, 'cvs_d2.zip' from dual union all
     select 125, 2, 'cvs_w1.gz'  from dual union all
     select 126, 1, 'cvs_d3.zip' from dual)
   , rn AS (SELECT LEVEL rn FROM dual 
        CONNECT BY LEVEL < (SELECT MAX(expected_num_files) FROM lookup))
/* This is the select you can put into a view: */
SELECT NVL(a.audit_id, -1) AS audit_id
     , NVL(a.id,l.id)      AS id
     , NVL(a.file_nm, l.file_name_expr) AS file_nm
  FROM lookup l
 /* Create a Row for every expected file: */
  JOIN rn r
    ON r.rn <= l.expected_num_files
 FULL JOIN (SELECT a.*
                 , row_number() over(PARTITION BY id ORDER BY audit_id) AS rn 
              FROM auditlog a) a
   ON a.id = l.id
   AND a.rn = r.rn
 ORDER BY 2,1

结果:

AUDIT_ID | ID | FILE_NM
---------+----+-----------
   123   |  1 | cvs_d1.zip
   124   |  1 | cvs_d2.zip
   126   |  1 | cvs_d3.zip
   -1    |  2 | cvs_w.*.gz
   125   |  2 | cvs_w1.gz
   -1    |  3 | ra_d.*.gz
   -1    |  3 | ra_d.*.gz
   -1    |  3 | ra_d.*.gz
   -1    |  3 | ra_d.*.gz
   -1    |  5 | wal_d*.txt
   -1    |  5 | wal_d*.txt

另一种编写查询的方法如下:

WITH lookup(id, nm, expected_num_files, file_name_expr) AS 
    (SELECT 1, 'CVS',     3,'cvs_d.*.zip' from dual union all
     SELECT 2, 'CVS',     2,'cvs_w.*.gz'  from dual union all
     SELECT 3, 'Rite-aid',4,'ra_d.*.gz'   from dual union all
     SELECT 5, 'Walgreen',2,'wal_d*.txt'  from dual)
   , auditlog(audit_id, id, file_nm) AS
    (select 123, 1, 'cvs_d1.zip' from dual union all
     select 124, 1, 'cvs_d2.zip' from dual union all
     select 125, 2, 'cvs_w1.gz'  from dual union all
     select 126, 1, 'cvs_d3.zip' from dual) 
   , rn AS (SELECT LEVEL rn FROM dual 
        CONNECT BY LEVEL < (SELECT MAX(expected_num_files) FROM lookup))
SELECT * FROM auditlog
UNION ALL 
SELECT -1, l.id, l.file_name_expr
  FROM lookup l
  JOIN rn r
    ON r.rn <= l.expected_num_files - NVL((SELECT COUNT(*) FROM auditlog WHERE id = l.id),0)
 ORDER BY 2,1

【讨论】:

除了交叉连接还有其他选择吗?因为我有数百万条记录,所以会影响性能 我已经使用连接而不是交叉连接稍微更改了查询。不确定它是否更有效。【参考方案2】:

除了交叉连接还有其他选择吗?因为我有数百万条记录,所以会影响性能

这总是很难判断。 CROSS APPLY 可能会更快。至少,它节省了生成大量最终被丢弃的行的工作。可能值得一试。

SELECT coalesce(al.audit_id,-1) audit_id, 
       s.id, 
       coalesce(al.file_nm, s.file_name_expr) file_nm
FROM audit_summary s 
CROSS APPLY ( SELECT rownum rn FROM dual CONNECT BY rownum <= s.expected_num_files ) ef
LEFT JOIN LATERAL ( SELECT row_number() over ( partition by al.id ORDER BY al.audit_id ) rn, 
                           al.file_nm, 
                           al.audit_id, 
                           al.id 
                    FROM   audit_log al 
                    WHERE  al.id = s.id) al ON al.rn = ef.rn
ORDER BY 2,3,1;

这是一个完整的数据示例:

WITH audit_summary (id, nm, expected_num_files, file_name_expr ) AS
( SELECT 1, 'CVS', 3, 'cvs_d.*.zip' FROM DUAL UNION ALL
  SELECT 2, 'CVS', 2, 'cvs_w.*.gz' FROM DUAL UNION ALL
  SELECT 3, 'Rite-aid', 4, 'ra_d.*.gz' FROM DUAL UNION ALL
  SELECT 4, 'Walgreen', 2, 'wal_d*.txt' FROM DUAL),
audit_log (audit_id, id, file_nm) AS 
( SELECT 123, 1, 'cvs_d1.zip' FROM DUAL UNION ALL
  SELECT 124, 1, 'cvs_d2.zip' FROM DUAL UNION ALL
  SELECT 125, 2, 'cvs_w1.gz' FROM DUAL UNION ALL
  SELECT 126, 1, 'cvs_d3.zip' FROM DUAL )
SELECT coalesce(al.audit_id,-1) audit_id, 
       s.id, 
       coalesce(al.file_nm, s.file_name_expr) file_nm
FROM audit_summary s 
CROSS APPLY ( SELECT rownum rn FROM dual CONNECT BY rownum <= s.expected_num_files ) ef
LEFT JOIN LATERAL ( SELECT row_number() over ( partition by al.id ORDER BY al.audit_id ) rn, 
                           al.file_nm, 
                           al.audit_id, 
                           al.id 
                    FROM   audit_log al 
                    WHERE  al.id = s.id) al ON al.rn = ef.rn
ORDER BY 2,3,1;
+----------+----+------------+
| AUDIT_ID | ID |  FILE_NM   |
+----------+----+------------+
|      123 |  1 | cvs_d1.zip |
|      124 |  1 | cvs_d2.zip |
|      126 |  1 | cvs_d3.zip |
|       -1 |  2 | cvs_w.*.gz |
|      125 |  2 | cvs_w1.gz  |
|       -1 |  3 | ra_d.*.gz  |
|       -1 |  3 | ra_d.*.gz  |
|       -1 |  3 | ra_d.*.gz  |
|       -1 |  3 | ra_d.*.gz  |
|       -1 |  4 | wal_d*.txt |
|       -1 |  4 | wal_d*.txt |
+----------+----+------------+

我对@9​​87654325@ 是否有用存在分歧。这与简单的LEFT JOIN 相同,其中al.id = s.id 条件从侧视图移动到连接条件。我对 LEFT JOIN LATERAL 制作它有一个模糊的想法,因此如果您怀疑缺少特定文件,您可以逐个运行查询(通过 audit_summary.id)。

【讨论】:

以上是关于如果条件不匹配 SQL,则基于列生成更多行的主要内容,如果未能解决你的问题,请参考以下文章

如何对 BigQuery 中的两个表进行条件连接?

Where 子句基于优先级

基于列值向sql中的列添加行值

pandas编写自定义条件函数基于if函数elif函数else函数使用apply基于条件对数据行进行判断生成新的数据列(apply function with condition)

如果值与在 r 中有条件删除的值匹配,则删除行

基于日期时间列名称的数据框的条件平均值