如果条件不匹配 SQL,则基于列生成更多行
Posted
技术标签:
【中文标题】如果条件不匹配 SQL,则基于列生成更多行【英文标题】:Genrating more rows based on a colum, if criteria not matched SQL 【发布时间】:2019-09-20 15:20:27 【问题描述】:我有下表,这些表包含有关文件数和其他一些字段的信息。这是一个查找表或参考表。
我使用的oracle版本是Oracle Database 18c Enterprise E
id nm expected_num_files file_name_expr
1 CVS 3 cvs_d.*.zip
2 CVS 2 cvs_w.*.gz
3 Rite-aid 4 ra_d.*.gz
5 Walgreen 2 wal_d*.txt
我有一个审核表,其中包含收到的文件信息。这两个表都可以在 id 上连接
audit_id id file_nm
123 1 cvs_d1.zip
124 1 cvs_d2.zip
125 2 cvs_w1.gz
126 1 cvs_d3.zip
理想的情况是接收到所有文件。
理想结果 select id , count(*) from auditlog group by id
id count_files
1 3
2 2
3 4
5 2
审计表的当前结果 但在目前的情况下,我只收到了一些文件
id count_files
1 3
2 1
为了达到理想情况,我需要用空的 auditid 从查找表中填充最终表中的虚拟记录
我需要一个最终的输出表应该是这样的。
如果我执行查询select id , count(*) from auditlog group by id
在决赛桌上,我会得到上面突出显示的理想结果
audit_id id file_nm
123 1 cvs_d1.zip
124 1 cvs_d2.zip
126 1 cvs_d3.zip
-1 2 cvs_w.*.gz
125 2 cvs_w1.gz
-1 3 ra_d*.gz
-1 3 ra_d.*.gz
-1 3 ra_d.*.gz
-1 3 ra_d.*.gz
-1 5 wal_d*.txt
-1 5 wal_d*.txt
我们可以很容易地生成初始行,但是带有-1的行是根据列中未收到的文件数(未发送的文件数)生成的
解释最终表:因为我们在审计表中有 3 条记录为 id 1,所以我们在最终表中填充了它们,但是对于 id 2,我们在审计表中有一条记录,我们填充了该记录,而对于其他记录,我们填充了 - 1.
【问题讨论】:
file_name_expr 与 file_nm 不匹配。 我读了 3 遍,但我还是不明白你想做什么。请详细说明一下? @Sens,我们可以聊聊吗? @Achyuth 我现在只在应用程序上,我不知道它是否在那里工作。但是请 @Sens 你能查到问题知道吗 【参考方案1】:至于您提供的数据,您可以按原样保留表格,并根据需要创建提供数据的视图:
WITH /* The following is the lookup data you provided: */
lookup(id, nm, expected_num_files, file_name_expr) AS
(SELECT 1, 'CVS', 3,'cvs_d.*.zip' from dual union all
SELECT 2, 'CVS', 2,'cvs_w.*.gz' from dual union all
SELECT 3, 'Rite-aid',4,'ra_d.*.gz' from dual union all
SELECT 5, 'Walgreen',2,'wal_d*.txt' from dual)
, /* This is the current auditlog as you described: */
auditlog(audit_id, id, file_nm) AS
(select 123, 1, 'cvs_d1.zip' from dual union all
select 124, 1, 'cvs_d2.zip' from dual union all
select 125, 2, 'cvs_w1.gz' from dual union all
select 126, 1, 'cvs_d3.zip' from dual)
, rn AS (SELECT LEVEL rn FROM dual
CONNECT BY LEVEL < (SELECT MAX(expected_num_files) FROM lookup))
/* This is the select you can put into a view: */
SELECT NVL(a.audit_id, -1) AS audit_id
, NVL(a.id,l.id) AS id
, NVL(a.file_nm, l.file_name_expr) AS file_nm
FROM lookup l
/* Create a Row for every expected file: */
JOIN rn r
ON r.rn <= l.expected_num_files
FULL JOIN (SELECT a.*
, row_number() over(PARTITION BY id ORDER BY audit_id) AS rn
FROM auditlog a) a
ON a.id = l.id
AND a.rn = r.rn
ORDER BY 2,1
结果:
AUDIT_ID | ID | FILE_NM
---------+----+-----------
123 | 1 | cvs_d1.zip
124 | 1 | cvs_d2.zip
126 | 1 | cvs_d3.zip
-1 | 2 | cvs_w.*.gz
125 | 2 | cvs_w1.gz
-1 | 3 | ra_d.*.gz
-1 | 3 | ra_d.*.gz
-1 | 3 | ra_d.*.gz
-1 | 3 | ra_d.*.gz
-1 | 5 | wal_d*.txt
-1 | 5 | wal_d*.txt
另一种编写查询的方法如下:
WITH lookup(id, nm, expected_num_files, file_name_expr) AS
(SELECT 1, 'CVS', 3,'cvs_d.*.zip' from dual union all
SELECT 2, 'CVS', 2,'cvs_w.*.gz' from dual union all
SELECT 3, 'Rite-aid',4,'ra_d.*.gz' from dual union all
SELECT 5, 'Walgreen',2,'wal_d*.txt' from dual)
, auditlog(audit_id, id, file_nm) AS
(select 123, 1, 'cvs_d1.zip' from dual union all
select 124, 1, 'cvs_d2.zip' from dual union all
select 125, 2, 'cvs_w1.gz' from dual union all
select 126, 1, 'cvs_d3.zip' from dual)
, rn AS (SELECT LEVEL rn FROM dual
CONNECT BY LEVEL < (SELECT MAX(expected_num_files) FROM lookup))
SELECT * FROM auditlog
UNION ALL
SELECT -1, l.id, l.file_name_expr
FROM lookup l
JOIN rn r
ON r.rn <= l.expected_num_files - NVL((SELECT COUNT(*) FROM auditlog WHERE id = l.id),0)
ORDER BY 2,1
【讨论】:
除了交叉连接还有其他选择吗?因为我有数百万条记录,所以会影响性能 我已经使用连接而不是交叉连接稍微更改了查询。不确定它是否更有效。【参考方案2】:除了交叉连接还有其他选择吗?因为我有数百万条记录,所以会影响性能
这总是很难判断。 CROSS APPLY
可能会更快。至少,它节省了生成大量最终被丢弃的行的工作。可能值得一试。
SELECT coalesce(al.audit_id,-1) audit_id,
s.id,
coalesce(al.file_nm, s.file_name_expr) file_nm
FROM audit_summary s
CROSS APPLY ( SELECT rownum rn FROM dual CONNECT BY rownum <= s.expected_num_files ) ef
LEFT JOIN LATERAL ( SELECT row_number() over ( partition by al.id ORDER BY al.audit_id ) rn,
al.file_nm,
al.audit_id,
al.id
FROM audit_log al
WHERE al.id = s.id) al ON al.rn = ef.rn
ORDER BY 2,3,1;
这是一个完整的数据示例:
WITH audit_summary (id, nm, expected_num_files, file_name_expr ) AS
( SELECT 1, 'CVS', 3, 'cvs_d.*.zip' FROM DUAL UNION ALL
SELECT 2, 'CVS', 2, 'cvs_w.*.gz' FROM DUAL UNION ALL
SELECT 3, 'Rite-aid', 4, 'ra_d.*.gz' FROM DUAL UNION ALL
SELECT 4, 'Walgreen', 2, 'wal_d*.txt' FROM DUAL),
audit_log (audit_id, id, file_nm) AS
( SELECT 123, 1, 'cvs_d1.zip' FROM DUAL UNION ALL
SELECT 124, 1, 'cvs_d2.zip' FROM DUAL UNION ALL
SELECT 125, 2, 'cvs_w1.gz' FROM DUAL UNION ALL
SELECT 126, 1, 'cvs_d3.zip' FROM DUAL )
SELECT coalesce(al.audit_id,-1) audit_id,
s.id,
coalesce(al.file_nm, s.file_name_expr) file_nm
FROM audit_summary s
CROSS APPLY ( SELECT rownum rn FROM dual CONNECT BY rownum <= s.expected_num_files ) ef
LEFT JOIN LATERAL ( SELECT row_number() over ( partition by al.id ORDER BY al.audit_id ) rn,
al.file_nm,
al.audit_id,
al.id
FROM audit_log al
WHERE al.id = s.id) al ON al.rn = ef.rn
ORDER BY 2,3,1;
+----------+----+------------+ | AUDIT_ID | ID | FILE_NM | +----------+----+------------+ | 123 | 1 | cvs_d1.zip | | 124 | 1 | cvs_d2.zip | | 126 | 1 | cvs_d3.zip | | -1 | 2 | cvs_w.*.gz | | 125 | 2 | cvs_w1.gz | | -1 | 3 | ra_d.*.gz | | -1 | 3 | ra_d.*.gz | | -1 | 3 | ra_d.*.gz | | -1 | 3 | ra_d.*.gz | | -1 | 4 | wal_d*.txt | | -1 | 4 | wal_d*.txt | +----------+----+------------+
我对@987654325@ 是否有用存在分歧。这与简单的LEFT JOIN
相同,其中al.id = s.id
条件从侧视图移动到连接条件。我对 LEFT JOIN LATERAL
制作它有一个模糊的想法,因此如果您怀疑缺少特定文件,您可以逐个运行查询(通过 audit_summary.id
)。
【讨论】:
以上是关于如果条件不匹配 SQL,则基于列生成更多行的主要内容,如果未能解决你的问题,请参考以下文章
pandas编写自定义条件函数基于if函数elif函数else函数使用apply基于条件对数据行进行判断生成新的数据列(apply function with condition)