将 n 列一起发生的事件转换为两列成对的事件
Posted
技术标签:
【中文标题】将 n 列一起发生的事件转换为两列成对的事件【英文标题】:Transform n columns of events occurring together into two columns of pairs of occurrences 【发布时间】:2021-08-06 09:14:35 【问题描述】:我目前有一张这样的桌子:
A | B | C | D | n |
---|---|---|---|---|
1 | 1 | 0 | 0 | 50 |
0 | 0 | 1 | 0 | 100 |
0 | 1 | 1 | 1 | 200 |
其中第一行表示事件 A 和 B 一起发生了 50 次,第二行表示事件 C 本身发生了 100 次,依此类推。
实际上,我有大约 10 个事件和 10 个事件允许的尽可能多的组合,我想将表格转换为 3 列,其中上表的相关行如下所示
Event1 | Event2 | n |
---|---|---|
A | B | 50 |
C | C | 100 |
B | C | 200 |
B | D | 200 |
C | D | 200 |
我不知道从哪里解决这个问题。我想查看 SQL 分解函数,或旋转数据,但不知道如何处理这些数据。
【问题讨论】:
表中是否有主键,例如 id? @forpas 该表派生自具有主键的表 - 列标题(事件)都是主键。此表中没有主键,仅用于计数/分析目的 为什么省略了 A->A、B->B 和 D->D? @berihulel 正如我在帖子中提到的,我只在顶部包含了与表格相关的行。 A、B 或 D 都不是自己发生的。 @15150776 。 . .为什么在第二个示例中需要 C/C 而在第一个示例中不需要 A/A? 【参考方案1】:有很多方法可以做到这一点。只需将UNION
术语添加到data2
CTE 术语,就可以扩展以处理更多列。我敢肯定有办法缩短它,但这是比较清楚的。
我在这个测试用例中使用了 mysql 8.0.26。
注意:MariaDB 10.5.0 在这方面存在一些错误。如果您碰巧尝试了 MariaDB,可能需要进行一些编辑来解决这些错误。没什么可怕的。只是烦人。
data0
提供您的初始表格。
data1
只是强制对该数据进行排序,然后为每行分配一个唯一的 ID
data2
通过UNION
部分规范化数据。添加术语以处理更多列。
data
在每个 id
中添加一个 COUNT
(cnt
) 以标识没有配对的事件。
tcross
生成对的部分笛卡尔积,防止反射等。
WITH data0 (a, b, c, d, n) AS (
SELECT 1, 1, 0, 0, 50 UNION
SELECT 0, 0, 1, 0, 100 UNION
SELECT 0, 1, 1, 1, 200
)
, data1 (a, b, c, d, n, id) AS (
SELECT t.*, ROW_NUMBER() OVER (ORDER BY n) FROM data0 AS t
)
, data2 (event, n, id) AS (
SELECT 'A', n, id FROM data1 WHERE a = 1 UNION
SELECT 'B', n, id FROM data1 WHERE b = 1 UNION
SELECT 'C', n, id FROM data1 WHERE c = 1 UNION
SELECT 'D', n, id FROM data1 WHERE d = 1
)
, data (event, n, id, cnt) AS (
SELECT t.*, COUNT(*) OVER (PARTITION BY id) FROM data2 AS t
)
, tcross (event1, event2, n, id) AS (
SELECT t1.event, COALESCE(t2.event, t1.event), t1.n, t1.id
FROM data AS t1
LEFT JOIN data AS t2
ON t1.id = t2.id
AND t1.event < t2.event
WHERE t2.event IS NOT NULL OR t1.cnt = 1
)
SELECT event1, event2, n
FROM tcross
ORDER BY id, event1, event2
;
结果:
+--------+--------+-----+
| event1 | event2 | n |
+--------+--------+-----+
| A | B | 50 |
| C | C | 100 |
| B | C | 200 |
| B | D | 200 |
| C | D | 200 |
+--------+--------+-----+
Fully working test case
【讨论】:
【参考方案2】:您可以像这样对union all
使用蛮力方法:
select 'A' as event1, 'B' as event2, n
from t
where a = 1 and b = 1
union all
select 'A' as event1, 'C' as event2, n
from t
where a = 1 and c = 1
union all
select 'A' as event1, 'D' as event2, n
from t
where a = 1 and d = 1
union all
select 'B' as event1, 'C' as event2, n
from t
where b = 1 and c = 1
union all
select 'B' as event1, 'D' as event2, n
from t
where b = 1 and d = 1
union all
select 'C' as event1, 'D' as event2, n
from t
where c = 1 and d = 1;
您也可以使用join
方法:
select x.event1, x.event2, t.n
from t join
(select 1 as a, 1 as b, 0 as c, 0 as d, 'A' as event1, 'B' as event2 union all
select 1 as a, 0 as b, 1 as c, 0 as d, 'A' as event1, 'C' as event2 union all
select 1 as a, 0 as b, 0 as c, 1 as d, 'A' as event1, 'D' as event2 union all
select 0 as a, 1 as b, 1 as c, 0 as d, 'B' as event1, 'C' as event2 union all
select 0 as a, 1 as b, 0 as c, 1 as d, 'B' as event1, 'D' as event2 union all
select 0 as a, 0 as b, 1 as c, 1 as d, 'C' as event1, 'D' as event2
) x
on (x.a = t.a or x.a = 0) and
(x.b = t.b or x.b = 0) and
(x.c = t.c or x.c = 0) and
(x.d = t.d or x.d = 0);
【讨论】:
这不会返回预期结果的第二行。以上是关于将 n 列一起发生的事件转换为两列成对的事件的主要内容,如果未能解决你的问题,请参考以下文章