将 n 列一起发生的事件转换为两列成对的事件

Posted

技术标签:

【中文标题】将 n 列一起发生的事件转换为两列成对的事件【英文标题】:Transform n columns of events occurring together into two columns of pairs of occurrences 【发布时间】:2021-08-06 09:14:35 【问题描述】:

我目前有一张这样的桌子:

A B C D n
1 1 0 0 50
0 0 1 0 100
0 1 1 1 200

其中第一行表示事件 A 和 B 一起发生了 50 次,第二行表示事件 C 本身发生了 100 次,依此类推。

实际上,我有大约 10 个事件和 10 个事件允许的尽可能多的组合,我想将表格转换为 3 列,其中上表的相关行如下所示

Event1 Event2 n
A B 50
C C 100
B C 200
B D 200
C D 200

我不知道从哪里解决这个问题。我想查看 SQL 分解函数,或旋转数据,但不知道如何处理这些数据。

【问题讨论】:

表中是否有主键,例如 id? @forpas 该表派生自具有主键的表 - 列标题(事件)都是主键。此表中没有主键,仅用于计数/分析目的 为什么省略了 A->A、B->B 和 D->D? @berihulel 正如我在帖子中提到的,我只在顶部包含了与表格相关的行。 A、B 或 D 都不是自己发生的。 @15150776 。 . .为什么在第二个示例中需要 C/C 而在第一个示例中不需要 A/A? 【参考方案1】:

有很多方法可以做到这一点。只需将UNION 术语添加到data2 CTE 术语,就可以扩展以处理更多列。我敢肯定有办法缩短它,但这是比较清楚的。

我在这个测试用例中使用了 mysql 8.0.26。

注意:MariaDB 10.5.0 在这方面存在一些错误。如果您碰巧尝试了 MariaDB,可能需要进行一些编辑来解决这些错误。没什么可怕的。只是烦人。

data0 提供您的初始表格。 data1 只是强制对该数据进行排序,然后为每行分配一个唯一的 ID data2 通过UNION 部分规范化数据。添加术语以处理更多列。 data 在每个 id 中添加一个 COUNT (cnt) 以标识没有配对的事件。 tcross 生成对的部分笛卡尔积,防止反射等。
WITH data0 (a, b, c, d, n) AS (
         SELECT 1, 1, 0, 0,  50 UNION
         SELECT 0, 0, 1, 0, 100 UNION
         SELECT 0, 1, 1, 1, 200
     )
   , data1 (a, b, c, d, n, id) AS (
         SELECT t.*, ROW_NUMBER() OVER (ORDER BY n) FROM data0 AS t
     )
   , data2 (event, n, id) AS (
         SELECT 'A', n, id FROM data1 WHERE a = 1 UNION
         SELECT 'B', n, id FROM data1 WHERE b = 1 UNION
         SELECT 'C', n, id FROM data1 WHERE c = 1 UNION
         SELECT 'D', n, id FROM data1 WHERE d = 1
     )
   , data (event, n, id, cnt) AS (
         SELECT t.*, COUNT(*) OVER (PARTITION BY id) FROM data2 AS t
     )
   , tcross (event1, event2, n, id) AS (
         SELECT t1.event, COALESCE(t2.event, t1.event), t1.n, t1.id
           FROM      data AS t1
           LEFT JOIN data AS t2
             ON t1.id = t2.id
            AND t1.event < t2.event
          WHERE t2.event IS NOT NULL OR t1.cnt = 1
     )
SELECT event1, event2, n
  FROM tcross
 ORDER BY id, event1, event2
;

结果:

+--------+--------+-----+
| event1 | event2 | n   |
+--------+--------+-----+
| A      | B      |  50 |
| C      | C      | 100 |
| B      | C      | 200 |
| B      | D      | 200 |
| C      | D      | 200 |
+--------+--------+-----+

Fully working test case

【讨论】:

【参考方案2】:

您可以像这样对union all 使用蛮力方法:

select 'A' as event1, 'B' as event2, n
from t
where a = 1 and b = 1
union all
select 'A' as event1, 'C' as event2, n
from t
where a = 1 and c = 1
union all
select 'A' as event1, 'D' as event2, n
from t
where a = 1 and d = 1
union all
select 'B' as event1, 'C' as event2, n
from t
where b = 1 and c = 1
union all
select 'B' as event1, 'D' as event2, n
from t
where b = 1 and d = 1
union all
select 'C' as event1, 'D' as event2, n
from t
where c = 1 and d = 1;

您也可以使用join 方法:

select x.event1, x.event2, t.n
from t join
     (select 1 as a, 1 as b, 0 as c, 0 as d, 'A' as event1, 'B' as event2 union all
      select 1 as a, 0 as b, 1 as c, 0 as d, 'A' as event1, 'C' as event2 union all
      select 1 as a, 0 as b, 0 as c, 1 as d, 'A' as event1, 'D' as event2 union all
      select 0 as a, 1 as b, 1 as c, 0 as d, 'B' as event1, 'C' as event2 union all
      select 0 as a, 1 as b, 0 as c, 1 as d, 'B' as event1, 'D' as event2 union all
      select 0 as a, 0 as b, 1 as c, 1 as d, 'C' as event1, 'D' as event2
     ) x
     on (x.a = t.a or x.a = 0) and
        (x.b = t.b or x.b = 0) and
        (x.c = t.c or x.c = 0) and
        (x.d = t.d or x.d = 0);

  

【讨论】:

这不会返回预期结果的第二行。

以上是关于将 n 列一起发生的事件转换为两列成对的事件的主要内容,如果未能解决你的问题,请参考以下文章

将字典转换为两列熊猫数据框[重复]

如何将单个表格行转换为两列?

Magento:将产品页面中的产品选项显示为两列中的列表元素

将单行及其标题转换为两列[重复]

使用 r 将一列拆分为两列 [重复]

如何将包含日期时间的 DataFrame 列拆分为两列:一列包含日期,另一列包含一天中的时间?