计算postgresql矩阵中的列组合
Posted
技术标签:
【中文标题】计算postgresql矩阵中的列组合【英文标题】:count combination of columns in postgresql matrix 【发布时间】:2019-07-13 21:01:04 【问题描述】:我在 postgres 中有一张表,如下所示
我想要一个 postgres 中的 sql,它计算 2 个具有 YY 的列的组合
期待像
这样的输出组合计数
AB 2
AC 1
AD 2
AZ 1
BC 1
BD 3
BZ 2
CD 2
CZ 0
DZ 1
谁能帮帮我?
【问题讨论】:
【参考方案1】:WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name AS combo
, (CASE WHEN t1.col_value = 'Y' AND t2.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
AND t1.col_name < t2.col_name) t3
GROUP BY combo
ORDER BY combo
产量
| combo | count |
|-------+-------|
| AB | 2 |
| AC | 1 |
| AD | 2 |
| AZ | 2 |
| BC | 1 |
| BD | 3 |
| BZ | 2 |
| CD | 2 |
| CZ | 0 |
| DZ | 1 |
取消透视表的unnest
ing 方法来自Stew's post, here。
要在您可以使用的 3 列中计算 YYY
的出现次数:
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name || t3.col_name AS combo
, (CASE WHEN t1.col_value = 'Y'
AND t2.col_value = 'Y'
AND t3.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
INNER JOIN stacked t3
ON t1.id = t3.id
AND t1.col_name < t2.col_name
And t2.col_name < t3.col_name
) t3
GROUP BY combo
ORDER BY combo
;
产生
| combo | count |
|-------+-------|
| ABC | 0 |
| ABD | 1 |
| ABZ | 2 |
| ACD | 1 |
| ACZ | 0 |
| ADZ | 1 |
| BCD | 1 |
| BCZ | 0 |
| BDZ | 1 |
| CDZ | 0 |
或者,要处理 N 列的组合,您可以使用 WITH RECURSIVE
:
例如,对于N = 3
,
WITH RECURSIVE result AS (
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t)
SELECT id, array[col_name] AS path, array[col_value] AS path_val, col_name AS last_name
FROM stacked
UNION
SELECT r.id, path || s.col_name, path_val || s.col_value, s.col_name
FROM result r
INNER JOIN stacked s
ON r.id = s.id
AND s.col_name > r.last_name
WHERE array_length(r.path, 1) < 3) -- Change 3 to your value for N
SELECT combo, sum(cnt)
FROM (
SELECT id, array_to_string(path, '') AS combo, (CASE WHEN 'Y' = all(path_val) THEN 1 ELSE 0 END) AS cnt
FROM result
WHERE array_length(path, 1) = 3) t -- Change 3 to your value for N
GROUP BY combo
ORDER BY combo
请注意,N = 3
在上述 SQL 中的两个位置使用。
【讨论】:
谢谢你,Ubuntu。这正是我想要的。精彩的。知道如何组合 3 列吗?像YYY。只是好奇。提前谢谢! 太棒了!! N 的情况适用于所有情况。谢谢!【参考方案2】:我会使用横向连接来做到这一点:
with vals as (
select v.*
from t cross join lateral
(values ('A', A), ('B', B), ('C', C), ('D', D), ('Z', Z)
) v(which, val)
)
select (v1.which || v2.which) as combo,
sum( (val = 'Y')::int ) as count
from vals v1 join
vals v2
on v1.which < v2.which
group by combo
order by combo;
我认为横向连接是一种更直接的取消透视值的方法。没有必要将值转换为一个未嵌套的数组,更不用说将两个数组取消嵌套并对齐值。
【讨论】:
以上是关于计算postgresql矩阵中的列组合的主要内容,如果未能解决你的问题,请参考以下文章
如何使用枢轴将行转换为postgres中的列?让我来计算一下
如何在不先定义表中的列的情况下将数据加载到 PostgreSQL 中?