如何计算 SQL 中的所有组合出现次数?
Posted
技术标签:
【中文标题】如何计算 SQL 中的所有组合出现次数?【英文标题】:How to count all combined occurrences in SQL? 【发布时间】:2011-05-06 21:03:41 【问题描述】:是否有任何选项可以在 一个 SQL 查询中获得所有元素的计数组合,而不使用临时表或过程?
考虑这三个表:
产品(id、product_name)
交易(id、日期)
transaction_has_product(id、product_id、transaction_id)
样本数据
产品
1 AAA
2 BBB
3 CCC
交易
1 some_date
2 some_date
transaction_has_products
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
结果应该是:
AAA, BBB = 2
AAA, CCC = 1
BBB, CCC = 1
AAA, BBB, CCC = 1
【问题讨论】:
这只是一个例子,已修复 它有助于使用准确的数据...感谢修复。 【参考方案1】:不容易,因为与其他行相比,最后一行的匹配产品数量不同。您也许可以使用某种 GROUP_CONCAT() 运算符(在 mysql 中可用;在其他 DBMS 中实现,例如 Informix 和可能的 PostgreSQL),但我对此没有信心。
成对匹配
SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
FROM (SELECT p.product_name, h.transaction_id
FROM products AS p
JOIN transactions_has_products AS h ON h.product_id = p.product_id
) AS p1
JOIN (SELECT p.product_name, h.transaction_id
FROM products AS p
JOIN transactions_has_products AS h ON h.product_id = p.product_id
) AS p2
ON p1.transaction_id = p2.transaction_id
AND p1.product_name < p2.product_name
GROUP BY p1.name, p2.name;
处理三重匹配并非易事;进一步扩展它肯定是相当困难的。
【讨论】:
最后,应该也是“group by p1.product_name, p2.product_name”,谢谢【参考方案2】:如果您预先知道所有产品将是什么,您可以通过像这样旋转数据来做到这一点。
如果您事先不知道产品是什么,您可以在存储过程中动态构建此查询。如果产品数量很大,这两种方法的实用性都会失效,但我认为无论如何实现这一要求,这可能都是正确的。
select
product_combination,
case product_combination
when 'AAA, BBB' then aaa_bbb
when 'AAA, CCC' then aaa_ccc
when 'BBB, CCC' then bbb_ccc
when 'AAA, BBB, CCC' then aaa_bbb_ccc
end as number_of_transactions
from
(
select 'AAA, BBB' as product_combination union all
select 'AAA, CCC' union all
select 'BBB, CCC' union all
select 'AAA, BBB, CCC'
) as combination_list
cross join
(
select
sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
from
(
select
count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
b.transaction_id
from
products a
inner join
transaction_has_products b
on
a.id = b.product_id
group by
b.transaction_id
) as product_matrix
) as combination_counts
结果:
product_combination number_of_transactions
AAA, BBB 2
AAA, CCC 1
BBB, CCC 1
AAA, BBB, CCC 1
【讨论】:
【参考方案3】:取决于您对查询的控制程度(对于 postgresql,TSQL 可能需要更改)
SELECT COUNT(*) FROM transactions t WHERE
(
SELECT COUNT(DISTINCT tp.product)
FROM transaction_has_products tp
WHERE tp.[transaction_id] = t.id and tp.product IN (1, 2, 3)
) = 3
其中(1,2,3)
是您要检查的 ID 列表,= 3
等于列表中的条目数量。
【讨论】:
【参考方案4】:-
生成所有可能的组合。我用这个来支持自己:https://***.com/a/9135162/2244766(这有点棘手,我不完全理解逻辑......但它有效!)
创建一个子查询,将 products_in_transactions 聚合到每个 transaction_id 的产品数组中
使用数组包含运算符将它们都连接起来
经过以上步骤,你可以得到类似的东西:
with all_combis as (
with RECURSIVE y1 as (
with x1 as (
--select id from products
select distinct product_id as a from transaction_has_products
)
select array[a] as b ,a as c ,1 as d
from x1
union all
select b||a,a,d+1
from x1
join y1 on (a < c)
)
select *
from y1
)
, grouped_transactions as (
SELECT
array_agg(product_id) as products
FROM transaction_has_products
GROUP BY transaction_id
)
SELECT all_combis.b, count(*)
from all_combis
left JOIN grouped_transactions ON grouped_transactions.products @> all_combis.b
--WHERE array_upper(b, 1) > 1 -- or whatever
GROUP BY all_combis.b
order by array_upper(b, 1) desc, count(*) desc
您可以加入您的产品表以将产品 ID 替换为产品名称 - 但我想您会从这里获得它。 here's the fiddle(sqlfiddle 今天过得很糟糕 - 所以在你的数据库上检查一下,以防它抛出一些奇怪的错误,比如超时或类似的东西)
GL,高频:D
【讨论】:
以上是关于如何计算 SQL 中的所有组合出现次数?的主要内容,如果未能解决你的问题,请参考以下文章