使用解析函数查询同类产品

Posted 2023-03-30

技术标签:

【中文标题】使用解析函数查询同类产品【英文标题】：Query using analytic function for similar product 【发布时间】：2021-07-05 21:52:52 【问题描述】：

我正在尝试查找查询以匹配相似的客户。

为简化情况考虑这种情况：

我有一个表格，其中包含客户姓名和购买的产品。客户名称可以多次购买相同和/或不同的产品。

假设原始数据是：

CustomerName | ProductName
A            | p1
A            | p2
A            | p3
A            | p1
B            | p1
B            | p2
B            | p4
B            | p5
C            | p2

在我正在寻找的查询中，我想查看至少有 1 个共同购买产品的所有客户对，然后让所有产品客户 2 购买，同时显示有多少类似产品的计数（不同的产品）在客户 1 和 2 之间。根据演示原始数据，应该是：

客户 A 和 B 都购买了 p1,p2，因此在这对显示的每条记录中，他们在 CountSimilarity 中都有 2。客户 B 还购买了 p4、p5，因此它们应该是重要的输出。在 B-A 对之间，它们具有相同的 2，但客户 A 还购买了 p3，而 B 没有购买，因此这是重要的输出。 C-A 和 C-B 对也是如此

CustomerName1 | CustomerName2 | ProductName2 | CountSimilarity
A             | B             | p4           | 2
A             | B             | p5           | 2
B             | A             | p3           | 2
C             | A             | p1           | 1
C             | A             | p3           | 1
C             | B             | p1           | 1
C             | B             | p4           | 1
C             | B             | p5           | 1

在蒂姆的帮助下，在我之前的问题中，我做了以下查询，这几乎显示了我需要什么。不幸的是，计数太高 - 我需要计算不同的产品名称，这不适用于 over partition by。

select distinct t1.cname, t1.pname, t2.cname, t3.pname pnamet3,
count(*) over (partition by t1.cname, t2.cname) cnt
from MyTable t1 
inner join MyTable t2 on t1.pname = t2.pname and t1.cname != t2.cname
inner join MyTable t3 on t2.cname = t3.cname and t2.pname != t3.pname

对如何处理此查询有任何建议吗？

环境是 SQL Server。

谢谢

【问题讨论】：

This questions 处理窗口函数中的distinct。您可以在这里发布您的预期结果吗？如果可能，还包括表模式和数据。 ? 【参考方案1】：

查询需要使用CROSS JOIN imo 从DISTINCT 值构建。像这样的

drop table if exists #dts;
go
create table #dts(
  CustomerName        char(1) not null,
  ProductName        char(2) not null);

insert #dts(CustomerName, ProductName) values 
('A','p1'),
('A','p2'),
('A','p3'),
('A','p1'),
('B','p1'),
('B','p2'),
('B','p4'),
('B','p5'),
('C','p2');

with
unq_cust_cte as (
    select distinct CustomerName 
    from #dts),
unq_prod_cte as (
    select distinct CustomerName, ProductName 
    from #dts),
xjoin_cte as (
    select c1.CustomerName cn1, c2.CustomerName cn2
    from unq_cust_cte c1
         cross join unq_cust_cte c2
    where c1.CustomerName<>c2.CustomerName),
similar_count_cte as (
    select x.cn1, x.cn2, count(*) similar_count
    from xjoin_cte x
         join unq_prod_cte p1 on x.cn1=p1.CustomerName
         join unq_prod_cte p2 on x.cn2=p2.CustomerName
    where p1.ProductName=p2.ProductName
    group by x.cn1, x.cn2)
select sc.*, excpt.ProductName2
from similar_count_cte sc
     cross apply (select up.ProductName
                  from unq_prod_cte up
                  where sc.cn2=up.CustomerName
                  except
                  select up.ProductName
                  from unq_prod_cte up
                  where sc.cn1=up.CustomerName) excpt(ProductName2)
order by sc.cn1, sc.cn2, excpt.ProductName2;

cn1 cn2 similar_count   ProductName2
A   B   2               p4
A   B   2               p5
B   A   2               p3
C   A   1               p1
C   A   1               p3
C   B   1               p1
C   B   1               p4
C   B   1               p5

【讨论】：

我收到了关于按顺序命名的错误，但我将其修改为 sc.cn1 而不是 x.cn1 并且有效！非常感谢你！从未见过“交叉应用”运算符，我会尝试阅读它！

以上是关于使用解析函数查询同类产品的主要内容，如果未能解决你的问题，请参考以下文章