在 GROUP BY 连接中选择最高值

Posted 2023-02-16

技术标签:

【中文标题】在 GROUP BY 连接中选择最高值【英文标题】：Choosing highest value in a GROUP BY Join 【发布时间】：2021-03-10 13:34:39 【问题描述】：

我有一个pages 表，其中包含 URL 及其关联的类别。我将使用constraints 表和GROUP BY 将其与ITSELF 一起加入以获得唯一的URL，然后按最高分排序。

问题：并不总是选择 URL 组中得分最高的。

（背景：在生产中，这将用于了解“来自”类别中的哪些页面应该超链接到“到”类别中的页面）

我认为this answer 中有一些东西，但我不知道如何适应它：

当前查询

SELECT keyword, URL, score FROM
    (   
        SELECT keyword, URL, score
        FROM pages
        JOIN constraints
        ON pages.category = constraints.to
        AND constraints.from IN (SELECT category FROM pages WHERE URL = 'https://www.example.net')
        ORDER BY score DESC
    )   
AS x        
GROUP BY URL;

页面

+---------+-------------------------+----------+-------+
| keyword | URL                     | category | score |
+---------+-------------------------+----------+-------+
| Cat     | https://www.example.org | 1        | 100   |
+---------+-------------------------+----------+-------+
| Dog     | https://www.example.com | 2        | 50    |
+---------+-------------------------+----------+-------+
| Fish    | https://www.example.com | 2        | 60    |
+---------+-------------------------+----------+-------+
| Mouse   | https://www.example.net | 3        | 1     |
+---------+-------------------------+----------+-------+

约束

+------+----+
| from | to |
+------+----+
| 1    | 2  |
+------+----+
| 2    | 1  |
+------+----+
| 3    | 2  |
+------+----+

电流输出：

+---------+-------------------------+-------+
| keyword | URL                     | score |
+---------+-------------------------+-------+
| Dog     | https://www.example.com | 50    |
+---------+-------------------------+-------+

选择了狗行，尽管分数低于鱼行。

期望的输出：

+---------+-------------------------+-------+
| keyword | URL                     | score |
+---------+-------------------------+-------+
| Fish    | https://www.example.com | 60    |
+---------+-------------------------+-------+

编辑：将表格简化为最小的可重现示例。添加了电流输出。并且解释得更好一些。

【问题讨论】：

你的mysql是什么版本的？请描述您是如何获得“所需输出”的。（换句话说，不要让别人猜测或假设） @Strawberry 对不起，我以为我已经把它缩小了，但我会再试一次我迷路了。您在约束表中有“4”，但它不在您的数据中。您为某些 URL 指定其他 URL 需要什么 - 但结果基于什么 URL？太棒了。什么时候告诉我们。 【参考方案1】：

根据您的数据，我尝试解决问题。我的逻辑如下：

table x

从页面

到页面

table y

dense_rank()

排名

table y

rank 1

group by

我的查询如下：

select to_keyword as Keyword, to_url as URL, to_score as Score 
    from (
     select from_url, to_keyword, to_url, to_score, dense_rank() over (partition by to_url order by to_score desc) as rnk 
       from ( select p.url as from_url, u.Keyword as to_keyword, u.url as to_url, u.Category as to_category, u.Score as to_score 
              from pages p inner join constrains c on p.Category = c.from inner join pages u on u.Category = c.to
             )x
         ) y 
    where rnk = 1 and from_url = 'https://www.b.com/' 
    group by 1, 2, 3
    order by to_url

【讨论】：

以上是关于在 GROUP BY 连接中选择最高值的主要内容，如果未能解决你的问题，请参考以下文章