MYSQL - 将具有多个重复值的行组合起来，然后删除重复项

Posted 2023-02-24

技术标签:

【中文标题】MYSQL - 将具有多个重复值的行组合起来，然后删除重复项【英文标题】：MYSQL - Combine rows with multiple duplicate values and delete duplicates afterwards 【发布时间】：2015-09-26 09:33:28 【问题描述】：

所以我将我的数据库设置为单个表。在该表中，我收集了源 URL 和描述（我正在从许多页面中抓取产品描述）。不幸的是，如果有多个段落，我会在数据库中为 URL/源页面创建多行。

我想做的是，如果有多行具有相同的 URL，请结合每一行的描述，然后删除该 URL 的重复行。

我的表的结构实际上是这样的：

table             
+----+----------------------------+-------------+
| id | url                        | description |
+----+----------------------------+-------------+
|  1 | http://example.com/page-a  | paragraph 1 |
|  2 | http://example.com/page-a  | paragraph 2 |
|  3 | http://example.com/page-a  | paragraph 3 |
|  4 | http://example.com/page-b  | paragraph 1 |
|  5 | http://example.com/page-b  | paragraph 2 |
+----+----------------------------+-------------+

我想要的样子是这样的：

table             
+----+----------------------------+-------------------------------------+
| id | url                        | description                         |
+----+----------------------------+-------------------------------------+
|  1 | http://example.com/page-a  | paragraph 1 paragraph 2 paragraph 3 |
|  2 | http://example.com/page-b  | paragraph 1 paragraph 2             |
+----+----------------------------+-------------------------------------+

我并不担心将 ID 更新为正确，我只是希望能够将段落应位于同一字段中的行合并为相同的 URL，然后删除重复项。

任何帮助将不胜感激！

【问题讨论】：

【参考方案1】：

新建临时表，截断原表，重新插入数据：

create temporary table tempt as
    select (@rn := @rn + 1) as id, url,
           group_concat(description order by id separator ' ') as description
    from t cross join (select @rn := 0) params
    group by url 
    order by min(id);

-- Do lots of testing and checking here to be sure you have the data you want.

truncate table t;

insert into t(id, url, description)
    select id, url, description
    from tempt;

如果id 已在表中自动递增，则无需为其提供值。

【讨论】：

感谢您的贡献，但我不断收到错误。不过还是谢谢！【参考方案2】：

过滤表格很容易，只需将结果插入新表格即可：

SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url

【讨论】：

【参考方案3】：

在SQL中

SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; ' 
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;

【讨论】：

以上是关于MYSQL - 将具有多个重复值的行组合起来，然后删除重复项的主要内容，如果未能解决你的问题，请参考以下文章