再战mysql 数据去重

Posted jarjune

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了再战mysql 数据去重相关的知识,希望对你有一定的参考价值。

年初时,写过一篇去重的,在小表中还能用用,在大表中真的是效率低下,现在给了一次优化
https://www.cnblogs.com/jarjune/p/8328013.html

继上一篇文章

方法三:

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT(\'
		DELETE 
		FROM 
			\', TABLENAME ,\' 
		WHERE 
			(\', FIELDNAMES ,\') IN ( 
				SELECT \', FIELDNAMES ,\' 
				FROM (
					SELECT 
						\', FIELDNAMES ,\' 
					FROM 
						\', TABLENAME ,\' 
					GROUP BY 
						\', FIELDNAMES ,\' 
					HAVING 
						COUNT(1) > 1 
				) t1
			) 
		AND \', AUTOFIELD ,\' NOT IN ( 
			SELECT \', AUTOFIELD ,\' 
			FROM (
				SELECT 
					MAX(\', AUTOFIELD ,\') \', AUTOFIELD ,\' 
				FROM 
					\', TABLENAME ,\'
				GROUP BY 
					\', FIELDNAMES ,\' 
				HAVING 
					COUNT(1) > 1 
				) t2
			)
\');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_1(\'表名\', \'字段1,字段2,字段3...\', \'主键(唯一)字段\');

之后发现删除的效率还是挺低,又优化成

方法三(优化):

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT(\'
		DELETE 
		FROM 
			\', TABLENAME ,\' 
		WHERE 
			\', AUTOFIELD ,\' IN ( 
				SELECT 
					\', AUTOFIELD ,\' 
				FROM
					(
					SELECT 
						\', AUTOFIELD ,\' 
					FROM 
						\', TABLENAME ,\' 
					WHERE 
						(\', FIELDNAMES ,\') IN ( 
							SELECT 
								\', FIELDNAMES ,\' 
							FROM 
								\', TABLENAME ,\' 
							GROUP BY 
								\', FIELDNAMES ,\' 
							HAVING 
								COUNT(1) > 1 
						) 
					AND \', AUTOFIELD ,\' NOT IN ( 
						SELECT 
							MAX(\', AUTOFIELD ,\') 
						FROM 
							\', TABLENAME ,\'
						GROUP BY 
							\', FIELDNAMES ,\' 
						HAVING 
							COUNT(1) > 1 
					) 
				) t2 
			) 
	\');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_2(\'表名\', \'字段1,字段2,字段3...\', \'主键字段\');

由于上述都要group by 两次,又换了一种思路

方法四

DELETE t1
FROM
	l_weijij_47 t1,
	(
		SELECT
			f01,
			f02,
			f03,
			MAX(seq_value) seq_value
		FROM
			l_weijij_47
		GROUP BY
			f01,
			f02,
			f03
		HAVING
			COUNT(1) > 1
		ORDER BY NULL
	) t2
where
	t1.f01 = t2.f01
AND t1.f02 = t2.f02
AND t1.f03 = t2.f03
and t1.seq_value < t2.seq_value

注:group by默认会进行排序,所以要加上order by NULL就避免了排序
group by a,b,c的时候,a,b,c一定要加索引才快

综上,方法四是目前在用的去重。

以上是关于再战mysql 数据去重的主要内容,如果未能解决你的问题,请参考以下文章

mysql去重distinct,太牛了!

linux中怎么查看mysql数据库版本

mysql多表查询去重

mysql的去重问题

部分代码片段

求mysql 语句去重并按重复个数排序