年初时,写过一篇去重的,在小表中还能用用,在大表中真的是效率低下,现在给了一次优化
https://www.cnblogs.com/jarjune/p/8328013.html
继上一篇文章
方法三:
DELIMITER //
DROP PROCEDURE IF EXISTS delete_rows_2;
CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN
DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
SET DELETE_TABLE_ROWS_SQL = CONCAT(\'
DELETE
FROM
\', TABLENAME ,\'
WHERE
(\', FIELDNAMES ,\') IN (
SELECT \', FIELDNAMES ,\'
FROM (
SELECT
\', FIELDNAMES ,\'
FROM
\', TABLENAME ,\'
GROUP BY
\', FIELDNAMES ,\'
HAVING
COUNT(1) > 1
) t1
)
AND \', AUTOFIELD ,\' NOT IN (
SELECT \', AUTOFIELD ,\'
FROM (
SELECT
MAX(\', AUTOFIELD ,\') \', AUTOFIELD ,\'
FROM
\', TABLENAME ,\'
GROUP BY
\', FIELDNAMES ,\'
HAVING
COUNT(1) > 1
) t2
)
\');
SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
END//
DELIMITER ;
CALL delete_rows_1(\'表名\', \'字段1,字段2,字段3...\', \'主键(唯一)字段\');
之后发现删除的效率还是挺低,又优化成
方法三(优化):
DELIMITER //
DROP PROCEDURE IF EXISTS delete_rows_2;
CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN
DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
SET DELETE_TABLE_ROWS_SQL = CONCAT(\'
DELETE
FROM
\', TABLENAME ,\'
WHERE
\', AUTOFIELD ,\' IN (
SELECT
\', AUTOFIELD ,\'
FROM
(
SELECT
\', AUTOFIELD ,\'
FROM
\', TABLENAME ,\'
WHERE
(\', FIELDNAMES ,\') IN (
SELECT
\', FIELDNAMES ,\'
FROM
\', TABLENAME ,\'
GROUP BY
\', FIELDNAMES ,\'
HAVING
COUNT(1) > 1
)
AND \', AUTOFIELD ,\' NOT IN (
SELECT
MAX(\', AUTOFIELD ,\')
FROM
\', TABLENAME ,\'
GROUP BY
\', FIELDNAMES ,\'
HAVING
COUNT(1) > 1
)
) t2
)
\');
SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
END//
DELIMITER ;
CALL delete_rows_2(\'表名\', \'字段1,字段2,字段3...\', \'主键字段\');
由于上述都要group by 两次,又换了一种思路
方法四
DELETE t1
FROM
l_weijij_47 t1,
(
SELECT
f01,
f02,
f03,
MAX(seq_value) seq_value
FROM
l_weijij_47
GROUP BY
f01,
f02,
f03
HAVING
COUNT(1) > 1
ORDER BY NULL
) t2
where
t1.f01 = t2.f01
AND t1.f02 = t2.f02
AND t1.f03 = t2.f03
and t1.seq_value < t2.seq_value
注:group by默认会进行排序,所以要加上order by NULL就避免了排序
group by a,b,c的时候,a,b,c一定要加索引才快
综上,方法四是目前在用的去重。