sql 在MySQL中查找重复条目

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sql 在MySQL中查找重复条目相关的知识,希望对你有一定的参考价值。

-- =============================================================================
-- Find Duplicate Input With MySQL
-- http://kvz.io/blog/2013/03/04/find-duplicate-input-with-mysql/
--
-- At my company we have employees creating customer accounts every day.
-- Sometimes we make mistakes, for instance, we forget to check if the company
-- already was a customer (maybe 10y ago they may have had a product).
--
-- Duplicate accounts can cause all sorts of problems, so I wanted way to detect
-- them with SQL.
--
-- The problem was, the company names may have been entered with different
-- punctuation, whitespace, etc. So I needed similar names to surface from the
-- depths of our database, not just exact matches (that would have been too easy
-- :)
--
-- For the solution I turned to SOUNDEX for fuzzy matching similar sounding
-- company names, and then review the results myself (false positives are
-- possible, but since they would be few, it becomes a simple task to
-- doublecheck) and report back to our company.
--
-- I thought I'd share
--
-- * partly because it could be useful to others (obviously this could be used
--   to detect all kinds of user generated typos and similar entries);
-- * mostly because I'm curious to find if there is a better (more performant)
--   way to write this query.
-- =============================================================================
 
--
-- Select all the individual company names that have a
-- soundex_code that occurs more than once (I now use a subquery for that)
SELECT
  `id`,
  `customer_name`,
  SOUNDEX(`customer_name`) AS soundex_code
FROM `customers`
WHERE SOUNDEX(`customer_name`) IN (
  -- Subquery: select all soundex_codes that occur more than once,
  -- (this does not return the individual company names that share them)
  SELECT SOUNDEX(`customer_name`) AS soundex_code
  FROM `customers`
  WHERE 1 = 1
    AND `is_active` = 1
    -- More specific criteria to define who you want to compare
  GROUP BY soundex_code
  HAVING COUNT(*) > 1
)
ORDER BY
  soundex_code,
  `customer_name`
 
--
-- This e.g. returns:
--
-- -----|------------------|------------------
-- id      customer_name       soundex_code
-- -----|------------------|------------------
-- 291     F.S. Hosting        F2352
-- 1509    FS hosting          F2352
-- 9331    R  Schmit           R253
-- 9332    R Schmit            R253

以上是关于sql 在MySQL中查找重复条目的主要内容,如果未能解决你的问题,请参考以下文章

在sql表条目中查找重复单词[关闭]

sql 查找因主键(PK)违规而可能阻止查询的所有重复条目

从表 mysql 中获取重复的条目

在 SQL 中查找重复的值对

sql 在MySQL中查找重复值

如何使用 Propel ORM 在数据库中查找重复条目?