C# LINQ 在列表中查找重复项

Posted 2023-03-31

技术标签:

【中文标题】C# LINQ 在列表中查找重复项【英文标题】：C# LINQ find duplicates in List 【发布时间】：2013-08-31 10:55:22 【问题描述】：

使用 LINQ，从 List<int>，我如何检索包含重复多次的条目及其值的列表？

【问题讨论】：

【参考方案1】：

解决问题的最简单方法是根据元素的值对元素进行分组，如果组中有多个元素，则选择该组的代表。在 LINQ 中，这转换为：

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => y.Key)
              .ToList();

如果想知道元素重复了多少次，可以使用：

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => new  Element = y.Key, Counter = y.Count() )
              .ToList();

这将返回一个匿名类型的List，每个元素将具有Element 和Counter 属性，以检索您需要的信息。

最后，如果您要查找的是字典，则可以使用

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .ToDictionary(x => x.Key, y => y.Count());

这将返回一个字典，其中您的元素作为键，重复的次数作为值。

【讨论】：

现在只是一个奇迹，假设重复的 int 分布到 n 个 int 数组中，我使用字典和 for 循环来了解哪个数组包含重复项并根据分布逻辑将其删除，是否存在实现该结果的最快方法（linq 想知道）？提前感谢您的关注。我正在做这样的事情：code for (int i = 0; i （））； for (int k = 0; k code 如果您想在数组列表中查找重复项，请查看 SelectMany 我在列表数组中搜索重复项，但不知道 selectmany 可以帮助我找出答案要检查任何集合是否有多个元素，如果使用 Skip(1).Any() 而不是 Count() 更有效。想象一个包含 1000 个元素的集合。 Skip(1).Any() 一旦找到第二个元素，就会检测到超过 1 个。使用 Count() 需要访问完整的集合。【参考方案2】：

找出一个可枚举项是否包含任何重复项：

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

找出枚举中的所有值是否唯一：

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

【讨论】：

有没有可能这些并不总是布尔对立面？ anyDuplicate == !allUnique 在所有情况下。 @GarrGodfrey 他们总是布尔对立面【参考方案3】：

另一种方法是使用HashSet:

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));

如果您希望重复列表中的唯一值：

var myhash = new HashSet<int>();
var mylist = new List<int>()1,1,2,2,3,3,3,4,4,4;
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();

这里是与通用扩展方法相同的解决方案：

public static class Extensions

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
  
    var hash = new HashSet<TKey>(comparer);
    return source.Where(item => !hash.Add(selector(item))).ToList();
  

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
  
    return source.GetDuplicates(x => x, comparer);      
  

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
  
    return source.GetDuplicates(selector, null);
  

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
  
    return source.GetDuplicates(x => x, null);

【讨论】：

这不能按预期工作。使用List<int> 1, 2, 3, 4, 5, 2 作为源，结果是一个IEnumerable<int>，其中一个元素的值为1（其中正确的重复值为2） @BCA 昨天，我认为你错了。看看这个例子：dotnetfiddle.net/GUnhUl 你的小提琴打印出正确的结果。但是，我在它的正下方添加了Console.WriteLine("Count: 0", duplicates.Count()); 行，它打印了6。除非我遗漏了有关此功能的要求，否则结果集合中应该只有 1 项。 @BCA昨天，这是LINQ延迟执行引起的错误。我添加了ToList 以解决此问题，但这意味着该方法在调用后立即执行，而不是在您迭代结果时执行。 var hash = new HashSet<int>(); var duplicates = list.Where(i => !hash.Add(i)); 将生成一个包含所有重复项的列表。因此，如果您的列表中有 4 个 2，那么您的重复列表将包含 3 个 2，因为只有一个 2 可以添加到 HashSet。如果您希望列表包含每个重复项的唯一值，请改用此代码：var duplicates = mylist.Where(item => !myhash.Add(item)).ToList().Distinct().ToList();【参考方案4】：

仅查找重复值：

var duplicates = list.GroupBy(x => x.Key).Any(g => g.Count() > 1);

例如

var list = new[] 1,2,3,1,4,2;

GroupBy 将按它们的键对数字进行分组，并保持计数（重复的次数）。之后，我们只是检查重复多次的值。

仅查找唯一值：

var unique = list.GroupBy(x => x.Key).All(g => g.Count() == 1);

例如

var list = new[] 1,2,3,1,4,2;

GroupBy 将按它们的键对数字进行分组，并保持计数（重复的次数）。之后，我们只是检查仅重复一次的值意味着是唯一的。

【讨论】：

下面的代码也会找到独特的项目。 var unique = list.Distinct(x => x) 您的 ANY 语法不会返回重复项，它只会告诉您是否有任何重复项。在第一个示例中也使用 ALL 语法，这应该对其进行排序！这两个例子都只返回布尔值，这不是 OP 所要求的。【参考方案5】：

你可以这样做：

var list = new[] 1,2,3,1,4,2;
var duplicateItems = list.Duplicates();

使用这些扩展方法：

public static class Extensions

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    
        var grouped = source.GroupBy(selector);
        var moreThan1 = grouped.Where(i => i.IsMultiple());
        return moreThan1.SelectMany(i => i);
    

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
    
        return source.Duplicates(i => i);
    

    public static bool IsMultiple<T>(this IEnumerable<T> source)
    
        var enumerator = source.GetEnumerator();
        return enumerator.MoveNext() && enumerator.MoveNext();

在 Duplicates 方法中使用 IsMultiple() 比 Count() 更快，因为这不会迭代整个集合。

【讨论】：

如果您查看reference source for Grouping，您会发现Count() 是预先计算的，您的解决方案可能会更慢。 @Johnbot。你是对的，在这种情况下它更快并且实现可能永远不会改变......但这取决于IGrouping背后的实现类的实现细节。使用我的实现，你知道它永远不会迭代整个集合。所以计算 [Count()] 与迭代整个列表基本上不同。 Count() 是预先计算的，但不是迭代整个列表。 @rehan khan：我不明白 Count() 和 Count() 的区别 @RehanKhan：IsMultiple 没有执行 Count()，它在 2 个项目后立即停止。就像 Take(2).Count >= 2;【参考方案6】：

我创建了一个扩展来响应这个问题，你可以将它包含在你的项目中，我认为当你在 List 或 Linq 中搜索重复项时，这会返回大多数情况。

例子：

//Dummy class to compare in list
public class Person

    public int Id  get; set; 
    public string Name  get; set; 
    public string Surname  get; set; 
    public Person(int id, string name, string surname)
    
        this.Id = id;
        this.Name = name;
        this.Surname = surname;
    



//The extention static class
public static class Extention

    public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
     //Return only the second and next reptition
        return extList
            .GroupBy(groupProps)
            .SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
    
    public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    
        //Get All the lines that has repeating
        return extList
            .GroupBy(groupProps)
            .Where(z => z.Count() > 1) //Filter only the distinct one
            .SelectMany(z => z);//All in where has to be retuned
    


//how to use it:
void DuplicateExample()

    //Populate List
    List<Person> PersonsLst = new List<Person>()
    new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
    new Person(2,"Ana","Figueiredo"),
    new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
    new Person(4,"Margarida","Figueiredo"),
    new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
    ;

    Console.WriteLine("All:");
    PersonsLst.ForEach(z => Console.WriteLine("0 -> 1 2", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All:
        1 -> Ricardo Figueiredo
        2 -> Ana Figueiredo
        3 -> Ricardo Figueiredo
        4 -> Margarida Figueiredo
        5 -> Ricardo Figueiredo
        */

    Console.WriteLine("All lines with repeated data");
    PersonsLst.getAllRepeated(z => new  z.Name, z.Surname )
        .ToList()
        .ForEach(z => Console.WriteLine("0 -> 1 2", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All lines with repeated data
        1 -> Ricardo Figueiredo
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
    Console.WriteLine("Only Repeated more than once");
    PersonsLst.getMoreThanOnceRepeated(z => new  z.Name, z.Surname )
        .ToList()
        .ForEach(z => Console.WriteLine("0 -> 1 2", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        Only Repeated more than once
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */

【讨论】：

考虑使用 Skip(1).Any() 而不是 Count()。如果您有 1000 个重复项，则 Skip(1).Any() 将在找到第二个后停止。 Count() 将访问所有 1000 个元素。如果添加此扩展方法，请考虑使用 HashSet.Add 而不是 GroupBy，如其他答案之一中所建议的那样。一旦 HashSet.Add 找到重复项，它就会停止。您的 GroupBy 将继续对所有元素进行分组，即使已找到包含多个元素的组【参考方案7】：

有一个答案，但我不明白为什么不起作用；

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

在这种情况下我的解决方案是这样的；

var duplicates = model.list
                    .GroupBy(s => s.SAME_ID)
                    .Where(g => g.Count() > 1).Count() > 0;
if(duplicates) 
    doSomething();

【讨论】：

第一个语法不起作用，因为它实际上是一个布尔扩展：如果至少一个元素满足谓词，ANY 方法将返回 true，否则返回 false。所以你的代码只会告诉你如果你有重复，而不是它们是什么【参考方案8】：

在 MS SQL Server 中检查了 Duplicates 函数的完整 Linq to SQL 扩展集。不使用 .ToList() 或 IEnumerable。 这些查询在 SQL Server 中而不是在内存中执行。。结果只在内存中返回。

public static class Linq2SqlExtensions 

    public class CountOfT<T> 
        public T Key  get; set; 
        public int Count  get; set; 
    

    public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);

    public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);

    public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey>  Key = y.Key, Count = y.Count() );

    public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));

【讨论】：

【参考方案9】：

Linq 查询：

var query = from s2 in (from s in someList group s by new  s.Column1, s.Column2  into sg select sg) where s2.Count() > 1 select s2;

【讨论】：

【参考方案10】：

这种不使用组的更简单方法只需获取 District 元素，然后遍历它们并检查它们在列表中的计数，如果它们的 计数>1，这意味着出现超过 1 个项目，因此将其添加到 Repeteditemlist

var mylist = new List<int>()  1, 1, 2, 3, 3, 3, 4, 4, 4 ;
            var distList=  mylist.Distinct().ToList();
            var Repeteditemlist = new List<int>();
            foreach (var item in distList)
            
               if(mylist.Count(e => e == item) > 1)
                
                    Repeteditemlist.Add(item);
                
            
            foreach (var item in Repeteditemlist)
            
                Console.WriteLine(item);

预期输出：

1 3 4

【讨论】：

【参考方案11】：

按键删除重复项

myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();

【讨论】：

问题不在于删除重复项。

以上是关于C# LINQ 在列表中查找重复项的主要内容，如果未能解决你的问题，请参考以下文章