从 ArrayList 中捕获重复项

Posted 2023-02-25

技术标签:

【中文标题】从 ArrayList 中捕获重复项【英文标题】：Capturing Duplicates from ArrayList 【发布时间】：2014-09-22 13:28:30 【问题描述】：

我在从 ArrayList 中删除重复对象时遇到问题。我将 XML 解析为我称之为 IssueFeed 的对象。这包括症状、问题、解决方案。

我的大多数对象都是独一无二的，不具有相同的症状、问题、解决方案，但有些具有相同的症状但有不同的问题。

我正在尝试完成几件事。

捕获与重复 Arraylist 具有相同症状的对象从主列表中删除重复的项目，留下至少 1 个具有该症状的项目来显示。当用户点击我们知道有重复的项目时，在我的listview/adapter中设置重复数据Arraylist。

我已经采取的步骤。

我已尝试对对象进行排序，并且能够捕获重复项，但不知道如何从主列表中删除除一个之外的所有对象。 2 在列表之间循环并查找不是它们自身和症状 = 症状的对象，然后删除并更新我的重复数组和主数组。

一些代码

IssueFeed - 对象

public IssueFeed(String symptom, String problem, String solution) 
    this.symptom = symptom;
    this.problem = problem;
    this.solution = solution;

public String getSymptom() 
    return symptom;

public String getProblem() 
    return problem;

public String getSolution() 
    return solution;

我的ArrayList<IssueFeed>的

duplicateDatalist = new ArrayList<IssueFeed>(); // list of objects thats share a symptom

list_of_non_dupes = new ArrayList<IssueFeed>(); // list of only objects with unique symptom

mIssueList = mIssueParser.parseLocally(params[0]); // returns ArrayList<IssueFeed> of all objects

我可以通过下面的sort 代码获得副本。

Collections.sort(mIssueList, new Comparator<IssueFeed>()
            public int compare(IssueFeed s1, IssueFeed s2) 
                if (s1.getSymptom().matches(s2.getSymptom())) 
                    if (!duplicateDatalist.contains(s1)) 
                        duplicateDatalist.add(s1);
                        System.out.print("Dupe s1 added" + " " + s1.getSymptom() + ", " + s1.getProblem() + "\n");
                    
                    if (!duplicateDatalist.contains(s2)) 
                        duplicateDatalist.add(s2);
                        System.out.print("Dupe s2 added" + " " + s2.getSymptom() + ", " + s2.getProblem() + "\n");
                    
                
                return s1.getSymptom().compareToIgnoreCase(s2.getSymptom());
            
        );

现在我需要创建新的非骗子列表，此代码仅添加了所有对象。：/

for (int j = 0; j < mIssueList.size(); j++) 
            IssueFeed obj = mIssueList.get(j);

            for (int i = 0; i < mIssueList.size(); i++) 
                IssueFeed obj_two = mIssueList.get(j);

                if (obj.getSymptom().matches(obj_two.getSymptom())) 
                    if (!list_non_dupes.contains(obj_two)) 
                        list_non_dupes.add(obj_two);
                    
                    break;
                 else 
                    if (!list_non_dupes.contains(obj_two)) 
                        list_non_dupes.add(obj_two);

【问题讨论】：

【参考方案1】：

您应该遍历 ArrayList 两次。使用这种方法，您甚至不需要按重复项对 ArrayList 进行排序（Collections.sort 是一个 O(n log n) 操作），并且可以在线性时间内处理列表。您也不需要为 IssueFeed 对象覆盖 equals() 和 hashCode()。

在第一次迭代中，您应该填写一个症状哈希表，该哈希表根据 ArrayList 中出现的次数进行哈希处理。应该是

class SymptomInfo 
    int incidence;
    boolean used;

HashMap<String, SymptomInfo> symptomIncidence = new HashMap<String, SymptomInfo>();

但是，如果您从多个线程读取和写入 HashMap，则可能需要使用线程安全的 HashMap 数据结构。

在通过 ArrayList 的第二次迭代中，您应该从哈希图中查找 incidence 值并找到该症状的出现总数。这是确定是否应将对象添加到duplicateDataList 或list_of_non_dupes 的一种快速简便的方法。此外，当您第一次遇到具有特定symptom 值的对象时，您可以将used 设置为true。因此，如果您遇到 used 为 true 的对象，您就知道它是重复出现的并且可以将其从主列表中删除。

【讨论】：

感谢您的建议。在你和 Syam 的回答之间，我已经解决了我原来的问题。【参考方案2】：

如果您可以修改 IssueFeed 对象，请考虑覆盖 equals() 和 hashCode() 方法并使用集合来查找重复项。例如

import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;

class IssueFeed 
    private String symptom;
    private String problem;
    private String solution;

    public IssueFeed(String symptom, String problem, String solution) 
        this.symptom = symptom;
        this.problem = problem;
        this.solution = solution;
    
    public String getSymptom() 
        return symptom;
    
    public String getProblem() 
        return problem;
    
    public String getSolution() 
        return solution;
    
    @Override
    public int hashCode() 
        final int prime = 31;
        int result = 1;
        result = prime * result + ((symptom == null) ? 0 : symptom.hashCode());
        return result;
    
    @Override
    public boolean equals(Object obj) 
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        IssueFeed other = (IssueFeed) obj;
        if (symptom == null) 
            if (other.symptom != null)
                return false;
         else if (!symptom.equals(other.symptom))
            return false;
        return true;
    
    @Override
    public String toString() 
        return "IssueFeed [symptom=" + symptom + ", problem=" + problem
                + ", solution=" + solution + "]";
    


public class Sample 

    public static void main(String[] args) 
        List<IssueFeed> mainList = new ArrayList<IssueFeed>(
                Arrays.asList(new IssueFeed[] 
                        new IssueFeed("sym1", "p1", "s1"),
                        new IssueFeed("sym2", "p2", "s2"),
                        new IssueFeed("sym3", "p3", "s3"),
                        new IssueFeed("sym1", "p1", "s1") ));
        System.out.println("Initial List : " + mainList);
        Set<IssueFeed> list_of_non_dupes = new LinkedHashSet<IssueFeed>();
        List<IssueFeed> duplicateDatalist = new ArrayList<IssueFeed>(); 
        for(IssueFeed feed : mainList)
            if(!list_of_non_dupes.add(feed)) 
                duplicateDatalist.add(feed);
            
        
        mainList = new ArrayList<IssueFeed>(list_of_non_dupes); // Remove the duplicate items from the main list, leaving at least 1 item with that symptom to be display
        list_of_non_dupes.removeAll(duplicateDatalist); // list of only objects with unique symptom
        System.out.println("Fina main list : " + mainList);
        System.out.println("Unique symptom" + list_of_non_dupes);
        System.out.println("Duplicate symptom" + duplicateDatalist);

【讨论】：

我认为 list_of_non_dupes 适用于 mainList 中具有独特症状的对象。在代码中查看 OP 的 cmets。我认为这是 OP 的第二个要求“从主列表中删除重复的项目，至少留下 1 个具有该症状的项目要显示” 然而，OP 说这是主列表，并且似乎正在处理 3 个列表。我们可以在线性时间内完成我们提到的事情，而无需覆盖 .equals() 和 .hashCode()。修改为包含所有 3 个列表。 :) @SyamS 这与我需要的非常接近，但是重复列表的原因是为了获取症状相同但问题不同的对象。 new IssueFeed("sym1", "p1", "s1"), new IssueFeed("sym2", "p2", "s2"), new IssueFeed("sym3", "p3", "s3"), new IssueFeed ("sym1", "p4", "s4")

以上是关于从 ArrayList 中捕获重复项的主要内容，如果未能解决你的问题，请参考以下文章