比较两个迭代器并检查哪些元素被添加,删除或两者之间相同

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了比较两个迭代器并检查哪些元素被添加,删除或两者之间相同相关的知识,希望对你有一定的参考价值。

我正在编写一些代码,它基于两个巨大的数据列表接收两个迭代器。为简单起见,您可以想象两者都是数字列表。相同的数字可以存在于一个或两个列表中,也可以存在于两个列表中

它应该做的是遍历两个列表,当它这样做时,确定两个列表中存在哪些遇到的项目,哪些只存在于列表1中,哪些只存在于列表2中。

我可以创建包含第一个和第二个值的所有值的集合,并使用它们来进行差异,但代码将用于比较真正大的数据集(数百万条记录)。在内存中加载2组不是一个选项,它必须是“流式”。

我可以在处理之前对2个列表进行排序,以便您可以假设它们将被订购。

这是我能想到的最好的但是在某些情况下它会陷入无限循环:

public class ChangeScanner {

    public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne,
                                                      Iterator<T> sourceTwo,
                                                      Comparator<T> comparator) {
        T valueInOne = sourceOne.next();
        T valueInTwo = sourceTwo.next();

        while (sourceOne.hasNext() || sourceTwo.hasNext()) {
            if (comparator.compare(valueInOne, valueInTwo) == 0) {
                System.out.println("Present in both list 1 and 2: " + valueInOne);
                valueInOne = getNextValue(valueInOne, sourceOne);
                valueInTwo = getNextValue(valueInTwo, sourceTwo);

            } else if (comparator.compare(valueInOne, valueInTwo) < 0) {

                System.out.println("Present in list 1, Not present in list 2: " + valueInOne);
                valueInOne = getNextValue(valueInOne, sourceOne);

            } else if (comparator.compare(valueInOne, valueInTwo) > 0) {

                System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
                valueInTwo = getNextValue(valueInTwo, sourceTwo);

            }
        }
    }

    private static <T> T getNextValue(T current, Iterator<T> iterator) {
        if (iterator.hasNext()) {
            return iterator.next();
        }

        return current;
    }
}

还有一个简单的JUnit测试来演示它:

@Test
public void testIteratorComparingFail() {

    List<String> tableOne = Lists.newArrayList("1", "2", "3", "4", "5", "6", "7", "8", "9", "14", "15", "16", "17");
    List<String> tableTwo = Lists.newArrayList("8", "9", "10", "11", "12", "13");

    ChangeScanner.compareEntriesOfTwoStreams(tableOne.iterator(), tableTwo.iterator(), String::compareTo);
}

我现在基本上做的是:

Order everything from small to large before checking.

check 1 and 8 -> 1 is smaller so I know that 1 is not present in list 2. Advance one.
check 2 and 8 -> 2 is smaller so I know that 2 is not present in list 2. Advance one.
...
check 8 and 8 -> it exists in both lists. Advance one and two.
check 9 and 9 -> it exists in both lists. Advance one and two.
check 14 and 10 -> 14 is larger than 10 so I know that 10 is not present in list 1. Two will now loop further to catch up. Advance two.
check 14 and 11 -> 14 is larger than 11 so I know that 11 is not present in list 1. Advance two.
check 14 and 12 -> 14 is larger than 12 so I know that 12 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.

这一切都非常好,直到其中一个迭代器运行结束。在这种情况下,它的迭代器2现在已经完成(13是最后一个元素)。

当前逻辑将停留在最后一次检查中,因为它不能再使迭代器2前进,但迭代器仍然有更多元素:

check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.

这是我无法弄清楚该怎么做的地方。我很确定当任何一个迭代器完成时我必须包含一些额外的逻辑。

两个问题:

我一直在寻找能够做到这一点的第三方库,因为我不想自己发明这个。如果有,请告诉我:)

如果不是,我想知道我可以添加哪些检查来处理结束的2个迭代器之一。

答案

这是一个有趣的挑战,因为迭代器只能被消耗一次,并且代码不应该过早地丢弃从迭代器读取的值。

我只能提出一个递归解决方案,但如果你可以用循环重写它会更好

static <T> void diff(Iterator<T> lefts, Iterator<T> rights, Comparator<T> comparator,
        Consumer<T> onlyLeft, Consumer<T> equals, Consumer<T> onlyRight) {
    while (lefts.hasNext() && rights.hasNext()) {
        recur(lefts.next(), rights.next(), lefts, rights, comparator, onlyLeft, equals, onlyRight);
    }
    if (!lefts.hasNext()) {
        rights.forEachRemaining(onlyRight);
    }
    if (!rights.hasNext()) {
        lefts.forEachRemaining(onlyLeft);
    }
}

static <T> void recur(T left, T right, Iterator<T> lefts, Iterator<T> rights,
        Comparator<T> comparator, Consumer<T> onlyLeft, Consumer<T> equals,
        Consumer<T> onlyRight) {
    if (comparator.compare(left, right) == 0) {
        equals.accept(left);
    } else if (comparator.compare(left, right) < 0) {
        onlyLeft.accept(left);
        if (lefts.hasNext()) {
            recur(lefts.next(), right, lefts, rights, comparator, onlyLeft, equals, onlyRight);
        } else {
            onlyRight.accept(right);
        }
    } else {
        onlyRight.accept(right);
        if (rights.hasNext()) {
            recur(left, rights.next(), lefts, rights, comparator, onlyLeft, equals, onlyRight);
        } else {
            onlyLeft.accept(left);
        }
    }
}

简单的测试:

public static void main(String[] args) {
    List<String> tableOne = Lists.newArrayList("1", "2", "3", "4", "5", "6");
    List<String> tableTwo = Lists.newArrayList("2", "2", "5", "7", "8");

    diff(tableOne.iterator(), tableTwo.iterator(), String::compareTo,
            left -> System.out.println("Left " + left),
            both -> System.out.println("Both " + both),
            right -> System.out.println("Right " + right));

}

输出

Left 1
Both 2
Right 2
Left 3
Left 4
Both 5
Left 6
Right 7
Right 8
另一答案

我们可以使用伪照明角落情况编写函数

 while list1 and list2 has element
  if(list1.next < list2.next)
      keep advancing list1 these are in list1 and not in list2
  else if(list1.next > list2.next)
     keep advancing list2 these are in list2 and not in list1
  else if(list1.next == list2.next)
     advance both list1 and list2  these are common in both list

 while(list1.hasNext)   
   all remaining are only in list1 

 while(list2.hasNext)   
   all remaining are only in list2

工作代码

public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne, Iterator<T> sourceTwo,
        Comparator<T> comparator) {
    T valueInOne = sourceOne!=null ? sourceOne.hasNext() ? sourceOne.next() : null:null;
    T valueInTwo = sourceTwo!=null ? sourceTwo.hasNext() ? sourceTwo.next() : null:null;  
    while (valueInOne != null && valueInTwo != null) {

        if (comparator.compare(valueInOne, valueInTwo) > 0) {
            // advance sourcetwo
            while (valueInTwo != null && comparator.compare(valueInOne, valueInTwo) > 0) {
                System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
                valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
            }

        } else if (comparator.compare(valueInOne, valueInTwo) < 0) {
            // advance sourceone 
            while (valueInOne != null && comparator.compare(valueInOne, valueInTwo) < 0) {
                // this will advance
                System.out.println("Not present in list 2, Present in list 1: " + valueInOne);
                valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
            }

        } else if (comparator.compare(valueInOne, valueInTwo) ==0) {
            System.out.println("present in both list:" + valueInOne);
            valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
            valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
            // present in both list if one of list is ended
        }

    }

    while (valueInOne != null) {
        // all these are only in list1
        System.out.println("Not present in list 2, Present in list 1: " + valueInOne);
        valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
    }

    while (valueInTwo != null) {
        // these are only in list2
        System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
        valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
    }
}

样品运行

compareEntriesOfTwoStreams(Stream.of(1,2,3,10).iterator(), Stream.of(3,4,10,12).iterator(), Integer::compare);

产量

Not present in list 2, Present in list 1: 1
Not present in list 2, Present in list 1: 2
present in both list                    : 3
Not present in list 1, Present in list 2: 4
present in both list                    : 10
Not present in list 1, Present in list 2: 12
另一答案

在while循环中添加以下行解决了这个问题。

        if(!sourceOne.hasNext())
          {
            sourceTwo.next();
          }
        if(!sourceTwo.hasNext())
          {
            sourceOne.next();
          }

完整代码:

 public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne,
                                                  Iterator<T> sourceTwo,
                                                  Comparator<T> comparator) {
    T valueInOne = sourceOne.next();
    T valueInTwo = sourceTwo.next();

    while (sourceOne.hasNext() || sourceTwo.hasNext()) {

        if (comparator.compare(valueInOne, valueInTwo) == 0) {
            System.out.println("Present in both list 1 and 2: " + valueInOne);
            valueInOne = getNextValue(valueInOne, sourceOne);
            valueInTwo = getNextValue(valueInTwo, sourceTwo);

        } else if (comparator.compare(valueInOne, valueInTwo) < 0) {

            System.out.println("Present in list 1, Not present in list 2: " + valueInOne);
            valueInOne = getNextValue(valueInOne, sourceOne);

        } else if (comparator.compare(valueInOne, valueInTwo) > 0) {

            System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
            valueInTwo = getNextValue(valueInTwo, sourceTwo);

        }
        if(!sourceOne.hasNext())
        {
            sourceTwo.next();
        }
        if(!sourceTwo.hasNext())
        {
            sourceOne.next();
        }

    }
}

private static <T> T getNextValue(T current, Iterator<T> iterator) {
    if (iterator.hasNext()) {
        return iterator.next();
    }

    return current;
}

以上是关于比较两个迭代器并检查哪些元素被添加,删除或两者之间相同的主要内容,如果未能解决你的问题,请参考以下文章

比较两个数组以检测删除和添加元素

比较 NSDates 以检查今天是不是介于两者之间

python-19-迭代器是个什么东西?

set

迭代器失效

JS数组去重的方法汇总