比较两个迭代器并检查哪些元素被添加,删除或两者之间相同
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了比较两个迭代器并检查哪些元素被添加,删除或两者之间相同相关的知识,希望对你有一定的参考价值。
我正在编写一些代码,它基于两个巨大的数据列表接收两个迭代器。为简单起见,您可以想象两者都是数字列表。相同的数字可以存在于一个或两个列表中,也可以存在于两个列表中
它应该做的是遍历两个列表,当它这样做时,确定两个列表中存在哪些遇到的项目,哪些只存在于列表1中,哪些只存在于列表2中。
我可以创建包含第一个和第二个值的所有值的集合,并使用它们来进行差异,但代码将用于比较真正大的数据集(数百万条记录)。在内存中加载2组不是一个选项,它必须是“流式”。
我可以在处理之前对2个列表进行排序,以便您可以假设它们将被订购。
这是我能想到的最好的但是在某些情况下它会陷入无限循环:
public class ChangeScanner {
public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne,
Iterator<T> sourceTwo,
Comparator<T> comparator) {
T valueInOne = sourceOne.next();
T valueInTwo = sourceTwo.next();
while (sourceOne.hasNext() || sourceTwo.hasNext()) {
if (comparator.compare(valueInOne, valueInTwo) == 0) {
System.out.println("Present in both list 1 and 2: " + valueInOne);
valueInOne = getNextValue(valueInOne, sourceOne);
valueInTwo = getNextValue(valueInTwo, sourceTwo);
} else if (comparator.compare(valueInOne, valueInTwo) < 0) {
System.out.println("Present in list 1, Not present in list 2: " + valueInOne);
valueInOne = getNextValue(valueInOne, sourceOne);
} else if (comparator.compare(valueInOne, valueInTwo) > 0) {
System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
valueInTwo = getNextValue(valueInTwo, sourceTwo);
}
}
}
private static <T> T getNextValue(T current, Iterator<T> iterator) {
if (iterator.hasNext()) {
return iterator.next();
}
return current;
}
}
还有一个简单的JUnit测试来演示它:
@Test
public void testIteratorComparingFail() {
List<String> tableOne = Lists.newArrayList("1", "2", "3", "4", "5", "6", "7", "8", "9", "14", "15", "16", "17");
List<String> tableTwo = Lists.newArrayList("8", "9", "10", "11", "12", "13");
ChangeScanner.compareEntriesOfTwoStreams(tableOne.iterator(), tableTwo.iterator(), String::compareTo);
}
我现在基本上做的是:
Order everything from small to large before checking.
check 1 and 8 -> 1 is smaller so I know that 1 is not present in list 2. Advance one.
check 2 and 8 -> 2 is smaller so I know that 2 is not present in list 2. Advance one.
...
check 8 and 8 -> it exists in both lists. Advance one and two.
check 9 and 9 -> it exists in both lists. Advance one and two.
check 14 and 10 -> 14 is larger than 10 so I know that 10 is not present in list 1. Two will now loop further to catch up. Advance two.
check 14 and 11 -> 14 is larger than 11 so I know that 11 is not present in list 1. Advance two.
check 14 and 12 -> 14 is larger than 12 so I know that 12 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
这一切都非常好,直到其中一个迭代器运行结束。在这种情况下,它的迭代器2现在已经完成(13是最后一个元素)。
当前逻辑将停留在最后一次检查中,因为它不能再使迭代器2前进,但迭代器仍然有更多元素:
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
check 14 and 13 -> 14 is larger than 13 so I know that 13 is not present in list 1. Advance two.
这是我无法弄清楚该怎么做的地方。我很确定当任何一个迭代器完成时我必须包含一些额外的逻辑。
两个问题:
我一直在寻找能够做到这一点的第三方库,因为我不想自己发明这个。如果有,请告诉我:)
如果不是,我想知道我可以添加哪些检查来处理结束的2个迭代器之一。
这是一个有趣的挑战,因为迭代器只能被消耗一次,并且代码不应该过早地丢弃从迭代器读取的值。
我只能提出一个递归解决方案,但如果你可以用循环重写它会更好
static <T> void diff(Iterator<T> lefts, Iterator<T> rights, Comparator<T> comparator,
Consumer<T> onlyLeft, Consumer<T> equals, Consumer<T> onlyRight) {
while (lefts.hasNext() && rights.hasNext()) {
recur(lefts.next(), rights.next(), lefts, rights, comparator, onlyLeft, equals, onlyRight);
}
if (!lefts.hasNext()) {
rights.forEachRemaining(onlyRight);
}
if (!rights.hasNext()) {
lefts.forEachRemaining(onlyLeft);
}
}
static <T> void recur(T left, T right, Iterator<T> lefts, Iterator<T> rights,
Comparator<T> comparator, Consumer<T> onlyLeft, Consumer<T> equals,
Consumer<T> onlyRight) {
if (comparator.compare(left, right) == 0) {
equals.accept(left);
} else if (comparator.compare(left, right) < 0) {
onlyLeft.accept(left);
if (lefts.hasNext()) {
recur(lefts.next(), right, lefts, rights, comparator, onlyLeft, equals, onlyRight);
} else {
onlyRight.accept(right);
}
} else {
onlyRight.accept(right);
if (rights.hasNext()) {
recur(left, rights.next(), lefts, rights, comparator, onlyLeft, equals, onlyRight);
} else {
onlyLeft.accept(left);
}
}
}
简单的测试:
public static void main(String[] args) {
List<String> tableOne = Lists.newArrayList("1", "2", "3", "4", "5", "6");
List<String> tableTwo = Lists.newArrayList("2", "2", "5", "7", "8");
diff(tableOne.iterator(), tableTwo.iterator(), String::compareTo,
left -> System.out.println("Left " + left),
both -> System.out.println("Both " + both),
right -> System.out.println("Right " + right));
}
输出
Left 1
Both 2
Right 2
Left 3
Left 4
Both 5
Left 6
Right 7
Right 8
我们可以使用伪照明角落情况编写函数
while list1 and list2 has element
if(list1.next < list2.next)
keep advancing list1 these are in list1 and not in list2
else if(list1.next > list2.next)
keep advancing list2 these are in list2 and not in list1
else if(list1.next == list2.next)
advance both list1 and list2 these are common in both list
while(list1.hasNext)
all remaining are only in list1
while(list2.hasNext)
all remaining are only in list2
工作代码
public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne, Iterator<T> sourceTwo,
Comparator<T> comparator) {
T valueInOne = sourceOne!=null ? sourceOne.hasNext() ? sourceOne.next() : null:null;
T valueInTwo = sourceTwo!=null ? sourceTwo.hasNext() ? sourceTwo.next() : null:null;
while (valueInOne != null && valueInTwo != null) {
if (comparator.compare(valueInOne, valueInTwo) > 0) {
// advance sourcetwo
while (valueInTwo != null && comparator.compare(valueInOne, valueInTwo) > 0) {
System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
}
} else if (comparator.compare(valueInOne, valueInTwo) < 0) {
// advance sourceone
while (valueInOne != null && comparator.compare(valueInOne, valueInTwo) < 0) {
// this will advance
System.out.println("Not present in list 2, Present in list 1: " + valueInOne);
valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
}
} else if (comparator.compare(valueInOne, valueInTwo) ==0) {
System.out.println("present in both list:" + valueInOne);
valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
// present in both list if one of list is ended
}
}
while (valueInOne != null) {
// all these are only in list1
System.out.println("Not present in list 2, Present in list 1: " + valueInOne);
valueInOne = sourceOne.hasNext() ? sourceOne.next() : null;
}
while (valueInTwo != null) {
// these are only in list2
System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
valueInTwo = sourceTwo.hasNext() ? sourceTwo.next() : null;
}
}
样品运行
compareEntriesOfTwoStreams(Stream.of(1,2,3,10).iterator(), Stream.of(3,4,10,12).iterator(), Integer::compare);
产量
Not present in list 2, Present in list 1: 1
Not present in list 2, Present in list 1: 2
present in both list : 3
Not present in list 1, Present in list 2: 4
present in both list : 10
Not present in list 1, Present in list 2: 12
在while循环中添加以下行解决了这个问题。
if(!sourceOne.hasNext())
{
sourceTwo.next();
}
if(!sourceTwo.hasNext())
{
sourceOne.next();
}
完整代码:
public static <T> void compareEntriesOfTwoStreams(Iterator<T> sourceOne,
Iterator<T> sourceTwo,
Comparator<T> comparator) {
T valueInOne = sourceOne.next();
T valueInTwo = sourceTwo.next();
while (sourceOne.hasNext() || sourceTwo.hasNext()) {
if (comparator.compare(valueInOne, valueInTwo) == 0) {
System.out.println("Present in both list 1 and 2: " + valueInOne);
valueInOne = getNextValue(valueInOne, sourceOne);
valueInTwo = getNextValue(valueInTwo, sourceTwo);
} else if (comparator.compare(valueInOne, valueInTwo) < 0) {
System.out.println("Present in list 1, Not present in list 2: " + valueInOne);
valueInOne = getNextValue(valueInOne, sourceOne);
} else if (comparator.compare(valueInOne, valueInTwo) > 0) {
System.out.println("Not present in list 1, Present in list 2: " + valueInTwo);
valueInTwo = getNextValue(valueInTwo, sourceTwo);
}
if(!sourceOne.hasNext())
{
sourceTwo.next();
}
if(!sourceTwo.hasNext())
{
sourceOne.next();
}
}
}
private static <T> T getNextValue(T current, Iterator<T> iterator) {
if (iterator.hasNext()) {
return iterator.next();
}
return current;
}
以上是关于比较两个迭代器并检查哪些元素被添加,删除或两者之间相同的主要内容,如果未能解决你的问题,请参考以下文章