用于折叠一组可能重叠的范围的好的通用算法是啥?

Posted

技术标签:

【中文标题】用于折叠一组可能重叠的范围的好的通用算法是啥?【英文标题】:What's a good, generic algorithm for collapsing a set of potentially-overlapping ranges?用于折叠一组可能重叠的范围的好的通用算法是什么? 【发布时间】:2010-11-17 00:36:37 【问题描述】:

我有一个方法可以获取这个类的一些对象

class Range<T>

    public T Start;
    public T End;

在我的例子中,TDateTime,但为了简单起见,让我们使用 int。我想要一种将这些范围折叠成覆盖相同“区域”但不重叠的方法。

如果我有以下范围

1 到 5 3 到 9 11 到 15 12 至 14 13 到 20

方法应该给我

1 到 9 11 到 20

猜猜它会被称为工会?我想方法签名可能看起来像这样:

public static IEnumerable<Range<T>> Collapse<T>(
    this IEnumerable<Range<T>>, 
    IComparable<T> comparer)

    ...

我在这里查看了其他一些类似的问题,但我还没有找到它的实现。 This answer 和同一问题的其他一些答案描述了算法,但我不太确定我是否理解这些算法。也不是特别擅长实现算法,所以我希望这里有人可以帮助我。

【问题讨论】:

+1,我喜欢精彩的算法大战! ***.com/questions/149577/… ***.com/questions/628837/… @nlucaroni - 考虑到 OP 对泛型、比较等的使用,你能举一个例子,它实际上是在 .NET 术语中回答的。另外 - 一些这些问题中的一部分是用于测试交叉点,而不是用于查找最小范围集。 是的,这不一样。 【参考方案1】:

基于 Python 答案的 Go 算法:

package main

import "sort"
import "fmt"

type TupleList [][]int

// Methods required by sort.Interface.
func (s TupleList) Len() int 
    return len(s)

func (s TupleList) Less(i, j int) bool 
    return s[i][1] < s[j][1]

func (s TupleList) Swap(i, j int) 
    s[i], s[j] = s[j], s[i]


func main() 

    ranges :=
        TupleList
            11, 15,
            3, 9,
            12, 14,
            13, 20,
            1, 5

    fmt.Print(ranges)
    sort.Sort(ranges)
    fmt.Print("\n")
    fmt.Print(ranges)
    fmt.Print("\n")
    result := TupleList

    var cur []int
    for _, t := range ranges 
        if cur == nil 
            cur = t
            continue
        
        cStart, cStop := cur[0], cur[1]
        if t[0] <= cStop 
            cur = []intcStart, max(t[1], cStop)
         else 
            result = append(result, cur)
            cur = t
        
    
    result = append(result, cur)
    fmt.Print(result)


func max(v1, v2 int) int 
    if v1 <= v2 
        return v2
    
    return v1

【讨论】:

【参考方案2】:

这是一个细微的变化。我不需要折叠无序列表,而是想维护一个有序列表。在我的情况下,这更有效。我将其发布在这里,以防它对阅读此线程的其他人有用。显然可以很容易地被通用化。

        private static List<Tuple<int, int>> Insert(List<Tuple<int, int>> ranges, int startIndex, int endIndex)
        
            if (ranges == null || ranges.Count == 0)
                return new List<Tuple<int, int>>  new Tuple<int, int>(startIndex, endIndex) ;

            var newIndex = ranges.Count;
            for (var i = 0; i < ranges.Count; i++)
            
                if (ranges[i].Item1 > startIndex)
                
                    newIndex = i;
                    break;
                
            

            var min = ranges[0].Item1;
            var max = ranges[0].Item2;

            var newRanges = new List<Tuple<int, int>>();
            for (var i = 0; i <= ranges.Count; i++)
            
                int rangeStart;
                int rangeEnd;
                if (i == newIndex)
                
                    rangeStart = startIndex;
                    rangeEnd = endIndex;
                
                else
                
                    var range = ranges[i > newIndex ? i - 1 : i];
                    rangeStart = range.Item1;
                    rangeEnd = range.Item2;
                

                if (rangeStart > max && rangeEnd > max)
                
                    newRanges.Add(new Tuple<int, int>(min, max));
                    min = rangeStart;
                
                max = rangeEnd > max ? rangeEnd : max;
            
            newRanges.Add(new Tuple<int, int>(min, max));

            return newRanges;
        

【讨论】:

【参考方案3】:

红宝石版本。在合并之前对范围进行排序似乎是个好主意。

def merge a , b
    return b if a.nil?
    if b.begin <= a.end
        (a.begin..b.end)
    el
        [a , b ]     #no overlap
    end
end

ranges = [(1..5),(11..15),(3..9),(12..14),(13..20)]
sorted_ranges = ranges.sort_by |r| r.begin   #sorted by the start of the range

merged_ranges = sorted_ranges.inject([]) do |m , r|
       last = m.pop
       m << merge(last , r)   
       m.flatten
end

puts merged_ranges

【讨论】:

【参考方案4】:

这是一个简单的循环实现,但至少很清楚。

在我的简单测试中,它适用于 DateTime 和 Int 大部分复杂性在于范围上的重叠/组合方法 算法其实很容易理解,没有浮动变量 为 Range 类添加了一些通常可能有用的功能

-- 此行故意无意义,修复markdown问题--

public static class CollapseRange

    public static IEnumerable<Range<T>> Collapse<T>(this IEnumerable<Range<T>> me)
        where T:struct
    
        var result = new List<Range<T>>();
        var sorted = me.OrderBy(x => x.Start).ToList();
        do 
            var first = sorted.FirstOrDefault();
            sorted.Remove(first);
            while (sorted.Any(x => x.Overlap(first))) 
                var other = sorted.FirstOrDefault(x => x.Overlap(first));
                first = first.Combine(other);
                sorted.Remove(other);
            
            result.Add(first);
         while (sorted.Count > 0);
        return result;
    


[DebuggerDisplay("Range Start - End")]
public class Range<T> where T : struct

    public T Start  set; get; 
    public T End  set; get; 
    public bool Overlap(Range<T> other)
    
        return (Within(other.Start) || Within(other.End) || other.Within(this.Start) || other.Within(this.End));
    
    public bool Within(T point)
    
        var Comp = Comparer<T>.Default;
        var st = Comp.Compare(point, this.Start);
        var ed = Comp.Compare(this.End, point);
        return (st >= 0 && ed >= 0);
    
    /// <summary>Combines to ranges, updating the current range</summary>
    public void Merge(Range<T> other)
    
        var Comp = Comparer<T>.Default;
        if (Comp.Compare(this.Start, other.Start) > 0) this.Start = other.Start;
        if (Comp.Compare(other.End, this.End) > 0) this.End = other.End;
    
    /// <summary>Combines to ranges, returning a new range in their place</summary>
    public Range<T> Combine(Range<T> other)
    
        var Comp = Comparer<T>.Default;
        var newRange = new Range<T>()  Start = this.Start, End = this.End ;
        newRange.Start = (Comp.Compare(this.Start, other.Start) > 0) ? other.Start : this.Start;
        newRange.End = (Comp.Compare(other.End, this.End) > 0) ? other.End : this.End;
        return newRange;
    

【讨论】:

以前从未见过 DebuggerDisplay 属性。那真是太棒了:D【参考方案5】:

将另一顶帽子扔进戒指。与 Gary W 的实现非常相似(我从中得到了排序列表方法),但它是作为测试用例完成的,并且在 Range 类中添加了一些有用的功能。

import java.util.ArrayList;
import java.util.HashSet;
import java.util.Set;

import edu.emory.mathcs.backport.java.util.Collections;

import junit.framework.TestCase;

public class Range2Test extends TestCase 
    public void testCollapse() throws Exception 
        Set<Range<Integer>> set = new HashSet<Range<Integer>>();
        set.add(new Range<Integer>(1, 5));
        set.add(new Range<Integer>(3, 9));
        set.add(new Range<Integer>(11, 15));
        set.add(new Range<Integer>(12, 14));
        set.add(new Range<Integer>(13, 20));
        Set<Range<Integer>> expected = new HashSet<Range<Integer>>();
        expected.add(new Range<Integer>(1, 9));
        expected.add(new Range<Integer>(11, 20));
        assertEquals(expected, collapse(set));
    

    private static <T extends Comparable<T>> Set<Range<T>> collapse(Set<Range<T>> ranges) 
        if (ranges == null)
            return null;
        if (ranges.size() < 2)
            return new HashSet<Range<T>>(ranges);
        ArrayList<Range<T>> list = new ArrayList<Range<T>>(ranges);
        Collections.sort(list);
        Set<Range<T>> result = new HashSet<Range<T>>();
        Range<T> r = list.get(0);
        for (Range<T> range : list) 
            if (r.overlaps(range)) 
                r = r.union(range);
             else 
                result.add(r);
                r = range;
            
        result.add(r);
        return result;
    

    private static class Range<T extends Comparable<T>> implements Comparable<Range<T>> 
        public Range(T start, T end) 
            if (start == null || end == null)
                throw new NullPointerException("Range requires start and end.");
            this.start = start;
            this.end = end;
        
        public T    start;
        public T    end;

        private boolean contains(T t) 
            return start.compareTo(t) <= 0 && t.compareTo(end) <= 0;
        

        public boolean overlaps(Range<T> that) 
            return this.contains(that.start) || that.contains(this.start);
        

        public Range<T> union(Range<T> that) 
            T start = this.start.compareTo(that.start) < 0 ? this.start : that.start;
            T end = this.end.compareTo(that.end) > 0 ? this.end : that.end;
            return new Range<T>(start, end);
        

        public String toString() 
            return String.format("%s - %s", start, end);
        

        public int hashCode() 
            final int prime = 31;
            int result = 1;
            result = prime * result + end.hashCode();
            result = prime * result + start.hashCode();
            return result;
        

        @SuppressWarnings("unchecked")
        public boolean equals(Object obj) 
        if (this == obj)                    return true;
        if (obj == null)                    return false;
        if (getClass() != obj.getClass())   return false;
        Range<T> that = (Range<T>) obj;
        return end.equals(that.end) && start.equals(that.start);
        

        public int compareTo(Range<T> that) 
            int result = this.start.compareTo(that.start);
            if (result != 0)
                return result;
            return this.end.compareTo(that.end);
        
    

【讨论】:

【参考方案6】:

这似乎可行且易于理解。

    public static IEnumerable<Range<T>> Collapse<T>(this IEnumerable<Range<T>> me, IComparer<T> comparer)
    
        List<Range<T>> orderdList = me.OrderBy(r => r.Start).ToList();
        List<Range<T>> newList = new List<Range<T>>();

        T max = orderdList[0].End;
        T min = orderdList[0].Start;

        foreach (var item in orderdList.Skip(1))
        
            if (comparer.Compare(item.End, max) > 0 && comparer.Compare(item.Start, max) > 0)
            
                newList.Add(new Range<T>  Start = min, End = max );
                min = item.Start;
            
            max = comparer.Compare(max, item.End) > 0 ? max : item.End;
        
        newList.Add(new Range<T>Start=min,End=max);

        return newList;
    

这是我在 cmets 中提到的变体。这基本上是一样的,但需要对结果进行一些检查和生成,而不是在返回之前收集到一个列表中。

    public static IEnumerable<Range<T>> Collapse<T>(this IEnumerable<Range<T>> ranges, IComparer<T> comparer)
    
        if(ranges == null || !ranges.Any())
            yield break;

        if (comparer == null)
            comparer = Comparer<T>.Default;

        var orderdList = ranges.OrderBy(r => r.Start);
        var firstRange = orderdList.First();

        T min = firstRange.Start;
        T max = firstRange.End;

        foreach (var current in orderdList.Skip(1))
        
            if (comparer.Compare(current.End, max) > 0 && comparer.Compare(current.Start, max) > 0)
            
                yield return Create(min, max);
                min = current.Start;
            
            max = comparer.Compare(max, current.End) > 0 ? max : current.End;
        
        yield return Create(min, max);
    

【讨论】:

你应该检查列表是否为空,除此之外,好方法。 是的,我采用了这个解决方案的轻微变化。谢谢=) 一个简化:foreach 中的if 语句:您应该只检查是否comparer.Compare(item.Start, max) &gt; 0,因为item.End 也会更大...这种简化当然应该只在以下情况下使用范围总是正数 (item.Start &lt; item.End)【参考方案7】:

这可能会被优化...

using System.Collections.Generic;
using System.Linq;
using System;
static class Range

    public static Range<T> Create<T>(T start, T end)
    
        return new Range<T>(start, end);
    
    public static IEnumerable<Range<T>> Normalize<T>(
        this IEnumerable<Range<T>> ranges)
    
        return Normalize<T>(ranges, null);
    
    public static IEnumerable<Range<T>> Normalize<T>(
        this IEnumerable<Range<T>> ranges, IComparer<T> comparer)
    
        var list = ranges.ToList();
        if (comparer == null) comparer = Comparer<T>.Default;
        for (int i = list.Count - 1; i >= 0; i--)
        
            var item = list[i];

            for (int j = 0; j < i; j++)
            
                Range<T>? newValue = TryMerge<T>(comparer, item, list[j]);

                // did we find a useful transformation?
                if (newValue != null)
                
                    list[j] = newValue.GetValueOrDefault();
                    list.RemoveAt(i);
                    break;
                
            
        
        list.Sort((x, y) =>
        
            int t = comparer.Compare(x.Start, y.Start);
            if (t == 0) t = comparer.Compare(x.End, y.End);
            return t;
        );
        return list.AsEnumerable();
    

    private static Range<T>? TryMerge<T>(IComparer<T> comparer, Range<T> item, Range<T> other)
    
        if (comparer.Compare(other.End, item.Start) == 0)
         // adjacent ranges
            return new Range<T>(other.Start, item.End);
        
        if (comparer.Compare(item.End, other.Start) == 0)
         // adjacent ranges
            return new Range<T>(item.Start, other.End);
        
        if (comparer.Compare(item.Start, other.Start) <= 0
            && comparer.Compare(item.End, other.End) >= 0)
         // item fully swalls other
            return item;
        
        if (comparer.Compare(other.Start, item.Start) <= 0
            && comparer.Compare(other.End, item.End) >= 0)
         // other fully swallows item
            return other;
        
        if (comparer.Compare(item.Start, other.Start) <= 0
            && comparer.Compare(item.End, other.Start) >= 0
            && comparer.Compare(item.End, other.End) <= 0)
         // partial overlap
            return new Range<T>(item.Start, other.End);
        
        if (comparer.Compare(other.Start, item.Start) <= 0
             && comparer.Compare(other.End, item.Start) >= 0
            && comparer.Compare(other.End, item.End) <= 0)
         // partial overlap
            return new Range<T>(other.Start, item.End);
        
        return null;
    

public struct Range<T>

    private readonly T start, end;
    public T Start  get  return start;  
    public T End  get  return end;  
    public Range(T start, T end)
    
        this.start = start;
        this.end = end;
    
    public override string ToString()
    
        return start + " to " + end;
    


static class Program

    static void Main()
    
        var data = new[] 
        
            Range.Create(1,5), Range.Create(3,9),
            Range.Create(11,15), Range.Create(12,14),
            Range.Create(13,20)
        ;
        var result = data.Normalize();
        foreach (var item in result)
        
            Console.WriteLine(item);
        
    

【讨论】:

@Mitch - 是的,我可能会重构为 TryMerge 方法,即 if(TryMerge(other, item, out result)) list[j] = result; list.RemoveAt(i)); 这个似乎运行良好。您是否有一种聪明的方法可以合并彼此相邻的范围?所以如果你有(1,5)和(6,9)你会得到(1,9)?当然,这可能会有点复杂的日期......也许可以在之后查看列表并检查所有端点是否是邻居或其他东西...... TryMerge 方法会发生什么? (1,5) (6,9) => (1,9) 需要特殊处理,因为大多数值是连续的 - 即 5 和 6 之间存在明确的差距。使用 TryMerge - 只是上面的很多颠簸;实际上,有一个更好的方法来简化它 - 将更新......【参考方案8】:

折叠列表的想法对我来说简直是“减少”。但它并没有像我希望的那样优雅。

def collapse(output,next_range):
    last_start,last_end = output[-1]
    next_start, next_end = next_range
    if (next_start <= last_end):
        output[-1] = (last_start, max(next_end, last_end))
    else:
        output.append(next_range)
    return output

ranges = [
  (11, 15),
  (3, 9),
  (12, 14),
  (13, 20),
  (1, 5)]

ranges.sort()
result = [ranges.pop(0)]
reduce(collapse, ranges,result)

print result

感谢 yairchu 输入数据,以便我可以剪切和粘贴它:)

【讨论】:

【参考方案9】:

非冗长的 Python 解决方案:

ranges = [
  (11, 15),
  (3, 9),
  (12, 14),
  (13, 20),
  (1, 5)]

result = []
cur = None
for start, stop in sorted(ranges): # sorts by start
  if cur is None:
    cur = (start, stop)
    continue
  cStart, cStop = cur
  if start <= cStop:
    cur = (cStart, max(stop, cStop))
  else:
    result.append(cur)
    cur = (start, stop)
result.append(cur)

print result

【讨论】:

【参考方案10】:
static void Main(string[] args) 
    List<Range<int>> ranges = new List<Range<int>>() 
                   
        new Range<int>(3,9),
        new Range<int>(1,5),
        new Range<int>(11,15),
        new Range<int>(12,14),
        new Range<int>(13,20),
    ;

    var orderedRanges = ranges.OrderBy(r => r.Start);
    var lastRange = new Range<int>(orderedRanges.First().Start, orderedRanges.First().End);

    List<Range<int>> newranges = new List<Range<int>>();            
    newranges.Add(lastRange);

    foreach (var range in orderedRanges.Skip(1)) 
        if (range.Start >= lastRange.Start && range.Start <= lastRange.End && range.End > lastRange.End) 
            lastRange.End = range.End;
        
        else if (range.Start > lastRange.End) 
            lastRange = new Range<int>(range.Start, range.End);
            newranges.Add(lastRange);
        
    

    foreach (var r in newranges) 
        Console.WriteLine("0, 1", r.Start, r.End);
    

类似的东西。没有验证它是否适用于所有输入。

【讨论】:

以上是关于用于折叠一组可能重叠的范围的好的通用算法是啥?的主要内容,如果未能解决你的问题,请参考以下文章

使用 hash_map 时在 stl 字符串上使用的最佳散列算法是啥?

最好的通用 SVN 忽略模式?

网络爬虫产品谁知道有哪些做的好的?

PHP 特征 - 定义通用常量

快速算法在一组范围中快速找到一个数字所属的范围?

如何使用一组重叠最小的范围覆盖一个范围?