将多重集排序为每个可用元素出现一次的升序子序列

Posted 2023-02-23

技术标签:

【中文标题】将多重集排序为每个可用元素出现一次的升序子序列【英文标题】：Sorting multiset as ascending subsequences with one occurence of each available element 【发布时间】：2013-10-31 18:45:45 【问题描述】：

假设给定多重集，例如

A = 1, 1, 1, 2, 2, 3, 3, 3.

像这样对元素进行排序的最简单方法是什么：

(1, 2, 3, 1, 2, 3, 1, 3),

即从集合的可用元素构建的升序子序列构建的序列？

如何在 C++ 和 Python 中实现。有没有图书馆？如何“手工”完成？

【问题讨论】：

这将始终按数字排序，还是您的多重集偶尔会包含非数字？同样，您是否总是有每个数字的固定数字，或者它们也会有所不同？最后，到目前为止，您尝试了什么？我需要它，因为我认为它对一项练习很有用。它不是来自任何学校。只有数字。我认为您需要指定更多约束。例如。任何序列都可以很容易地分成升序子序列——所有子序列的长度都为 1。但大概这不是你想要的？另外，多重集B = 2, 2 的答案是什么？我想在每个子序列中使用尽可能多的元素。对于 2,2 它只是 (2,2) 如果您可以使用第二个容器，但不难按照您指定的顺序将集合的元素重写到第二个容器中。 【参考方案1】：

您可以将其实现为Counting sort 首先计算每个元素出现的次数，元素是数组中存储每个值出现次数的索引。然后遍历该数组，直到每个索引的值为零。

这可能不是实现它的最佳（或最有效）方法，但这是首先想到的解决方案。

【讨论】：

Python 似乎有一个 Counter 类来为您计算。（请注意，它似乎没有按照您预期的方式对键进行排序。） @AaronMcDaid 我在想更多关于 c++ 的东西，但是很酷 :) 这绝对可以用在这个实现中。【参考方案2】：

假设您愿意修改原始的多集，（或处理它的副本），请执行类似的操作

while(!data.empty()) 
    auto x = data.begin();
    while( x != data.end()) 
        auto value = *x;
        cout << value << endl;
        data.erase(x); // delete *one* item
        x = data.upper_bound(value); // find the next *different* value

这不是很有效。如果您有一个庞大的数据集，那么您可能需要考虑您的限制是什么（内存还是时间？）。

【讨论】：

【参考方案3】：

在 Python 中，您可以使用 groupby 从排序列表中获取唯一项组的矩阵：

from itertools import groupby, izip_longest

A=[1, 1, 1, 2, 2, 3, 3, 3]

groups=[]
for k, g in groupby(sorted(A)):
    groups.append(list(g))

print groups
# [[1, 1, 1], [2, 2], [3, 3, 3]]

更简洁地说，您可以使用列表推导来做同样的事情：

groups=[list(g) for _, g in groupby(sorted(A))]
# [[1, 1, 1], [2, 2], [3, 3, 3]]

或者，您可以扩展 Python 版本的多重集 Counter，并对键进行排序以获得相同的嵌套列表：

from collections import Counter
c=Counter(A)
groups=[[k]*c[k] for k in sorted(c.keys())]
# [[1, 1, 1], [2, 2], [3, 3, 3]]

一旦您拥有嵌套列表groups，使用izip_longest 反转矩阵，展平列表并删除None 值：

print [e for t in izip_longest(*groups) for e in t if e!=None]

打印

[1, 2, 3, 1, 2, 3, 1, 3]

【讨论】：

为什么不直接理解组的构建呢？ [list(g) for _, g in groupby(sorted(A))].【参考方案4】：

以下是如何在没有任何导入库的情况下在 python 中手动操作：

A = (1, 1, 1, 2, 2, 3, 3, 3)

# create a list out of a set of unique elems in A
a = list(set(A))
a.sort() # sort so they are in ascending order

countList = []

# find how many repeated elems in the list set we just made
for i, elem in enumerate(a, 0):
    countList.append(A.count(elem))

# find the what is the lowest repeated number in the orig list
minEntry = min(countList)
# we can multiply the list set by that lowest number
outString = a * minEntry

# add the left over numbers to the outstring
for i in range(len(countList)):
    count = abs(countList[i] - minEntry)
    if count != 0:
        outString.append(a[i]*count)

print outString

这是输出字符串

[1, 2, 3, 1, 2, 3, 1, 3]

【讨论】：

【参考方案5】：

如果您可以使用第二个连续容器，那么在 C++ 中，您可以通过标准算法 std::unique_copy 和 std::set_difference 将原始容器的元素简单地移动到第二个容器中。

【讨论】：

【参考方案6】：

def Test(seq):
    index = 0
    Seq = seq
    newlist = []
    while len(Seq) != 0:
            newlist.append(list(set(Seq).union()))
            for Del in newlist[index]:
                    Seq.remove(Del)
            index += 1
    return [y for x in newlist for y in x]

【讨论】：

【参考方案7】：

在 C++ 中，您可以准备一个迭代器列表到相等范围的开头，而不是操作数据结构，然后依次取消引用/递增这些迭代器：

#include <set>
#include <list>
#include <iostream>

int main()

    std::multiset<int> A = 1, 1, 1, 2, 2, 3, 3, 3;

    // build a list of iterator pairs to each equal range
    std::list< std::pair<std::multiset<int>::iterator,
                         std::multiset<int>::iterator> > iters;
    for(auto it=A.begin(); it != A.end(); it = A.upper_bound(*it))
        iters.push_back(A.equal_range(*it));

    // for each non-empty subrange, show what the first iterator is
    // pointing to, then advance it by one position in its subrange
    // if the subrange is empty, drop it from the list
    while(!iters.empty())
        for(auto it = iters.begin(); it != iters.end(); )
            if(it->first != it->second)
               std::cout << *it++->first++ << ' '; // don't do this at home
            else
               it = iters.erase(it);

【讨论】：

以上是关于将多重集排序为每个可用元素出现一次的升序子序列的主要内容，如果未能解决你的问题，请参考以下文章