对一个容器多次排序，用啥容器，用啥方法

Posted 2023-02-21

技术标签:

【中文标题】对一个容器多次排序，用啥容器，用啥方法【英文标题】：Sorting a container multiple times, what container and what aproach to use对一个容器多次排序，用什么容器，用什么方法 【发布时间】：2017-07-01 15:16:31 【问题描述】：

我有一些需要打印的数据，为了简单起见，可以说它是一个包含一些参数的人的容器（向量）。在我程序的不同部分，我需要打印所有按不同参数排序的部分。我的问题是

1.) 选择哪个容器？（我个人选择了矢量）。

2.) 哪种方法更好，每次都对整个向量进行排序，还是制作该向量的副本并将其保存排序更好？在我的解决方案中，我每次都对相同的向量进行排序，但由于速度的原因，对于大型向量来说，这可能不是正确的方法。

class Person

private:
    std::string name;
    std::string surname;
    int age;
public:
    Person(std::string name, std::string surname, int age) : name name , surname surname , age age  ;
    void print()  std::cout << name << " " << surname << " " << age << std::endl; ;
    static bool sortName(Person const &A, Person const &B)  return A.name < B.name; ;
    static bool sortSurname(Person const &A, Person const &B)  return A.surname < B.surname; ;
    static bool sortAge(Person const &A, Person const &B)  return A.age < B.age; ;
;

主要：

int main()

    std::vector<Person> persons;
    Person person1("John", "Smith", 30);
    Person person2("Mark", "Cooper", 28);
    Person person3("George", "Orwell", 19);

    persons.push_back(person1);
    persons.push_back(person2);
    persons.push_back(person3);

    std::sort(persons.begin(), persons.end(), Person::sortSurname);
    for (int i = 0; i < persons.size(); ++i)
    
        persons[i].print();
    

    // do some other stuff here ... and then ...
    std::sort(persons.begin(), persons.end(), Person::sortName);
    for (int i = 0; i < persons.size(); ++i)
    
        persons[i].print();
    

    // do some other stuff here ... and then ...
    std::sort(persons.begin(), persons.end(), Person::sortAge);
    for (int i = 0; i < persons.size(); ++i)
    
        persons[i].print();
    

    return 0;

【问题讨论】：

您需要量化 "huge" 即小于 10,000 的向量排序，超过 10,000,000 您可能需要不同的方法。在 2 之间测量它。矢量元素的大小也会影响我认为的选择... 【参考方案1】：

boost::multi_index_container 允许您定义具有任意数量的不同索引或视图的任何类型的容器。

容器会在插入和移除时自动更新索引。

这是一个庞大的模板库，需要一点时间来适应，但文档很好，有很多示例。

这是一个这样表达的实现：

#include <iostream>
#include <string>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/mem_fun.hpp>

class Person 
private:
    std::string name;
    std::string surname;
    int age;
public:
    Person(std::string name, std::string surname, int age) : namename, surnamesurname, ageage ;

    auto get_name() const -> const std::string&  return name; 
    auto get_surname() const -> const std::string&  return surname; 
    auto get_age() const -> int  return age; 

    void print() const  std::cout << name << " " << surname << " " << age << std::endl; ;
;

namespace bmi = boost::multi_index;

struct by_name ;
struct by_surname ;
struct by_age;
using PersonTable = boost::multi_index_container<Person,
        bmi::indexed_by<
                bmi::ordered_non_unique<bmi::tag<by_name>, bmi::const_mem_fun<Person,std::string const&,&Person::get_name>>,
                bmi::ordered_non_unique<bmi::tag<by_surname>, bmi::const_mem_fun<Person,std::string const&,&Person::get_surname>>,
                bmi::ordered_non_unique<bmi::tag<by_age>, bmi::const_mem_fun<Person,int,&Person::get_age>>
        >
>;

int main()

    PersonTable people;
    people.insert(Person("John", "Smith", 30));
    people.insert(Person("Mark", "Cooper", 28));
    people.insert(Person("George", "Orwell", 19));

    std::cout << "by name" << std::endl;
    for (auto&& person : people.get<by_name>())
    
        person.print();
    
    std::cout << "\nby surname" << std::endl;
    for (auto&& person : people.get<by_surname>())
    
        person.print();
    
    std::cout << "\nby age" << std::endl;
    for (auto&& person : people.get<by_age>())
    
        person.print();

预期输出：

by name
George Orwell 19
John Smith 30
Mark Cooper 28

by surname
Mark Cooper 28
George Orwell 19
John Smith 30

by age
George Orwell 19
Mark Cooper 28
John Smith 30

此处的文档：http://www.boost.org/doc/libs/1_64_0/libs/multi_index/doc/index.html

【讨论】：

【参考方案2】：

考虑用指向 Person 的指针向量替换你的 Person 向量。有了这个，只需交换指针就可以很便宜地交换两个 Person。然后使用类中定义的函子将指针按所需的排序顺序放置，然后开始打印。

【讨论】：

【参考方案3】：

我会使用std::shared_ptr<Person> 的3 个std::set 实例，每个实例按Person 的相应字段排序：

int main()

    std::shared_ptr<Person> person1 = std::make_shared<Person>("John", "Smith", 30);
    std::shared_ptr<Person> person2 = std::make_shared<Person>("Mark", "Cooper", 28);
    std::shared_ptr<Person> person3 = std::make_shared<Person>("George", "Orwell", 19);

    std::set<std::shared_ptr<Person>> persons1([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) 
        return a->name < b->name;
    );
    std::set<std::shared_ptr<Person>> persons2([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) 
        return a->surname < b->surname;
    );
    std::set<std::shared_ptr<Person>> persons3([](std::shared_ptr<Person> a, std::shared_ptr<Person> b) 
        return a->age < b->age;
    );

    persons1.insert(person1);
    persons1.insert(person2);
    persons1.insert(person3);

    persons2.insert(person1);
    persons2.insert(person2);
    persons2.insert(person3);

    persons3.insert(person1);
    persons3.insert(person2);
    persons3.insert(person3);

    return 0;

使用std::shared_ptr，存储时不会浪费内存多个容器中的对象。 std::set 是已经排序的容器，所以你不必排序每次使用它时，只需从头到尾枚举元素结束。

【讨论】：

【参考方案4】：

IMO，您现在使用的方法很好，即在运行时需要时进行排序。对于较大的数据集，您需要首先评估您对内存和处理能力的要求。例如，对于一个非常大的数据集，您将无法在内存中对其进行排序。而且，如果您决定采用多线程解决方案，则会出现同步问题。因此，您需要一些专门的解决方案，例如 DBMS，您可以在其中根据运行时的需要查询已排序的数据。您将拥有索引等功能来优化查询时间。

【讨论】：

【参考方案5】：

其中主要取决于 3 个因素 - 1. 数据大小 2. 你在看什么样的表演 3. 可以为#2 折衷的空间（内存）量

一般std::sort() 在 nlogn 的平均表现 -

复杂性最后：执行大约 N*log2(N)（其中 N 是这个距离）元素的比较，以及最多元素交换（或移动）。

现在，如果您的用例涉及过于频繁调用的排序方法，那么预排序和保存向量可能是有意义的 - 在这种情况下，您将获得可观的性能提升。现在，在这个设计中，您必须考虑集合是否可修改？如果是，那么多久一次？那么你必须考虑平均案例插入性能的影响。

总之这取决于

【讨论】：

【参考方案6】：

您应该为存储在主向量中的对象构建多个索引向量，并按各种标准对它们进行排序，而不是对对象的向量进行排序（这对于具有许多字段的复杂对象来说相当昂贵）。

#include <algorithm>
...

::std::vector< Person > persons;
//  add persons...

::std::vector< ::std::size_t > sorted_indexes;
sorted_indexes.reserve(persons.size());

    ::std::size_t index;
    ::std::generate
    (
        sorted_indexes.begin()
    ,   sorted_indexes.end()
    ,   [&index]return index++;
    );

::std::sort
(
    sorted_indexes.begin()
,   sorted_indexes.end()
,   [&persons](::std::size_t const left, ::std::size_t const right)
    
        return Person::sortSurname(persons[left], persons[right]);
    
);
for(auto person_index: sorted_indexes)

    persons[person_index].print();

sortSurname 也应该采用 const 引用以避免复制：

static bool sortSurname(Person const & left, Person const & right)  return left.surname < right.surname; ;

【讨论】：

::std 中所有多余的 ::s 是怎么回事？这不只是我用std::reference_wrapper提出的解决方案的一个更复杂的版本吗？ @JesperJuhl :: in ::std 似乎只是多余的，直到您在非全局范围内遇到其他一些 std 命名空间并花了几天时间弄清楚为什么事情没有按应有的方式工作。使用引用包装器的解决方案基本上使用相同的方法，但是当使用索引将项目存储在向量中时，可能会提供将更多项目添加到主向量中将使所有引用无效。虽然在这个例子中它并不真正相关，因为原始向量不会改变。 @VTT，是的，你是对的 const 参考......我编辑了它。【参考方案7】：

如果向量很小或者元素复制起来很便宜，你可以在需要时重新排序，没有任何问题。

如果向量的元素很大且复制起来很昂贵，您可以以您需要的一种方式对向量进行一次排序，然后创建第二个 std::reference_wrappers 向量并以不同的方式排序，以创建第二个原始向量的“视图”，既不修改原始向量，也不将元素复制到第二个向量。

至于容器的选择；除非您特别需要其他容器之一的某些特殊属性，否则请使用 std::vector。

在任何情况下，基准不同的解决方案（具有优化的构建）并在确定一个解决方案之前测量不同解决方案的性能。

【讨论】：

以上是关于对一个容器多次排序，用啥容器，用啥方法的主要内容，如果未能解决你的问题，请参考以下文章

Java中对数组升序排列用Arrays.sort( )方法，那降序排列用啥方法？

在SQL语句中,分组用啥子句，排序用啥子句

podman用啥语言开发的

用spss分析几个因素对某一因素的影响，用啥研究方法。

php 程序：对数组按键名从大到小排列要用啥函数 ,要保持原键名不变

Linux列出当前目录下的文件和目录用啥命令