如何对数组中的连续整数进行分组？

Posted 2023-02-22

技术标签:

【中文标题】如何对数组中的连续整数进行分组？【英文标题】：How to group consecutive integers in an array? 【发布时间】：2021-01-31 07:10:28 【问题描述】：

如何将整数列表拆分为子列表，然后返回std::map<int, string> 以将int 映射到由子列表连接的string？

需要保证每个子列表随着连续值单调递增

例子

input:
1,2,3, 6,7,8,9, 12, 14,15

output:
1  -> "1-2-3"
2  -> "1-2-3"
3  -> "1-2-3"
6  -> "6-7-8-9"
7  -> "6-7-8-9"
8  -> "6-7-8-9"
9  -> "6-7-8-9"
12 -> "12"
14 -> "14-15"
15 -> "14-15"

我尝试了这段代码并使它正确，谢谢大家提供想法

#include <stdio.h>
#include <iostream>
#include <sstream>
#include <map>
using namespace std;

void split(int* lis, int num, map<int, string> &dict)

    int start = 0, end = 0;
    while (true)
    
        string str = to_string(lis[start]);
        for (int j = start + 1; j < num; j++)
        
            if (lis[j] - 1 == lis[j - 1])
            
                end = j;
                str = str + "-" + to_string(lis[j]);
            
            else
                break;
        
        for (int j = start; j <= end; j++)
            dict[lis[j]] = str;
        start = end = end + 1;
        if (end == num)
            return;
    


int main(void)

    int lis[10] =  1,3,5,6,7,8,11,12,13,19 ;
    map<int, string> dict;
    split(lis, 10, dict);
    for (int i = 0; i < 10; i++)
        cout << lis[i] << "\t" << dict[lis[i]] << '\n';
    return 0;

【问题讨论】：

你到底想用什么分割？构建邻接差异序列。 IE。 1,1,3,1,1,1,3,2,1 由此，确定每个集群的外观。祝你好运。请展示您为解决此问题所做的任何尝试。请解释为什么你被卡住了。除非您能证明自己已经付出了一些努力，否则要求做作业不会得到太多回应。有两个索引：序列开始和序列结束。首先将序列索引的开始设置为0（“列表”中的第一个索引），然后使用序列结束索引遍历列表。如果当前元素（由序列索引的结尾指示）与下一个元素之间的差异大于1，那么您将拥有由序列索引的开始和结束指示的完整序列。以某种合适的方式存储，并将序列索引的开始和结束都设置为下一个元素，然后继续。 @某程序员老兄感谢您提供一个好主意！ 【参考方案1】：

这是使用 STL 容器和算法的一个很好的例子。

所以，我们可以从任何值的std::vector 开始。这样，我们可以通过将其放入std::set 来排序和删除重复项。

然后我们遍历该数据并使用std::adjacent_find 查找不等于先前值 + 1 的值。我们可以在循环中执行此操作，直到到达原始数据的末尾。

每次，当我们发现这样一个序列的结尾时，我们将构建生成的指定字符串并将其与源数据一起存储在std::map中。

我们将开始迭代器设置为当前的结束迭代器并继续搜索。

最后，我们将结果展示给用户。

当然有很多可能的解决方案。请看下面的一个例子：

#include <iostream>
#include <sstream>
#include <vector>
#include <set>
#include <iterator>
#include <algorithm>
#include <string>
#include <map>
#include <iomanip>

int main() 

    // Some test data. Not sorted, with duplicates
    std::vector testData 15,14,12,14,3,3,12,1,7,8,7,8,9,2,1,1,6,6,9,15 ;

    // Sort and remove duplicates. Maybe you wanna do this or not. Up to you. If not, eliminate that line and work with original data
    std::set data(testData.begin(), testData.end());

    // Here we will store the result
    std::map<int, std::string> result;

    // We will start the evaluation at the beginning of our data
    auto startOfSequence = data.begin();

    // Find all sequences
    while (startOfSequence != data.end()) 


        // FInd first value that is not greate than one
        auto endOfSequence = std::adjacent_find(startOfSequence, data.end(), [](const auto& v1, const auto& v2) return v2 != v1 + 1; );
        if (endOfSequence != data.end()) std::advance(endOfSequence, 1);

        // Build resulting string
        std::ostringstream oss;
        bool writeDash = false;
        for (auto it = startOfSequence; it != endOfSequence; ++it) 
            oss << (writeDash ? "-" : "") << std::to_string(*it);
            writeDash = true;
        

        // Copy result to map
        for (auto it = startOfSequence; it != endOfSequence; ++it)
            result[*it] = oss.str();
        
        // Continue to search for the next sequence
        startOfSequence = endOfSequence;
    
    // Show result on the screen. Or use the map in whichever way you want.
    for (const auto& [value, text] : result) std::cout << std::left << std::setw(2) << value << " -> """ << text << """\n";

    return 0;

我使用的是 CTAD，所以你必须启用 C++17 进行编译。

使用 Microsoft Visual Studio Community 2019 版本 16.8.2 开发、编译和测试

另外用 gcc10.2 和 clang 11.0.1 编译和测试

编辑

同时 OP 发布了自己的代码。我根据这种风格调整了我的功能。

但在 C++ 中，我们应该从不使用 C 风格的数组，尤其是不using namespace std;

无论如何。请参阅下面的解决方案。

#include <map>
#include <string>
#include <sstream>
#include <algorithm>
#include <iostream>


void split(int* lis, int num, std::map<int, std::string>& dict) 

    // We will start the evaluation at the beginning of our data
    auto startOfSequence = lis;
    auto endOfList = lis + num;

    // Find all sequences
    while (startOfSequence != endOfList) 

        // FInd first value that is not greate than one
        auto endOfSequence = std::adjacent_find(startOfSequence, endOfList,
            [](const auto& v1, const auto& v2) return v2 != v1 + 1; );
            if (endOfSequence != endOfList) std::advance(endOfSequence, 1);

        // Build resulting string
        std::ostringstream oss;
        bool writeDash = false;
        for (auto it = startOfSequence; it != endOfSequence; ++it) 
            oss << (writeDash ? "-" : "") << std::to_string(*it);
            writeDash = true;
        

        // Copy result to map
        for (auto it = startOfSequence; it != endOfSequence; ++it)
            dict[*it] = oss.str();

        // Continue to search for the next sequence
        startOfSequence = endOfSequence;
    


int main() 
    int lis[10] =  1,3,5,6,7,8,11,12,13,19 ;
    std::map<int, std::string> dict;
    split(lis, 10, dict);
    for (int i = 0; i < 10; i++)
        std::cout << lis[i] << "\t" << dict[lis[i]] << '\n';
    return 0;

【讨论】：

@Suman Bhadra：您试图编辑我的代码并将“”添加到vector 和set。对于 C++17，这不是必需的。编译器将推断出正确的数据类型。这就是所谓的CTAD。见这里：en.cppreference.com/w/cpp/language/…【参考方案2】：

我想为这个问题添加另一个解决方案，它会给出相同的结果，但对大输入数据有更好的性能。

例如，假设您的输入数组是一个大小为100000 的数组，其中的整数都是连续的，即[1 , 2 , 3 , ... , 100000]

因此，通过您需要存储数据的方法，您将在地图中存储1e5 次大约200000 个字符的字符串，这无疑会在运行时崩溃（O(n^2) 内存复杂度）

因此，与其将数组元素的值作为 map 的键，您可以在此之前根据其所在的序列为每个数组值创建一个键，然后您可以通过sequenceOf[keyOfValue[arrayValue]] 检索该序列这将导致O(n) 内存复杂度，其中n 是数组的大小。

代码示例：

#include<iostream>
#include<map>
#include<string>
#include<vector>
#include<algorithm>
#include<sstream>  
using namespace std; 

string convert_to_string(vector<int> sequence)
    string ret = "" ; 
    
    for(int i = 0 ; i < sequence.size() ; i++)
        ret += to_string(sequence[i]) ;
        ret += "-";
    
    ret.pop_back() ; 
    return ret ; 


int main()

    int a[] = 1,2,3, 6,7,8,9, 12, 14,15; 
    int size = sizeof(a) / sizeof(a[0]) ; 
    
    map<int , string>mp ;
    map<int , int>keyOfValue; 
    vector<int>sequence; 
    int key = 0 ; 
    
    for(int i = 0 ; i < size ; i++)
        
        if(sequence.empty() || sequence.back() + 1 == a[i])
            sequence.push_back(a[i]); 
            
            if(i != size - 1)
                continue ;
        
        
        for(int j = 0 ; j < sequence.size(); j++)
            // store all sequence element keys to be the same 
            keyOfValue[sequence[j]] = key ; 
        
        
        // store the sequence only once. 
        string value = convert_to_string(sequence); 
        mp[key++] = value ; 
        sequence.clear(); 
        
        if(i != size - 1)
            i--; 
    
    
    // How to retrieve the value 
    for(int i = 0 ; i < size; i++) 
        cout << mp[keyOfValue[a[i]]] << endl ;

除了使用std::map 来存储序列，您还可以使用std::vector<string> 来减少从logarithmic 到O(1) 时间的数组检索时间，但是我使用map，因为这是问题的目标。

【讨论】：

以上是关于如何对数组中的连续整数进行分组？的主要内容，如果未能解决你的问题，请参考以下文章

如何对数组中的日期进行分组？

如何对记录数组的数组中的字段进行分组？

如何根据反应中的状态属性对数组中的元素进行分组？

如何对数组中的特定字段进行分组？

对 pandas 数据框中的连续值进行分组

如何在js中对数组中的数据进行分组