按字符拆分字符串

Posted 2023-02-19

技术标签:

【中文标题】按字符拆分字符串【英文标题】：Splitting a string by a character 【发布时间】：2012-04-20 23:08:18 【问题描述】：

我知道这是一个非常简单的问题，但我只想一劳永逸地为自己解决这个问题

我只想使用字符作为拆分分隔符将字符串拆分为数组。（很像 C# 著名的 .Split() 函数。我当然可以应用蛮力方法，但我想知道是否有比这更好的方法。

到目前为止，我已经搜索过并且可能是 最接近 的解决方案方法是使用 strtok()，但是由于它的不便（将您的字符串转换为字符数组等）我不喜欢使用它。有没有更简单的方法来实现这个？

注意：我想强调这一点，因为人们可能会问“为什么蛮力不起作用”。我的蛮力解决方案是创建一个循环，并在里面使用 substr() 函数。但是，由于它需要起点和长度，所以当我想拆分日期时它会失败。因为用户可能将其输入为 7/12/2012 或 07/3/2011，在计算“/”分隔符的下一个位置之前，我可以真正知道长度。

【问题讨论】：

Splitting String C++的可能重复这能回答你的问题吗？ How do I iterate over the words of a string? 【参考方案1】：

我已经使用了很长时间的一个解决方案是拆分，它可以与向量和列表一起使用

#include <vector>
#include <string>
#include <list>

template< template<typename,typename> class Container, typename Separator >
Container<std::string,std::allocator<std::string> > split( const std::string& line, Separator sep ) 
    std::size_t pos = 0;
    std::size_t next = 0;
    Container<std::string,std::allocator<std::string> > fields;
    while ( next != std::string::npos ) 
        next = line.find_first_of( sep, pos );
        std::string field = next == std::string::npos ? line.substr(pos) : line.substr(pos,next-pos);
        fields.push_back(  field );
        pos = next + 1;
    
    return fields;


int main() 
    auto res1 = split<std::vector>( "abc,def", ",:" );
    auto res2 = split<std::list>( "abc,def", ',' );

【讨论】：

【参考方案2】：

我知道这个解决方案并不合理，但它是有效的。此处提供此方法是为了作为当前问题解决方案的一种变体。

#include <iostream>
#include <vector>
#include <string>
using namespace std;
const int maximumSize=40;
vector<int> visited(maximumSize, 0);
string word;
void showContentVectorString(vector<string>& input)

    for(int i=0; i<input.size(); ++i)
    
        cout<<input[i]<<", ";
    
    return;

void dfs(int current, int previous, string& input, vector<string>& output, char symbol)

    if(visited[current]==1)
    
        return;
    
    visited[current]=1;
    string stringSymbol;
    stringSymbol.push_back(symbol);
    if(input[current]!=stringSymbol[0])
    
        word.push_back(input[current]);
    
    else
    
        output.push_back(word);
        word.clear();
    
    if(current==(input.size()-1))
    
        output.push_back(word);
        word.clear();
    
    for(int next=(current+1); next<input.size(); ++next)
    
        if(next==previous)
        
            continue;
        
        dfs(next, current, input, output, symbol);
    
    return;

void solve()

    string testString="this_is_a_test_string";
    vector<string> vectorOfStrings;
    dfs(0, -1, testString, vectorOfStrings, '_');
    cout<<"vectorOfStrings <- ";
    showContentVectorString(vectorOfStrings);
    return;

int main()

    solve();
    return 0;

结果如下：

vectorOfStrings <- this, is, a, test, string,

【讨论】：

【参考方案3】：

我天生不喜欢stringstream，虽然我不知道为什么。今天，我编写了这个函数，以允许将任意字符或字符串的std::string 拆分为向量。我知道这个问题很老了，但我想分享另一种拆分 std::string 的方法。

此代码从结果中完全省略了分割字符串的部分，尽管可以轻松修改它以包含它们。

#include <string>
#include <vector>

void split(std::string str, std::string splitBy, std::vector<std::string>& tokens)

    /* Store the original string in the array, so we can loop the rest
     * of the algorithm. */
    tokens.push_back(str);

    // Store the split index in a 'size_t' (unsigned integer) type.
    size_t splitAt;
    // Store the size of what we're splicing out.
    size_t splitLen = splitBy.size();
    // Create a string for temporarily storing the fragment we're processing.
    std::string frag;
    // Loop infinitely - break is internal.
    while(true)
    
        /* Store the last string in the vector, which is the only logical
         * candidate for processing. */
        frag = tokens.back();
        /* The index where the split is. */
        splitAt = frag.find(splitBy);
        // If we didn't find a new split point...
        if(splitAt == std::string::npos)
        
            // Break the loop and (implicitly) return.
            break;
        
        /* Put everything from the left side of the split where the string
         * being processed used to be. */
        tokens.back() = frag.substr(0, splitAt);
        /* Push everything from the right side of the split to the next empty
         * index in the vector. */
        tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));

要使用，只需像这样调用...

std::string foo = "This is some string I want to split by spaces.";
std::vector<std::string> results;
split(foo, " ", results);

您现在可以随意访问向量中的所有结果。就这么简单——没有stringstream，没有第三方库，没有退回到C！

【讨论】：

你对为什么这会更好有什么论据吗？我也不是标准 C++ 中某些东西的忠实粉丝（例如极其冗长的流，但它们正在被 fmtlib 取代，所以我很高兴）。但是当我可以编写更少的代码行时，我倾向于将这些感觉放在一边——一开始就大大减少了出现错误的机会。【参考方案4】：

Boost 在algorithm/string.hpp 中有您正在寻找的split()：

std::string sample = "07/3/2011";
std::vector<std::string> strs;
boost::split(strs, sample, boost::is_any_of("/"));

【讨论】：

【参考方案5】：

使用向量、字符串和字符串流。有点麻烦，但可以解决问题。

#include <string>
#include <vector>
#include <sstream>

std::stringstream test("this_is_a_test_string");
std::string segment;
std::vector<std::string> seglist;

while(std::getline(test, segment, '_'))

   seglist.push_back(segment);

这会产生一个内容相同的向量

std::vector<std::string> seglist "this", "is", "a", "test", "string" ;

【讨论】：

其实这种方法正是我正在寻找的。很容易理解，不使用外部库，非常简单。谢谢@thelazydeveloper！如果要提高性能，可以添加

seglist.reserve(std::count_if(str.begin(), str.end(), [&amp;](char c)  return c == splitChar; ) + (str.empty() ? 1 : 0));

如果要拆分的原始字符串存储在str中。【参考方案6】：

对于那些没有（想要，需要）C++20 的人，这个C++11 解决方案可能是一种选择。

它在输出迭代器上进行了模板化，因此您可以提供自己的目的地，拆分项目应附加到该目的地，并提供如何处理多个连续分隔字符的选择。

是的，它使用std::regex，但是，如果你已经在 C++11 快乐的土地上，为什么不使用它。

////////////////////////////////////////////////////////////////////////////
//
// Split string "s" into substrings delimited by the character "sep"
// skip_empty indicates what to do with multiple consecutive separation
// characters:
//
// Given s="aap,,noot,,,mies"
//       sep=','
//
// then output gets the following written into it:
//      skip_empty=true  => "aap" "noot" "mies"
//      skip_empty=false => "aap" "" "noot" "" "" "mies"
//
////////////////////////////////////////////////////////////////////////////
template <typename OutputIterator>
void string_split(std::string const& s, char sep, OutputIterator output, bool skip_empty=true) 
    std::regex  rxSplit( std::string("\\")+sep+(skip_empty ? "+" : "") );

    std::copy(std::sregex_token_iterator(std::begin(s), std::end(s), rxSplit, -1),
              std::sregex_token_iterator(), output);

【讨论】：

【参考方案7】：

由于尚未有人发布此内容：c++20 解决方案使用ranges 非常简单。您可以使用std::ranges::views::split 分解输入，然后将输入转换为std::string 或std::string_view 元素。

#include <ranges>


...

// The input to transform
const auto str = std::string"Hello World";

// Function to transform a range into a std::string
// Replace this with 'std::string_view' to make it a view instead.
auto to_string = [](auto&& r) -> std::string 
    const auto data = &*r.begin();
    const auto size = static_cast<std::size_t>(std::ranges::distance(r));

    return std::stringdata, size;
;

const auto range = str | 
                   std::ranges::views::split(' ') | 
                   std::ranges::views::transform(to_string);

for (auto&& token : str | range) 
    // each 'token' is the split string

这种方法实际上可以组合成几乎任何东西，甚至是一个返回std::vector<std::string> 的简单split 函数：

auto split(const std::string& str, char delimiter) -> std::vector<std::string>

    const auto range = str | 
                       std::ranges::views::split(delimiter) | 
                       std::ranges::views::transform(to_string);

    return std::ranges::begin(range), std::ranges::end(range);

Live Example

【讨论】：

1.为什么你使用str | range 而不是range？ 2. transform 和to_string 有必要吗？似乎token 可以声明为string_view，因此transform 是不必要的。 3. split_view 的 begin 和 end 函数是非常量的，所以看起来程序格式不正确，因为循环的范围使用了 const 范围。哦，我明白了，对于 2，constructing a string_view 来自一个范围是 C++23 的一个特性。这有点难读，与其他答案相比根本不清楚【参考方案8】：

erase() 函数呢？如果您知道字符串中要拆分的确切位置，那么您可以使用erase()“提取”字符串中的字段。

std::string date("01/02/2019");
std::string day(date);
std::string month(date);
std::string year(date);

day.erase(2, string::npos); // "01"
month.erase(0, 3).erase(2); // "02"
year.erase(0,6); // "2019"

【讨论】：

【参考方案9】：

另一种可能性是使用特殊的ctype facet 为流注入区域设置。流使用 ctype 方面来确定什么是“空白”，它将其视为分隔符。使用将分隔符分类为空格的 ctype 方面，读取可能非常简单。这是实现方面的一种方法：

struct field_reader: std::ctype<char> 

    field_reader(): std::ctype<char>(get_table()) 

    static std::ctype_base::mask const* get_table() 
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        // we'll assume dates are either a/b/c or a-b-c:
        rc['/'] = std::ctype_base::space;
        rc['-'] = std::ctype_base::space;
        return &rc[0];
    
;

我们通过使用imbue 来告诉流使用包含它的语言环境，然后从该流中读取数据：

std::istringstream in("07/3/2011");
in.imbue(std::locale(std::locale(), new field_reader);

有了这个，拆分变得几乎是微不足道的——只需使用几个istream_iterators 初始化一个向量，从字符串中读取片段（嵌入在istringstream中）：

std::vector<std::string>((std::istream_iterator<std::string>(in),
                          std::istream_iterator<std::string>());

如果您只在一个地方使用它，这显然会导致矫枉过正。但是，如果您经常使用它，它可以大大有助于保持其余代码的干净。

【讨论】：

【参考方案10】：

喜欢 RegEx 的人的另一种方式 (C++11/boost)。就我个人而言，我是此类数据的 RegEx 的忠实粉丝。 IMO 它比简单地使用分隔符拆分字符串要强大得多，因为如果您愿意，您可以选择更聪明地了解“有效”数据的构成。

#include <string>
#include <algorithm>    // copy
#include <iterator>     // back_inserter
#include <regex>        // regex, sregex_token_iterator
#include <vector>

int main()

    std::string str = "08/04/2012";
    std::vector<std::string> tokens;
    std::regex re("\\d+");

    //start/end points of tokens in str
    std::sregex_token_iterator
        begin(str.begin(), str.end(), re),
        end;

    std::copy(begin, end, std::back_inserter(tokens));

【讨论】：

因此，您在代码中包含整个正则表达式匹配器只是为了拆分字符串。伤心... @Dev 否，包括一个正则表达式匹配器，以更智能地了解有效数据的构成 - 例如选择数字，还允许使用其他分隔符，如点或连字符这在二进制大小和整体效率方面都很糟糕，但由于在这种情况下这两者都不是问题，所以我不会继续。 @Dev 如果一个人对二进制大小有如此极端的限制，那么他们应该重新考虑甚至使用 C++，或者至少使用它的标准库，如 string/vector/etc，因为它们都会产生类似的效果.至于效率，最好的建议来自 Donald Knuth——“过早的优化是万恶之源”；换句话说，在进行优化之前，首要任务是确定问题是否存在，然后通过分析等客观手段找出原因，而不是浪费时间去寻找每一个可能的微优化。 “在这种情况下，这两者都不是问题” - 我自己。【参考方案11】：

您是否有不想将 string 转换为字符数组 (char*) 的原因？拨打.c_str() 相当容易。您还可以使用循环和.find() 函数。

string classstring .find()string .c_str()

【讨论】：

【参考方案12】：

看看boost::tokenizer

如果您想自己汇总方法，可以使用std::string::find() 确定拆分点。

【讨论】：

感谢您提供字符串查找提示。总是喜欢听到 std 解决方案！

以上是关于按字符拆分字符串的主要内容，如果未能解决你的问题，请参考以下文章