比较字符串模式的更好解决方案。？

Posted 2023-02-16

技术标签:

【中文标题】比较字符串模式的更好解决方案。？【英文标题】：A better solution for comparing string patterns.? 【发布时间】：2020-12-30 07:25:31 【问题描述】：

任务：创建一个函数，如果两个字符串共享相同的字母模式，则返回 true，否则返回 false。

我找到了解决此任务的方法，但我认为它可以更简单、更简短。我将所有相同的字母转换为 2 个字符串的特定字符。然后过程结束检查它们是否相同。对于更简单的解决方案有什么想法吗？

#include <iostream>
#include <string>

using namespace std;

bool LetterPattern(string str1, string str2)  
    // Controlling whether they have same size or not
    if (str1.length() != str2.length()) 
        return false; 
    
    else 
        // Checking for ABC XYZ format type 
        int counter = 0;
        for (int i = 0; i < str1.length()-1; i++) 
            for (int k = i+1; k < str1.length(); k++) 
                if (str1[i] == str1[k]) 
                    counter++;
                
            
        
        int counter2 = 0;
        for (int i = 0; i < str2.length() - 1; i++) 
            for (int k = i + 1; k < str2.length(); k++) 
                if (str2[i] == str2[k]) 
                    counter2++;
                
            
        
        
        if (counter == 0 && counter2 == 0) 
            return true;
        
        // I added the above part because program below couldn't return 1 for completely different letter formats
        // like XYZ ABC DEF etc.
        
        //Converting same letters to same chars for str1
        for (int i = 0; i < str1.length()-1; i++) 
            for (int k = i+1; k < str1.length(); k++)  
                if (str1[i] == str1[k]) 
                    str1[k] = (char)i;
                
            
            str1[i] = (char)i;
        
    
    //Converting same letters to same chars for str1
    for (int i = 0; i < str2.length() - 1; i++) 
        for (int k = i + 1; k < str2.length(); k++)  
            if (str2[i] == str2[k]) 
                str2[k] = (char)i;
            
        
        str2[i] = (char)i;
    
    if (str1 == str2)  // After converting strings, it checks whether they are same or not
        return true;
    
    else 
        return false;
    

    

int main()
    cout << "Please enter two string variable: ";
    string str1, str2;
    cin >> str1 >> str2;
    cout << "Same Letter Pattern: " << LetterPattern(str1, str2);

    system("pause>0");

例子：

str1	str2	result
AABB	CCDD	true
ABAB	CDCD	true
AAFFG	AAFGF	false
asdasd	qweqwe	true

【问题讨论】：

也许只有我一个人，但我不完全确定“共享相同的后一种模式”是什么意思（我也无法从代码中解决）。你的意思是有一个共同的子序列？像往常一样，举几个例子会有所帮助。好吧，我完全错了。您的意思是有一组字母替换可以将一个字符串转换为另一个字符串。 AFAIK 应该可以通过字符串和映射来解决，以记住您到目前为止使用的替换。 【参考方案1】：

如果您想查看一个字符串是否是另一个字符串的凯撒密码，您可以这样做：

bool LetterPatternImpl(const std::string& str1, const std::string& str2)  
    if (str1.length() != str2.length())  return false; 

    std::array<std::optional<char>, 256> mapping; // char has limited range,
                                                  // else we might use std::map
    for (std::size_t i = 0; i != str1.length(); ++i) 
        auto index = static_cast<unsigned char>(str1[i]);

        if (!mapping[index])  mapping[index] = str2[i]; 
        if (*mapping[index] != str2[i])  return false; 
    
    return true;


bool LetterPattern(const std::string& str1, const std::string& str2) 
    // Both ways needed
    // so ABC <-> ZZZ should return false.
    return LetterPatternImpl(str1, str2) && LetterPatternImpl(str2, str1);

【讨论】：

【参考方案2】：

通过对字符串进行 1 次迭代，创建定义相应字符的键值对。

在第二次迭代中，检查第一个/第二个字符串中的每个字符是否与第二个/第二个字符串中具有相同索引的字符兼容。如果没有不兼容返回true，否则返回false。

【讨论】：

【参考方案3】：

首先，我们可以比较两个字符串的大小。如果它们相等，我们继续。

通过迭代其中一个字符串，我们可以填充地图。映射的键是第一个字符串中的字符，其值是第二个字符串中对应的字符。

通过到达第 n 个字符，我们检查我们是否有一个键或与该字符相同的键。

如果是：检查等于第二个字符串的第n个字符的值。

如果不是：我们向地图添加一个新的键值对。（键是第一个字符串的第n个字符，值是第二个字符串的第n个字符）

1。完成此操作后，我们应该为另一个字符串再次执行此操作。我的意思是，例如，如果在第一步中第一个字符串的字符是键，那么在第二步中我们应该以第二个字符串的字符成为键的方式替换字符串。

如果他们都给出了正确的答案，那么答案就是正确的。否则为假。

2。我们可以防止将重复值添加到映射中，而不是替换字符串并重复迭代。

要理解第 1 段和第 2 段，请想象对“ABC”和“ZZZ”字符串进行 1 次迭代。

请注意，可以使用数组代替映射。

【讨论】：

在std::string 的情况下，可能值的范围足够小，可以使用简单的数组代替映射。地图使用时间复杂度会高于 O(n) 但空间复杂度对于大字母或小样本会低得多 @Swift-FridayPie：在最坏的情况下（覆盖整个字母表），地图的空间复杂度会更高。 @YvesDaust 对于最坏情况的阵列方法可能会失败或占用 4Gb 空间。我不会假设字符串是 latin1。 @Swift-FridayPie 除非你开始解码多字节字符std::string 通常只能容纳 256 个不同的字符【参考方案4】：

最后但同样重要的是，还有一个使用“计数”的附加解决方案。

如果我们阅读了要求，那么您只对布尔结果感兴趣。这意味着，只要我们对第一个字符串中的字母有第二个关联，那么结果就是false。

示例：如果我们有一个 'a' 并且在第二个字符串中的相同位置是一个 'b'，然后在第一个字符串的某个下一个位置又是一个 'a' 但然后在第二个字符串的相同位置字符串 a 'c'，那么我们对字母 a 有 2 个不同的关联。这是错误的。

如果每个字母只有一个关联，那么一切正常。

如何完成“联想”和“计数”。对于“关联”，我们将使用关联容器 std::unordered_map。并且，我们将第一个字符串中的一个字母与已处理的字母（来自第二个字符串）的 std::set 关联。std::sets i insert 函数不会从第二个字符串中添加双字母。因此，如果再有一个 'b' 与一个 'a' 相关联，那就完全没问题了。

但如果有不同的关联字母，那么std::set 将包含 2 个元素。这是false 结果的指标。

在这种情况下，我们会立即停止评估字符。这导致代码非常紧凑和快速。

请看：

#include <iostream>
#include <string>
#include <unordered_map>
#include <utility>
#include <set>

bool letterPattern(const std::string& s1, const std::string& s2) 

    // Here we will store the result of the function
    bool result s1.length() == s2.length() ;

    // And here all associations
    std::unordered_map<char, std::set<char>> association;

    // Add associations. Stop if result = false
    for (size_t index; result && index < s1.length(); ++index)
        if (const auto& [iter, ok] association[s1[index]].insert(s2[index]); ok)
            result = association[s1[index]].size() == 1;

    return result;

// Some driver test code
int main() 
    std::vector<std::pair<std::string,std::string>> testData
        "AABB", "CCDD",
        "ABAB", "CDCD",
        "AAFFG", "AAFGF",
        "asdasd", "qweqwe"
    ;

    for (const auto& p : testData)
        std::cout << std::boolalpha << letterPattern(p.first, p.second) << "\t for: '" << p.first << "' and '" << p.second << "'\n";

    return 0;

【讨论】：

【参考方案5】：

不确定更好，但是一个 C++17 解决方案，它基于第一个字符串的字母构建正则表达式并将其与第二个匹配：

#include <iostream>
#include <sstream>
#include <string>
#include <unordered_map>
#include <tuple>
#include <regex>

bool match(const std::string &pattern, const std::string &s) 
  std::unordered_map<char, int> indexes;
  std::ostringstream builder;
  int ref = 1;

  for (char c : pattern) 
    if (auto backref = indexes.find(c); backref != indexes.end()) 
      builder << '\\' << backref->second;
     else 
      if (ref > 1) 
        builder << "(?!";
        for (int n = 1; n < ref; n += 1) 
          if (n != 1) 
            builder << '|';
          
          builder << '\\' << n;
        
        builder << ')';
      
      builder << "(.)";
      indexes.emplace(c, ref++);
    
  

  // std::cout << builder.str() << '\n';
  return std::regex_match(s, std::regexbuilder.str());


int main() 
  std::tuple<std::string, std::string, bool> tests[] = 
      "AABB", "CCDD", true,
      "ABAB", "CDCD", true,
      "AAFFG", "AAFGF", false,
      "asdasd", "qweqwe", true,
      "abc", "zzz", false
  ;

  std::cout << std::boolalpha;
  for (const auto &[s1, s2, expected] : tests) 
    if (match(s1, s2) == expected) 
      std::cout << s1 << " => " << s2 << " = " << expected << ": PASS\n";
     else 
      std::cout << s1 << " => " << s2 << " = " << (!expected) << ": FAIL\n";
    
  

  return 0;

【讨论】：

@Jarod42 已更新以捕捉到这一点。我确实不会说 "better" :) 硬编码的正则表达式可能并不被视为简单，因此生成一个......而且它甚至不会减少/隐藏复杂性，仍然存在显式映射。不错的尝试;-)【参考方案6】：

一种简单（可能不是很有效）的方法：

#include<iostream>
#include<unordered_map>

using namespace std;

int main(void) 
    string s1, s2;
    unordered_map<string, char> subs;

    cout<<"Enter the strings: ";
    cin >> s1 >> s2;
    
    if (s1.length() != s2.length())
        cout<<"False"<<endl;
    else 
        for (int i=0; i<s1.length(); ++i) 
            string key(1, s2[i]);
            subs[key] = s1[i];
        

        string s1_2 = "";

        for (int i=0; i<s2.length(); ++i) 
            string key(1, s2[i]);
            s1_2 += subs[key];
        

        if (s1 == s1_2) 
            cout<<"True"<<endl;
        else
            cout<<"False"<<endl;
    
    return 0;

时间复杂度 O(n);空间复杂度 O(n)

【讨论】：

AFAIK 这不起作用。例如 asdasd 和 qweqwe 未通过此测试是的，你是对的，替换可以使用任何字母，距离不会相同注意：像您使用的 C 风格 VLA 不是标准 C++ 的一部分（尽管一些流行的编译器支持它们作为扩展）。 @Shawn 我不知道我为什么这样做！更新了代码。【参考方案7】：

如果我理解正确并且：

AABB - CCDD = true
AAFFG - AAFGF = false
asdasd - qweqwe = true

这不是模式，它通过替换第一个来检查第二个字符串是否是加密的结果。您可以通过尝试构建替换表以更简单的方式完成此操作。如果失败，即源和结果之间存在多个关联，则结果为false。

最简单的情况是我们必须检查整个字符串。如果我们需要发现是否有任何子字符串是第二个字符串中包含的模式的替换，那么复杂度就会成正比：

#include <string>
#include <vector>
#include <map>
#include <optional>
#include <limits>

bool is_similar (const std::string& s1, const std::string& s2)

    if(s1.length() != s2.length()) return false;
    using TCh = std::decay_t<decltype(s1)>::value_type;
    // for non-unicode characters can use an array
    //std::optional<TCh> table[ std::numeric_limits<TCh>::max ];
    // std::optional used for clarity, in reality may use `TCh`
    // and compare with zero char
    std::map< TCh, std::optional<TCh>> table;
    
    for (size_t it = 0; it < s1.length(); ++it)
    
       if( table[s1[it]].has_value() && table[s1[it]] != s2[it] ) return false;
       if( table[s2[it]].has_value() && table[s2[it]] != s1[it] ) return false;
       table[s1[it]] = s2[it];
       //table[s2[it]] = s1[it]; if symmetric
    
    return true;

【讨论】：

std::string 可能包含 nul 字符。据我了解，您认为AABB <-> BBCC 是错误的，不确定OP。其他答案接受该模式。【参考方案8】：

如果我们找到一个新字符，我们将使其与其他字符串字符的位置相同。下次如果再找到，我们会根据它进行检查。

假设我们有“aa”和“cd”。 第一次迭代： 'a'='c' 第二次迭代： 已经是 'a'='c'（第一次迭代），所以我们必须需要'c' 在我们的第二个字符串中。但在我们的第二个字符串中，它是 'd'。所以很简单，它会返回 false。

#include <bits/stdc++.h>

using namespace std;

// if you want to use map
bool LetterPattern_with_map(string str1,string str2)

    if(str1.size()!=str2.size()) return false;
    map<char,char> mp;
    for(int i=0;i<str1.size();i++)
    
        if(!mp[str1[i]])  mp[str1[i]]=str2[i]; continue; 
        if(mp[str1[i]]!=str2[i]) return false;
        
    
    return true;


// if you want to use array instead of map
bool LetterPattern_with_array(string str1,string str2)

    if(str1.size()!=str2.size()) return false;
    int check[128]=0;
    for(int i=0;i<str1.size();i++)
    
        if(!check[str1[i]-'A'+1])  check[str1[i]-'A'+1]=(int)(str2[i]-'A'+1); continue; 
        if(check[str1[i]-'A'+1]!=(int)(str2[i]-'A'+1)) return false;
    
    return true;




int main()

    cout << "Please enter two string variable: ";
    string str1, str2;
    cin >> str1 >> str2;
    cout << "Same Letter Pattern: " << LetterPattern_with_map(str1, str2)<<'\n';
    cout << "Same Letter Pattern: " << LetterPattern_with_array(str1, str2);

【讨论】：

以上是关于比较字符串模式的更好解决方案。？的主要内容，如果未能解决你的问题，请参考以下文章

如何在 Bash 中比较两个点分隔版本格式的字符串？

如何以编程方式比较magento版本？

Python 字符串模式

比较字符串忽略 NUL

REGEXP_REPLACE 模式必须是 const 吗？比较 BigQuery 中的字符串

Perl 的“标准字符串比较顺序”是啥？