使用 C++ 检查两个字符串是不是是字谜

Posted 2023-03-30

技术标签:

【中文标题】使用 C++ 检查两个字符串是不是是字谜【英文标题】：Check whether two strings are anagrams using C++使用 C++ 检查两个字符串是否是字谜 【发布时间】：2013-08-18 12:16:24 【问题描述】：

我想出的下面的程序用于检查两个字符串是否是字谜。它适用于小弦但适用于较大的弦（我试过：听过，入伍）它给了我一个“不！”

帮助！

#include<iostream.h> 
#include<string.h>
#include<stdio.h>

int main()

    char str1[100], str2[100];
    gets(str1);
    gets(str2);
    int i,j;
    int n1=strlen(str1);
    int n2=strlen(str2);
    int c=0;
    if(n1!=n2)
    
          cout<<"\nThey are not anagrams ! ";
          return 0;
    
    else 
    
         for(i=0;i<n1;i++)
             for(j=0;j<n2;j++)
                 if(str1[i]==str2[j])
                     ++c;
    
    if(c==n1)
        cout<<"yes ! anagram !! ";
    else 
        cout<<"no ! ";

    system("pause");
    return 0;

【问题讨论】：

iostream.h 是一个非常古老的非标准标头。而你正在使用 C++。使用std::cin 读取std::string，因为gets 很糟糕，即使在C 中也是如此，而且C 字符串在C++ 中通常较差（我假设您想要一个单词，因为它用于字谜）。使用std::sort对它们进行排序，然后进行比较。你的算法是错误的，简单明了，和字符串的长度无关。尝试使用“aa”、“aa”。多想想。看在上帝的份上，不要使用gets()。 【参考方案1】：

我很懒，所以我会使用标准库功能对两个字符串进行排序，然后比较它们：

#include <string>
#include <algorithm>

bool is_anagram(std::string s1, std::string s2)

  std::sort(s1.begin(), s1.end());
  std::sort(s2.begin(), s2.end());
  return s1 == s2;

一个小的优化可能是在排序之前检查字符串的大小是否相同。

但是如果这个算法被证明是一个瓶颈，我会暂时摆脱我的一些懒惰并将它与一个简单的计数解决方案进行比较：

std::unordered_map<char, unsigned int> m

s1

char

s2

char

0

【讨论】：

即使您的 3 线解决方案确实解决了字谜问题，但 OP 显然在理解算法方面存在问题。想详细说明为什么他的解决方案是错误的，为什么你的解决方案是正确的？ @DanielKO 我没有读过 OP 的算法，表面上看起来太长而且太复杂了。我认为为什么这个工作应该很明显，所以我没有解释。 @DanielKO，教 OP 如何自己调试它会更好，但对于答案来说并不实用（而且不是一个有效的问题）。这个答案很好地提供了一个干净清晰的解决方案。【参考方案2】：

当要求查找 aa 和 aa 是否是字谜时，该算法也会失败。尝试在心理上或在调试器中跟踪算法的步骤以找出原因；这样你会学到更多。

顺便说一句.. 查找字谜的常用方法是计算每个字母在字符串中出现的次数。每个字母的计数应该相等。这种方法的时间复杂度为 O(n)，而不是 O(n²)。

【讨论】：

【参考方案3】：

bool areAnagram(char *str1, char *str2)

    // Create two count arrays and initialize all values as 0
    int count1[NO_OF_CHARS] = 0;
    int count2[NO_OF_CHARS] = 0;
    int i;

    // For each character in input strings, increment count in
    // the corresponding count array
    for (i = 0; str1[i] && str2[i];  i++)
    
        count1[str1[i]]++;
        count2[str2[i]]++;
    

    // If both strings are of different length. Removing this condition
    // will make the program fail for strings like "aaca" and "aca"
    if (str1[i] || str2[i])
      return false;

    // Compare count arrays
    for (i = 0; i < NO_OF_CHARS; i++)
        if (count1[i] != count2[i])
            return false;

    return true;

【讨论】：

如果字符串包含负字符，这会导致 UB。您还可以通过从 count1 中减去而不是添加到 count2 来将内存使用量减半。【参考方案4】：

我看到以下两种主要方法：

排序然后比较统计每个字母出现的次数

有趣的是，Suraj 的好解决方案得到了 1 分（在撰写本文时由我决定），但有一个得到 22 分。解释是人们并不在意性能——这对于短字符串来说很好。

排序实现只有 3 行长，但是对于长字符串，计数比它快。它要快得多（O(N) 与 O(NlogN)）。使用 500 MB 长的字符串得到以下结果。

排序

计数

多线程计数

多线程尝试是一种天真的尝试，它试图通过在单独的线程中计数来使速度加倍，每个线程一个线程。内存访问是瓶颈，这是多线程使事情变得更糟的一个例子。

我很高兴看到一些可以加快计数解决方案的想法（由擅长内存延迟问题、缓存的人思考）。

【讨论】：

【参考方案5】：

#include<stdio.h>
#include<string.h>
int is_anagram(char* str1, char* str2)
    if(strlen(str1)==strspn(str1,str2) && strlen(str1)==strspn(str2,str1) &&
    strlen(str1)==strlen(str2))
    return 1;
    return 0;

int main()
    char* str1 = "stream";
    char* str2 = "master";
    if(is_anagram(str1,str2))
    printf("%s and %s  are anagram to each other",str1,str2);
    else
    printf("%s and %s  are not anagram to each other",str1,str2);
    return 0;

【讨论】：

【参考方案6】：

#include<iostream>
#include<unordered_map>
using namespace std;

int checkAnagram (string &str1, string &str2)

    unordered_map<char,int> count1, count2;
    unordered_map<char,int>::iterator it1, it2;
    int isAnagram = 0;

    if (str1.size() != str2.size()) 
        return -1;
    

    for (unsigned int i = 0; i < str1.size(); i++) 
        if (count1.find(str1[i]) != count1.end())
            count1[str1[i]]++;
         else 
            count1.insert(pair<char,int>(str1[i], 1));
        
    

    for (unsigned int i = 0; i < str2.size(); i++) 
        if (count2.find(str2[i]) != count2.end()) 
            count2[str2[i]]++;
         else 
            count2.insert(pair<char,int>(str2[i], 1));
        
    

    for (unordered_map<char, int>::iterator itUm1 = count1.begin(); itUm1 != count1.end(); itUm1++) 
        unordered_map<char, int>::iterator itUm2 = count2.find(itUm1->first);
        if (itUm2 != count2.end()) 
            if (itUm1->second != itUm2->second)
                isAnagram = -1;
                break;
            
        
    

    return isAnagram;


int main(void)

    string str1("WillIamShakespeare");
    string str2("IamaWeakishSpeller");

    cout << "checkAnagram() for " << str1 << "," << str2 << " : " << checkAnagram(str1, str2) << endl;

    return 0;

【讨论】：

【参考方案7】：

有趣的是，有时最好的问题却是最简单的。

这里的问题是如何推断两个单词是否是字谜 - 一个单词本质上是一个未排序的多字符集。

我们知道我们必须进行排序，但理想情况下我们希望避免排序的时间复杂性。

事实证明，在许多情况下，我们可以消除许多在线性时间中不相似的单词，方法是遍历它们并将字符值异或到累加器中。如果两个字符串都是字谜，则无论排序如何，两个字符串中所有字符的总异或必须为零。这是因为任何与自身异或的东西都会变成零。

当然，反之亦然。仅仅因为累加器为零并不意味着我们有一个字谜匹配。

使用这些信息，我们可以在没有排序的情况下消除许多非字谜，至少短路非字谜的情况。

#include <iostream>
#include <string>
#include <algorithm>

//
// return a sorted copy of a string
//
std::string sorted(std::string in)

    std::sort(in.begin(), in.end());
    return in;


//
// check whether xor-ing the values in two ranges results in zero.
// @pre first2 addresses a range that is at least as big as (last1-first1)
//
bool xor_is_zero(std::string::const_iterator first1,
                 std::string::const_iterator last1,
                 std::string::const_iterator first2)

    char x = 0;
    while (first1 != last1) 
        x ^= *first1++;
        x ^= *first2++;
    
    return x == 0;


//
// deduce whether two strings are the same length
//
bool same_size(const std::string& l, const std::string& r)

    return l.size() == r.size();


//
// deduce whether two words are anagrams of each other
// I have passed by const ref because we may not need a copy
//
bool is_anagram(const std::string& l, const std::string& r)

    return same_size(l, r)
    && xor_is_zero(l.begin(), l.end(), r.begin())
    && sorted(l) == sorted(r);


// test

int main()  

    using namespace std;

    auto s1 = "apple"s;
    auto s2 = "eppla"s;

    cout << is_anagram(s1, s2) << '\n';

    s2 = "pppla"s;

    cout << is_anagram(s1, s2) << '\n';

    return 0;

预期：

1
0

【讨论】：

【参考方案8】：

试试这个：

// Anagram. Two words are said to be anagrams of each other if the letters from one word can be rearranged to form the other word. 
// From the above definition it is clear that two strings are anagrams if all characters in both strings occur same number of times. 
// For example "xyz" and "zxy" are anagram strings, here every character 'x', 'y' and 'z' occur only one time in both strings. 

#include <map>
#include <string>
#include <cctype>
#include <iostream> 
#include <algorithm>
#include <unordered_map>

using namespace std;

bool IsAnagram_1( string w1, string w2 )

    // Compare string lengths
    if ( w1.length() != w2.length() )
        return false;

    sort( w1.begin(), w1.end() );
    sort( w2.begin(), w2.end() );

    return w1 == w2;


map<char, size_t> key_word( const string & w )

    // Declare a map which is an associative container that will store a key value and a mapped value pairs
    // The key value is a letter in a word and the maped value is the number of times this letter appears in the word
   map<char, size_t> m;

   // Step over the characters of string w and use each character as a key value in the map
   for ( auto & c : w )
   
        // Access the mapped value directly by its corresponding key using the bracket operator
        ++m[toupper( c )];  
   

   return ( m );



bool IsAnagram_2( const string & w1, const string & w2 )  

    // Compare string lengths
    if ( w1.length() != w2.length() )
        return false;

    return ( key_word( w1 ) == key_word( w2 ) );



bool IsAnagram_3( const string & w1, const string & w2 )  

    // Compare string lengths
    if ( w1.length() != w2.length() )
        return false;

    // Instantiate a count map, std::unordered_map<char, unsigned int> m
    unordered_map<char, size_t> m;

    // Loop over the characters of string w1 incrementing the count for each character
   for ( auto & c : w1 )
   
        // Access the mapped value directly by its corresponding key using the bracket operator
        ++m[toupper(c)];    
   

    // Loop over the characters of string w2 decrementing the count for each character
   for ( auto & c : w2 )
   
        // Access the mapped value directly by its corresponding key using the bracket operator
        --m[toupper(c)];    
   

   // Check to see if the mapped values are all zeros
   for ( auto & c : w2 )
   
        if ( m[toupper(c)] != 0 )
            return false;
   

   return true;



int main( )

    string word1, word2;

    cout << "Enter first word: ";
    cin >> word1;

    cout << "Enter second word: ";
    cin >> word2;

    if ( IsAnagram_1( word1, word2 ) )
        cout << "\nAnagram" << endl;
    else 
        cout << "\nNot Anagram" << endl;


    if ( IsAnagram_2( word1, word2 ) )
        cout << "\nAnagram" << endl;
    else 
        cout << "\nNot Anagram" << endl;


    if ( IsAnagram_3( word1, word2 ) )
        cout << "\nAnagram" << endl;
    else 
        cout << "\nNot Anagram" << endl;

    system("pause");

    return 0;

【讨论】：

【参考方案9】：

在这种方法中，我还处理了空字符串和重复字符。享受它并评论任何限制。

#include <iostream>
#include <map>
#include <string>

using namespace std;

bool is_anagram( const string a, const string b )
    std::map<char, int> m;
    int count = 0;

    for (int i = 0; i < a.length(); i++) 
        map<char, int>::iterator it = m.find(a[i]);
        if (it == m.end()) 
            m.insert(m.begin(), pair<char, int>(a[i], 1));
         else 
            m[a[i]]++;
        
    

    for (int i = 0; i < b.length(); i++) 
        map<char, int>::iterator it = m.find(b[i]);
        if (it == m.end()) 
            m.insert(m.begin(), pair<char, int>(b[i], 1));
         else 
            m[b[i]]--;
        
    

    if (a.length() <= b.length()) 
        for (int i = 0; i < a.length(); i++) 
            if (m[a[i]] >= 0) 
                count++;
             else
                return false;
        
        if (count == a.length() && a.length() > 0)
            return true;
        else
            return false;
     else 
        for (int i = 0; i < b.length(); i++) 
            if (m[b[i]] >= 0) 
                count++;
             else 
                return false;
            
        
        if (count == b.length() && b.length() > 0)
            return true;
        else
            return false;
    
    return true;

【讨论】：

【参考方案10】：

检查两个字符串对于每个唯一字符的计数是否相同。

bool is_Anagram_String(char* str1,char* str2)

    int first_len=(int)strlen(str1);
    int sec_len=(int)strlen(str2);

    if (first_len!=sec_len) 
         return false;

    int letters[256] = 0;

    int num_unique_chars = 0;
    int num_completed_t = 0;

    for(int i=0;i<first_len;++i)
        int char_letter=(int)str1[i];
        if(letters[char_letter]==0)
            ++num_unique_chars;
        ++letters[char_letter];
    

    for (int i = 0; i < sec_len; ++i) 

        int c = (int) str2[i];

        if (letters[c] == 0)  // Found more of char c in t than in s.
            return false;
        

        --letters[c];

        if (letters[c] == 0) 

            ++num_completed_t;

            if (num_completed_t == num_unique_chars) 
                // it’s a match if t has been processed completely
                return i == sec_len - 1;
            
        
   
    return false;

【讨论】：

【参考方案11】：

#include <iostream>
#include <string.h>

using namespace std;

const int MAX = 100;

char cadA[MAX];
char cadB[MAX];
bool chrLocate;


int i,m,n,j, contaChr;

void buscaChr(char [], char []);

int main() 

    cout << "Ingresa CadA: ";
    cin.getline(cadA, sizeof(cadA));
    cout << "Ingresa CadB: ";
    cin.getline(cadB, sizeof(cadA));    

    if ( strlen(cadA) == strlen(cadB) ) 

        buscaChr(cadA,cadB);        

     else 
        cout << "No son Anagramas..." << endl;
    

    return 0;



void buscaChr(char a[], char b[]) 

    j = 0;  
    contaChr = 0;

    for ( i = 0; ( (i < strlen(a)) && contaChr < 2 ); i++ ) 

        for ( m = 0; m < strlen(b); m++ ) 

            if ( a[i] == b[m]) 
                j++;
                contaChr++;
                a[i] = '-';
                b[m] = '+';
             else  contaChr = 0;  
               
    

    if ( j ==  strlen(a)) 
        cout << "SI son Anagramas..." << endl;
     else 
        cout << "No son Anagramas..." << endl;

【讨论】：

您要回答问题的哪一部分？【参考方案12】：

您的算法不正确。您正在检查第一个单词中的每个字符，以查看该字符在第二个单词中出现的次数。如果这两个词是 'aaaa' 和 'aaaa'，那么你的计数是 16。对你的代码做一个小的改动就可以让它工作，但是因为你有一个双循环，所以复杂度为 N^2 .

for(i=0;i<n1;i++)
for(j=0;j<n2;j++)
if(str1[i]==str2[j])
++c, str2[j] = 0; // 'cross off' letters as they are found.

我用字谜比较做了一些测试。比较两个每个 72 个字符的字符串（字符串始终是真正的字谜以获得最大比较次数），使用几个不同的 STL 容器执行 256 次相同测试...

template<typename STORAGE>
bool isAnagram(const string& s1, const string& s2, STORAGE& asciiCount)

    for(auto& v : s1)
    
        asciiCount[v]++;    
    
    for(auto& v : s2)
    
        if(--asciiCount[static_cast<unsigned char>(v)] == -1)
        
            return false;
        
    

    return true;

其中 STORAGE asciiCount =

map<char, int> storage;           // 738us 
unordered_map<char, int> storage; // 260us 
vector<int> storage(256);         // 43us

// g++ -std=c++17 -O3 -Wall -pedantic

这是我能达到的最快速度。

这些是使用 coliru 在线编译器 + 和 std::chrono::steady_clock::time_point 进行测量的粗略测试，但是它们给出了性能提升的一般概念。 vector 具有相同的性能，仅使用 256 个字节，尽管字符串的长度限制为 255 个字符（对于无符号字符计数，也更改为：--asciiCount[static_cast(v)] == 255）。假设矢量是最快的。一个改进是只分配一个 C 风格的数组 unsigned char asciiCount[256];在堆栈上（因为 STL 容器在堆上动态分配内存）您可以将此存储空间减少到 128 字节、64 甚至 32 字节（ascii 字符通常在 0..127 范围内，而 A-Z+az 64.127，以及大小写为 64..95 或 96 ...127）虽然不确定将其安装在缓存行或一半内会发现什么收益。

有更好的方法吗？为了速度、内存、代码优雅？

【讨论】：

【参考方案13】：

1。删除匹配字符的简单快速方法

bool checkAnagram(string s1, string s2) 
    for (char i : s1) 
        unsigned int pos = s2.find(i,0);
        if (pos != string::npos) 
            s2.erase(pos,1);
         else 
            return false;
        
    
    return s2.empty();

2。转换为素数。漂亮但非常昂贵，长字符串需要特殊的 Big Integer 类型。

// https://en.wikipedia.org/wiki/List_of_prime_numbers
int primes[255] = 2, 3, 5, 7, 11, 13, 17, 19, ... , 1613; 

bool checkAnagramPrimes(string s1, string s2) 
    long c1 = 1;
    for (char i : s1) 
        c1 = c1 * primes[i];
    
    long c2 = 1;
    for (char i : s2) 
        c2 = c2 * primes[i];
        if (c2 > c1) 
            return false;
        
    
    return c1 == c2;

【讨论】：

另见How to add 2 arbitrarily sized integers in C++?。它为 Botan、Crypto++ 和 OpenSSL 提供 C++ 大整数类。【参考方案14】：

    string key="listen";
    string key1="silent";
    string temp=key1;  
    int len=0;
    //assuming both strings are of equal length
    for (int i=0;i<key.length();i++)
            for (int j=0;j<key.length();j++)
                if(key[i]==temp[j])
                    len++;
                    temp[j] = ' ';//to deal with the duplicates
                    break;
                

            
        
cout << (len==key.length());    //if true: means the words are anagrams

【讨论】：

【参考方案15】：

而不是使用现代 c++ 中已弃用的 dot h 标头。

试试这个解决方案。

        #include <iostream>
        #include <string>
        #include <map>

        int main()
          std::string word_1 ;
          std::cout << "Enter first word: ";
          std::cin >> word_1;

          std::string word_2 ;
          std::cout << "Enter second word: ";
          std::cin >> word_2;

          if(word_1.length() == word_2.length())
            std::map<char, int> word_1_map;
            std::map<char, int> word_2_map;
            for(auto& c: word_1)
              word_1_map[std::tolower(c)]++;
            for(auto& c: word_2)
              word_2_map[std::tolower(c)]++;
            if(word_1_map == word_2_map)
              std::cout << "Anagrams" << std::endl;
            
            else
              std::cout << "Not Anagrams" << std::endl;
            
          else
            std::cout << "Length Mismatch" << std::endl;

【讨论】：

【参考方案16】：

#include <bits/stdc++.h>
using namespace std;
#define NO_OF_CHARS 256

int main()
   bool ans = true;
    string word1 = "rest";
    string word2 = "tesr";
    unordered_map<char,int>maps;
    for(int i = 0 ; i <5 ; i++)
    
        maps[word1[i]] +=1;

    
    for(int i = 0 ; i <5 ; i++)
    
        maps[word2[i]]-=1 ;

    

    for(auto i : maps)
    
        if(i.second!=0)
        
            ans = false;
        
    

    cout<<ans;

【讨论】：

如果您添加对修复内容的解释将会很有帮助：D @Gagandeep 您的解决方案的时间复杂度是多少？【参考方案17】：

Well if you don't want to sort than this code will give you perfect output. 
#include <iostream>
using namespace std;
int main()
string a="gf da";
string b="da gf";
int al,bl;
int counter =0;
al =a.length();
bl =b.length();
for(int i=0 ;i<al;i++)
    for(int j=0;j<bl;j++)
        if(a[i]==b[j])
            if(j!=bl)
                b[j]=b[b.length()-counter-1];
                bl--;
                counter++;
                break;
            else
                bl--;
                counter++;
            
                
        
    

if(counter==al)
    cout<<"true";

else
    cout<<"false";

return 0;

【讨论】：

【参考方案18】：

这是检查字谜的最简单、最快的方法

bool anagram(string a, string b) 
    int a_sum = 0, b_sum = 0, i = 0;
    while (a[i] != '\0') 
        a_sum += (int)a[i]; // (int) cast not necessary
        b_sum += (int)b[i];
        i++;
    
    return a_sum == b_sum;

只需添加 ASCII 值并检查总和是否相等。

例如：字符串 a = "nap" 和字符串 b = "pan"a_sum = 110 + 97 + 112 = 319b_sum = 112 + 97 + 110 = 319

【讨论】：

测试用例 "ad" vs "bc" 是 "97 + 100" vs "98 + 99" 怎么样。您的解决方案会给出假阳性结果好收获。如果我想出一个解决方案，我会编辑它。谢谢

以上是关于使用 C++ 检查两个字符串是不是是字谜的主要内容，如果未能解决你的问题，请参考以下文章