如何在C中按顺序对带有数字和字母的文件名进行排序？

Posted 2023-03-27

技术标签:

【中文标题】如何在C中按顺序对带有数字和字母的文件名进行排序？【英文标题】：How to sort file names with numbers and alphabets in order in C? 【发布时间】：2012-12-01 04:09:40 【问题描述】：

我已经使用以下代码按字母顺序对文件进行排序，它对文件进行排序，如图所示：

for(int i = 0;i < maxcnt;i++) 

    for(int j = i+1;j < maxcnt;j++)
               
        if(strcmp(Array[i],Array[j]) > 0)
                    
            strcpy(temp,Array[i]);      
            strcpy(Array[i],Array[j]);      
            strcpy(Array[j],temp);

但我需要按照在 Windows 资源管理器中看到的顺序对其进行排序

如何按这种方式排序？请帮忙

【问题讨论】：

比较时，先从两个字符串中提取数字，再比较数字。 @Wimmel：这依赖于具有相同前缀的所有文件名，我认为不一定是这种情况。它还依赖于所有包含一个（不是零，不是两个）数字分量的文件名。无关：为什么要复制字符串进行交换？只需交换指针（还有std::swap...）。 How to get the sort order in Delphi as in Windows Explorer? 的可能重复项 - 该问题的答案不是 delphi 特定的。不要太用力敲自己的鼓，但this rather obscure old file manager 包含执行此操作的代码，并且是用纯 C 编写的。它是 GPL。 【参考方案1】：

对于C 的答案，以下是strcasecmp() 的替代品。此函数递归处理包含交替数字和非数字子字符串的字符串。你可以和qsort()一起使用：

int strcasecmp_withNumbers(const void *void_a, const void *void_b) 
   const char *a = void_a;
   const char *b = void_b;

   if (!a || !b)  // if one doesn't exist, other wins by default
      return a ? 1 : b ? -1 : 0;
   
   if (isdigit(*a) && isdigit(*b))  // if both start with numbers
      char *remainderA;
      char *remainderB;
      long valA = strtol(a, &remainderA, 10);
      long valB = strtol(b, &remainderB, 10);
      if (valA != valB)
         return valA - valB;
      // if you wish 7 == 007, comment out the next two lines
      else if (remainderB - b != remainderA - a) // equal with diff lengths
         return (remainderB - b) - (remainderA - a); // set 007 before 7
      else // if numerical parts equal, recurse
         return strcasecmp_withNumbers(remainderA, remainderB);
   
   if (isdigit(*a) || isdigit(*b))  // if just one is a number
      return isdigit(*a) ? -1 : 1; // numbers always come first
   
   while (*a && *b)  // non-numeric characters
      if (isdigit(*a) || isdigit(*b))
         return strcasecmp_withNumbers(a, b); // recurse
      if (tolower(*a) != tolower(*b))
         return tolower(*a) - tolower(*b);
      a++;
      b++;
   
   return *a ? 1 : *b ? -1 : 0;

注意事项：

Windows 需要 stricmp() 而不是 Unix 等效的 strcasecmp()。如果数字真的很大，上面的代码将（显然）给出不正确的结果。此处忽略前导零。在我的领域，这是一个特性，而不是错误：我们通常希望 UAL0123 与 UAL123 匹配。但这可能是也可能不是您需要的。另请参阅Sort on a string that may contain a number 和How to implement a natural sort algorithm in c++?，尽管与上面的代码相比，那里的答案或它们的链接中的答案肯定很长而且杂乱无章，至少大约是四倍。

【讨论】：

【参考方案2】：

自然排序是您必须采用的方式。我有一个适用于我的场景的工作代码。您可能可以通过根据需要更改它来使用它：

    #ifndef JSW_NATURAL_COMPARE
    #define JSW_NATURAL_COMPARE
    #include <string>
    int natural_compare(const char *a, const char *b);
    int natural_compare(const std::string& a, const std::string& b);
    #endif
    #include <cctype>
    namespace 
      // Note: This is a convenience for the natural_compare 
      // function, it is *not* designed for general use
      class int_span 
        int _ws;
        int _zeros;
        const char *_value;
        const char *_end;
      public:
        int_span(const char *src)
        
          const char *start = src;
          // Save and skip leading whitespace
          while (std::isspace(*(unsigned char*)src)) ++src;
          _ws = src - start;
          // Save and skip leading zeros
          start = src;
          while (*src == '0') ++src;
          _zeros = src - start;
          // Save the edges of the value
          _value = src;
          while (std::isdigit(*(unsigned char*)src)) ++src;
          _end = src;
        
        bool is_int() const  return _value != _end; 
        const char *value() const  return _value; 
        int whitespace() const  return _ws; 
        int zeros() const  return _zeros; 
        int digits() const  return _end - _value; 
        int non_value() const  return whitespace() + zeros(); 
      ;
      inline int safe_compare(int a, int b)
      
        return a < b ? -1 : a > b;
      
    
    int natural_compare(const char *a, const char *b)
    
      int cmp = 0;
      while (cmp == 0 && *a != '\0' && *b != '\0') 
        int_span lhs(a), rhs(b);
        if (lhs.is_int() && rhs.is_int()) 
          if (lhs.digits() != rhs.digits()) 
            // For differing widths (excluding leading characters),
            // the value with fewer digits takes priority
            cmp = safe_compare(lhs.digits(), rhs.digits());
          
          else 
            int digits = lhs.digits();
            a = lhs.value();
            b = rhs.value();
            // For matching widths (excluding leading characters),
            // search from MSD to LSD for the larger value
            while (--digits >= 0 && cmp == 0)
              cmp = safe_compare(*a++, *b++);
          
          if (cmp == 0) 
            // If the values are equal, we need a tie   
            // breaker using leading whitespace and zeros
            if (lhs.non_value() != rhs.non_value()) 
              // For differing widths of combined whitespace and 
              // leading zeros, the smaller width takes priority
              cmp = safe_compare(lhs.non_value(), rhs.non_value());
            
            else 
              // For matching widths of combined whitespace 
              // and leading zeros, more whitespace takes priority
              cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
            
          
        
        else 
          // No special logic unless both spans are integers
          cmp = safe_compare(*a++, *b++);
        
      
      // All else being equal so far, the shorter string takes priority
      return cmp == 0 ? safe_compare(*a, *b) : cmp;
    
    #include <string>
    int natural_compare(const std::string& a, const std::string& b)
    
      return natural_compare(a.c_str(), b.c_str());

【讨论】：

【参考方案3】：

您要做的是执行“自然排序”。 Here is a blog post 关于它，我相信在 python 中解释实现。 Here 是一个完成它的 perl 模块。 How to implement a natural sort algorithm in c++?上似乎也有类似的问题

【讨论】：

【参考方案4】：

考虑到这有一个c++ 标签，您可以详细说明@Joseph Quinsey 的答案并创建一个natural_less 函数以传递给标准库。

using namespace std;

bool natural_less(const string& lhs, const string& rhs)

    return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;


void example(vector<string>& data)

    std::sort(data.begin(), data.end(), natural_less);

我花时间写了一些工作代码作为练习 https://github.com/kennethlaskoski/natural_less

【讨论】：

【参考方案5】：

修改this答案：

bool compareNat(const std::string& a, const std::string& b)
    if (a.empty())
        return true;
    if (b.empty())
        return false;
    if (std::isdigit(a[0]) && !std::isdigit(b[0]))
        return true;
    if (!std::isdigit(a[0]) && std::isdigit(b[0]))
        return false;
    if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
    
        if (a[0] == b[0])
            return compareNat(a.substr(1), b.substr(1));
        return (toUpper(a) < toUpper(b));
        //toUpper() is a function to convert a std::string to uppercase.
    

    // Both strings begin with digit --> parse both numbers
    std::istringstream issa(a);
    std::istringstream issb(b);
    int ia, ib;
    issa >> ia;
    issb >> ib;
    if (ia != ib)
        return ia < ib;

    // Numbers are the same --> remove numbers and recurse
    std::string anew, bnew;
    std::getline(issa, anew);
    std::getline(issb, bnew);
    return (compareNat(anew, bnew));

toUpper()函数：

std::string toUpper(std::string s)
    for(int i=0;i<(int)s.length();i++)s[i]=toupper(s[i]);
    return s;

用法：

#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit

int main()

    std::vector<std::string> str;
    str.push_back("20.txt");
    str.push_back("10.txt");
    str.push_back("1.txt");
    str.push_back("z2.txt");
    str.push_back("z10.txt");
    str.push_back("z100.txt");
    str.push_back("1_t.txt");
    str.push_back("abc.txt");
    str.push_back("Abc.txt");
    str.push_back("bcd.txt");

    std::sort(str.begin(), str.end(), compareNat);
    std::copy(str.begin(), str.end(),
              std::ostream_iterator<std::string>(std::cout, "\n"));

【讨论】：

这个效率不是很高，更高效更全面的解决方案是this one 上述解决方案有效，但仍将未编号的同名文件放在底部。这不是我们想要的结果。【参考方案6】：

您的问题是您对文件名的某些部分有解释。

按字典顺序，Slide1 在 Slide10 之前，Slide5 之前。

您希望 Slide5 在 Slide10 之前，因为您对子字符串 5 和 10（作为整数）有一个解释。

你会遇到更多的问题，如果你有文件名中月份的名称，并希望它们按日期排序（即一月在八月之前）。您需要根据这种解释调整排序（“自然”顺序将取决于您的解释，没有通用解决方案）。

另一种方法是以您的排序和字典顺序一致的方式格式化文件名。在您的情况下，您将使用前导零和固定长度的数字。所以Slide1 变成Slide01，然后你会看到按字典顺序对它们进行排序会产生你想要的结果。

但是，您通常无法影响应用程序的输出，因此无法直接强制执行您的格式。

在这些情况下我会做什么：编写一个小脚本/函数，将文件重命名为适当的格式，然后使用标准排序算法对它们进行排序。这样做的好处是您不需要调整您的排序，并且可以使用现有的软件进行排序。不利的一面是，在某些情况下这是不可行的（因为需要修复文件名）。

【讨论】：

在我的例子中，文件名是固定的。

以上是关于如何在C中按顺序对带有数字和字母的文件名进行排序？的主要内容，如果未能解决你的问题，请参考以下文章