递归文件搜索的问题

Posted

技术标签:

【中文标题】递归文件搜索的问题【英文标题】:Trouble with recursive file search 【发布时间】:2015-12-23 20:03:18 【问题描述】:

我正在尝试编写一个脚本来递归扫描磁盘并返回目录、文件的总数并显示找到的最大文件。 我知道在 *** 或网络上可以找到很多信息,但是到目前为止我发现的所有示例似乎都存在真正递归扫描磁盘的问题。 下面的脚本使用 FindFile 和 FindNextFile 函数来扫描文件。如果文件属性返回该函数找到了一个目录,它会将名称写入一个列表以供稍后搜索。搜索当前目录后,脚本获取列表的第一项(要扫描的目录),从列表中删除该项目并扫描此目录。

我的问题是扫描了几百个目录和子目录后,程序以错误结束

Unhandled exception at ..... (ntdll.dll) in File Lister 6.0.exe: .....: Stack overflow

Unhandled exception at ..... (ntdll.dll) in File Lister 6.0.exe: ......: Access violation writing location 0x00f30fe8.

但随后扫描的目录永远不会相同。

我一直在尝试解决这个问题,但找不到明确的原因。 因此,如果有人可以提供帮助,我将不胜感激

下面是我的代码。我知道它看起来很新手,对此我深表歉意

    #include <windows.h>
    #include <iostream>
    #include <fstream>
    #include <conio.h>
    #include <ctype.h>
    #include <string>
    #include <string.h>
    #include <stdio.h>
    #include <direct.h>
    #include <list>

    using namespace std;


    list <string> myList;
    std::list<string>::iterator it;

       int siZe;
       int siZekB;
       bool  bSearchSubdirectories=true;
       HANDLE handle;
       LPCTSTR strPattern;
       std::string temp;
       std::string temp1;
       std::string temp2;
       std::string temp3;
       WIN32_FIND_DATA search_data;
       int DIRPlace;
       int MaxDir=400000000;

       int telDIR=1;
       int telFILES=1;
       double LargestFile=0;
       string LargestFileName;
       string SDir;


    std::string string_to_hex(const std::string& input)
    
        static const char* const lut = "0123456789ABCDEF";
        size_t len = input.length();

        std::string output;
        output.reserve(2 * len);
        for (size_t i = 0; i < len; ++i)
        
            const unsigned char c = input[i];
            output.push_back(lut[c >> 4]);
            output.push_back(lut[c & 15]);
        
        return output;
    



    int SearchDirectory(string FileSearch ,string refvecFiles,
                        bool bSearchSubdirectories)
      

       WIN32_FIND_DATA search_data;
       memset(&search_data, 0, sizeof(WIN32_FIND_DATA));
       HANDLE handle = FindFirstFile((refvecFiles+FileSearch).c_str(), &search_data);  
       temp = refvecFiles;
       while(handle != INVALID_HANDLE_VALUE)   

               do
                   if (search_data.cFileName[0]!='.')  
                           temp2=search_data.dwFileAttributes;
                           temp3=string_to_hex(temp2);
                           DIRPlace=strlen(temp3.c_str())-1;
                       switch (temp3[DIRPlace-1])
                       
                          case '1':    //Directory
                               temp = refvecFiles;
                               temp1=search_data.cFileName;
                               temp2=search_data.dwFileAttributes;
                               myList.push_back(temp+temp1);
                               telDIR++;
                               break; 
                         default:   //Other types (Files etc)
                               siZe=(search_data.nFileSizeHigh * (MAXDWORD+1)) + search_data.nFileSizeLow;
                               siZekB=((search_data.nFileSizeHigh * (MAXDWORD+1)) + search_data.nFileSizeLow)/1024;
                               temp = refvecFiles;
                               temp1=search_data.cFileName;
                               temp2=search_data.dwFileAttributes;
                               if (siZekB>LargestFile)
                                  LargestFile=siZekB;
                                  LargestFileName=temp.substr(0, temp.size()) + "\\" + temp1;
                               
                               telFILES++;
                               break;
                       
                     
                while      (FindNextFile(handle, &search_data) != FALSE && telDIR<MaxDir);
                                  string line,SearD, LineFiller,FrontString, BackString;


                      it=myList.begin();
                        SearD=*it+"\\\\";
                        myList.remove(*it);
                      if (SearD.length()>60)
                      
                          FrontString=SearD.substr(0,10);
                          BackString=SearD.substr(SearD.length()-44);
                          LineFiller="......";
                          FrontString=FrontString+LineFiller+BackString;
                      else
                        FrontString=SearD;
                      
                      cout<<"Exploring:                                                              "<<"\r";
                      cout<<"Exploring: "<<FrontString<<"\r\r";
                      SearchDirectory("\\*",SearD, false);     
       
       FindClose(handle);
       return 0;

    





    int main(int argc, char* argv[])
       
        std::cout<< "Enter directory to be searched: ";
        getline(cin,SDir);
        std::cout<< "\n";
        try
        SearchDirectory("\\*",SDir, false);
        catch (int e)
  
    cout << "An exception occurred. Exception Nr. " << e << '\n';
  
                std::cout<< "\n"<<"\n";
        std::cout<<"Directories found: "<< telDIR<< "\n";
        std::cout<<"Files found: "<< telFILES<< "\n"; 
        std::cout<<"Largest File: "<<LargestFileName << " ("<< LargestFile << " kB)"<<"\n";     
        std::cout<<"press any key";
        getch();       
    

编辑:下面是调试器调用堆栈的快照

>   File Lister 6.0.exe!std::operator<<<std::char_traits<char> >(std::basic_ostream<char,std::char_traits<char> > & _Ostr=..., const char * _Val=0x000d51c8)  Line 791 + 0x20 bytes   C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\numeric\\interval\\", bool bSearchSubdirectories=false)  Line 110 + 0x16 bytes   C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\numeric\\conversion\\", bool bSearchSubdirectories=false)  Line 113  C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\multi_index\\detail\\", bool bSearchSubdirectories=false)  Line 113  C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\multiprecision\\traits\\", bool bSearchSubdirectories=false)  Line 113   C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\multiprecision\\detail\\", bool bSearchSubdirectories=false)  Line 113   C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\multiprecision\\cpp_int\\", bool bSearchSubdirectories=false)  Line 113  C++
    File Lister 6.0.exe!SearchDirectory(std::basic_string<char,std::char_traits<char>,std::allocator<char> > FileSearch="\*", std::basic_string<char,std::char_traits<char>,std::allocator<char> > refvecFiles="c:\\boost\\multiprecision\\concepts\\", bool bSearchSubdirectories=false)  Line 113 C++

调试器在 ostream 中停止

if (_State == ios_base::goodbit
    && _Ostr.rdbuf()->sputn(_Val, _Count) != _Count)
    _State |= ios_base::badbit;

【问题讨论】:

人们不断将 C++ 标签添加到您的 C++11 问题中。请接受提示! 如果您遇到崩溃,您应该做的第一件事是在调试器中运行您的程序,并让调试器在运行中捕获崩溃。这将告诉您在哪里崩溃发生了,您将能够检查函数调用堆栈并将其向上移动到您的代码(如果调试器尚未在您的代码中停止),然后您可以检查所有变量的值。至少你应该编辑你的问题,向我们展示崩溃的位置在你的代码中,最好是相关变量的值。 How do you iterate through every file/directory recursively in standard C++?的可能重复 【参考方案1】:

从表面上看,您已经实现了一个无限递归,它不会运行到无限,而是直到您耗尽资源。值得注意的是,在SearchDirectory() 的底部,您调用SearchDirectory() 时使用的参数与您最初调用它时的参数相同。您需要确保递归最终停止。

【讨论】:

【参考方案2】:

好的,我找到了答案。这一切都与不定式递归有关。如果有人感兴趣,我已经重写了代码。它会在 100 秒内扫描 650GB 硬盘(超过 750k 文件)并返回一些不错的数据:

    #include <windows.h>
    #include <iostream>
    #include <fstream>
    #include <conio.h>
    #include <ctype.h>
    #include <string>
    #include <string.h>
    #include <stdio.h>
    #include <direct.h>
    #include <list>
    #include <vector>
    #include <ctime>
    #include <iomanip>

using namespace std;

    vector <string> myList;

       int siZe;
       int siZekB;
       bool  bSearchSubdirectories=true;
       HANDLE handle;
       LPCTSTR strPattern;
       std::string temp;
       std::string temp1;
       std::string temp2;
       std::string temp3;
       WIN32_FIND_DATA search_data;
       int DIRPlace;

       int telDIR=0;
       int telFILES=0;
       int telFILESHidden=0;
       double LargestFile=0;
       double LargestFileHidden=0;
       double TotalUsed=0;
       double TotalUsedHidden=0;
       string LargestFileName;
       string LargestFileNameHidden;
       string SDir;
       string line,SearD, LineFiller,FrontString, BackString;
       int SearchDirectory(string FileSearch ,string refvecFiles,
                        bool bSearchSubdirectories);


int main(int argc, char* argv[])
       
        system("mode CON: COLS=140 LINES=22");
        std::cout<< "Enter directory to be explored: ";
        getline(cin,SDir);
        std::cout<< "\n";

  clock_t begin = clock();



        SearchDirectory("\\*.*",SDir, false);

        while (!myList.empty())

            SearD=myList.back();
            myList.pop_back();

            SearchDirectory("\\*.*",SearD.c_str(), false);
        


        clock_t end = clock();
        double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

        std::cout<< "\n"<<"\n";
        system("cls");
        std::cout<<"Directory explored in      : "<<elapsed_secs<<" seconds\n";
        std::cout<<"Explored (sub)directories  : "<< telDIR<< "\n";
        std::cout<<"Unexplored directories     : "<<myList.size()<<"\n\n";
        std::cout<<setprecision (2)<<fixed<<"Files found                : "<< telFILES<< " using "<<TotalUsed<<" kB storage\n"; 
        std::cout<<setprecision (2)<<fixed<<"Largest File               : "<<LargestFileName << " ("<< LargestFile << " kB)"<<"\n\n";

        std::cout<<setprecision (2)<<fixed<<"Hidden Files found         : "<< telFILESHidden<< " using "<<TotalUsedHidden<<" kB storage\n"; 
        std::cout<<setprecision (2)<<fixed<<"Largest Hidden File        : "<<LargestFileNameHidden << " ("<< LargestFileHidden << " kB)"<<"\n";

        std::cout<<"press any key";
        getch();       
    


    int SearchDirectory(string FileSearch ,string refvecFiles,
                        bool bSearchSubdirectories)
      

       WIN32_FIND_DATA search_data;
       temp = refvecFiles+ "\\\\";
       refvecFiles += "\\\\*.*";        


       HANDLE handle = FindFirstFile(refvecFiles.c_str(), &search_data);

       if (INVALID_HANDLE_VALUE == handle) 
        
            return 0;
        

        do
                    if((search_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))      //Directory
                            string filePath =search_data.cFileName;
                            if (strcmp(".", filePath.c_str()) && strcmp("..", filePath.c_str()))             
                                
                                    if (filePath == "$RECYCLE.BIN" || filePath == "$Recycle.Bin")
                                    continue;
                                    temp1=search_data.cFileName;
                                    temp2=search_data.dwFileAttributes;
                                    myList.push_back(temp+temp1);
                                    telDIR++;
                            
                        else
                                siZe=(search_data.nFileSizeHigh * (MAXDWORD+1)) + search_data.nFileSizeLow;
                                siZekB=((search_data.nFileSizeHigh * (MAXDWORD+1)) + search_data.nFileSizeLow)/1024;
                                temp1=search_data.cFileName;
                                temp2=search_data.dwFileAttributes;
                                if ((search_data.dwFileAttributes & FILE_ATTRIBUTE_HIDDEN) == 0)
                                   TotalUsed=TotalUsed+siZekB;
                                    if (siZekB>LargestFile)
                                        LargestFile=siZekB;
                                        LargestFileName=temp.substr(0, temp.size()) + "\\" + temp1;
                                    
                                    telFILES++;
                                else
                                    TotalUsedHidden=TotalUsedHidden+siZekB;
                                    if (siZekB>LargestFileHidden)
                                        LargestFileHidden=siZekB;
                                        LargestFileNameHidden=temp.substr(0, temp.size()) + "\\" + temp1;
                                    
                                    telFILESHidden++;
                                
                        

        while      (FindNextFile(handle, &search_data) != 0);
       FindClose(handle);
       return 0;
    

【讨论】:

以上是关于递归文件搜索的问题的主要内容,如果未能解决你的问题,请参考以下文章

找到所需文件后,如何立即停止递归搜索目录?

C ++递归搜索具有某些扩展名的文件

递归搜索根目录和子文件夹中的文件[重复]

如何进行递归子文件夹搜索并返回列表中的文件?

使用 PHP 递归搜索目录中的文件并更改值

如何使用PowerShell递归搜索目录中的所有文件,包括隐藏目录中的隐藏文件?