如何在 Linux 架构上即时列出 C 代码中可用的所有函数/符号?

Posted

技术标签:

【中文标题】如何在 Linux 架构上即时列出 C 代码中可用的所有函数/符号?【英文标题】:How to list on-the-fly all the functions/symbols available in C code on a Linux architecture? 【发布时间】:2013-03-24 15:17:31 【问题描述】:

假设main.c 使用来自共享库的符号和main.c 中声明的本地函数。

有没有一种优雅优雅的方式在运行时打印所有可用函数名称和符号的列表?

应该可以,因为数据加载到.code段。

【问题讨论】:

不是 C 库函数,但可以使用导入/导出部分的 API。如果 C 函数在代码部分用作标签,则可以使用 C 函数 【参考方案1】:

由于我同样需要在运行时检索所有加载的符号名称,因此我根据 R.. 的回答做了一些研究。所以这里有一个 ELF 格式的 linux 共享库的详细解决方案,它适用于我的 gcc 4.3.4,但希望也适用于较新的版本。

我主要使用以下资源来开发此解决方案:

ELF Manpage Some sample code(在搜索“dl_iterate_phdr”时找到)

这是我的代码。我使用自我解释变量名称并添加了详细的 cmets 以使其易于理解。如果有什么错误或遗漏,请告诉我... (编辑:我刚刚意识到问题是针对 C 的,而我的代码是针对 C++ 的。但是如果您省略了向量和字符串,它也应该适用于 C)

#include <link.h>
#include <string>
#include <vector>

using namespace std;

/* Callback for dl_iterate_phdr.
 * Is called by dl_iterate_phdr for every loaded shared lib until something
 * else than 0 is returned by one call of this function.
 */
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector) 


    /* ElfW is a macro that creates proper typenames for the used system architecture
     * (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
    ElfW(Dyn*) dyn;
    ElfW(Sym*) sym;
    ElfW(Word*) hash;

    char* strtab = 0;
    char* sym_name = 0;
    ElfW(Word) sym_cnt = 0;

    /* the void pointer (3rd argument) should be a pointer to a vector<string>
     * in this example -> cast it to make it usable */
    vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);

    /* Iterate over all headers of the current shared lib
     * (first call is for the executable itself) */
    for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
    

        /* Further processing is only needed if the dynamic section is reached */
        if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
        

            /* Get a pointer to the first entry of the dynamic section.
             * It's address is the shared lib's address + the virtual address */
            dyn = (ElfW(Dyn)*)(info->dlpi_addr +  info->dlpi_phdr[header_index].p_vaddr);

            /* Iterate over all entries of the dynamic section until the
             * end of the symbol table is reached. This is indicated by
             * an entry with d_tag == DT_NULL.
             *
             * Only the following entries need to be processed to find the
             * symbol names:
             *  - DT_HASH   -> second word of the hash is the number of symbols
             *  - DT_STRTAB -> pointer to the beginning of a string table that
             *                 contains the symbol names
             *  - DT_SYMTAB -> pointer to the beginning of the symbols table
             */
            while(dyn->d_tag != DT_NULL)
            
                if (dyn->d_tag == DT_HASH)
                
                    /* Get a pointer to the hash */
                    hash = (ElfW(Word*))dyn->d_un.d_ptr;

                    /* The 2nd word is the number of symbols */
                    sym_cnt = hash[1];

                
                else if (dyn->d_tag == DT_STRTAB)
                
                    /* Get the pointer to the string table */
                    strtab = (char*)dyn->d_un.d_ptr;
                
                else if (dyn->d_tag == DT_SYMTAB)
                
                    /* Get the pointer to the first entry of the symbol table */
                    sym = (ElfW(Sym*))dyn->d_un.d_ptr;


                    /* Iterate over the symbol table */
                    for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
                    
                        /* get the name of the i-th symbol.
                         * This is located at the address of st_name
                         * relative to the beginning of the string table. */
                        sym_name = &strtab[sym[sym_index].st_name];

                        symbol_names->push_back(string(sym_name));
                    
                

                /* move pointer to the next entry */
                dyn++;
            
        
    

    /* Returning something != 0 stops further iterations,
     * since only the first entry, which is the executable itself, is needed
     * 1 is returned after processing the first entry.
     *
     * If the symbols of all loaded dynamic libs shall be found,
     * the return value has to be changed to 0.
     */
    return 1;



int main()

    vector<string> symbolNames;
    dl_iterate_phdr(retrieve_symbolnames, &symbolNames);

    return 0;

【讨论】:

使用 DT_HASH 获取符号计数似乎不可靠。当我运行上面的代码时,有趣的是从来没有 DT_HASH。此外,symbol_count 应初始化为 0,否则就会发生恶作剧。 事实证明有可能获得 DT_GNU_HASH 而不是 DT_HASH。有谁知道如何从 gnu 哈希中获取sym_cnt @justin.m.chase: DT_GNU_HASH 没有简单的方法来获取符号计数,而无需简单地遍历所有哈希桶并计数。你可以在这里看到我的代码:git.musl-libc.org/cgit/musl/tree/src/ldso/… 感谢您提供这个 sn-p,我发现从 .so 调用它本身会为您提供相关的 DT_GNU_HASH,您需要一些更类似于 github.com/axlecrusher/hgengine3/blob/C++/Mercury3/… 的代码,它本身并不能解决我的所有问题问题。【参考方案2】:

在动态链接的基于 ELF 的系统上,您可能有一个可用的函数 dl_iterate_phdr。如果是这样,它可用于收集有关每个加载的共享库文件的信息,并且您获得的信息足以检查符号表。流程基本上是:

    从传回给您的dl_phdr_info 结构中获取程序头的地址。 使用PT_DYNAMIC 程序头找到模块的_DYNAMIC 表。 使用_DYNAMICDT_SYMTABDT_STRTABDT_HASH 条目来查找符号列表。 DT_HASH 只需要获取符号表的长度,因为它似乎没有存储在其他任何地方。

你需要的类型都应该在&lt;elf.h&gt;&lt;link.h&gt;中。

【讨论】:

没有动态链接的符号怎么办?或者,像 libc 这样的东西也是共享库吗? 是的,假设您使用的是动态链接,“libc”是一个共享库,您也可以通过这种方式获取它的符号表。【参考方案3】:

这并不是真正的 C 特定问题,而是操作系统和二进制格式以及(用于调试符号和未损坏的 C++ 符号名称)甚至是编译器特定的问题。没有通用的方式,也没有真正优雅的方式。

最便携和面向未来的方式可能是运行外部程序,例如 POSIX 中的nm。在 Linux 中发现的GNU version 可能有很多扩展,如果你的目标是可移植性和面向未来,你应该避免这些扩展。

它的输出应该保持稳定,即使二进制格式发生变化,它也会得到更新并继续工作。只需使用正确的开关运行它,捕获它的输出(可能通过popen 运行它以避免临时文件)并解析它。

【讨论】:

【参考方案4】:

我从 Kanalpiroge 的回答中更新了代码,因此它也适用于 DT_HASH 缺失的情况(例如,RHEL)。它适用于 64 位,但修改它以支持 32 位也相对容易。灵感来自这里:https://chromium-review.googlesource.com/c/crashpad/crashpad/+/876879/18/snapshot/elf/elf_image_reader.cc#b512。

#include <link.h>
#include <string>
#include <vector>

using namespace std;

static uint32_t GetNumberOfSymbolsFromGnuHash(Elf64_Addr gnuHashAddress)

    // See https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ and
    // https://sourceware.org/ml/binutils/2006-10/msg00377.html
    typedef struct
    
        uint32_t nbuckets;
        uint32_t symoffset;
        uint32_t bloom_size;
        uint32_t bloom_shift;
     Header;

    Header* header = (Header*)gnuHashAddress;
    const void* bucketsAddress = (void*)gnuHashAddress + sizeof(Header) + (sizeof(uint64_t) * header->bloom_size);

    // Locate the chain that handles the largest index bucket.
    uint32_t lastSymbol = 0;
    uint32_t* bucketAddress = (uint32_t*)bucketsAddress;
    for (uint32_t i = 0; i < header->nbuckets; ++i)
    
        uint32_t bucket = *bucketAddress;
        if (lastSymbol < bucket)
        
            lastSymbol = bucket;
        
        bucketAddress++;
    

    if (lastSymbol < header->symoffset)
    
        return header->symoffset;
    

    // Walk the bucket's chain to add the chain length to the total.
    const void* chainBaseAddress = bucketsAddress + (sizeof(uint32_t) * header->nbuckets);
    for (;;)
    
        uint32_t* chainEntry = (uint32_t*)(chainBaseAddress + (lastSymbol - header->symoffset) * sizeof(uint32_t));
        lastSymbol++;

        // If the low bit is set, this entry is the end of the chain.
        if (*chainEntry & 1)
        
            break;
        
    

    return lastSymbol;


/* Callback for dl_iterate_phdr.
 * Is called by dl_iterate_phdr for every loaded shared lib until something
 * else than 0 is returned by one call of this function.
 */
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector) 


    /* ElfW is a macro that creates proper typenames for the used system architecture
     * (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
    ElfW(Dyn*) dyn;
    ElfW(Sym*) sym;
    ElfW(Word*) hash;

    char* strtab = 0;
    char* sym_name = 0;
    ElfW(Word) sym_cnt = 0;

    /* the void pointer (3rd argument) should be a pointer to a vector<string>
     * in this example -> cast it to make it usable */
    vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);

    /* Iterate over all headers of the current shared lib
     * (first call is for the executable itself) */
    for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
    

        /* Further processing is only needed if the dynamic section is reached */
        if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
        

            /* Get a pointer to the first entry of the dynamic section.
             * It's address is the shared lib's address + the virtual address */
            dyn = (ElfW(Dyn)*)(info->dlpi_addr +  info->dlpi_phdr[header_index].p_vaddr);

            /* Iterate over all entries of the dynamic section until the
             * end of the symbol table is reached. This is indicated by
             * an entry with d_tag == DT_NULL.
             *
             * Only the following entries need to be processed to find the
             * symbol names:
             *  - DT_HASH   -> second word of the hash is the number of symbols
             *  - DT_STRTAB -> pointer to the beginning of a string table that
             *                 contains the symbol names
             *  - DT_SYMTAB -> pointer to the beginning of the symbols table
             */
            while(dyn->d_tag != DT_NULL)
            
                if (dyn->d_tag == DT_HASH)
                
                    /* Get a pointer to the hash */
                    hash = (ElfW(Word*))dyn->d_un.d_ptr;

                    /* The 2nd word is the number of symbols */
                    sym_cnt = hash[1];

                
                else if (dyn->d_tag == DT_GNU_HASH && sym_cnt == 0)
                
                    sym_cnt = GetNumberOfSymbolsFromGnuHash(dyn->d_un.d_ptr);
                
                else if (dyn->d_tag == DT_STRTAB)
                
                    /* Get the pointer to the string table */
                    strtab = (char*)dyn->d_un.d_ptr;
                
                else if (dyn->d_tag == DT_SYMTAB)
                
                    /* Get the pointer to the first entry of the symbol table */
                    sym = (ElfW(Sym*))dyn->d_un.d_ptr;


                    /* Iterate over the symbol table */
                    for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
                    
                        /* get the name of the i-th symbol.
                         * This is located at the address of st_name
                         * relative to the beginning of the string table. */
                        sym_name = &strtab[sym[sym_index].st_name];

                        symbol_names->push_back(string(sym_name));
                    
                

                /* move pointer to the next entry */
                dyn++;
            
        
    

    /* Returning something != 0 stops further iterations,
     * since only the first entry, which is the executable itself, is needed
     * 1 is returned after processing the first entry.
     *
     * If the symbols of all loaded dynamic libs shall be found,
     * the return value has to be changed to 0.
     */
    return 1;



int main()

    vector<string> symbolNames;
    dl_iterate_phdr(retrieve_symbolnames, &symbolNames);

    return 0;

【讨论】:

【参考方案5】:

应该是dl_iterate_phdr(retrieve_symbolnames, &amp;symbolNames);

【讨论】:

以上是关于如何在 Linux 架构上即时列出 C 代码中可用的所有函数/符号?的主要内容,如果未能解决你的问题,请参考以下文章

im即时通讯开发:高可用易伸缩高并发的IM群聊单聊架构方案设计

ImFire即时通讯系统构建(前言)

im即时通讯开发技术:100到1000万高并发的架构演进

im即时通讯开发:高可用易伸缩高并发的IM群聊单聊架构方案设计

IM即时通讯开发架构技术:整体架构服务拆分

IM即时通讯开发架构技术:整体架构服务拆分