是否可以在 C++ 运行时动态创建函数？

Posted 2023-03-25

技术标签:

【中文标题】是否可以在 C++ 运行时动态创建函数？【英文标题】：Is it possible to create a function dynamically, during runtime in C++? 【发布时间】：2012-06-16 11:33:26 【问题描述】：

C++ 是一种静态的编译语言，模板在编译时解析等等...

但是是否可以在运行时创建一个在源代码中没有描述并且在编译期间没有转换为机器语言的函数，以便用户可以向它抛出源代码中没有预料到的数据?

我知道这不可能以直接的方式发生，但肯定有可能，有很多编程语言没有被编译，而是动态地创建了用 C 或 C++ 实现的那种东西。

如果创建了所有原始类型的工厂，以及将它们组织成更复杂的对象（例如用户类型和函数）的合适的数据结构，这是可以实现的吗？

欢迎提供有关该主题的任何信息以及指向在线材料的指针。谢谢！

编辑：我知道这是可能的，这更像是我对实现细节感兴趣:)

【问题讨论】：

你能举例说明你的期望吗？ @DarenThomas，但在处理 C++ 时会变得很棘手。解析器并不简单。 @LuchianGrigore - 这个想法不是直接解析代码，而是一个可视化的数据结构和函数编辑器，可以测试东西（性能并不重要），然后整个程序结构可以序列化为 C++ 代码（每个组件“知道”如何），然后可以按常规方式编译。我对一种新的编程方式有一个愿景，它不是关于打字，而是更多关于视觉和概念表达，但我需要有一些运行时让它在保存到 C++ 源代码和编译之前运行。不需要直接编译成代码，运行即可。现代操作系统通常不允许您分配内存然后将其标记为可执行。虽然它确实是可能的（恶意软件会这样做），但我会改用脚本引擎。另见：c++ - How to generate and run native code dynamically? - Stack Overflow 【参考方案1】：

是的，当然，没有其他答案中提到的任何工具，但只需使用 C++ 编译器。

只需在您的 C++ 程序中执行这些步骤（在 linux 上，但在其他操作系统上必须类似）

ofstream

system("c++ /tmp/prog.cc -o /tmp/prog.so -shared -fPIC");

dlopen()

【讨论】：

...假设目标机器上安装了编译器。 @MathieuRodic 别担心，我们会将编译器静态链接到我们自己的程序。 :)【参考方案2】：

您也可以直接将字节码提供给函数，然后将其转换为函数类型，如下所示。

例如

byte[3] func =  0x90, 0x0f, 0x1 
*reinterpret_cast<void**>(&func)()

【讨论】：

真是个黑客！你怎么知道任何函数的字节码？这真的有用吗？这根本不起作用。第一个问题是优先级：() 比* 绑定得更紧密，所以它解析为* ( reinterpret_cast<void**>(&func)() )。这会失败，因为您不能调用 void **（它不是函数类型）。修复优先级无济于事：如果您先取消引用，它将尝试调用void *，这也不是函数类型。如果您设法正确地将&func（数组的地址，推测：byte[3] func 实际上是一个语法错误）转换为函数指针的地址（void (**)()）并取消引用它，它会崩溃，因为... ... 它现在将数组 (0x90, 0x0f, 0x1) 的内容解释为函数指针，这是无稽之谈。如果你想让它编译，你需要像unsigned char func[] = 0x90, 0x0f, 0x1 ; reinterpret_cast<void (*)()>(func)();这样的东西（没有这个指针到指针的东西），它可能仍然会在运行时崩溃，但至少它现在要求CPU从一个字节执行代码数组。（它可能会崩溃，因为即使您的处理器架构具有正确的字节码，堆栈和数据段也可能不会在任何现代操作系统上标记为可执行。） @Jay 我不会扯头发。声明语法（最不重要的部分）是byte func[3]，而不是byte[3] func。如果你写&func，你会得到一个指向整个数组的指针。这仍然是单层间接。如果您尝试取消引用它两次（在您的代码中：一次由*，一次由()（函数调用运算符）），您最终会将3 个字节视为内存地址。即使您忽略所有优先级和类型错误，这也行不通。【参考方案3】：

是的，JIT 编译器一直都在这样做。他们分配一块已被操作系统赋予特殊执行权限的内存，然后用代码填充它并将指针转换为函数指针并执行它。很简单。

编辑：这是一个关于如何在 Linux 中执行此操作的示例：http://burnttoys.blogspot.de/2011/04/how-to-allocate-executable-memory-on.html

【讨论】：

【参考方案4】：

下面是一个基于前面提到的方法的C++运行时编译示例（将代码写入输出文件，通过system()编译，通过dlopen()和dlsym()加载）。另请参阅related question 中的示例。这里的区别在于它动态编译一个类而不是一个函数。这是通过将 C 风格的 maker() 函数添加到要动态编译的代码来实现的。参考资料：

https://www.linuxjournal.com/article/3687 http://www.tldp.org/HOWTO/C++-dlopen/thesolution.html

该示例仅适用于 Linux（Windows 具有 LoadLibrary 和 GetProcAddress 函数），并且需要在目标机器上提供相同的编译器。

baseclass.h

#ifndef BASECLASS_H
#define BASECLASS_H
class A

protected:
    double m_input;     // or use a pointer to a larger input object
public:
    virtual double f(double x) const = 0;
    void init(double input)  m_input=input; 
    virtual ~A() ;
;
#endif /* BASECLASS_H */

main.cpp

#include "baseclass.h"
#include <cstdlib>      // EXIT_FAILURE, etc
#include <string>
#include <iostream>
#include <fstream>
#include <dlfcn.h>      // dynamic library loading, dlopen() etc
#include <memory>       // std::shared_ptr

// compile code, instantiate class and return pointer to base class
// https://www.linuxjournal.com/article/3687
// http://www.tldp.org/HOWTO/C++-dlopen/thesolution.html
// https://***.com/questions/11016078/
// https://***.com/questions/10564670/
std::shared_ptr<A> compile(const std::string& code)

    // temporary cpp/library output files
    std::string outpath="/tmp";
    std::string headerfile="baseclass.h";
    std::string cppfile=outpath+"/runtimecode.cpp";
    std::string libfile=outpath+"/runtimecode.so";
    std::string logfile=outpath+"/runtimecode.log";
    std::ofstream out(cppfile.c_str(), std::ofstream::out);

    // copy required header file to outpath
    std::string cp_cmd="cp " + headerfile + " " + outpath;
    system(cp_cmd.c_str());

    // add necessary header to the code
    std::string newcode =   "#include \"" + headerfile + "\"\n\n"
                            + code + "\n\n"
                            "extern \"C\" \n"
                            "A* maker()\n"
                            "\n"
                            "    return (A*) new B(); \n"
                            "\n"
                            " // extern C\n";

    // output code to file
    if(out.bad()) 
        std::cout << "cannot open " << cppfile << std::endl;
        exit(EXIT_FAILURE);
    
    out << newcode;
    out.flush();
    out.close();

    // compile the code
    std::string cmd = "g++ -Wall -Wextra " + cppfile + " -o " + libfile
                      + " -O2 -shared -fPIC &> " + logfile;
    int ret = system(cmd.c_str());
    if(WEXITSTATUS(ret) != EXIT_SUCCESS) 
        std::cout << "compilation failed, see " << logfile << std::endl;
        exit(EXIT_FAILURE);
    

    // load dynamic library
    void* dynlib = dlopen (libfile.c_str(), RTLD_LAZY);
    if(!dynlib) 
        std::cerr << "error loading library:\n" << dlerror() << std::endl;
        exit(EXIT_FAILURE);
    

    // loading symbol from library and assign to pointer
    // (to be cast to function pointer later)
    void* create = dlsym(dynlib, "maker");
    const char* dlsym_error=dlerror();
    if(dlsym_error != NULL)  
        std::cerr << "error loading symbol:\n" << dlsym_error << std::endl;
        exit(EXIT_FAILURE);
    

    // execute "create" function
    // (casting to function pointer first)
    // https://***.com/questions/8245880/
    A* a = reinterpret_cast<A*(*)()> (create)();

    // cannot close dynamic lib here, because all functions of the class
    // object will still refer to the library code
    // dlclose(dynlib);

    return std::shared_ptr<A>(a);



int main(int argc, char** argv)

    double input=2.0;
    double x=5.1;
    // code to be compiled at run-time
    // class needs to be called B and derived from A
    std::string code =  "class B : public A \n"
                        "    double f(double x) const \n"
                        "    \n"
                        "        return m_input*x;\n"
                        "    \n"
                        ";";

    std::cout << "compiling.." << std::endl;
    std::shared_ptr<A> a = compile(code);
    a->init(input);
    std::cout << "f(" << x << ") = " << a->f(x) << std::endl;

    return EXIT_SUCCESS;

输出

$ g++ -Wall -std=c++11 -O2 -c main.cpp -o main.o   # c++11 required for std::shared_ptr
$ g++ -ldl main.o -o main
$ ./main
compiling..
f(5.1) = 10.2

【讨论】：

当字符串newcode可以直接写入流而不创建临时对象时，为什么还要创建它？【参考方案5】：

除了简单地使用嵌入式脚本语言（Lua 非常适合嵌入）或编写自己的 C++ 编译器以在运行时使用之外，如果您真的想使用 C++，您可以只使用现有的编译器。

例如Clang 是一个 C++ 编译器，它构建为可以轻松嵌入到另一个程序中的库。它旨在用于需要以各种方式分析和操作 C++ 源代码的 IDE 等程序，但使用 LLVM 编译器基础设施作为后端，它还能够在运行时生成代码并为您提供一个函数指针你可以调用来运行生成的代码。

Clang LLVM

【讨论】：

【参考方案6】：

看看libtcc;它简单、快速、可靠并且适合您的需要。每当我需要“即时”编译 C 函数时，我都会使用它。

在存档中，您会找到文件 examples/libtcc_test.c，它可以让您有个良好的开端。这个小教程也可能对你有所帮助：http://blog.mister-muffin.de/2011/10/22/discovering-tcc/

#include <stdlib.h>
#include <stdio.h>
#include "libtcc.h"

int add(int a, int b)  return a + b; 

char my_program[] =
"int fib(int n) \n"
"    if (n <= 2) return 1;\n"
"    else return fib(n-1) + fib(n-2);\n"
"\n"
"int foobar(int n) \n"
"    printf(\"fib(%d) = %d\\n\", n, fib(n));\n"
"    printf(\"add(%d, %d) = %d\\n\", n, 2 * n, add(n, 2 * n));\n"
"    return 1337;\n"
"\n";

int main(int argc, char **argv)

    TCCState *s;
    int (*foobar_func)(int);
    void *mem;

    s = tcc_new();
    tcc_set_output_type(s, TCC_OUTPUT_MEMORY);
    tcc_compile_string(s, my_program);
    tcc_add_symbol(s, "add", add);

    mem = malloc(tcc_relocate(s, NULL));
    tcc_relocate(s, mem);

    foobar_func = tcc_get_symbol(s, "foobar");

    tcc_delete(s);

    printf("foobar returned: %d\n", foobar_func(32));

    free(mem);
    return 0;

如果您在使用库时遇到任何问题，请在 cmets 中提问！

【讨论】：

【参考方案7】：

基本上，您需要在程序中编写 C++ 编译器（这不是一项简单的任务），并执行与 JIT 编译器相同的操作来运行代码。你实际上已经完成了这一段的 90%：

我知道这不可能以直接的方式发生，但可以肯定的是必须是可能的，有很多编程语言是没有编译并动态创建那种东西用 C 或 C++ 实现。

正是——这些程序带有解释器。你可以通过说python MyProgram.py--python 运行一个 python 程序，它是编译后的 C 代码，它能够即时解释和运行你的程序。您需要按照这些方式做一些事情，但要使用 C++ 编译器。

如果您非常需要的动态函数，请使用不同的语言 :)

【讨论】：

【参考方案8】：

一种典型的方法是将 C++（或任何它所编写的）项目与脚本语言结合起来。Lua 是最受欢迎的方法之一，因为它有据可查、体积小，并且具有用于很多语言。

但如果您不朝这个方向研究，也许您可以考虑使用动态库？

【讨论】：

找不到投反对票的理由。下次您投反对票时，请提供原因。【参考方案9】：

是的 - 您可以使用 C++ 编写具有一些额外功能的 C++ 编译器 - 编写您自己的函数，自动编译和运行（或不自动）...

【讨论】：

我不期待编译对象，在运行时动态创建为机器代码，只是执行它们，尽管不是最高性能和效率。【参考方案10】：

查看 .NET 中的 ExpressionTrees - 我认为这基本上就是您想要实现的目标。创建一个子表达式树，然后评估它们。在面向对象的方式中，每个节点可能知道如何通过递归到其子节点来评估自己。然后，您的视觉语言将创建此树，您可以编写一个简单的解释器来执行它。

另外，请查看Ptolemy II，作为 Java 中的示例，了解如何编写这种可视化编程语言。

【讨论】：

【参考方案11】：

您可以查看Runtime Compiled C++（或查看RCC++ blog and videos），或者尝试其中的alternatives。

【讨论】：

【参考方案12】：

使用操作码扩展Jay's answer，以下适用于Linux。

myfunc.cpp

double f(double x)  return x*x;

$ g++ -O2 -c myfunc.cpp

f

$ gdb -batch -ex "file ./myfunc.o" -ex "set disassembly-flavor intel" -ex "disassemble/rs f"
Dump of assembler code for function _Z1fd:
   0x0000000000000000 <+0>:     f2 0f 59 c0     mulsd  xmm0,xmm0
   0x0000000000000004 <+4>:     c3      ret    
End of assembler dump.

x*x

mulsd xmm0,xmm0

ret

f2 0f 59 c0 c3

opcode.cpp

#include <cstdlib>          // EXIT_FAILURE etc
#include <cstdio>           // printf(), fopen() etc
#include <cstring>          // memcpy()
#include <sys/mman.h>       // mmap()

// allocate memory and fill it with machine code instructions
// returns pointer to memory location and length in bytes
void* gencode(size_t& length)

    // machine code
    unsigned char opcode[] = 
        0xf2, 0x0f, 0x59, 0xc0,         // mulsd  xmm0,xmm0
        0xc3                            // ret
    ;
    // allocate memory which allows code execution
    // https://en.wikipedia.org/wiki/NX_bit
    void* buf = mmap(NULL,sizeof(opcode),PROT_READ|PROT_WRITE|PROT_EXEC,
                     MAP_PRIVATE|MAP_ANON,-1,0);
    // copy machine code to executable memory location
    memcpy(buf, opcode, sizeof(opcode));
    // return: pointer to memory location with executable code
    length = sizeof(opcode);
    return buf;


// print the disassemby of buf
void print_asm(const void* buf, size_t length)

    FILE* fp = fopen("/tmp/opcode.bin", "w");
    if(fp!=NULL) 
        fwrite(buf, length, 1, fp);
        fclose(fp);
    
    system("objdump -D -M intel -b binary -mi386 /tmp/opcode.bin");


int main(int, char**)

    // generate machine code and point myfunc() to it
    size_t length;
    void* code=gencode(length);
    double (*myfunc)(double);   // function pointer
    myfunc = reinterpret_cast<double(*)(double)>(code);

    double x=1.5;
    printf("f(%f)=%f\n", x,myfunc(x));
    print_asm(code,length);     // for debugging
    return EXIT_SUCCESS;

$ g++ -O2 opcode.cpp -o opcode
$ ./opcode
f(1.500000)=2.250000

/tmp/opcode.bin:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   f2 0f 59 c0             mulsd  xmm0,xmm0
   4:   c3                      ret

【讨论】：

【参考方案13】：

如果您不追求性能，最简单的解决方案是嵌入脚本语言解释器，例如对于Lua 或Python。

【讨论】：

我不期待嵌入第三方解释语言，但更希望根据自己的需要自行创建这些设施。 -1。我不认为这回答了这个问题。没有他在哪里问“什么语言支持这个？”他问：“我可以用 C++ 做吗？” 亲爱的 Vlad @Vlad，你知道任何嵌入 python 的开源项目吗，我希望看到这个，谢谢！见AppsWithPythonScripting和Embedding Python in C/C++。【参考方案14】：

它像这样对我有用。您必须使用 -fpermissive 标志。我正在使用 CodeBlocks 17.12。

#include <cstddef>

using namespace std;
int main()

    char func[] = '\x90', '\x0f', '\x1';
    void (*func2)() = reinterpret_cast<void*>(&func);
    func2();
    return 0;

【讨论】：

以上是关于是否可以在 C++ 运行时动态创建函数？的主要内容，如果未能解决你的问题，请参考以下文章