pybind11:将 MPI 通信器从 Python 发送到 CPP

Posted

技术标签:

【中文标题】pybind11:将 MPI 通信器从 Python 发送到 CPP【英文标题】:pybind11: send MPI communicator from Python to CPP 【发布时间】:2022-01-22 04:02:21 【问题描述】:

我有一个 C++ 类,我打算从 python 的 mpi4py 接口调用它,这样每个节点都会生成该类。在 C++ 方面,我使用Open MPI 库(通过homebrew 安装)和pybind11。

C++类如下:

#include <pybind11/pybind11.h>
#include <iostream>
#include <chrono>
#include <thread>
#include <vector>
#include <mpi.h>
// #define PyMPI_HAVE_MPI_Message 1
// #include <mpi4py/mpi4py.h>


namespace py = pybind11;

class SomeComputation

    float multiplier;
    std::vector<int> test;
    MPI_Comm comm_;

public:
    void Init()
    
        int rank;
        MPI_Comm_rank(comm_, &rank);
        test.clear();
        test.resize(10, rank);
    

    void set_comm(MPI_Comm comm)
        this->comm_ = comm;
    

    SomeComputation(float multiplier_) : multiplier(multiplier_)
    ~SomeComputation()  std::cout << "Destructor Called!\n"; 


    float compute(float input)
    
        std::this_thread::sleep_for(std::chrono::milliseconds((int)input * 10));
        for (int i = 0; i != 10; ++i)
        
            std::cout << test[i] << " ";
        
        std::cout << std::endl;
        return multiplier * input;
    
;

PYBIND11_MODULE(module_name, handle)

    py::class_<SomeComputation>(handle, "Cpp_computation")
        .def(py::init<float>()) // args of constructers are template args
        .def("set_comm", &SomeComputation::set_comm)  
        .def("compute", &SomeComputation::compute)
        .def("cpp_init", &SomeComputation::Init);


这是生成相同 C++ 的 python 接口:

from build.module_name import * 
import time

from mpi4py import MPI


comm = MPI.COMM_WORLD
rank = comm.Get_rank()


m = Cpp_computation(44.0) # send communicator to cpp
m.cpp_init()
i = 0
while i < 5:
    print(m.compute(i))
    time.sleep(1)
    i+=1

我已经尝试过“Sharing an MPI communicator using pybind11”,但我遇到了一个长期无益的错误 (full message):

[...]
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h:1398:22:   required from 'pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = void (SomeComputation::*)(ompi_communicator_t*); Extra = ; type_ = SomeComputation; options = ]'
/Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:79:7:   required from here
/opt/homebrew/Cellar/gcc/11.2.0_3/include/c++/11/type_traits:1372:38: error: invalid use of incomplete type 'struct ompi_communicator_t'
 1372 |     : public integral_constant<bool, __is_base_of(_Base, _Derived)>
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:6:
/opt/homebrew/Cellar/open-mpi/4.1.2/include/mpi.h:419:16: note: forward declaration of 'struct ompi_communicator_t'
  419 | typedef struct ompi_communicator_t *MPI_Comm;
      |                ^~~~~~~~~~~~~~~~~~~

[...]

/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h:1398:22:   required from 'pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = void (SomeComputation::*)(ompi_communicator_t*); Extra = ; type_ = SomeComputation; options = ]'
/Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:79:7:   required from here
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/detail/descr.h:40:19: error: invalid use of incomplete type 'struct ompi_communicator_t'
   40 |         return &typeid(Ts)..., nullptr;
      |                   ^~~~~~~~~~
In file included from /Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:6:
/opt/homebrew/Cellar/open-mpi/4.1.2/include/mpi.h:419:16: note: forward declaration of 'struct ompi_communicator_t'
  419 | typedef struct ompi_communicator_t *MPI_Comm;
      |                ^~~~~~~~~~~~~~~~~~~

[...]

                 from /Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:1:
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/detail/descr.h:40:42: error: could not convert '<expression error>, nullptr' from '<brace-enclosed initializer list>' to 'std::array<const std::type_info*, 3>'
   40 |         return &typeid(Ts)..., nullptr;
      |                                          ^
      |                                          |
      |                                          <brace-enclosed initializer list>

[...]

In file included from /Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:1:
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h: In instantiation of 'void pybind11::cpp_function::initialize(Func&&, Return (*)(Args ...), const Extra& ...) [with Func = pybind11::cpp_function::cpp_function<void, SomeComputation, ompi_communicator_t*, pybind11::name, pybind11::is_method, pybind11::sibling>(void (SomeComputation::*)(ompi_communicator_t*), const pybind11::name&, const pybind11::is_method&, const pybind11::sibling&)::<lambda(SomeComputation*, ompi_communicator_t*)>; Return = void; Args = SomeComputation*, ompi_communicator_t*; Extra = pybind11::name, pybind11::is_method, pybind11::sibling]':
[..]
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h:1398:22:   required from 'pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = void (SomeComputation::*)(ompi_communicator_t*); Extra = ; type_ = SomeComputation; options = ]'
/Users/purusharth/Documents/hiwi/pympicontroller/main.cpp:79:7:   required from here
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h:266:73:   in 'constexpr' expansion of 'pybind11::detail::descr<18, SomeComputation, ompi_communicator_t>::types()'
/Users/purusharth/Documents/hiwi/pympicontroller/pybind11/include/pybind11/pybind11.h:266:39: error: 'constexpr' call flows off the end of the function
  266 |         PYBIND11_DESCR_CONSTEXPR auto types = decltype(signature)::types();
      |                                       ^~~~~

错误指向.def("set_comm", &amp;SomeComputation::set_comm)

这些错误的原因是什么,应该如何解决?

更新:如this answer 中所述,使用自定义类型施法器在下面添加了答案。但这是唯一的方法吗?

【问题讨论】:

据我所知,MPI_comm 只是被声明而没有被定义,所以你应该把它保存在 (MPI_comm* comm_;) 而不是直接用值。见***.com/questions/8972588/… 当然,这可能不是真正的问题。在这种情况下,您可以尝试发布完整的 main.cpp 吗?错误引用第 79 行,但您的代码比这短。 您是否可能没有从 MPI 导入所有必需的标头?有几个错误引用了一个不完整的类型,因此您可能缺少一个“完成”类型定义所需的包含。 所有信息都应添加到问题中(如site guidelines 和sample code guidelines 中所述),而不仅仅是linked to。一方面,外部页面消失了(关于 SO 有很多问题,OP 没想到会发生这种情况,但确实发生了)。在长错误的情况下(例如 C++ 编译输出),可以发布核心错误消息,并带有用于完整输出的链接。 (如果不确定如何编辑长错误消息,可以在聊天中请求帮助,只需要 20 个代表,或者可以对同一点发表评论。) 【参考方案1】:

基于此答案:https://***.com/a/62449190/4593199

我能够通过创建自定义 MPI 类型脚轮来传输 MPI Communicator。

#include <pybind11/pybind11.h>
#include <mpi.h>
#include <mpi4py/mpi4py.h>

namespace py = pybind11;

struct mpi4py_comm 
  mpi4py_comm() = default;
  mpi4py_comm(MPI_Comm value) : value(value) 
  operator MPI_Comm ()  return value; 

  MPI_Comm value;
;


namespace pybind11  namespace detail 
  template <> struct type_caster<mpi4py_comm> 
    public:
      PYBIND11_TYPE_CASTER(mpi4py_comm, _("mpi4py_comm"));

      // Python -> C++
      bool load(handle src, bool) 
        PyObject *py_src = src.ptr();

        // Check that we have been passed an mpi4py communicator
        if (PyObject_TypeCheck(py_src, &PyMPIComm_Type)) 
          // Convert to regular MPI communicator
          value.value = *PyMPIComm_Get(py_src);
         else 
          return false;
        

        return !PyErr_Occurred();
      

      // C++ -> Python
      static handle cast(mpi4py_comm src,
                         return_value_policy /* policy */,
                         handle /* parent */)
      
        // Create an mpi4py handle
        return PyMPIComm_New(src.value);
      
  ;
 // namespace pybind11::detail


// recieve a communicator and check if it equals MPI_COMM_WORLD
void print_comm(mpi4py_comm comm)

        int rank;
        std::vector<int> test; 
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        test.clear();
        test.resize(10, rank); 

        for (int i = 0; i != 10; ++i) 
            std::cout << test[i] << " ";
        
        std::cout << std::endl;



class SomeComputation

    float multiplier;
    std::vector<int> test;
    MPI_Comm comm_;

public:
    void Init()
    
        int rank;
        MPI_Comm_rank(comm_, &rank);
        test.clear();
        test.resize(10, rank);
    
    SomeComputation(float multiplier_) : multiplier(multiplier_)
    ~SomeComputation()  std::cout << "Destructor Called!\n"; 

    void set_comm(mpi4py_comm comm)
        this->comm_ = comm;
    

    float compute(float input)
    
        // std::this_thread::sleep_for(std::chrono::milliseconds((int)input * 10));
        for (int i = 0; i != 10; ++i)
        
            std::cout << test[i] << " ";
        
        std::cout << std::endl;
        return multiplier * input;
    
;


mpi4py_comm get_comm()

  return MPI_COMM_WORLD; // Just return MPI_COMM_WORLD for demonstration


PYBIND11_MODULE(native, m)

  // import the mpi4py API
  if (import_mpi4py() < 0) 
    throw std::runtime_error("Could not load mpi4py API.");
  

  // register the test functions
  m.def("print_comm", &print_comm, "Do something with the mpi4py communicator.");
  m.def("get_comm", &get_comm, "Return some communicator.");


    py::class_<SomeComputation>(m, "Cpp_computation")
        .def(py::init<float>()) // args of constructers are template args
        .def("set_comm", &SomeComputation::set_comm)
        .def("compute", &SomeComputation::compute)
        .def("cpp_init", &SomeComputation::Init);

这个编译运行成功了,但是有没有更优雅的方法呢?

【讨论】:

【参考方案2】:

使用void * 作为为我成功编译的参数。它与 pybind11 接口是 ABI 兼容的(MPI_Comm 在任何情况下都是一个指针)。我必须改变的是:

void set_comm(void* comm)
  this->comm_ = (MPI_Comm)comm;

我还向setup.py 添加了 MPI 库和包含文件夹,如下所示(根据需要将文件夹替换为您的 MPI 实现):

ext_modules = [
    Pybind11Extension("module_name",
        ["src/main.cpp"],
        include_dirs=["/etc/alternatives/mpi-x86_64-linux-gnu"],
        library_dirs=["/usr/lib/x86_64-linux-gnu/openmpi/lib"],
        libraries=["mpi", "mpi_cxx"],
    ),
]

【讨论】:

啊,我明白了。但是,我收到一条错误消息,指出 MPI_Comm_rank 在运行时未定义。 ImportError: build/mpi_lib.cpython-310-x86_64-linux-gnu.so: undefined symbol: MPI_Comm_rank 你是用mpicc编译的吗?我使用了以下命令行:CC=mpicxx CXX=mpicxx python setup.py develop 尽管使用 mpicc 编译它,但在从 python 调用时会崩溃并出现相同的错误。你能上传你的CMake文件吗?我不确定setup.py 我没有使用 CMake,所有构建说明都在 setup.py 中。我还设法通过直接添加库来消除使用 mpicxx 编译的要求。请参阅编辑后的答案。我用python setup.py develop 编译了这个。如果您需要更多组件,请告诉我。将mpi_cxx 添加到库中可以修复上述 ImportError。

以上是关于pybind11:将 MPI 通信器从 Python 发送到 CPP的主要内容,如果未能解决你的问题,请参考以下文章

矩阵乘法与 mpi

pybind11 相当于 boost::python::extract?

等效于 pybind11 中的 boost::python py::scope().attr()

Pybind11:从 C++ 端创建并返回 numpy 数组

如何将浮点数传递给期望 int 的 pybind11 函数

MPI通信器的范围