使用 boost::python vector_indexing_suite 包装 std::vector

Posted 2023-02-21

技术标签:

【中文标题】使用 boost::python vector_indexing_suite 包装 std::vector【英文标题】：Wrapping an std::vector using boost::python vector_indexing_suite 【发布时间】：2014-11-22 12:44:07 【问题描述】：

我正在开发一个带有 Python 绑定（使用 boost::python）的 C++ 库，表示存储在文件中的数据。我的大多数半技术用户将使用 Python 与之交互，因此我需要使其尽可能 Pythonic。不过，我也会让 C++ 程序员使用 API，所以我不想在 C++ 方面妥协以适应 Python 绑定。

图书馆的很大一部分将由容器组成。为了让 python 用户更直观，我希望他们表现得像 python 列表，即：

# an example compound class
class Foo:
    def __init__( self, _val ):
        self.val = _val

# add it to a list
foo = Foo(0.0)
vect = []
vect.append(foo)

# change the value of the *original* instance
foo.val = 666.0
# which also changes the instance inside the container
print vect[0].val # outputs 666.0

测试设置

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
#include <boost/python/register_ptr_to_python.hpp>
#include <boost/shared_ptr.hpp>

struct Foo 
    double val;

    Foo(double a) : val(a) 
    bool operator == (const Foo& f) const  return val == f.val; 
;

/* insert the test module wrapping code here */

int main() 
    Py_Initialize();
    inittest();

    boost::python::object globals = boost::python::import("__main__").attr("__dict__");

    boost::python::exec(
        "import test\n"

        "foo = test.Foo(0.0)\n"         // make a new Foo instance
        "vect = test.FooVector()\n"     // make a new vector of Foos
        "vect.append(foo)\n"            // add the instance to the vector

        "foo.val = 666.0\n"             // assign a new value to the instance
                                        //   which should change the value in vector

        "print 'Foo =', foo.val\n"      // and print the results
        "print 'vector[0] =', vect[0].val\n",

        globals, globals
    );

    return 0;

`shared_ptr`的方式

使用 shared_ptr，我可以获得与上面相同的行为，但这也意味着我必须使用共享指针表示 C++ 中的所有数据，从很多角度来看这并不好。

BOOST_PYTHON_MODULE( test ) 
    // wrap Foo
    boost::python::class_< Foo, boost::shared_ptr<Foo> >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);

    // wrap vector of shared_ptr Foos
    boost::python::class_< std::vector < boost::shared_ptr<Foo> > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< boost::shared_ptr<Foo> >, true >());

在我的测试设置中，这会产生与纯 Python 相同的输出：

Foo = 666.0
vector[0] = 666.0

`vector<Foo>`的方式

直接使用向量在 C++ 端提供了一个很好的干净设置。但是，结果的行为方式与纯 Python 不同。

BOOST_PYTHON_MODULE( test ) 
    // wrap Foo
    boost::python::class_< Foo >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);

    // wrap vector of Foos
    boost::python::class_< std::vector < Foo > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< Foo > >());

这会产生：

Foo = 666.0
vector[0] = 0.0

这是“错误的”——改变原始实例并没有改变容器内的值。

我希望我不要太多

有趣的是，无论我使用两种封装中的哪一种，这段代码都有效：

footwo = vect[0]
footwo.val = 555.0
print vect[0].val

这意味着 boost::python 能够处理“假共享所有权”（通过其 by_proxy 返回机制）。在插入新元素时有什么方法可以达到同样的效果吗？

但是，如果答案是否定的，我很想听听其他建议 - Python 工具包中是否有一个示例实现了类似的集合封装，但它的行为不像 Python 列表？

非常感谢您阅读本文 :)

【问题讨论】：

如果你有数字向量，你可能想看看 swig 和 numpy：wiki.scipy.org/Cookbook/SWIG_NumPy_examples 它包装了与 numpy 之间的传递指针（std::vectors 可以很容易地转换为）数组。谢谢！在最底层，每一层都有一组“通道”组件，每一个都有点像一个数组（上面有一些东西），所以我一定会看看。但是，对于更高级别的对象，情况并非如此，所以我仍然卡住了:( 如 Boost.Python 文档中所述，Python 容器不容易映射到 C++ 容器。如果提供了有关如何在 Python 和 C++ 中或在两种语言之间使用集合的更多详细信息，则可能有助于为解决方案提供更好的方向。此外，所有权语义需要明确定义。在 Boost.Python 中，对象所有权是相当明确的：它要么在语言之间共享，要么一种语言拥有独占所有权。没有“虚假共享所有权”的概念。 @TannerSansbury 正如原始问题中提到的，目标是一个具有 Python 绑定的库。有两个 API，一个在 C++ 中，一个在 Python 中，并且库本身不提供在这两者之间传递所有权的方法。 “假共享所有权”不是指两种语言之间，而是 Python-only 场景中的两个 Python 对象（使用代理机制）。我知道容器在语言之间不容易映射，但我希望有一种方法可以提供具有绑定的单个实现，而不是实现具有相同功能的两个单独的库。很有可能存在使用单一实现的解决方案。这个问题很好地解释了如何使用集合的所需方式，但没有涵盖集合本身的使用方式。 Python 用户会将集合传递给将对其进行操作的 C++ 函数吗？它会改变集合吗？要素？它会在副本上运行吗？这些微妙的语义通常会影响理想的解决方案（代理、自定义转换器、自定义持有类型、猴子补丁等） 【参考方案1】：

由于语言之间的语义差异，当涉及到集合时，通常很难将单个可重用的解决方案应用于所有场景。最大的问题是，虽然 Python 集合直接支持引用，但 C++ 集合需要一定程度的间接性，例如具有 shared_ptr 元素类型。如果没有这种间接，C++ 集合将无法支持与 Python 集合相同的功能。例如，考虑两个引用同一个对象的索引：

s = Spam()
spams = []
spams.append(s)
spams.append(s)

如果没有类似指针的元素类型，C++ 集合就不能有两个索引指向同一个对象。尽管如此，根据使用情况和需求，可能会有一些选项允许 Python 用户使用 Python 风格的接口，同时仍保持 C++ 的单一实现。

最符合 Python 风格的解决方案是使用自定义转换器，将 Python 可迭代对象转换为 C++ 集合。有关实施细节，请参阅this 答案。如果出现以下情况，请考虑此选项：集合的元素复制成本很低。 C++ 函数仅对右值类型（即std::vector<> 或const std::vector<>&）进行操作。此限制阻止 C++ 更改 Python 集合或其元素。增强vector_indexing_suite 功能，尽可能多地重用功能，例如用于安全处理索引删除和底层集合重新分配的代理：使用自定义的HeldType 公开模型，该模型用作智能指针并委托给从vector_indexing_suite 返回的实例或元素代理对象。 Monkey 修补了将元素插入到集合中的集合方法，以便将自定义 HeldType 设置为委托给元素代理。

当向 Boost.Python 公开一个类时，HeldType 是嵌入在 Boost.Python 对象中的对象类型。当访问封装的类型对象时，Boost.Python 为 HeldType 调用 get_pointer()。下面的object_holder 类提供了将句柄返回到它拥有的实例或元素代理的能力：

/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder

public:

  typedef T element_type;

  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  

  element_type* get() const
  
    if (!object_.is_none())
    
      return boost::python::extract<element_type*>(object_)();
    
    return ptr_ ? ptr_.get() : NULL;
  

  void reset(boost::python::object object)
  
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;

    object_ = object;
    ptr_.reset();
  

private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
;

/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)

  return holder.get();

有了间接支持，剩下的唯一事情就是修补集合以设置object_holder。支持这一点的一种干净且可重用的方法是使用def_visitor。这是一个通用接口，允许以非侵入方式扩展class_ 对象。例如，vector_indexing_suite 使用此功能。

monkey 下的custom_vector_indexing_suite 类修补append() 方法以委托给原始方法，然后使用新设置元素的代理调用object_holder.reset()。这会导致 object_holder 引用集合中包含的元素。

/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>

private:

  friend class boost::python::def_visitor_access;

  template <typename ClassT>
  void visit(ClassT& cls) const
  
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());

    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  

  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        ,
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  
;

包装需要在运行时进行，自定义函子对象不能通过def()直接在类上定义，所以必须使用make_function()函数。对于函子，它需要CallPolicies 和代表签名的MPL front-extensible sequence。

这是一个完整的示例，demonstrates 使用 object_holder 委托给代理并使用 custom_vector_indexing_suite 修补集合。

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>

/// @brief Mockup type.
struct spam

  int val;

  spam(int val) : val(val) 
  bool operator==(const spam& rhs)  return val == rhs.val; 
;

/// @brief Mockup function that operations on a collection of spam instances.
void modify_spams(std::vector<spam>& spams)

  for (auto& spam : spams)
    spam.val *= 2;


/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder

public:

  typedef T element_type;

  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  

  element_type* get() const
  
    if (!object_.is_none())
    
      return boost::python::extract<element_type*>(object_)();
    
    return ptr_ ? ptr_.get() : NULL;
  

  void reset(boost::python::object object)
  
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;

    object_ = object;
    ptr_.reset();
  

private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
;

/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)

  return holder.get();


/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>

private:

  friend class boost::python::def_visitor_access;

  template <typename ClassT>
  void visit(ClassT& cls) const
  
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());

    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  

  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        ,
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  

  // .. make_setitem_wrapper
  // .. make_extend_wrapper
;

BOOST_PYTHON_MODULE(example)

  namespace python = boost::python;

  // Expose spam.  Use a custom holder to allow for transparent delegation
  // to different instances.
  python::class_<spam, object_holder<spam>>("Spam", python::init<int>())
    .def_readwrite("val", &spam::val)
    ;

  // Expose a vector of spam.
  python::class_<std::vector<spam>>("SpamVector")
    .def(custom_vector_indexing_suite<
      std::vector<spam>, object_holder<spam>>())
    ;

  python::def("modify_spams", &modify_spams);

互动使用：

>>> import example
>>> spam = example.Spam(5)
>>> spams = example.SpamVector()
>>> spams.append(spam)
>>> assert(spams[0].val == 5)
>>> spam.val = 21
>>> assert(spams[0].val == 21)
>>> example.modify_spams(spams)
>>> assert(spam.val == 42)
>>> spams.append(spam)
>>> spam.val = 100
>>> assert(spams[1].val == 100)
>>> assert(spams[0].val == 42) # The container does not provide indirection.

由于vector_indexing_suite 仍在使用中，因此只能使用 Python 对象的 API 修改底层 C++ 容器。例如，在容器上调用 push_back 可能会导致底层内存重新分配，并导致现有 Boost.Python 代理出现问题。另一方面，可以安全地修改元素本身，例如通过上面的modify_spams() 函数。

【讨论】：

哇，这个答案太棒了。我希望我正在教高级 python-c++ 学生，所以我可以在课堂上使用它。感谢分享。【参考方案2】：

不幸的是，答案是否定的，你不能做你想做的事。在python中，一切都是指针，列表是指针的容器。共享指针的 C++ 向量之所以有效，是因为底层数据结构或多或少等同于 python 列表。您要求的是让分配内存的 C++ 向量像指针向量一样，这是无法做到的。

让我们看看 python 列表中发生了什么，用 C++ 等效的伪代码：

foo = Foo(0.0)     # Foo* foo = new Foo(0.0)
vect = []          # std::vector<Foo*> vect
vect.append(foo)   # vect.push_back(foo)

此时foo和vect[0]都指向同一个分配的内存，所以更改*foo会更改*vect[0]。

现在有了vector<Foo> 版本：

foo = Foo(0.0)      # Foo* foo = new Foo(0.0)
vect = FooVector()  # std::vector<Foo> vect
vect.append(foo)    # vect.push_back(*foo)

这里，vect[0] 拥有自己分配的内存，并且是 *foo 的副本。从根本上说，你不能让 vect[0] 成为与 *foo 相同的内存。

附带说明，在使用std::vector<Foo> 时要小心footwo 的生命周期管理：

footwo = vect[0]    # Foo* footwo = &vect[0]

随后的追加可能需要移动为向量分配的存储空间，并且可能使footwo 无效（&vect[0] 可能会更改）。

【讨论】：

非常感谢您的回答，但我不相信这是真的。我确实理解指针（尤其是共享指针）反映了 Python 密切处理数据的方式（共享指针甚至包括引用计数）。然而， boost::python 允许使用“代理”机制以“假共享”的方式公开非共享值。仅当元素已存在于向量中时才执行此操作，而不是在将其插入向量时执行此操作。我很想知道在插入时是否有任何方法可以实现相同的效果（这可能涉及将“源”更改为代理）。仅当NoProxy 设置为true 或在不使用Python 对象的API 的情况下修改了底层C++ 容器时才适用。使用代理时，Python 容器会检查失效情况并根据需要将代理重新关联到 C++ 对象。它甚至通过在删除之前复制 C++ 对象并将副本作为 detached 对象来处理代理仍然引用的已删除索引的情况。请参阅this 测试用例。

以上是关于使用 boost::python vector_indexing_suite 包装 std::vector的主要内容，如果未能解决你的问题，请参考以下文章

使用 boost::python vector_indexing_suite 包装 std::vector

测试设置

shared_ptr的方式

vector&lt;Foo&gt;的方式

我希望我不要太多

`shared_ptr`的方式

`vector<Foo>`的方式