如何在 C++ 中实现序列化

Posted 2023-02-22

技术标签:

【中文标题】如何在 C++ 中实现序列化【英文标题】：How to implement serialization in C++ 【发布时间】：2010-12-21 01:08:45 【问题描述】：

每当我发现自己需要在 C++ 程序中序列化对象时，我都会使用这种模式：

class Serializable 
  public:
    static Serializable *deserialize(istream &is) 
        int id;
        is >> id;
        switch(id) 
          case EXAMPLE_ID:
            return new ExampleClass(is);
          //...
        
    

    void serialize(ostream &os) 
        os << getClassID();
        serializeMe(os);
    

  protected:
    int getClassID()=0;
    void serializeMe(ostream &os)=0;
;

上述方法在实践中效果很好。但是，我听说这种类 ID 的切换是邪恶的并且是反模式；在 C++ 中处理序列化的标准 OO 方式是什么？

【问题讨论】：

@SergeyK：最近有哪些变化？我当然没有听说过。我的意思是这个答案：***.com/a/10332336/1065190 @SergeyK.：啊！我认为答案本身的 cmets 部分可能是讨论它的最佳场所。事实上，我已经开始了。这对我来说似乎很深奥，尤其是将序列化与自动 getter 和 setter 合并的想法（混合不同的概念通常很糟糕）。还让我想起了某个 QT 项目……最后，您拥有准 C++ 并且失去了可移植性，因为您依赖于应该将其转变为适当的、可编译的 C++ 的工具的可用性。我没有屏住呼吸。最好把源码和源代码放在一起，然后在目标平台上重新编译。 Qt 确实有一个称为元对象编译器的工具，可以为您的 C++ 项目生成元信息。 How to serialize in c++?的可能重复 【参考方案1】：

使用像 Boost Serialization 这样的东西，虽然绝不是一个标准，但它是一个（在大多数情况下）编写良好的库，可以为你完成繁重的工作。

上次我不得不手动解析带有清晰继承树的预定义记录结构时，我最终使用了带有可注册类的factory pattern（即使用（模板）创建者函数的键映射而不是很多开关功能）来尝试避免您遇到的问题。

编辑上一段中提到的对象工厂的基本 C++ 实现。

/**
* A class for creating objects, with the type of object created based on a key
* 
* @param K the key
* @param T the super class that all created classes derive from
*/
template<typename K, typename T>
class Factory  
private: 
    typedef T *(*CreateObjectFunc)();

    /**
    * A map keys (K) to functions (CreateObjectFunc)
    * When creating a new type, we simply call the function with the required key
    */
    std::map<K, CreateObjectFunc> mObjectCreator;

    /**
    * Pointers to this function are inserted into the map and called when creating objects
    *
    * @param S the type of class to create
    * @return a object with the type of S
    */
    template<typename S> 
    static T* createObject() 
        return new S(); 
    
public:

    /**
    * Registers a class to that it can be created via createObject()
    *
    * @param S the class to register, this must ve a subclass of T
    * @param id the id to associate with the class. This ID must be unique
    */ 
    template<typename S> 
    void registerClass(K id) 
        if (mObjectCreator.find(id) != mObjectCreator.end()) 
            //your error handling here
        
        mObjectCreator.insert( std::make_pair<K,CreateObjectFunc>(id, &createObject<S> ) ); 
    

    /**
    * Returns true if a given key exists
    *
    * @param id the id to check exists
    * @return true if the id exists
    */
    bool hasClass(K id)
        return mObjectCreator.find(id) != mObjectCreator.end();
     

    /**
    * Creates an object based on an id. It will return null if the key doesn't exist
    *
    * @param id the id of the object to create
    * @return the new object or null if the object id doesn't exist
    */
    T* createObject(K id)
        //Don't use hasClass here as doing so would involve two lookups
        typename std::map<K, CreateObjectFunc>::iterator iter = mObjectCreator.find(id); 
        if (iter == mObjectCreator.end()) 
            return NULL;
        
        //calls the required createObject() function
        return ((*iter).second)();
    
;

【讨论】：

为什么createObject返回类型为@return a object with the type of S时返回T*，应该是S*？我是 CPP 的新手，我正在努力理解这一点【参考方案2】：

序列化是 C++ 中的一个敏感话题...

快速提问：

序列化：短寿命结构，一个编码器/解码器消息：更长的寿命，多种语言的编码器/解码器

这两个是有用的，并且有它们的用途。

Boost.Serialization 通常是最推荐用于序列化的库，尽管 operator& 的奇怪选择取决于 const-ness 进行序列化或反序列化，这对我来说确实是对运算符重载的滥用。

对于消息传递，我宁愿建议Google Protocol Buffer。它们提供了一种简洁的语法来描述消息，并为各种语言生成编码器和解码器。当性能很重要时，还有另一个优势：它允许通过设计进行惰性反序列化（即一次仅部分 blob）。

继续前进

现在，至于实现的细节，这真的取决于你想要什么。

您需要版本控制，即使是常规序列化，您也可能需要向后兼容以前的版本。您可能需要也可能不需要tag + factory 的系统。它仅对多态类是必需的。每个继承树 (kind) 都需要一个 factory，然后...当然可以对代码进行模板化！指针/引用会咬你一口……它们引用了内存中的一个位置，该位置在反序列化后会发生变化。我通常选择切线方法：每个kind 的每个对象都被赋予一个id，它的kind 是唯一的，所以我序列化id 而不是一个指针。只要您没有循环依赖并序列化首先指向/引用的对象，一些框架就会处理它。

就个人而言，我尽我所能将序列化/反序列化的代码与运行该类的实际代码分开。特别是，我尝试将其隔离在源文件中，以便对这部分代码的更改不会破坏二进制兼容性。

关于版本控制

我通常尝试将一个版本的序列化和反序列化保持在一起。更容易检查它们是否真正对称。我还尝试直接在我的序列化框架+其他一些东西中抽象版本控制处理，因为应该遵守 DRY :)

关于错误处理

为了简化错误检测，我通常使用一对“标记”（特殊字节）将一个对象与另一个对象分开。它允许我在反序列化期间立即抛出，因为我可以检测到流的去同步问题（即，有点吃太多字节或吃得不够）。

如果你想要允许的反序列化，即反序列化流的其余部分，即使之前失败了，你将不得不转向字节计数：每个对象前面都有它的字节计数，并且只能吃这么多字节（并预计将它们全部吃掉）。这种方法很好，因为它允许部分反序列化：即您可以保存对象所需的部分流，并仅在必要时对其进行反序列化。

标记（您的类 ID）在这里很有用，而不是（仅）用于分派，而只是检查您实际上是否在反序列化正确类型的对象。它还允许显示漂亮的错误消息。

以下是您可能希望的一些错误消息/异常：

No version X for object TYPE: only Y and Z Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB TYPE (version X) was not completely deserialized Trying to deserialize a TYPE1 in TYPE2

请注意，据我所知，Boost.Serialization 和 protobuf 确实有助于错误/版本处理。

protobuf 也有一些好处，因为它可以嵌套消息：

自然支持字节数以及版本控制您可以进行惰性反序列化（即，存储消息并仅在有人要求时反序列化）

对应的是，由于消息的固定格式，多态性更难处理。为此，您必须仔细设计它们。

【讨论】：

【参考方案3】：

不幸的是，序列化在 C++ 中永远不会完全无痛，至少在可预见的将来不会，这仅仅是因为 C++ 缺乏使其他语言轻松序列化成为可能的关键语言特性：reflection .也就是说，如果您创建一个类Foo，C++ 没有机制在运行时以编程方式检查该类以确定它包含哪些成员变量。

因此，没有办法创建通用的序列化函数。一种或另一种方式，您必须为每个类实现一个特殊的序列化函数。 Boost.Serialization 也不例外，它只是为您提供了一个方便的框架和一组很好的工具来帮助您完成此操作。

【讨论】：

C++ Middleware Writer 自动编写序列化函数。实际上，C++ 有（一些）编译时反射可通过元模板库获得。可以通过滥用预处理器指令和 Boost.Fusion 来利用它。我宁愿不走那条路：x【参考方案4】：

Yacoby 的答案可以进一步扩展。

如果实际实现反射系统，我相信序列化可以以类似于托管语言的方式实现。

多年来，我们一直在使用自动化方法。

我是 C++ 后处理器和反射库的实现者之一：LSDC 工具和 Linderdaum 引擎核心（iObject + RTTI + Linker/Loader）。在http://www.linderdaum.com查看源代码

类工厂抽象类实例化的过程。

要初始化特定成员，您可以添加一些侵入式 RTTI 并为它们自动生成加载/保存过程。

假设，您的层次结构顶部有 iObject 类。

// Base class with intrusive RTTI
class iObject

public:
    iMetaClass* FMetaClass;
;

///The iMetaClass stores the list of properties and provides the Construct() method:

// List of properties
class iMetaClass: public iObject

public:
    virtual iObject* Construct() const = 0;
    /// List of all the properties (excluding the ones from base class)
    vector<iProperty*> FProperties;
    /// Support the hierarchy
    iMetaClass* FSuperClass;
    /// Name of the class
    string FName;
;

// The NativeMetaClass<T> template implements the Construct() method.
template <class T> class NativeMetaClass: public iMetaClass

public:
    virtual iObject* Construct() const
    
        iObject* Res = new T();
        Res->FMetaClass = this;
        return Res;
    
;

// mlNode is the representation of the markup language: xml, json or whatever else.
// The hierarchy might have come from the XML file or JSON or some custom script
class mlNode 
public:
    string FName;
    string FValue;
    vector<mlNode*> FChildren;
;

class iProperty: public iObject 
public:
    /// Load the property from internal tree representation
    virtual void Load( iObject* TheObject, mlNode* Node ) const = 0;
    /// Serialize the property to some internal representation
    virtual mlNode* Save( iObject* TheObject ) const = 0;
;

/// function to save a single field
typedef mlNode* ( *SaveFunction_t )( iObject* Obj );

/// function to load a single field from mlNode
typedef void ( *LoadFunction_t )( mlNode* Node, iObject* Obj );

// The implementation for a scalar/iObject field
// The array-based property requires somewhat different implementation
// Load/Save functions are autogenerated by some tool.
class clFieldProperty : public iProperty 
public:
    clFieldProperty() 
    virtual ~clFieldProperty() 

    /// Load single field of an object
    virtual void Load( iObject* TheObject, mlNode* Node ) const 
        FLoadFunction(TheObject, Node);
    
    /// Save single field of an object
    virtual mlNode* Save( iObject* TheObject, mlNode** Result ) const 
        return FSaveFunction(TheObject);
    
public:
    // these pointers are set in property registration code
    LoadFunction_t FLoadFunction;
    SaveFunction_t FSaveFunction;
;

// The Loader class stores the list of metaclasses
class Loader: public iObject 
public:
    void RegisterMetaclass(iMetaClass* C)  FClasses[C->FName] = C; 
    iObject* CreateByName(const string& ClassName)  return FClasses[ClassName]->Construct(); 

    /// The implementation is an almost trivial iteration of all the properties
    /// in the metaclass and calling the iProperty's Load/Save methods for each field
    void LoadFromNode(mlNode* Source, iObject** Result);

    /// Create the tree-based representation of the object
    mlNode* Save(iObject* Source);

    map<string, iMetaClass*> FClasses;
;

当您定义从 iObject 派生的 ConcreteClass 时，您使用一些扩展和代码生成器工具来生成保存/加载程序列表和注册代码。

让我们看看这个示例的代码。

在框架的某个地方我们有一个空的正式定义

#define PROPERTY(...)

/// vec3 is a custom type with implementation omitted for brevity
/// ConcreteClass2 is also omitted
class ConcreteClass: public iObject 
public:
    ConcreteClass(): FInt(10), FString("Default") 

    /// Inform the tool about our properties
    PROPERTY(Name=Int, Type=int,  FieldName=FInt)
    /// We can also provide get/set accessors
    PROPERTY(Name=Int, Type=vec3, Getter=GetPos, Setter=SetPos)
    /// And the other field
    PROPERTY(Name=Str, Type=string, FieldName=FString)
    /// And the embedded object
    PROPERTY(Name=Embedded, Type=ConcreteClass2, FieldName=FEmbedded)

    /// public field
    int FInt;
    /// public field
    string FString;
    /// public embedded object
    ConcreteClass2* FEmbedded;

    /// Getter
    vec3 GetPos() const  return FPos; 
    /// Setter
    void SetPos(const vec3& Pos)  FPos = Pos; 
private:
    vec3 FPos;
;

自动生成的注册码是：

/// Call this to add everything to the linker
void Register_ConcreteClass(Linker* L) 
    iMetaClass* C = new NativeMetaClass<ConcreteClass>();
    C->FName = "ConcreteClass";

    iProperty* P;
    P = new FieldProperty();
    P->FName = "Int";
    P->FLoadFunction = &Load_ConcreteClass_FInt_Field;
    P->FSaveFunction = &Save_ConcreteClass_FInt_Field;
    C->FProperties.push_back(P);
    ... same for FString and GetPos/SetPos

    C->FSuperClass = L->FClasses["iObject"];
    L->RegisterClass(C);


// The autogenerated loaders (no error checking for brevity):
void Load_ConcreteClass_FInt_Field(iObject* Dest, mlNode* Val) 
    dynamic_cast<ConcereteClass*>Object->FInt = Str2Int(Val->FValue);


mlNode* Save_ConcreteClass_FInt_Field(iObject* Dest, mlNode* Val) 
    mlNode* Res = new mlNode();
    Res->FValue = Int2Str( dynamic_cast<ConcereteClass*>Object->FInt );
    return Res;

/// similar code for FString and GetPos/SetPos pair with obvious changes

现在，如果您有类似 JSON 的分层脚本

Object("ConcreteClass") 
    Int 50
    Str 10
    Pos 1.5 2.2 3.3
    Embedded("ConcreteClass2") 
        SomeProp Value

Linker 对象将解析 Save/Load 方法中的所有类和属性。

抱歉，这篇文章太长了，当所有错误处理都进入时，实现会变得更大。

【讨论】：

我见过更丑的……但并不常见。我真的不喜欢在修改 my 代码的编译过程中进行额外的传递。我不介意添加 extra 代码（如 protobuf 文件），但是当额外的 pass 破坏文件并最终导致错误编译时，跟踪错误是一场噩梦。嗯，我们正在解决语言本身缺乏工具的问题 - 这不能无缝完成。没有源代码垃圾 - 生成的元信息也被添加到新的源文件中。如果您没有将它们包含在您的项目中，则没有垃圾（尽管也没有工厂/序列化）。编译错误看起来很棘手（一旦你错过了 PROPERTY 声明中的某些内容），但可以像我们对模板类错误所做的那样习惯它们。我不提倡将此作为解决方案。它的速度比不上二进制序列化，所以它只适用于小配置。我曾经经常看到这种风格的解决方案……早在 C++ 的早期，尤其是预 RTTI 和预模板，当然还有预元编程。可以肯定的是，我曾多次写过类似的东西。最终，经过几年的经验，我得出了个人结论，即这试图采取一种适用于具有内置反射功能的动态语言的解决方案，并将其转换为不支持的语言。你实际上创建的是一个迷你动态类型系统......那是不是 C++。哦，关于“预元编程”的事情。假设我们已经发明了一组模板来包装我们的 getter/setter 和所有保存加载的混乱。然后，再一次，您必须注册所有这些（在源代码中添加一些“标记”，但这次它只是在本机 C++ 中）。使用生成的代码轻而易举。关于学习曲线。是的，随着大量新时代开发人员习惯于垃圾收集和托管环境，C++ 反射器的实现是不必要的。真的，所有这一切都只是一个久经考验的遗产。【参考方案5】：

也许我并不聪明，但我认为最终会编写出与您编写的相同类型的代码，这仅仅是因为 C++ 没有运行时机制来做任何不同的事情。问题是它是由开发人员定制编写，通过模板元编程生成（我怀疑 boost.serialization 就是这样做的），还是通过 IDL 编译器/代码生成器等外部工具生成。

这三种机制中的哪一种（也许还有其他可能性）应该在每个项目的基础上进行评估。

【讨论】：

正是我想说的！【参考方案6】：

我想最接近标准方式的应该是Boost.Serialization。我想听听你是在什么情况下听到关于类 ID 的事情的。在序列化的情况下，我真的想不出其他办法（当然，除非你知道反序列化时期望的类型）。还有，One size does not fit all。

【讨论】：

以上是关于如何在 C++ 中实现序列化的主要内容，如果未能解决你的问题，请参考以下文章