protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍

Posted yuanjiangw

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍相关的知识,希望对你有一定的参考价值。

 

场景:

浏览器请求--->python数据生成--->python-生成excel--->浏览器下载excel

目标:

重构为

浏览器请求--->python数据生成--->golang-生成excel--->浏览器下载excel

二阶目标:

后端全部golang实现

 

https://developers.google.com/protocol-buffers/

https://developers.google.com/protocol-buffers/docs/pythontutorial

Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/gotutorial

 

一阶子探索:

1、python  --- protocol-buffers --- golang 

Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/gotutorial

Protocol Buffer Basics: Python  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/pythontutorial

序列化和获取结构数据的方式

How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:

  • Use gobs to serialize Go data structures. This is a good solution in a Go-specific environment, but it doesn‘t work well if you need to share data with applications written for other platforms.  
  • You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
  • Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

 

1/3、gobs -- golang pickling -- python   

https://golang.org/pkg/encoding/gob/

Package gob manages streams of gobs - binary values exchanged between an Encoder (transmitter) and a Decoder (receiver). A typical use is transporting arguments and results of remote procedure calls (RPCs) such as those provided by package "net/rpc".

The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

跨语言性差,局限在golang

Use Python pickling. This is the default approach since it‘s built into the language, but it doesn‘t deal well with schema evolution, and also doesn‘t work very well if you need to share data with applications written in C++ or Java. 

11.1. pickle — Python object serialization — Python 2.7.16 documentation
https://docs.python.org/2/library/pickle.html

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

This documentation describes both the pickle module and the cPickle module.

11.1. pickle — Python object serialization — Python 2.7.16 documentation
https://docs.python.org/2/library/pickle.html#module-cPickle

The cPickle module supports serialization and de-serialization of Python objects, providing an interface and functionality nearly identical to the pickle module. There are several differences, the most important being performance and subclassability.

First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.

The pickle data stream produced by pickle and cPickle are identical, so it is possible to use pickle and cPickle interchangeably with existing pickles. [10]

There are additional minor differences in API between cPickle and pickle, however for most applications, they are interchangeable. More documentation is provided in the pickle module documentation, which includes a list of the documented differences.

 

2/3、string

普通字符串的缺点是只能描述简单的数据结构

3/3、XML

跨语言性好,但是资源消耗高,性能差

 

以上是关于protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍的主要内容,如果未能解决你的问题,请参考以下文章

protobuf 中有单字节类型吗?

Golang里面使用protobuf(proto3)

SuperSocket与SuperSocket.ClientEngine实现Protobuf协议

golang Protobuf学习

protobuf 使用相关

protobuf语法指南