protocol-buffer 消息的格式

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了protocol-buffer 消息的格式相关的知识,希望对你有一定的参考价值。

参考技术A

本文用来介绍Google的protocol-buffer 消息的格式以及使用事项,不会涉及相关api的使用

消息由至少一个字段组合而成,类似于C语言中的结构。每个字段都有一定的格式
字段格式:限定修饰符① | 数据类型② | 字段名称③ | = | 字段编码值④ | [字段默认值⑤]

限定修饰符包含 required optional repeated

protocol-buffer 基本数据类型

补充说明
N 表示打包的字节并不是固定。而是根据数据的大小或者长度。例如int32,如果数值比较小,在0~127时,使用一个字节打包。关于枚举的打包方式和uint32相同。关于 fixed32 和int32的区别。fixed32的打包效率比int32的效率高,但是使用的空间一般比int32多。因此一个属于时间效率高,一个属于空间效率高。根据项目的实际情况,一般选择fixed32,如果遇到对传输数据量要求比较苛刻的环境,可以选择int32.

有关enum message 特说说明

在定义message类型的时候,也许会有这样一种需求:其中的一个字段仅需要包含预定义的若干个值即可。比如,对于每一个搜索请求,现需要增加一个分类字段,分类包含:UNIVERSAL, WEB, IMAGES, LOCAL, NEWS, PRODUCTS or VIDEO。要实现该功能,仅需要增加一个枚举类型字段。如下:

可以定义枚举在一个message内部,也可以定义在message的外部,这样的枚举可以被其他任何.proto文件内的message复用。

使用其他Message类型作为filed类型
PB允许使用message类型作为filed类型。例如,在搜索相应message中,包含一个结果message。此时,只需要定义一个结果message,然后再.proto文件中,在搜索结果message中新增一个字段,该字段的类型设置为结果message即可。

在上例中,Result message类型与SearchResponse 定义在同一个文件中,假如有这么一种情况,这里所要使用的Resultmessage已经在其他的.proto文件中定义了呢?
可以通过导入其他.proto文件来使用其内的定义。为达此目的,需要在现.proto文件前增加一条import语句:

嵌套类型:
Message类型可以嵌套,类似于c++中的嵌套类,可以无限深层次嵌套。

protobuf建议字段的命名采用以下划线分割的驼峰式。例如 first_name 而不是firstName.

有了该值,通信双方才能互相识别对方的字段。当然相同的编码值,其限定修饰符和数据类型必须相同。
编码值的取值范围为 1~2^32(4294967296)。
其中 1~15的编码时间和空间效率都是最高的,编码值越大,其编码的时间和空间效率就越低(相对于1-15),当然一般情况下相邻的2个值编码效率的是相同的,除非2个值恰好实在4字节,12字节,20字节等的临界区。比如15和16.
1900~2000编码值为Google protobuf 系统内部保留值,建议不要在自己的项目中使用。
protobuf 还建议把经常要传递的值把其字段编码设置为1-15之间的值。
消息中的字段的编码值无需连续,只要是合法的,并且不能在同一个消息中有字段包含相同的编码值。
建议:项目投入运营以后涉及到版本升级时的新增消息字段全部使用optional或者repeated,尽量不实用required。如果使用了required,需要全网统一升级,如果使用optional或者repeated可以平滑升级。

protocol-buffer 允许设置可选字段(optional)。顾名思义,在一条message中,该字段可设值也可不设。假如没有设置,那么在解析该字段的时候,会根据该字段类型,给其赋一个类型默认值。除此之外,也可以在定义message格式的时候,就为optional字段设置一个默认值,如下:

假如没有赋值的话,会被赋上默认值。对于简单类型,默认值可以自己设定,例如上例的PhoneNumber中的PhoneType字段。如果没有自行设定,会被赋上一个系统默认值,数字类型会被赋为0,String类型会被赋为空字符串,bool类型会被赋为false。对于枚举类型,默认值是枚举列表中第一个值

本文将网上的一些资料进行整理,汇成此文,记录下自己学习的历程
主要的参考资料:
http://blog.sina.com.cn/s/blog_abea023b0101dxce.html

protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍

 

场景:

浏览器请求--->python数据生成--->python-生成excel--->浏览器下载excel

目标:

重构为

浏览器请求--->python数据生成--->golang-生成excel--->浏览器下载excel

二阶目标:

后端全部golang实现

 

https://developers.google.com/protocol-buffers/

https://developers.google.com/protocol-buffers/docs/pythontutorial

Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/gotutorial

 

一阶子探索:

1、python  --- protocol-buffers --- golang 

Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/gotutorial

Protocol Buffer Basics: Python  |  Protocol Buffers  |  Google Developers
https://developers.google.com/protocol-buffers/docs/pythontutorial

序列化和获取结构数据的方式

How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:

  • Use gobs to serialize Go data structures. This is a good solution in a Go-specific environment, but it doesn‘t work well if you need to share data with applications written for other platforms.  
  • You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
  • Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

 

1/3、gobs -- golang pickling -- python   

https://golang.org/pkg/encoding/gob/

Package gob manages streams of gobs - binary values exchanged between an Encoder (transmitter) and a Decoder (receiver). A typical use is transporting arguments and results of remote procedure calls (RPCs) such as those provided by package "net/rpc".

The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

跨语言性差,局限在golang

Use Python pickling. This is the default approach since it‘s built into the language, but it doesn‘t deal well with schema evolution, and also doesn‘t work very well if you need to share data with applications written in C++ or Java. 

11.1. pickle — Python object serialization — Python 2.7.16 documentation
https://docs.python.org/2/library/pickle.html

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

This documentation describes both the pickle module and the cPickle module.

11.1. pickle — Python object serialization — Python 2.7.16 documentation
https://docs.python.org/2/library/pickle.html#module-cPickle

The cPickle module supports serialization and de-serialization of Python objects, providing an interface and functionality nearly identical to the pickle module. There are several differences, the most important being performance and subclassability.

First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.

The pickle data stream produced by pickle and cPickle are identical, so it is possible to use pickle and cPickle interchangeably with existing pickles. [10]

There are additional minor differences in API between cPickle and pickle, however for most applications, they are interchangeable. More documentation is provided in the pickle module documentation, which includes a list of the documented differences.

 

2/3、string

普通字符串的缺点是只能描述简单的数据结构

3/3、XML

跨语言性好,但是资源消耗高,性能差

 

以上是关于protocol-buffer 消息的格式的主要内容,如果未能解决你的问题,请参考以下文章

go微服务Protocol Buffers V3中文语法指南

go微服务Protocol Buffers V3中文语法指南

go微服务Protocol Buffers V3中文语法指南

SuperSocket与SuperSocket.ClientEngine实现Protobuf协议

protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍

golang Protobuf学习