Protobuf-net 反序列化开放街道地图

Posted 2023-02-16

技术标签:

【中文标题】Protobuf-net 反序列化开放街道地图【英文标题】：Protobuf-net Deserialize Open Street Maps 【发布时间】：2011-01-11 22:20:42 【问题描述】：

在我的一生中，我无法反序列化来自 Open Street Maps 的 protobuf 文件。

我正在尝试反序列化以下提取：http://download.geofabrik.de/osm/north-america/us-northeast.osm.pbf 以获取节点，并且我使用 http://code.google.com/p/protobuf-net/ 作为库。我试图反序列化一堆不同的对象，但它们都为空。

原始文件可以在这里找到：http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/protobuf

有什么建议吗？

【问题讨论】：

我是protobuf-net的作者；我现在正在“工作”时间，但我会在今天晚些时候尝试看看这个，看看有什么问题我知道你是谁 Marc，我下载了你的软件。我喜欢括号里的作品哈哈。感谢您的帮助（和框架）！ 【参考方案1】：

对；问题在于这不仅仅是 protobuf - 它是一种混合文件格式（defined here 在内部各种格式中包含 protobuf。它还包含压缩（尽管看起来是可选的）。

我已经从规范中提取了我所能做的，并且我这里有一个 C# 阅读器，它使用 protobuf-net 来处理这些块 - 它很高兴地读完那个文件 - 我可以告诉你有4515 个区块 (BlockHeader)。当它到达Blob 时，我对规范如何区分OSMHeader 和OSMData 有点困惑——我愿意在这里接受建议！我还使用ZLIB.NET 来处理正在使用的zlib 压缩。由于没有解决这个问题，我决定处理 ZLIB 数据并根据声称的大小验证它，以检查它至少是理智的。

如果你能弄清楚（或询问作者）他们是如何区分 OSMHeader 和 OSMData 的，我会很乐意加入其他内容。我希望你不介意我已经停在这里 - 但它已经几个小时了；p

using System;
using System.IO;
using OpenStreetMap; // where my .proto-generated entities are living
using ProtoBuf; // protobuf-net
using zlib; // ZLIB.NET    

class OpenStreetMapParser


    static void Main()
    
        using (var file = File.OpenRead("us-northeast.osm.pbf"))
        
            // from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
            //A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
            //The format is a repeating sequence of:
            //int4: length of the BlockHeader message in network byte order
            //serialized BlockHeader message
            //serialized Blob message (size is given in the header)

            int length, blockCount = 0;
            while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
            
                // I'm just being lazy and re-using something "close enough" here
                // note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
                // actually need the other way around (network byte order):
                uint len = (uint)length;
                len = ((len & 0xFF) << 24) | ((len & 0xFF00) << 8) | ((len & 0xFF0000) >> 8) | ((len & 0xFF000000) >> 24);
                length = (int)len;

                BlockHeader header;
                // again, v2 has capped-streams built in, but I'm deliberately
                // limiting myself to v1 features
                using (var tmp = new LimitedStream(file, length))
                
                    header = Serializer.Deserialize<BlockHeader>(tmp);
                
                Blob blob;
                using (var tmp = new LimitedStream(file, header.datasize))
                
                    blob = Serializer.Deserialize<Blob>(tmp);
                
                if(blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");

                using(var ms = new MemoryStream(blob.zlib_data))
                using(var zlib = new ZLibStream(ms))
                 // at this point I'm very unclear how the OSMHeader and OSMData are packed - it isn't clear
                    // read this to the end, to check we can parse the zlib
                    int payloadLen = 0;
                    while (zlib.ReadByte() >= 0) payloadLen++;
                    if (payloadLen != blob.raw_size) throw new FormatException("Screwed that up...");
                
                blockCount++;
                Console.WriteLine("Read block " + blockCount.ToString());


            
            Console.WriteLine("all done");
            Console.ReadLine();
        
    

abstract class InputStream : Stream

    protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
    public sealed override int Read(byte[] buffer, int offset, int count)
    
        int bytesRead, totalRead = 0;
        while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
        
            count -= bytesRead;
            offset += bytesRead;
            totalRead += bytesRead;
            pos += bytesRead;
        
        return totalRead;
    
    long pos;
    public override void Write(byte[] buffer, int offset, int count)
    
        throw new NotImplementedException();
    
    public override void SetLength(long value)
    
        throw new NotImplementedException();
    
    public override long Position
    
        get
        
            return pos;
        
        set
        
            if (pos != value) throw new NotImplementedException();
        
    
    public override long Length
    
        get  throw new NotImplementedException(); 
    
    public override void Flush()
    
        throw new NotImplementedException();
    
    public override bool CanWrite
    
        get  return false; 
    
    public override bool CanRead
    
        get  return true; 
    
    public override bool CanSeek
    
        get  return false; 
    
    public override long Seek(long offset, SeekOrigin origin)
    
        throw new NotImplementedException();
    

class ZLibStream : InputStream
   // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
    private ZInputStream reader; // seriously, why isn't this a stream?
    public ZLibStream(Stream stream)
    
        reader = new ZInputStream(stream);
    
    public override void Close()
    
        reader.Close();
        base.Close();
    
    protected override int ReadNextBlock(byte[] buffer, int offset, int count)
    
        // OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
        return reader.read(buffer, offset, count);
    


// deliberately doesn't dispose the base-stream    
class LimitedStream : InputStream

    private Stream stream;
    private long remaining;
    public LimitedStream(Stream stream, long length)
    
        if (length < 0) throw new ArgumentOutOfRangeException("length");
        if (stream == null) throw new ArgumentNullException("stream");
        if (!stream.CanRead) throw new ArgumentException("stream");
        this.stream = stream;
        this.remaining = length;
    
    protected override int ReadNextBlock(byte[] buffer, int offset, int count)
    
        if(count > remaining) count = (int)remaining;
        int bytesRead = stream.Read(buffer, offset, count);
        if (bytesRead > 0) remaining -= bytesRead;
        return bytesRead;

【讨论】：

这真是太棒了。感谢您抢先一步，我将看看我能得到什么！（你就是男人）。我将尝试从github.com/scrosby/OSM-binary/tree/master/src.java/crosby/…向后工作我不明白评论 我的 .proto 生成的实体在哪里，也不知道你从哪里得到OpenStreetMap。 @RenéNyffenegger 看起来 .proto 的外部链接已在此期间被删除；据推测，OpenStreetMap 是在 .proto 文件中声明的命名空间，或者为了方便我在命令行中覆盖的命名空间 @MarcGravell 你有没有可能看看这个 Marc：***.com/questions/59599088 这是关于 OSMSharp 的，作者 Ben Abelshausen 在下面使用 JonPerl 的答案。【参考方案2】：

在马克的大纲设置之后，我通过查看http://git.openstreetmap.nl/index.cgi/pbf2osm.git/tree/src/main.c?h=35116112eb0066c7729a963b292faa608ddc8ad7了解了最后一部分

这是最终代码。

using System;
using System.Diagnostics;
using System.IO;
using crosby.binary;
using OSMPBF;
using PerlLLC.Tools;
using ProtoBuf;
using zlib;

namespace OpenStreetMapOperations

    class OpenStreetMapParser
    
        static void Main()
        
            using (var file = File.OpenRead(StaticTools.AssemblyDirectory + @"\us-pacific.osm.pbf"))
            
                // from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
                //A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
                //The format is a repeating sequence of:
                //int4: length of the BlockHeader message in network byte order
                //serialized BlockHeader message
                //serialized Blob message (size is given in the header)

                int length, blockCount = 0;
                while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
                
                    // I'm just being lazy and re-using something "close enough" here
                    // note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
                    // actually need the other way around (network byte order):
                    length = IntLittleEndianToBigEndian((uint)length);

                    BlockHeader header;
                    // again, v2 has capped-streams built in, but I'm deliberately
                    // limiting myself to v1 features
                    using (var tmp = new LimitedStream(file, length))
                    
                        header = Serializer.Deserialize<BlockHeader>(tmp);
                    
                    Blob blob;
                    using (var tmp = new LimitedStream(file, header.datasize))
                    
                        blob = Serializer.Deserialize<Blob>(tmp);
                    
                    if (blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");

                    HeaderBlock headerBlock;
                    PrimitiveBlock primitiveBlock;

                    using (var ms = new MemoryStream(blob.zlib_data))
                    using (var zlib = new ZLibStream(ms))
                    
                        if (header.type == "OSMHeader")
                            headerBlock = Serializer.Deserialize<HeaderBlock>(zlib);

                        if (header.type == "OSMData")
                            primitiveBlock = Serializer.Deserialize<PrimitiveBlock>(zlib);
                    
                    blockCount++;
                    Trace.WriteLine("Read block " + blockCount.ToString());


                
                Trace.WriteLine("all done");
            
        

        // 4-byte number
        static int IntLittleEndianToBigEndian(uint i)
        
            return (int)(((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8) + ((i >> 24) & 0xff));
        
    

    abstract class InputStream : Stream
    
        protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
        public sealed override int Read(byte[] buffer, int offset, int count)
        
            int bytesRead, totalRead = 0;
            while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
            
                count -= bytesRead;
                offset += bytesRead;
                totalRead += bytesRead;
                pos += bytesRead;
            
            return totalRead;
        
        long pos;
        public override void Write(byte[] buffer, int offset, int count)
        
            throw new NotImplementedException();
        
        public override void SetLength(long value)
        
            throw new NotImplementedException();
        
        public override long Position
        
            get
            
                return pos;
            
            set
            
                if (pos != value) throw new NotImplementedException();
            
        
        public override long Length
        
            get  throw new NotImplementedException(); 
        
        public override void Flush()
        
            throw new NotImplementedException();
        
        public override bool CanWrite
        
            get  return false; 
        
        public override bool CanRead
        
            get  return true; 
        
        public override bool CanSeek
        
            get  return false; 
        
        public override long Seek(long offset, SeekOrigin origin)
        
            throw new NotImplementedException();
        
    
    class ZLibStream : InputStream
       // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
        private ZInputStream reader; // seriously, why isn't this a stream?
        public ZLibStream(Stream stream)
        
            reader = new ZInputStream(stream);
        
        public override void Close()
        
            reader.Close();
            base.Close();
        
        protected override int ReadNextBlock(byte[] buffer, int offset, int count)
        
            // OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
            return reader.read(buffer, offset, count);
        

    
    // deliberately doesn't dispose the base-stream    
    class LimitedStream : InputStream
    
        private Stream stream;
        private long remaining;
        public LimitedStream(Stream stream, long length)
        
            if (length < 0) throw new ArgumentOutOfRangeException("length");
            if (stream == null) throw new ArgumentNullException("stream");
            if (!stream.CanRead) throw new ArgumentException("stream");
            this.stream = stream;
            this.remaining = length;
        
        protected override int ReadNextBlock(byte[] buffer, int offset, int count)
        
            if (count > remaining) count = (int)remaining;
            int bytesRead = stream.Read(buffer, offset, count);
            if (bytesRead > 0) remaining -= bytesRead;
            return bytesRead;

【讨论】：

在反序列化过程中读取节点是否有问题？这段代码为我运行没有错误，但是在primitiveBlock中查找数据时我什么也没得到。抱歉，我从来没有收到通知。你想通了吗？我记得能够访问数据。虽然我们不再使用此代码。在查看了另一个项目后，我终于让代码工作了，但在遇到更多开放街道地图问题后，我们决定采用另一种解决方案。 @jonperl 你能看看这个吗：***.com/questions/59599088/…【参考方案3】：

是的，它来自 Fileformat.cs 中的 protogen（基于 OSM Fileformat.proto 文件.. 下面的代码。）

package OSM_PROTO;
  message Blob 
    optional bytes raw = 1;
    optional int32 raw_size = 2; 
    optional bytes zlib_data = 3;
    optional bytes lzma_data = 4;
    optional bytes bzip2_data = 5;
  

  message BlockHeader 
    required string type = 1;
    optional bytes indexdata = 2;
    required int32 datasize = 3;

这是 BlockHeader 在生成文件中的声明：

public sealed partial class BlockHeader : pb::GeneratedMessage<BlockHeader, BlockHeader.Builder> ...

-> 使用 pb = global::Google.ProtocolBuffers;

(ProtocolBuffers.dll) 附带这个包：

http://code.google.com/p/protobuf-csharp-port/downloads/detail?name=protobuf-csharp-port-2.4.1.473-full-binaries.zip&can=2&q=

【讨论】：

【参考方案4】：

您是否尝试过缩小面积？比如 us-pacific.osm.pbf

最终发布错误消息会很有用。

【讨论】：

仍然出现空值。我试过 var f = Serializer.Deserialize(file);

以上是关于Protobuf-net 反序列化开放街道地图的主要内容，如果未能解决你的问题，请参考以下文章

protobuf-net 反序列化错误无效标签：0

protobuf-net：不正确的线型反序列化 TimeSpan

使用 protobuf-net 反序列化不同的列表

Protobuf-net / NetCore2：反序列化忽略带注释的私有字段

使用 protobuf-net 反序列化字典

为啥我不能使用 ProtoBuf-Net 正确反序列化我的对象？