Protobuf-net 反序列化开放街道地图
Posted
技术标签:
【中文标题】Protobuf-net 反序列化开放街道地图【英文标题】:Protobuf-net Deserialize Open Street Maps 【发布时间】:2011-01-11 22:20:42 【问题描述】:在我的一生中,我无法反序列化来自 Open Street Maps 的 protobuf 文件。
我正在尝试反序列化以下提取:http://download.geofabrik.de/osm/north-america/us-northeast.osm.pbf 以获取节点,并且我使用 http://code.google.com/p/protobuf-net/ 作为库。我试图反序列化一堆不同的对象,但它们都为空。
原始文件可以在这里找到:http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/protobuf
有什么建议吗?
【问题讨论】:
我是protobuf-net的作者;我现在正在“工作”时间,但我会在今天晚些时候尝试看看这个,看看有什么问题 我知道你是谁 Marc,我下载了你的软件。我喜欢括号里的作品哈哈。感谢您的帮助(和框架)! 【参考方案1】:对;问题在于这不仅仅是 protobuf - 它是一种混合文件格式(defined here 在内部各种格式中包含 protobuf。它还包含压缩(尽管看起来是可选的)。
我已经从规范中提取了我所能做的,并且我这里有一个 C# 阅读器,它使用 protobuf-net 来处理这些块 - 它很高兴地读完那个文件 - 我可以告诉你有4515 个区块 (BlockHeader
)。当它到达Blob
时,我对规范如何区分OSMHeader
和OSMData
有点困惑——我愿意在这里接受建议!我还使用ZLIB.NET 来处理正在使用的zlib 压缩。由于没有解决这个问题,我决定处理 ZLIB 数据并根据声称的大小验证它,以检查它至少是理智的。
如果你能弄清楚(或询问作者)他们是如何区分 OSMHeader
和 OSMData
的,我会很乐意加入其他内容。我希望你不介意我已经停在这里 - 但它已经几个小时了;p
using System;
using System.IO;
using OpenStreetMap; // where my .proto-generated entities are living
using ProtoBuf; // protobuf-net
using zlib; // ZLIB.NET
class OpenStreetMapParser
static void Main()
using (var file = File.OpenRead("us-northeast.osm.pbf"))
// from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
//A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
//The format is a repeating sequence of:
//int4: length of the BlockHeader message in network byte order
//serialized BlockHeader message
//serialized Blob message (size is given in the header)
int length, blockCount = 0;
while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
// I'm just being lazy and re-using something "close enough" here
// note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
// actually need the other way around (network byte order):
uint len = (uint)length;
len = ((len & 0xFF) << 24) | ((len & 0xFF00) << 8) | ((len & 0xFF0000) >> 8) | ((len & 0xFF000000) >> 24);
length = (int)len;
BlockHeader header;
// again, v2 has capped-streams built in, but I'm deliberately
// limiting myself to v1 features
using (var tmp = new LimitedStream(file, length))
header = Serializer.Deserialize<BlockHeader>(tmp);
Blob blob;
using (var tmp = new LimitedStream(file, header.datasize))
blob = Serializer.Deserialize<Blob>(tmp);
if(blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");
using(var ms = new MemoryStream(blob.zlib_data))
using(var zlib = new ZLibStream(ms))
// at this point I'm very unclear how the OSMHeader and OSMData are packed - it isn't clear
// read this to the end, to check we can parse the zlib
int payloadLen = 0;
while (zlib.ReadByte() >= 0) payloadLen++;
if (payloadLen != blob.raw_size) throw new FormatException("Screwed that up...");
blockCount++;
Console.WriteLine("Read block " + blockCount.ToString());
Console.WriteLine("all done");
Console.ReadLine();
abstract class InputStream : Stream
protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
public sealed override int Read(byte[] buffer, int offset, int count)
int bytesRead, totalRead = 0;
while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
count -= bytesRead;
offset += bytesRead;
totalRead += bytesRead;
pos += bytesRead;
return totalRead;
long pos;
public override void Write(byte[] buffer, int offset, int count)
throw new NotImplementedException();
public override void SetLength(long value)
throw new NotImplementedException();
public override long Position
get
return pos;
set
if (pos != value) throw new NotImplementedException();
public override long Length
get throw new NotImplementedException();
public override void Flush()
throw new NotImplementedException();
public override bool CanWrite
get return false;
public override bool CanRead
get return true;
public override bool CanSeek
get return false;
public override long Seek(long offset, SeekOrigin origin)
throw new NotImplementedException();
class ZLibStream : InputStream
// uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
private ZInputStream reader; // seriously, why isn't this a stream?
public ZLibStream(Stream stream)
reader = new ZInputStream(stream);
public override void Close()
reader.Close();
base.Close();
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
// OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
return reader.read(buffer, offset, count);
// deliberately doesn't dispose the base-stream
class LimitedStream : InputStream
private Stream stream;
private long remaining;
public LimitedStream(Stream stream, long length)
if (length < 0) throw new ArgumentOutOfRangeException("length");
if (stream == null) throw new ArgumentNullException("stream");
if (!stream.CanRead) throw new ArgumentException("stream");
this.stream = stream;
this.remaining = length;
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
if(count > remaining) count = (int)remaining;
int bytesRead = stream.Read(buffer, offset, count);
if (bytesRead > 0) remaining -= bytesRead;
return bytesRead;
【讨论】:
这真是太棒了。感谢您抢先一步,我将看看我能得到什么! (你就是男人)。 我将尝试从github.com/scrosby/OSM-binary/tree/master/src.java/crosby/…向后工作 我不明白评论 我的 .proto 生成的实体在哪里,也不知道你从哪里得到OpenStreetMap
。
@RenéNyffenegger 看起来 .proto 的外部链接已在此期间被删除;据推测,OpenStreetMap 是在 .proto 文件中声明的命名空间,或者为了方便我在命令行中覆盖的命名空间
@MarcGravell 你有没有可能看看这个 Marc:***.com/questions/59599088 这是关于 OSMSharp 的,作者 Ben Abelshausen 在下面使用 JonPerl 的答案。【参考方案2】:
在马克的大纲设置之后,我通过查看http://git.openstreetmap.nl/index.cgi/pbf2osm.git/tree/src/main.c?h=35116112eb0066c7729a963b292faa608ddc8ad7了解了最后一部分
这是最终代码。
using System;
using System.Diagnostics;
using System.IO;
using crosby.binary;
using OSMPBF;
using PerlLLC.Tools;
using ProtoBuf;
using zlib;
namespace OpenStreetMapOperations
class OpenStreetMapParser
static void Main()
using (var file = File.OpenRead(StaticTools.AssemblyDirectory + @"\us-pacific.osm.pbf"))
// from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
//A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
//The format is a repeating sequence of:
//int4: length of the BlockHeader message in network byte order
//serialized BlockHeader message
//serialized Blob message (size is given in the header)
int length, blockCount = 0;
while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
// I'm just being lazy and re-using something "close enough" here
// note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
// actually need the other way around (network byte order):
length = IntLittleEndianToBigEndian((uint)length);
BlockHeader header;
// again, v2 has capped-streams built in, but I'm deliberately
// limiting myself to v1 features
using (var tmp = new LimitedStream(file, length))
header = Serializer.Deserialize<BlockHeader>(tmp);
Blob blob;
using (var tmp = new LimitedStream(file, header.datasize))
blob = Serializer.Deserialize<Blob>(tmp);
if (blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");
HeaderBlock headerBlock;
PrimitiveBlock primitiveBlock;
using (var ms = new MemoryStream(blob.zlib_data))
using (var zlib = new ZLibStream(ms))
if (header.type == "OSMHeader")
headerBlock = Serializer.Deserialize<HeaderBlock>(zlib);
if (header.type == "OSMData")
primitiveBlock = Serializer.Deserialize<PrimitiveBlock>(zlib);
blockCount++;
Trace.WriteLine("Read block " + blockCount.ToString());
Trace.WriteLine("all done");
// 4-byte number
static int IntLittleEndianToBigEndian(uint i)
return (int)(((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8) + ((i >> 24) & 0xff));
abstract class InputStream : Stream
protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
public sealed override int Read(byte[] buffer, int offset, int count)
int bytesRead, totalRead = 0;
while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
count -= bytesRead;
offset += bytesRead;
totalRead += bytesRead;
pos += bytesRead;
return totalRead;
long pos;
public override void Write(byte[] buffer, int offset, int count)
throw new NotImplementedException();
public override void SetLength(long value)
throw new NotImplementedException();
public override long Position
get
return pos;
set
if (pos != value) throw new NotImplementedException();
public override long Length
get throw new NotImplementedException();
public override void Flush()
throw new NotImplementedException();
public override bool CanWrite
get return false;
public override bool CanRead
get return true;
public override bool CanSeek
get return false;
public override long Seek(long offset, SeekOrigin origin)
throw new NotImplementedException();
class ZLibStream : InputStream
// uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
private ZInputStream reader; // seriously, why isn't this a stream?
public ZLibStream(Stream stream)
reader = new ZInputStream(stream);
public override void Close()
reader.Close();
base.Close();
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
// OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
return reader.read(buffer, offset, count);
// deliberately doesn't dispose the base-stream
class LimitedStream : InputStream
private Stream stream;
private long remaining;
public LimitedStream(Stream stream, long length)
if (length < 0) throw new ArgumentOutOfRangeException("length");
if (stream == null) throw new ArgumentNullException("stream");
if (!stream.CanRead) throw new ArgumentException("stream");
this.stream = stream;
this.remaining = length;
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
if (count > remaining) count = (int)remaining;
int bytesRead = stream.Read(buffer, offset, count);
if (bytesRead > 0) remaining -= bytesRead;
return bytesRead;
【讨论】:
在反序列化过程中读取节点是否有问题?这段代码为我运行没有错误,但是在primitiveBlock中查找数据时我什么也没得到。 抱歉,我从来没有收到通知。你想通了吗?我记得能够访问数据。虽然我们不再使用此代码。 在查看了另一个项目后,我终于让代码工作了,但在遇到更多开放街道地图问题后,我们决定采用另一种解决方案。 @jonperl 你能看看这个吗:***.com/questions/59599088/…【参考方案3】:是的,它来自 Fileformat.cs 中的 protogen(基于 OSM Fileformat.proto 文件.. 下面的代码。)
package OSM_PROTO;
message Blob
optional bytes raw = 1;
optional int32 raw_size = 2;
optional bytes zlib_data = 3;
optional bytes lzma_data = 4;
optional bytes bzip2_data = 5;
message BlockHeader
required string type = 1;
optional bytes indexdata = 2;
required int32 datasize = 3;
这是 BlockHeader 在生成文件中的声明:
public sealed partial class BlockHeader : pb::GeneratedMessage<BlockHeader, BlockHeader.Builder> ...
-> 使用 pb = global::Google.ProtocolBuffers;
(ProtocolBuffers.dll) 附带这个包:
http://code.google.com/p/protobuf-csharp-port/downloads/detail?name=protobuf-csharp-port-2.4.1.473-full-binaries.zip&can=2&q=
【讨论】:
【参考方案4】:您是否尝试过缩小面积?比如 us-pacific.osm.pbf
最终发布错误消息会很有用。
【讨论】:
仍然出现空值。我试过 var f = Serializer.Deserialize以上是关于Protobuf-net 反序列化开放街道地图的主要内容,如果未能解决你的问题,请参考以下文章
protobuf-net:不正确的线型反序列化 TimeSpan