将二进制文件读入结构(翻译指令)
Posted
技术标签:
【中文标题】将二进制文件读入结构(翻译指令)【英文标题】:Read binary file into struct (translating instructions) 【发布时间】:2021-11-14 08:23:34 【问题描述】:阅读二进制文件和结构对我来说是一个新领域。
我了解如何读取文件并尝试了各种方法来读取原始数据,但似乎我需要使用 struct。
我正在尝试将这些指令翻译成 python 代码:
二进制合并文件的开头包含用于各种通道的 GWI_file_header_struct 结构数组(在文件 INET_INT.H 中定义),然后是隔行扫描的 32 位浮点数据。标头中的第一个 4 字节是 1 个通道的标头长度(以字节为单位)(即 516 = 0x0204)。要读取存储在文件中的通道数,请读取第一个结构的“channelsPerFile”字段(例如,查看有多少标题)。在表头之后,数据以交错的形式保存,其中的点按时间获取的顺序存储。
主要的困惑是我如何将其翻译成:
struct.unpack(...)
INET_INT.H 结构:
typedef struct GWI_file_header_struct // This struct is at the beginning of GWI iNet BINARY files that contain waves.
//
// Macintosh:
//
// file type: 'GWID'
// creator type: 'ioNe' NETWORK_DATA_CREATOR
// ----------------------------------
// HEADER INFORMATION
iNetINT32 headerSizeInBytes; // contains length, in bytes, of this header (this does not include any data) bytes 0..3, base 0
// ----------------------------------
// FILE INFORMATION
iNetINT32 int32key; // 32bit key that should contain 0x12345678 (this will help you make sure your byte lanes are ok).
// bytes 4..7, base 0
iNetINT32 file_endian; // endian mode of stored data on disk: 0 = bigEndian_ion, 1 = littleEndian_ion
// bytes 8..11, base 0
iNetINT16 int16key; // 16bit key that should contain 0x55b4; (this field should consume 2 bytes
// in the struct -- no padding) (i.e. INET_INT16_KEY = 0x55b4)
// bytes 12..13, base 0
iNetINT16 zero; // set to 0 (this field should consume 2 bytes in the struct -- no padding)
// bytes 14..15, base 0
// # of seconds since Jan 1, Midnight, 1904 that the acquisition started (this is used to compute the
// date of acquisition). This overflows in 2030.
// Strip Chart: 1st digitized point in entire stream (i.e. 1st pt of 1st scan)
// Osc Mode: 1st point in current scan, secsSince1904_Int64 units
// bytes 16..19, base 0
iNetUINT32 acquisition_SecsSince1904_FixedUint32_OverflowIn2030;
// ----------------------------------
// # OF POINTS STORED
//
// This file contains a set of scans. Each scan is 1 to .5billion points long. For example,
// we might have 100 scans, each 1000 points long. In this example:
//
// pointsPerScanThisChannel_LSW = 1000
// pointsPerScanThisChannel_MSW = 0
//
// numScansStoredBeforeLastScan = 99
//
// numPointsInLastPartialScan_LSW = 1000
// numPointsInLastPartialScan_MSW = 0
//
// Each channel can have a different number of points per scan due to the sampleRateChanMULTiplier
iNetUINT32 pointsPerScanThisChannel_LSW;
iNetUINT32 pointsPerScanThisChannel_MSW;
// # points per scan = (pointsPerScanThisChannel_MSW * 2^32) + pointsPerScanThisChannel_LSW
// bytes 20..23, base 0
// bytes 24..27, base 0
iNetUINT32 numScansStoredBeforeLastScan_LSW;
// # of complete scans stored in file
// bytes 28..31, base 0
// iNetUINT32 numScansStoredBeforeLastScan_MSW;
// this is defined below, at the end of the struct
iNetUINT32 numPointsInLastPartialScan_LSW;
iNetUINT32 numPointsInLastPartialScan_MSW;
// # points stored in last scan if it is partially complete = (numPointsInLastPartialScan_MSW * 2^32) + numPointsInLastPartialScan_LSW
// bytes 32..35, base 0
// bytes 36..39, base 0
// ----------------------------------
// TIME INFORMATION
iNetFLT32 firstPoint_Time_Secs; // time of 1st point, units are seconds
// bytes 40..43, base 0
iNetFLT32 endUser_channel_samplePeriod_Secs;
// time between points for this channel,
// units are seconds. Notice that channels
// can have different sample rates, which
// is the master_endUser_SampleRate / sampleRate_Divider,
// where 'sampleRate_Divider' is an integer.
// bytes 44..47, base 0
// ----------------------------------
// TYPE OF DATA STORED
iNetINT32 arrayDataType; // Type of src array data. iNetDataType:
//
// 0 iNetDT_INT16: 16bit integer, signed
// 2 iNetDT_UINT16: 16bit integer, unsigned
// 3 iNetDT_INT32: 32bit integer, signed
// 4 iNetDT_UINT32: 32bit integer, unsigned
// 5 iNetDT_FLT32: 32bit float (IEEE flt32 format)
// 6 iNetDT_Double: 'double', as determined by the compiler
// (e.g. flt64, flt80, flt96, flt128)
// see 'bytesPerDataPoint' field to see
// how many bytes
// bytes 48..51, base 0
iNetINT32 bytesPerDataPoint; // # of bytes for each datapoint (e.g. 4 for 32bit signed integer)
// bytes 52..55, base 0
iNetStr31 verticalUnitsLabel; // pascal string of vertical units label (e.g. "Volts")
// bytes 56..87, base 0
iNetStr31 horizontalUnitsLabel; // horizontal units label, e.g. "Secs", pascal string (0th char is the # of valid chars)
// bytes 88..119, base 0
iNetStr31 userName; // user named set by user, e.g. "Pressure 1" , pascal string (0th char is the # of valid chars)
// bytes 120..151, base 0
iNetStr31 chanName; // name of channel, e.g. "Ch1 Vin+", pascal string (0th char is the # of valid chars)
// bytes 152..183, base 0
// ----------------------------------
// DATA MAPPING
//
iNetINT32 minCode; // if data is stored in integer format, this contains the mapping from integer
iNetINT32 maxCode; // to engineering units (e.g. +/-2048 A/D data is mapped to +/- 10V, minCode = -2048,
iNetFLT32 minEU; // maxCode = +2047, minEU = -10.000, maxEU = +9.995.
iNetFLT32 maxEU; //
// bytes 184..187, base 0
// bytes 188..191, base 0
// bytes 192..195, base 0
// bytes 196..199, base 0
// ----------------------------------
// iNet NETWORK ADDRESS (this does not need
// to be filled in, 0L's are ok)
iNetINT32 netNum; // channel network # (this pertains to iNet only; use 0 otherwise)
// bytes 200..203, base 0
iNetINT32 devNum; // channel device # (this pertains to iNet only; use 0 otherwise)
// bytes 204..207, base 0
iNetINT32 modNum; // channel module # (this pertains to iNet only; use 0 otherwise)
// bytes 208..211, base 0
iNetINT32 chNum; // channel channel # (this pertains to iNet only; use 0 otherwise)
// bytes 212..215, base 0
// ----------------------------------
// END USER NOTES
iNetStr255 notes; // pascal string that contains notes about the data stored.
// bytes 216..471, base 0
// ----------------------------------
// MAPPING
iNetFLT32 /* must remain flt32 */ internal1; // Mapping from internal engineering units (e.g. Volts) to external engineering
iNetFLT32 /* must remain flt32 */ external1; // units (e.g. mmHg). This is used for 2 point linear mapping/calibration to
iNetFLT32 /* must remain flt32 */ internal2; // a new, user defined, coordinate system. instruNet World does not read these values
iNetFLT32 /* must remain flt32 */ external2; // from the wave files, yet instead reads them from the instrNet.prf file -- they
// are only stored for the benefit of other software that might read this file. gsw 12/1/96
// bytes 472..475, base 0
// bytes 476..479, base 0
// bytes 480..483, base 0
// bytes 484..487, base 0
iNetFLT32 flt32key; // flt32 key set to 1234.56 (i.e. INET_FLT32_KEY), Used to test floating point code. gsw 12/1/96
// bytes 488..491, base 0
iNetINT32 sampleRate_Divider; // this channel is digitized at the master_endUser_SampleRate divided
// this 'sampleRate_Divider' (i.e. sampleRateChanMULT_integerRatio_N_int64)
// (helpful with FileType Binary Merge), gsw 1/29/97. Note: This field was introduced 1/29/97 and
// files saved before that time set it to 0.
// bytes 492..495, base 0
iNetINT32 channelsPerFile; // # of channels per file (i.e. interlaced after array of headers) (helpful with FileType Binary Merge), gsw 1/29/97
// Note: This field was introduced 1/29/97 and files saved before that time set it to 0.
// bytes 496..499, base 0
// ----------------------------------
// EXPANSION FIELDS
#if 1 // gsw 12/23/09
// # of complete scans stored in file, MS 32bits
// bytes 500..503, base 0
iNetUINT32 numScansStoredBeforeLastScan_MSW;
#else
iNetINT32 expansion8; // expansion fields that are preset to
#endif
iNetINT32 expansion9; // 0 and then ignored
iNetINT32 expansion10; // bytes 500..503, base 0
// bytes 504..507, base 0
// bytes 508..511, base 0
// ----------------------------------
// KEY TO TEST STRUCT PACKING
iNetINT32 int32key_StructTest; // 32bit key that should contain 0x12345678; (i.e. INET_INT32_KEY)
// bytes 512..515, base 0
// ----------------------------------
// ACTUAL DATA
/* iNetFLT32 *data[1]; */ // contains array of data of type 'arrayDataType'
GWI_file_header_struct;
最终代码和结果:
代码
from struct import *
# Current 3 channels: Ch11 Vin+, Ch13 Vin+ and Ch15 Vin+
# Header info extracted using provided header struct (INET_INT.H)
# After the header, the data is saved in an interlaced form,
# where points are stored in the order that they are acquired in time.
# 3 channels: A[0], B[0], C[0], A[1], B[1], C[1]...
# After header = 516 header size x 3 channels = 1,548 bytes
# Start of data at 1,548 bytes?
with open(file, "rb") as f:
byte = f.read(12)
header_size, int32key, file_endian = unpack('<3i', byte)
# channel name 1
f.seek(152)
chan = f.read(183-152)
chan = struct.unpack("<31s", chan)[0].rstrip(b'\x00').lstrip(b'\t')
# channel name 2
f.seek(152+header_size)
chan2 = f.read(183-152)
chan2 = struct.unpack("<31s", chan2)[0].rstrip(b'\x00').lstrip(b'\t')
print(header_size, int32key, file_endian)
print("channel 1: ".format(chan))
print("channel 2: ".format(chan2))
结果
516 305419896 1
channel 1: b'Ch11 Vin+'
channel 2: b'Ch13 Vin+'
【问题讨论】:
有一些信息 here 可能会有所帮助,但您可能需要更详细地阅读struct
的文档。
事情并不像看起来那么复杂,如果你可以阅读并理解“INET_INT.H”中的struct
定义,虽然这种文件格式有一个特殊的与那些隔行扫描的通道值有关。但是,如果您需要一些帮助来翻译它作为第一步,您将需要发布该 .h 文件的内容
@gimix 谢谢你的起点。我能够找到 INET_INT.H 文件并编辑我的帖子以包含似乎与此相关的部分。
【参考方案1】:
好的,这不是一个完整的答案,但我觉得 cmets 在这里真的难以阅读。
第一步是读取前 12 个字节(三个 4 字节整数),然后解压它们以便我们检查字节序。我们先试试大端序
from struct import *
with open(file, "rb") as f:
byte = f.read(12)
header_size, int32key, file_endian = unpack('>3i', byte)
我们希望将 int32key
设置为 305419896 (= \x12345678)。如果我们得到另一个值,那么让我们切换到 little-endian,即将我们的解包格式字符串更改为 <3i
。
此时,我们可以使用相同的逻辑读取标头的其余部分,并获取我们为第一个通道读取数据所需的所有信息。我希望这对你来说是一个好的开始。
【讨论】:
这很有帮助,我能够使用 little-endian unpack(' 频道数据肯定在标题之后。它们是由 ArrayDataType 标头字段指定的类型的数据数组。请注意,您可能需要按照“数据映射”部分中的说明进行转换 知道了,这回答了我的问题并在我的帖子中更新。 @gimix 非常感谢您的帮助!以上是关于将二进制文件读入结构(翻译指令)的主要内容,如果未能解决你的问题,请参考以下文章
Windows C++ API:如何将整个二进制文件读入缓冲区?