将二进制文件读入结构(翻译指令)

Posted

技术标签:

【中文标题】将二进制文件读入结构(翻译指令)【英文标题】:Read binary file into struct (translating instructions) 【发布时间】:2021-11-14 08:23:34 【问题描述】:

阅读二进制文件和结构对我来说是一个新领域。

我了解如何读取文件并尝试了各种方法来读取原始数据,但似乎我需要使用 struct。

我正在尝试将这些指令翻译成 python 代码:

二进制合并文件的开头包含用于各种通道的 GWI_file_header_struct 结构数组(在文件 INET_INT.H 中定义),然后是隔行扫描的 32 位浮点数据。标头中的第一个 4 字节是 1 个通道的标头长度(以字节为单位)(即 516 = 0x0204)。要读取存储在文件中的通道数,请读取第一个结构的“channelsPerFile”字段(例如,查看有多少标题)。在表头之后,数据以交错的形式保存,其中的点按时间获取的顺序存储。

主要的困惑是我如何将其翻译成:

struct.unpack(...)

INET_INT.H 结构:

typedef struct GWI_file_header_struct  //  This struct is at the beginning of GWI iNet BINARY files that contain waves.
                                    //
                                    //  Macintosh:
                                    //
                                    //      file type:      'GWID'
                                    //      creator type:   'ioNe'      NETWORK_DATA_CREATOR   
    
                                    //  ----------------------------------
                                    //  HEADER INFORMATION
    
iNetINT32 headerSizeInBytes;        //  contains length, in bytes, of this header (this does not include any data)  bytes 0..3, base 0 

                                    //  ----------------------------------
                                    //  FILE INFORMATION
    
iNetINT32 int32key;                 //  32bit key that should contain 0x12345678 (this will help you make sure your byte lanes are ok). 
                                    //   bytes 4..7, base 0 

iNetINT32 file_endian;              //  endian mode of stored data on disk: 0 = bigEndian_ion, 1 = littleEndian_ion
                                    //   bytes 8..11, base 0 

iNetINT16 int16key;                 //  16bit key that should contain 0x55b4; (this field should consume 2 bytes
                                    //   in the struct -- no padding) (i.e. INET_INT16_KEY = 0x55b4)
                                    //   bytes 12..13, base 0 

iNetINT16 zero;                     //  set to 0 (this field should consume 2 bytes in the struct -- no padding)
                                    //   bytes 14..15, base 0 

                                    //  # of seconds since Jan 1, Midnight, 1904 that the acquisition started (this is used to compute the
                                    //  date of acquisition). This overflows in 2030.
                                    //  Strip Chart: 1st digitized point in entire stream (i.e. 1st pt of 1st scan)
                                    //  Osc Mode:    1st point in current scan, secsSince1904_Int64 units 
                                    //   bytes 16..19, base 0 
iNetUINT32 acquisition_SecsSince1904_FixedUint32_OverflowIn2030;
                                    
                                    //  ----------------------------------
                                    //  # OF POINTS STORED
                                    //
                                    //  This file contains a set of scans.  Each scan is 1 to .5billion points long.  For example,
                                    //  we might have 100 scans, each 1000 points long. In this example:
                                    //
                                    //      pointsPerScanThisChannel_LSW = 1000
                                    //      pointsPerScanThisChannel_MSW = 0
                                    //
                                    //      numScansStoredBeforeLastScan = 99
                                    //
                                    //      numPointsInLastPartialScan_LSW = 1000
                                    //      numPointsInLastPartialScan_MSW = 0
                                    //
                                    //  Each channel can have a different number of points per scan due to the sampleRateChanMULTiplier

iNetUINT32 pointsPerScanThisChannel_LSW;    
iNetUINT32 pointsPerScanThisChannel_MSW;    
                                    //  # points per scan =  (pointsPerScanThisChannel_MSW * 2^32) + pointsPerScanThisChannel_LSW
                                    //   bytes 20..23, base 0 
                                    //   bytes 24..27, base 0 

iNetUINT32 numScansStoredBeforeLastScan_LSW;            
                                    //  # of complete scans stored in file 
                                    //   bytes 28..31, base 0 

//  iNetUINT32 numScansStoredBeforeLastScan_MSW;    
                                    //  this is defined below, at the end of the struct

iNetUINT32 numPointsInLastPartialScan_LSW;  
iNetUINT32 numPointsInLastPartialScan_MSW;  
                                    //  # points stored in last scan if it is partially complete = (numPointsInLastPartialScan_MSW * 2^32) + numPointsInLastPartialScan_LSW
                                    //   bytes 32..35, base 0 
                                    //   bytes 36..39, base 0 

                                    //  ----------------------------------
                                    //  TIME INFORMATION

iNetFLT32 firstPoint_Time_Secs;     //  time of 1st point, units are seconds
                                    //   bytes 40..43, base 0 

iNetFLT32 endUser_channel_samplePeriod_Secs;
                                    //  time between points for this channel,
                                    //  units are seconds.  Notice that channels
                                    //  can have different sample rates, which
                                    //  is the master_endUser_SampleRate / sampleRate_Divider,
                                    //  where 'sampleRate_Divider' is an integer.
                                    //   bytes 44..47, base 0 

                                    //  ----------------------------------
                                    //  TYPE OF DATA STORED

iNetINT32 arrayDataType;            //  Type of src array data. iNetDataType:
                                    //
                                    //  0   iNetDT_INT16:   16bit integer, signed
                                    //  2   iNetDT_UINT16:  16bit integer, unsigned
                                    //  3   iNetDT_INT32:   32bit integer, signed
                                    //  4   iNetDT_UINT32:  32bit integer, unsigned
                                    //  5   iNetDT_FLT32:   32bit float (IEEE flt32 format)
                                    //  6   iNetDT_Double:  'double', as determined by the compiler
                                    //                      (e.g. flt64, flt80, flt96, flt128)
                                    //                      see 'bytesPerDataPoint' field to see
                                    //                      how many bytes
                                    //   bytes 48..51, base 0 
                                            
iNetINT32 bytesPerDataPoint;        //  # of bytes for each datapoint (e.g. 4 for 32bit signed integer)
                                    //   bytes 52..55, base 0 

iNetStr31 verticalUnitsLabel;       //  pascal string of vertical units label (e.g. "Volts")
                                    //   bytes 56..87, base 0 

iNetStr31 horizontalUnitsLabel;     //  horizontal units label, e.g. "Secs", pascal string (0th char is the # of valid chars)   
                                    //   bytes 88..119, base 0 

iNetStr31 userName;                 //  user named set by user, e.g. "Pressure 1" , pascal string (0th char is the # of valid chars)   
                                    //   bytes 120..151, base 0 

iNetStr31 chanName;                 //  name of channel, e.g. "Ch1 Vin+", pascal string (0th char is the # of valid chars)   
                                    //   bytes 152..183, base 0 

                                    //  ----------------------------------
                                    //  DATA MAPPING
                                    //
iNetINT32 minCode;                  //  if data is stored in integer format, this contains the mapping from integer 
iNetINT32 maxCode;                  //  to engineering units (e.g. +/-2048 A/D data is mapped to +/- 10V, minCode = -2048,
iNetFLT32 minEU;                    //  maxCode = +2047, minEU = -10.000, maxEU = +9.995.
iNetFLT32 maxEU;                    //  
                                    //   bytes 184..187, base 0 
                                    //   bytes 188..191, base 0 
                                    //   bytes 192..195, base 0 
                                    //   bytes 196..199, base 0 

                                    //  ----------------------------------
                                    //  iNet NETWORK ADDRESS (this does not need
                                    //  to be filled in, 0L's are ok)

iNetINT32 netNum;                   //  channel network # (this pertains to iNet only; use 0 otherwise)
                                    //   bytes 200..203, base 0 

iNetINT32 devNum;                   //  channel device # (this pertains to iNet only; use 0 otherwise)
                                    //   bytes 204..207, base 0 

iNetINT32 modNum;                   //  channel module # (this pertains to iNet only; use 0 otherwise)
                                    //   bytes 208..211, base 0 

iNetINT32 chNum;                    //  channel channel # (this pertains to iNet only; use 0 otherwise)
                                    //   bytes 212..215, base 0 
        
                                    //  ----------------------------------
                                    //  END USER NOTES

iNetStr255 notes;                   //  pascal string that contains notes about the data stored.
                                    //   bytes 216..471, base 0 

                                    //  ----------------------------------
                                    //  MAPPING

iNetFLT32 /* must remain flt32 */ internal1;    //  Mapping from internal engineering units (e.g. Volts) to external engineering                     
iNetFLT32 /* must remain flt32 */ external1;    //  units (e.g. mmHg).  This is used for 2 point linear mapping/calibration to  
iNetFLT32 /* must remain flt32 */ internal2;    //  a new, user defined, coordinate system.  instruNet World does not read these values
iNetFLT32 /* must remain flt32 */ external2;    //  from the wave files, yet instead reads them from the instrNet.prf file -- they
                                    //  are only stored for the benefit of other software that might read this file. gsw 12/1/96
                                    //   bytes 472..475, base 0 
                                    //   bytes 476..479, base 0 
                                    //   bytes 480..483, base 0 
                                    //   bytes 484..487, base 0 

iNetFLT32 flt32key;                 //  flt32 key set to 1234.56 (i.e. INET_FLT32_KEY), Used to test floating point code. gsw 12/1/96
                                    //   bytes 488..491, base 0 

iNetINT32 sampleRate_Divider;       //  this channel is digitized at the master_endUser_SampleRate divided 
                                    //  this 'sampleRate_Divider' (i.e. sampleRateChanMULT_integerRatio_N_int64)
                                    //  (helpful with FileType Binary Merge), gsw 1/29/97. Note: This field was introduced 1/29/97 and
                                    //  files saved before that time set it to 0.
                                    //   bytes 492..495, base 0 
                                    
iNetINT32 channelsPerFile;          //  # of channels per file (i.e. interlaced after array of headers) (helpful with FileType Binary Merge), gsw 1/29/97
                                    //  Note: This field was introduced 1/29/97 and files saved before that time set it to 0.
                                    //   bytes 496..499, base 0 
                                    
                                    //  ----------------------------------
                                    //  EXPANSION FIELDS

                                    #if 1   //  gsw 12/23/09

                                    //  # of complete scans stored in file, MS 32bits
                                    //   bytes 500..503, base 0 
iNetUINT32 numScansStoredBeforeLastScan_MSW;    

                                    #else
                                    iNetINT32 expansion8;               //  expansion fields that are preset to 
                                    #endif

iNetINT32 expansion9;               //  0 and then ignored
iNetINT32 expansion10;              //   bytes 500..503, base 0 
                                    //   bytes 504..507, base 0 
                                    //   bytes 508..511, base 0 

                                    //  ----------------------------------
                                    //  KEY TO TEST STRUCT PACKING

iNetINT32 int32key_StructTest;      //  32bit key that should contain 0x12345678; (i.e. INET_INT32_KEY)
                                    //   bytes 512..515, base 0 
                                    
                                    //  ----------------------------------
                                    //  ACTUAL DATA

/* iNetFLT32 *data[1]; */           //  contains array of data of type 'arrayDataType'

 GWI_file_header_struct;

最终代码和结果:

代码

from struct import *
# Current 3 channels: Ch11 Vin+, Ch13 Vin+ and Ch15 Vin+
# Header info extracted using provided header struct (INET_INT.H)
# After the header, the data is saved in an interlaced form,
# where points are stored in the order that they are acquired in time.
# 3 channels: A[0], B[0], C[0], A[1], B[1], C[1]...
# After header = 516 header size x 3 channels = 1,548 bytes
# Start of data at 1,548 bytes?
with open(file, "rb") as f:
    byte = f.read(12)
    header_size, int32key, file_endian = unpack('<3i', byte)
    # channel name 1
    f.seek(152)
    chan = f.read(183-152)
    chan = struct.unpack("<31s", chan)[0].rstrip(b'\x00').lstrip(b'\t')
    # channel name 2
    f.seek(152+header_size)
    chan2 = f.read(183-152)
    chan2 = struct.unpack("<31s", chan2)[0].rstrip(b'\x00').lstrip(b'\t')
print(header_size, int32key, file_endian)
print("channel 1: ".format(chan))
print("channel 2: ".format(chan2))

结果

516 305419896 1
channel 1: b'Ch11 Vin+'
channel 2: b'Ch13 Vin+'

【问题讨论】:

有一些信息 here 可能会有所帮助,但您可能需要更详细地阅读 struct 的文档。 事情并不像看起来那么复杂,如果你可以阅读并理解“INET_INT.H”中的struct定义,虽然这种文件格式有一个特殊的与那些隔行扫描的通道值有关。但是,如果您需要一些帮助来翻译它作为第一步,您将需要发布该 .h 文件的内容 @gimix 谢谢你的起点。我能够找到 INET_INT.H 文件并编辑我的帖子以包含似乎与此相关的部分。 【参考方案1】:

好的,这不是一个完整的答案,但我觉得 cmets 在这里真的难以阅读。

第一步是读取前 12 个字节(三个 4 字节整数),然后解压它们以便我们检查字节序。我们先试试大端序

from struct import *
with open(file, "rb") as f:
    byte = f.read(12)
header_size, int32key, file_endian = unpack('>3i', byte)

我们希望将 int32key 设置为 305419896 (= \x12345678)。如果我们得到另一个值,那么让我们切换到 little-endian,即将我们的解包格式字符串更改为 &lt;3i

此时,我们可以使用相同的逻辑读取标头的其余部分,并获取我们为第一个通道读取数据所需的所有信息。我希望这对你来说是一个好的开始。

【讨论】:

这很有帮助,我能够使用 little-endian unpack(' 频道数据肯定在标题之后。它们是由 ArrayDataType 标头字段指定的类型的数据数组。请注意,您可能需要按照“数据映射”部分中的说明进行转换 知道了,这回答了我的问题并在我的帖子中更新。 @gimix 非常感谢您的帮助!

以上是关于将二进制文件读入结构(翻译指令)的主要内容,如果未能解决你的问题,请参考以下文章

将整个二进制文件读入 Python

将二进制文件缓冲区的块读入不同的类型

使用哪个 PHP 函数将二进制文件读入字符串?

Windows C++ API:如何将整个二进制文件读入缓冲区?

Matlab:当每个单元格具有不同的行数和列数时,将二进制文件读入单元格

将十六进制数据读入 less