COFF - 中间文件格式解析

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了COFF - 中间文件格式解析相关的知识,希望对你有一定的参考价值。

 

 

 

G Common Object File Format (COFF)

 

Overall structure 630 

File header 632 

Optional header 633 

Section headers 634 

Raw data sections 636 

COFF relocation information 637 

Line number information 639 

Symbol table 641 

Additional symbols 643 

String table 643

 

This section describes the Common Object File Format,

本节描述了通用对象文件格式

COFF, used by the linker.

COFF文件, 提供给链接器连接成可执行文件的中间文件

 

 

Overall structure

整体的COFF文件结构体:

The COFF Object Format is used both for object files (.o extension) and executable files.

COFF目标格式既用于中间文件,也用于可执行文件

Some of the information is only present in object files,

一些信息只出现在对象文件中

other information is only present in the executable files.

其他的信息只出现在可执行文件中

 

Table G-1   COFF file components COFF文件组成

Section  区段名

Description  说明

File header  

文件头

Contains general information; always present.  

包含一般性的消息, 永远有效

Optional header  

扩展头

Contains information about an executable file; usually only present in executables.  

包含关于可执行文件的信息, 通常只出现在可执行文件中

Section header  

区段头

Contains information about the different COFF sections; one for each section.  

包含每个不同的COFF区段信息, 每个区段头对应每个区段

Raw data sections

原始数据区

One for each section containing raw data, such as machine instructions and initialized variables.  

每个区段包含的数据, 例如可执行的机器码,和用来初始化变量的数据

Relocation information

重定位信息  

Contains information about unresolved references to symbols in other modules;

包含来自其它文件中没有确定地址的符号的信息.

one for each section having external references.

每个区段都有一个外部符号

Usually only present in object files and not in executable files.  

通常在目标文件出现而不在可执行文件中出现

Line number information  

行号信息

Contains debugging information about source line numbers;

好汉源代码行号的调试信息

one for each section if compiled with the -g option.  

如果编译选项含有-g参数,那么每一个区段都含有

Symbol table  

符号表

Contains information about all the symbols in the object file;

包含目标文件的所有符号信息

present if not stripped from an executable file.  

目标文件都含有, 可执行文件如果没有剔除的话也有

String table  

字符串表

Contains long symbol names.

包含一些长过8字节的符号名  

The following figure shows the COFF file structure:

下图显示的是COFF文件结构

 

File header 文件头

The file header contains general information about the object file

文件头包含目标文件的一般信息

and has the following structure from the file filehdr.h:

下面是来自filehdr.h文件的结构体

 

 1 struct filehdr { 
 3     unsigned short  f_magic;    /* magic 魔术字 */ 
 5     unsigned short  f_nscns;    /* number of sections 区段个数*/ 
 7     long            f_timdat;   /* date stamp  时间戳*/ 
 9     long            f_symptr;   /* fileptr to symtab 符号表的文件偏移*/ 
11     long            f_nsyms;    /* symtab count 符号表个数*/ 
13     unsigned short  f_opthdr;   /* sizeof(optional hdr) 扩展头的大小*/ 
15     unsigned short  f_flags;    /* flags COFF文件属性*/ 
17 }; 

 


Table G-2   COFF header fields 

Field  

Description  

 

f_magic 

Magic number used to identify the file as a COFF file. It has the value 0x170 for the PowerPC family of processors.

 

f_nscns 

Number of sections this file contains.

这个文件包含的区段个数

 

f_timdat 

Creation time of the file represented as a 32 bit value.

一个32位的数,表示文件的生成时间

 

f_symptr 

File offset of the symbol table.

符号表的文件偏移

 

f_nsyms 

Number of entries in the symbol table.

符号表的条目数

 

f_opthdr 

Number of bytes in the Optional Header.

扩展头的字节数

 

f_flags 

Bit field containing the following flags:

这个是个位域,包含着以下信息

 

 

F_RELFLG (0x1)  

Set if the COFF file does not contain relocation information;

如果设置,这个COFF文件就是不存在重定位信息

 normally true only for executable files.  

通常只有可执行文件为true(1)

 

F_EXEC (0x2)  

Set if the file is executable and all references are resolved.  

如果设置,则文件是一个所有符号引用都确定的可执行文件

 

F_LNNO (0x4)  

Set if the COFF file does not contain line number information;

如果设置,则文件是一个没有行号信息的对象文件

this symbolic debugging information can be stripped with the -s option or the strip program.  

这些调试符号信息可以被-s的参数或剔除程序给剔除

 

F_LSYMS (0x8)  

Set if the COFF file does not contain local symbols;

如果设置该位,文件将没有本地符号

these symbols can be stripped with the -X and -x options to the assembler and linker.  

可以用汇编器和链接器传入-X和-x参数剔除符号

 

F_AR32W (0x200)  

如果设置该位,则为大端的字节序  

 

Optional header

The optional header contains information about an executable file and has the following structure from the file aouthdr.h:

扩展头包含可执行文件的信息.下面是来自aouthdr.h头文件的结构体:

 1 typedef struct aouthdr {  
 3     short   magic;              /* a.out magic */ 
 5     short   vstamp;             /* version stamp */ 
 7     long    tsize;              /* .text size */ 
 9     long    dsize;              /* .data size */ 
11     long    bsize;              /* .bss size */ 
13     long    entry;              /* entry point */ 
15     long    text_start;         /* fileptr to .text */ 
17     long    data_start;         /* fileptr to .data */ 
19 } AOUTHDR; 

 


Table G-3   COFF optional (executable) header fields 

Field  

Description  

magic  

Value 0x10b.  

vstamp  

Set by the option -VS, but not used by the linker.  

tsize  

Size of the .text section.  

dsize  

Size of the .data section.  

bsize  

Size of the .bss section.  

entry  

Entry point in the executable program where execution will begin. The default entry point is the symbol start defined in the file function main(). The -e option can change this to any other symbol in the program.  

text_start 

File offset to the .text section in the COFF file.  

data_start 

File offset to the .data section in the COFF file.  

 

Section headers

区段头

There is one section header for each section in the COFF file,

每个COFF文件的区段都有像下面一样的区段头

specified by the f_nscns field in the COFF File Header.

由COFF的文件头结构体中的 f_nscns字段指出它的文件偏移

Section headers have the following structure from the file scnhdr.h:

下面是来自scnhdr.h头文件的区段头结构体

 1 struct scnhdr {                     /* modified COFF*/ 
 3     char            s_name[8];      /* section name 区段名*/ 
 5     long            s_paddr;        /* physical address 物理地址*/ 
 7     long            s_vaddr;        /* virtual address 虚拟地址*/ 
 9     long            s_size;         /* size of section 区段的字节数*/ 
11     long            s_scnptr;       /* fileptr to raw data 指向原始数据的文件偏移*/ 
13     long            s_relptr;       /* fileptr to reloc 指向重定位表的文件偏移*/ 
15     long            s_lnnoptr;      /* fileptr to lineno 指向行号表的文件偏移*/ 
17     unsigned long short  s_nreloc;       /* reloc count 重定位表条目数*/ 
19     unsigned long short  s_nlnno;        /* line number count 行号表条目数*/ 
21     long            s_flags;        /* flags */ 
23 };

 


 

 

#define SCNHDR struct scnhdr 

#define SCNHSZ sizeof(SCNHDR) 

Table G-4   COFF section header fields 

Field  

Description  

 

s_name[8] 

Eight byte null terminated section name.

8个字节, 以NULL为结束符的区段名

Standard names include .text, .data, and .bss.

标准的区段名包含:.text, .data, and .bss.

 

s_paddr 

Physical start address of the section.

区段的物理起始地址.

It is usually set to the same value as s_vaddr,

它通常被设置为s_vaddr设相同的值

but can be set to a different value with the command in the linker command language.

但是在链接器语言中可以设置不同的值

This can be useful when initialized data is physically allocated to a ROM address,

当给一个ROM分配一个实际地址以初始化数据时,它是有用的

but moved to a logical address in RAM at start-up.

但在启动后将被亦作一个虚拟地址

 

s_vaddr 

Logical start address of the section as allocated by the assembler or linker.

区段被汇编器和链接器分配的虚拟开始地址

 

s_size  

Size in bytes of the memory allocated to the section.

区段被分配的内存的字节数

 

s_scnptr 

File offset to the raw data of the section.

区段原始数据的文件偏移

Note that the .注意,

bss section does not have any raw data since it will be initialized by the operating system.

bss部分没有任何原始数据,因为它将由操作系统初始化

 

s_relptr 

File offset to the relocation information of the section.

区段重定位数据的文件偏移

 

s_lnnoopt 

File offset to the line number information of the section.

区段行号表信息的文件偏移

 

s_nreloc 

Number of relocation information entries.

重定位数据的数目

 

s_nlnno 

Number of line number information entries.

行号表数据的数目

 

s_flags 

Bit field containing the following flags:

该位域包含以下信息

 

 

STYP_TEXT (0x20)  

set for a .text section.

被设置时,这是一个代码段  

 

STYP_DATA (0x40)  

set for a .data section.  

被设置时,这是一个数据段

 

STYP_BSS (0x80)  

set for .bss section.

被设置时,这是一个未初始化的数据段 

 

STYP_INFO (0x200)  

set for a .comment section.  

The following table shows the correspondence between the type-spec as defined on p.409 and the COFF section flags assigned to the output section.

Table G-5   type-spec - COFF section flag correspondence

type-spec  

Section flags (s_flags)  

BSS  

STYP_BSS  

COMMENT  

STYP_INFO  

CONST  

STYP_DATA  

DATA  

STYP_DATA  

TEXT  

STYP_TEXT  

 

Raw data sections 原始数据区段

The Raw Data Sections contain the actual raw data for each section.

原始数据区段包含每个区段的实际的原始数据

Table G-6   COFF section names 

.text  

Machine instructions, constant data, and strings  

可执行的机器码, 常量数据和常量字符串

.sdata2  

Small constant data; see the Set size limit for "small const" variables (-Xsmall-const=n), p.106.  

.data  

Initialized data.  用于初始化全局变量的数据

.sdata  

Small initialized data; see the Set size limit for "small data" variables (-Xsmall-data=n), p.106.  

.bss  

Uninitialized data; does not have any raw data.  

未初始化的数据, 不存在任何原始数据

.sbss  

Small uninitialized data.  

.comment  

Comments from #ident directives in C.  

 

.init  

Code that is to be executed before the main() function.  

在main()函数之前被执行的代码

.fini  

Code that is to be executed when the user program has finished execution.  

当用户程序执行完毕后被执行的代码

.eini  

The instructions of the .fini code;

.fini区段的指令

the .init, .fini, and .eini sections should be placed after each other in memory.  

当彼此都在内存之后 .init, .fini, and .eini区段应该被分配

 

COFF relocation information

The Relocation Information segment contains information about unresolved references.

重定位段包含外部未分配地址的符号.

Since compilers and assemblers do not know at what absolute memory address a symbol will be allocated,

当汇编器和编译器不知道怎么给一个符号分配绝对的内存地址时.

and since they are unaware of definitions of symbols in other files,

因为汇编器和编译器不知道该符号会在其他文件定义

every reference to such a symbol will create a relocation entry.

所有这样的符号引用将被创建一个重定位条目

The relocation entry will point to the address where the reference is being made,

这个重定位条目将会指向这个符号被引用的地址.

and to the symbol table entry that contains the symbol that is referenced.

所以,当一个符号是被引用的,它会被包含在符号表的条目中.

The linker will use this information to fill in the correct address after it has allocated addresses to all symbols.

链接器可以使用这些信息给所有符号分配地址后,纠正这些被引用符号的地址

When an offset is added to a symbol in the assembly source,

当添加一个汇编源码中的符号时

lwz     r3,(var+16)(r0) 

move.l  var+16,d0 

 

that offset is stored in the addressing mode,

这个偏移到的地方是没有寻址模式的

so that adding the real address of the symbol with the address field will yield a correct reference.

这样添加的符号的真正地址的字段将产生一个正确的参考。

The relocation segment does not exist in executable files.

重定位段在可执行文件中不存在

A relocation entry has the following structure from the file reloc.h:

 1 struct reloc {                  /* modified COFF */ 
 3     long            r_vaddr;    /* 引用的地址(文件偏移) */ 
 5     long            r_symndx;   /* 在符号表的索引(符号名) */ 
 7     unsigned short  r_type;     /* 重定位类型 */ 
 9     unsigned short  r_offset;   /* 高位的字是真实地址*/ 
11 }; 
12 
13  
14 
15 #define RELOC   struct reloc 
16 
17 #define RELSZ   sizeof(RELOC) 
18 
19 #define RELSZ   10              /* sizeof(RELOC) */ 

 


Table G-7   COFF relocation entry fields  

Field  

Description  

r_vaddr 

The relative address of the area within the current section to be patched with the correct address.  

修正的地址是被匹配到的当前区段头的相对地址 , 这是指向需要修正的地址,这个值是个段内偏移,以这个段的开始的地方为偏移.

 

r_symndx 

Index into the symbol table pointing to the entry describing the symbol that is referenced at r_vaddr.  

r_vaddr. 地址对应的符号, 该数值是符号表条目一个索引.

r_type 

Type of addressing mode used;

使用寻址模式的类型

it describes whether the mode is absolute or relative,

它描述是绝对寻址还是相对寻址

and the size of the addressing mode.

和寻址模式的字节数

See the table below for relocation types used by the Wind River tools.  

通过风河公司工具查看下面的重定位类型的使用

r_offset 

The high 16 bits of any offset that is added to the symbol in a R_HVRT16, R_LVRT16, and R_HAVRT16 relocation modes.

当一个符号的r_type是 R_HVRT16, R_LVRT16, and R_HAVRT16 中的一个类型时,r_offset的高16位

Since the address field in the instruction is only 16 bits, it cannot represent a large offset. Example:

addis r13,r0,(var+0x123456)@ha. 

 

The address field in the addis instruction will contain 0x3456 and r_offset will contain 0x12.  

  

 

Table G-8   COFF relocation types 

Relocation type  

Number

 

Description  

R_RELWORD  

16

 

16 bit absolute address:

lwz    r3,var(r0)  

R_HVRT16  

131

 

Higher 16 bits of an absolute address:

addis  r3,r0,[email protected]  

R_LVRT16  

132

 

Lower 16 bits of an absolute address:

lwz    r3,[email protected](r0)  

R_HAVRT16  

136

 

Adjusted higher 16 bits of an absolute address. If the lower 16 bits is a negative number, one is added to the upper 16 bits:

addis  r3,r0,[email protected]  

R_PCR16S2  

137

 

16 bit PC relative address where the lower two bits are ignored:

bc     4,2,label  

R_PCR26S2  

138

 

26 bit PC relative address where the lower two bits are ignored:

bl     func  

R_REL16S2  

139

 

16 bit absolute address where the lower two bits are ignored:

bca    4,2,label  

R_REL26S2  

140

 

26 bit absolute address where the lower two bits are ignored:

bla    func  

 

Line number information

The line number information segment contains the mapping from source line numbers to machine instruction addresses used by symbolic debuggers. This information is only available if the -g option is specified to the compiler.

Line number entries for a section form groups of pairs where the first pair in a group is a pointer to the function containing the source. After that, every source line that has generated any instruction has an entry specifying the line number relative to the beginning of the function, and the corresponding instruction address. Normally only the .text section has line number information. The following table demonstrates the layout of the line number entries:

A line number entry has the following structure from the file linenum.h:

 1 struct lineno { 
 3     union { 
 5         long        l_symndx; 
 7         long        l_paddr; 
 9     } l_addr; 
11     unsigned long short  l_lnno; 
13 }; 
14 
15  
16 
17 #define LINENO      struct lineno 
18 
19 #define LINESZ      sizeof(LINENO) 
20 
21 #define LINESZ      6 
22 
23 Table G-9   COFF line number fields 

 


Field  

Description  

l_symndx  

Symbol table index for a new function; only valid if l_lnno is set to zero.  

l_paddr  

Instruction address corresponding to the source line l_lnno.  

l_lnno  

Source line relative to the start of the current function.  

 

Symbol table

The symbol table is an array of entries containing information about the symbols referenced in the COFF file.

符号表是一个包含COFF文件的符号引用的一个数组.

 A symbol table entry has the following structure from the file syms.h:

 1 struct syment { 
 3     union { 
 5         char        _n_name[8]; 
 7         struct { 
 9             long    _n_zeroes; 
11             long    _n_offset; 
13         } _n_n; 
15         char        *_n_nptr[2] 
17     } _n; 
19     long            n_value; 
21     short           n_scnum; 
25     unsigned short  n_type; 
27     char            n_sclass;
29     char            n_numaux;
31     short           n_pad; 
33 }; 
34 
35  
36 
37 #define SYMENT      struct syment 
38 
39 #define SYMESZ c    20 
40 
41 #define SYMESZ      18 
42 
43 #define n_name      _n._n_name 
44 
45 #define n_nptr      _n._n_nptr[1] 
46 
47 #define n_zeroes    _n._n_n._n_zeroes 
48 
49 #define n_offset    _n._n_n._n_offset 

 


Table G-10   COFF symbol table fields 

Field  

Description  

n_name 

Name of the symbol if the length is less than or equal to 8 bytes.

如果长度小于等于8则是符号的符号名,

If it is less than 8 bytes the name is terminated by a null character0.  

如果符号名称小于8字节,则该符号名称一NULL字符结尾

n_zeroes 

Zero if a symbol name is longer than 8 bytes.

如果符号名的长度大于8.则这个位段的值为0

This field overlaps the first 4 bytes of n_name.  

这个位段是和 n_name位段的首4个字节重叠的

n_offset 

An offset into the String Table if n_zeroes is zero.  

如果n_zeroes位段是0,则n_offset是字符串表的一个偏移(以字符串表开始地址为偏移)

n_value 

This pointer allows for overlays.  

 

n_value 

A value whose contents depends on the symbol type.

这个值取决于符号的类型

Normally it contains the address or the size of the symbol if the symbol is a common block.

这个位段一般保存符号所在区段的段内偏移地址,或者一个普通类型的块的占用的空间的大小

A zero value indicates an undefined symbol if n_scnum is also zero.

如果nvalue位段和n_scnum位段的值都是0,那么这个符号是一个未定义的符号

n_scnum 

Section number of the symbol starting with one.

符号所在区段的索引(区段头作为一个数组,该索引就是这个数组的索引)

A zero value indicates one of two things:

当它的值为0时,说明有以下两种情况:

If n_value is zero then the symbol is an undefined symbol that must be defined in another file.

当n_value的值也是0时,这个符号是一个未定义的符号,这个符号必须在其他的文件中被定义

If n_value is not zero then the symbol is a common block of size n_value.

当n_value的值不是0时,这n_value的值是普通数据块占用的空间的大小

All common blocks with the same name are combined by the linker and put in the.bss section,

具有相同名称的所有常见的块由链接器组合到可执行文件中

unless some other file defines that symbol in a section.  

n_type 

Type of the symbol; only set if compiled with -g.

表示符号的类型,  仅当编译器使用了-g参数时才有值

n_sclass 

Storage class of the symbol. There are over 20 storage classes, but most are used only

with the -g compiler option.

符号的村粗类型.有超过20中存储类型, 但大多数仅仅在编译器用-g参数生成中间的文件.

The two classes of interest to the linker are C_EXT, external storage, and C_STAT, static (local to the file) storage.  

链接器一般只对2种存储类型感兴趣,它们是:C_EXT,外部存储类型和 C_STAT静态存储类型

n_numaux 

Number of auxiliary entries used by the symbol.  

符号使用的辅助条目数

n_pad  

Pad the structure to a multiple of four bytes.

没有意义的字段,只是为了让结构体4字节对齐.

 

Any auxiliary entries to a symbol are stored immediately after the symbol in the table. They are mainly used for symbolic debugging (-g option) and are not discussed here.

 

Additional symbols

Wind River uses special COFF symbols as follows:

Table G-11   Special COFF Symbols

Extension  

Description  

!sn!section-name  

Long section-name.  

!cd!name  

COMDAT-section-name. See Mark sections as COMDAT for linker collapse (-Xcomdat), p.71.  

!sf!flags  

Section flags (a: allocate, w: write, x: execute, b: bss/nocode).  

!al!value  

Section alignment.  

!wk!symbol-name  

Weak symbol. See weak pragma, p.138.  

 

String table

The string table contains the null terminated names of symbols longer than eight characters. Those symbols point into the string table through an offset, n_offset. The first four bytes of the string table contain the size of the table and after that all strings are stored sequentially.

 

 

 

 

[email protected] 

Copyright © 2002, Wind River Systems, Inc. All rights reserved.

以上是关于COFF - 中间文件格式解析的主要内容,如果未能解决你的问题,请参考以下文章

COFF文件格式

PE文件和COFF文件格式分析——签名COFF文件头和可选文件头2

程序员自我修养阅读笔记——Windows PE/COFF

程序员自我修养阅读笔记——Windows PE/COFF

之间的用法差异。 a.out、.ELF、.EXE 和 .COFF

使用TASM编译COFF格式和连接