内核分析-第7周

Posted llguanli

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了内核分析-第7周相关的知识,希望对你有一定的参考价值。

刘文学 + 原创作品转载请注明出处 http://blog.csdn.net/wdxz6547/article/details/51112486 + 《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000

本文我们想解决的问题:

核心问题

  1. 一个程序文件(.c, .cpp, .java .go) 文件是怎样变成二进制文件的.
  2. 二进制文件是怎样被载入并运行的.

辅助问题

  1. 一个二进制文件的格式是怎么样的? 不同的语言的二进制文件格式会不同么? 主要探讨 ELF 格式文件
  2. 静态链接和动态链接的差别
  3. 可运行文件与进程的地址空间的映射关系

一个程序文件(.c, .cpp, .java .go) 文件是怎样变成二进制文件的

C 文件 –> 预处理 –> 汇编成汇编代码(.asm) –> 汇编成目标码(.o) –> 链接成可运行文件

  1. 预处理: 把 include 的文件包括进来及宏定义替换

gcc -E -o hello.cpp hello.c

  1. 编译

gcc -x cpp-output -S -o hello.s hello.cpp

  1. 汇编: 生成二进制文件(之前都是可读的文本文件, 此步骤生成二进制文件,
    包括一些机器指令, 但不是可运行文件)

gcc -x assembler -c hello.s -o hello.o

  1. 链接(ELF 格式文件)

gcc -o hello hello.o //默认动态
gcc -o hello.static hello.o -static //静态

$ readelf -h hello.o

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2‘s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          320 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         13
  Section header string table index: 10

$ readelf -h hello

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2‘s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400440
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4504 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 27

$ readelf -h hello.static

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2‘s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - GNU
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400f4e
  Start of program headers:          64 (bytes into file)
  Start of section headers:          789968 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         6
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 28

$ ldd hello
linux-vdso.so.1 => (0x00007fff06ffe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc6c2d40000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc6c3125000)

可运行文件格式

具体參考这里

A.out --> COFF --> PE (Windows)
               --> ELF (Linux)

ABI 与目标文件格式关系: 目标文件一般也叫ABI 文件, 实际目标文件已经是二进制兼容的格式(即该二进制文件已经适应到某一种 CPU 体系结构的二进制指令).

ELF

Object 參与程序的链接(创建一个程序)和运行(运行一个程序)

Linking View Execution View
============ ==============
ELF header ELF header
Program header table (optional) Program header table
Section 1 Segment 1
… Segment 2
Section n …
Section header table Section header table (optional)

ELF 头在文件的开头, 保存了线路图(road map), 描写叙述了文件的组织情况

程序头表告诉系统怎样创建一个进程的内存映像,

section 头表: 包括描写叙述文件 sections 部分, 每一个 section 在这个表中都有一个入口;
每一个入口给出了该 section 的名字, 大小等信息

可运行文件与进程地址空间的映射关系

当创建或添加一个进程映像的时候, 系统理论上将拷贝一个文件的段到一个虚拟的内存段

           File Offset   File                  Virtual Address
           ===========   ====                  ===============
                     0   ELF header
  Program header table
                         Other information
                 0x100   Text segment          0x8048100
                         ...
                         0x2be00 bytes         0x8073eff  //8048100 + 2be00
               0x2bf00   Data segment          0x8074f00
                         ...
                         0x4e00 bytes          0x8079cff
               0x30d00   Other information
                         ...

静态链接的 ELF 可运行文件与进程的地址空间的关系

一般静态链接会将全部的代码放在一个代码段

动态链接的进程会有多个代码段

二进制文件是怎样被载入并运行的

由前面章节的知识推測, 运行一个二进制文件的基本思路:

开启一个新的进程, 该进程主要工作就是载入并运行可运行文件, 主要包括载入与运行两部分; 当代码运行到载入可运行文件的时候, 调用 execve 系统调用. 该调用应该将可运行文件的内容载入到内存而且重置堆栈, sp, ip, 等关键寄存器, 之后运行可运行文件里指定的代码,这里必定涉及到寄存器相关的操作.

这里将以 bash 为例解释一个程序的运行的过程(其它相似).

  1. Shell 将命令行參数和环境參数传递给Bash 的 main 函数, main 函数将命令行解析后传递给系统调用 execve

首先, 我们在 bash 中输入一个命令

$./hello

因为 bash 也是 C 程序, 因此它也一定有 main 函数. 关于 shell 怎样到达 execve 的过程略.
假设你想看你运行的程序在 execve 是怎么运行的,

int execve(const char * filename,char * const argv[ ],char * const envp[ ]);

$ strace ./hello

execve("./hello", ["./hello"], [/* 78 vars */]) = 0
brk(0)                                  = 0xacd000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cc000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=122541, ...}) = 0
mmap(NULL, 122541, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f08182ae000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0817ce7000
mprotect(0x7f0817ea2000, 2093056, PROT_NONE) = 0
mmap(0x7f08180a1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f08180a1000
mmap(0x7f08180a7000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f08180a7000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ad000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ab000
arch_prctl(ARCH_SET_FS, 0x7f08182ab740) = 0
mprotect(0x7f08180a1000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f08182ce000, 4096, PROT_READ) = 0
munmap(0x7f08182ae000, 122541)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 10), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cb000
write(1, "hello kernel", 12hello kernel)            = 12
exit_group(0)                           = ?
+++ exited with 0 +++

main 实际调用 execve 系统调用完毕命令运行

http://code.woboq.org/linux/linux/fs/exec.c.html#1628

SYSCALL_DEFINE3(execve,
        const char __user *, filename,
        const char __user *const __user *, argv,
        const char __user *const __user *, envp)
{
    return do_execve(getname(filename), argv, envp);
}

http://code.woboq.org/linux/linux/fs/exec.c.html#do_execve

int do_execve(struct filename *filename,
    const char __user *const __user *__argv,
    const char __user *const __user *__envp)
{
    struct user_arg_ptr argv = { .ptr.native = __argv }; //复制环境变量和參数信息
    struct user_arg_ptr envp = { .ptr.native = __envp };
    return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
}

/*
 * sys_execve() executes a new program.
 */
static int do_execveat_common(int fd, struct filename *filename,
                  struct user_arg_ptr argv,
                  struct user_arg_ptr envp,
                  int flags)
{
    file = do_open_execat(fd, filename, flags);
        retval = PTR_ERR(file);
        if (IS_ERR(file))
            goto out_unmark;
    sched_exec();

    ...
    retval = copy_strings(bprm->envc, envp, bprm);
    if (retval < 0)
        goto out;
    retval = copy_strings(bprm->argc, argv, bprm);
    if (retval < 0)
goto out;
    retval = exec_binprm(bprm);
    if (retval < 0)
        goto out;
    ...
}


static int exec_binprm(struct linux_binprm *bprm)
{
    pid_t old_pid, old_vpid;
        int ret;
        /* Need to fetch pid before load_binary changes it */
        old_pid = current->pid;
        rcu_read_lock();
        old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
    rcu_read_unlock();

    ret = search_binary_handler(bprm);
        if (ret >= 0) {
            audit_bprm(bprm);
            trace_sched_process_exec(current, old_pid, bprm);
            ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
            proc_exec_connector(current);
        }

    return ret;
}

int search_binary_handler(struct linux_binprm *bprm) {
    ...
    list_for_each_entry(fmt, &formats, lh) {
            if (!try_module_get(fmt->module))
                continue;
            read_unlock(&binfmt_lock);
            bprm->recursion_depth++;
            retval = fmt->load_binary(bprm);
            read_lock(&binfmt_lock);
            put_binfmt(fmt);
            bprm->recursion_depth--;
            if (retval < 0 && !bprm->mm) {
                /* we got to flush_old_exec() and failed after it */
                read_unlock(&binfmt_lock);
                force_sigsegv(SIGSEGV, current);
                return retval;
            }
            if (retval != -ENOEXEC || !bprm->file) {
                read_unlock(&binfmt_lock);
                return retval;
            }
    }
    ...
}


http://code.woboq.org/linux/linux/include/linux/binfmts.h.html#linux_binfmt
/*
 * This structure defines the functions that are used to load the binary formats that
 * linux accepts.
 */
struct linux_binfmt {
    struct list_head lh;
    struct module *module;
    int (*load_binary)(struct linux_binprm *);
    int (*load_shlib)(struct file *);
    int (*core_dump)(struct coredump_params *cprm);
    unsigned long min_coredump; /* minimal dump size */
};

http://code.woboq.org/linux/linux/fs/binfmt_elf.c.html#1089

static struct linux_binfmt elf_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_elf_binary,
    .load_shlib = load_elf_library,
    .core_dump  = elf_core_dump,
    .min_coredump   = ELF_EXEC_PAGESIZE,
};

static int load_elf_binary(struct linux_binprm *bprm)
{
    ...
    start_thread(regs, elf_entry, bprm->p);
    retval = 0;
    ...
}


start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
{
    start_thread_common(regs, new_ip, new_sp,
                __USER_CS, __USER_DS, 0);
}


http://code.woboq.org/linux/linux/arch/x86/kernel/process_64.c.html#start_thread_common

static void
start_thread_common(struct pt_regs *regs, unsigned long new_ip,
            unsigned long new_sp,
            unsigned int _cs, unsigned int _ss, unsigned int _ds)
{
    loadsegment(fs, 0);
    loadsegment(es, _ds);
    loadsegment(ds, _ds);
    load_gs_index(0);
    regs->ip        = new_ip;
    regs->sp        = new_sp;
    regs->cs        = _cs;
    regs->ss        = _ss;
    regs->flags     = X86_EFLAGS_IF;
    force_iret();
}

眼下 Linux 支持的二进制格式

binfmt_script - support for interpreted scripts that are starts from the #! line;

static struct linux_binfmt script_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_script,
};

binfmt_misc - support different binary formats, according to runtime configuration of the Linux kernel;
binfmt_misc detects binaries via a magic or filename extension and invokes a specified wrapper. This
should obsolete binfmt_java, binfmt_em86 and binfmt_mz.

static struct linux_binfmt misc_format = {
    .module = THIS_MODULE,
    .load_binary = load_misc_binary,
};

binfmt_elf - support elf format;

binfmt_aout - support a.out format;

static struct linux_binfmt script_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_script,
};

binfmt_flat - support for flat format;
binfmt_elf_fdpic - Support for elf FDPIC binaries;

som_format - support som format used by HP-UX.;

static struct linux_binfmt som_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_som_binary,
    .load_shlib = load_som_library,
    .core_dump  = som_core_dump,
    .min_coredump   = SOM_PAGESIZE
};

flat_format : support flat_format

static struct linux_binfmt flat_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_flat_binary,
    .core_dump  = flat_core_dump,
    .min_coredump   = PAGE_SIZE
};

binfmt_em86 - support for Intel elf binaries running on Alpha machines.

static struct linux_binfmt em86_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_em86,
};

elf_fdpic_format :

static struct linux_binfmt elf_fdpic_format = {
    .module     = THIS_MODULE,
    .load_binary    = load_elf_fdpic_binary,
#ifdef CONFIG_ELF_CORE
    .core_dump  = elf_fdpic_core_dump,
#endif
    .min_coredump   = ELF_EXEC_PAGESIZE,
};

各种格式通过 register_binfmt 注冊

execve -> do_execve -> do_execveat_common -> exec_binprm –> search_binary_handler
–> load_elf_binary -> start_thread –> start_thread_common

当中 start_thread_common 通过改动内核 EIP 作为程序新的起点.

可运行文件与进程的地址空间的映射关系

相应 ELF 文件能够參考 load_elf_library 函数

静态链接和动态链接(动态库)的关系

链接是将各种代码和数据部分收集起来并组合成为一个单一文件的过程。这个文件能够被载入
(或者拷贝)到存储器并运行。

如今的链接是由叫做链接器的程序自己主动运行。

链接能够分为三种情形:1、编译时链接,也就是我们常说的静态链接;2、装载时链接;3、运行时链接。装载时链接和运行时链接合称为动态链接。

静态链接

以一组可重定位目标文件和命令行參数作为输入,生成一个全然链接的能够载入和运行的可运行目标文件作为输出。

链接器有两个任务:

a) 符号解析:目标文件定义和引用符号。
b) 重定位:编译器和汇编器生成从地址0開始的代码和数据节。链接后可运行文件里的各个段的虚拟地址都已经确定。链接器就改动全部对这些符号的引用,从而重定位这些节。

目标文件

a) 可重定位目标文件:包括二进制代码和数据。(形式name.o)
b) 可运行目标文件:包括二进制代码和数据。能够复制到存储器并运行。(形式name.out)
c) 共享目标文件:一种特殊类型的可重定位目标文件,能够在载入或者运行时被动态地载入到存储器并链接。

是由内核负责载入可运行程序依赖的动态链接库么?

不是, 由 ld 程序

动态链接

动态链接分为可运行程序装载时动态链接和运行时动态链接,例如以下代码演示了这两种动态链接。

$ ls

dllibexample.c  dllibexample.h  main.c  shlibexample.c  shlibexample.h

$ gcc -fPIC -shared shlibexample.c -o libshlibexample.so

$ gcc -fPIC -shared dllibexample.c -o libdllibexample.so

$ gcc main.c -o main -L . -lshlibexample -ldl

exportLDLIBRARYPATH=PWD

$ ./main

This is a Main program!
Calling SharedLibApi() function of libshlibexample.so!
This is a shared libary!
Calling DynamicalLoadingLibApi() function of libdllibexample.so!
This is a Dynamical Loading libary!

调试

静态连接程序跟踪

qemu-system-x86_64 -kernel ../linux-3.18.6/arch/x86/boot/bzImage -initrd ../rootfs.img -S -s

(gdb) file ../linux-3.18.6/vmlinux
Reading symbols from ../linux-3.18.6/vmlinux…done.
(gdb) remote target:1234
Undefined remote command: “target:1234”. Try “help remote”.
(gdb) target remote:1234
Remote debugging using :1234
0x0000000000000000 in irq_stack_union ()
(gdb) b sys_execve
Breakpoint 1 at 0xffffffff811626f0: file fs/exec.c, line 1604.
(gdb) b load_elf_binary
Breakpoint 2 at 0xffffffff811aa260: load_elf_binary. (2 locations)
(gdb) b start_thread
Breakpoint 3 at 0xffffffff810013b0: file arch/x86/kernel/process_64.c, line 249.

动态连接程序跟踪

TODO

总结:

  1. 可运行程序的装载是一个系统调用。

    可运行程序运行时。由execve系统调用后便陷入到内核态里。而后载入可运行文件,把当前进程的可运行程序覆盖掉。当execve系统调用返回的时,返回的则是新的可运行程序。

  2. 新的程序仍然有同样的PID。而且继承了调用execve函数时已打开的全部的文件描写叙述符。

以上是关于内核分析-第7周的主要内容,如果未能解决你的问题,请参考以下文章

Linux内核分析第六周作业

Linux内核分析第七周作业

LINUX内核分析第一周学习总结——计算机是如何工作的

Linux内核分析第五周作业

20179223《Linux内核原理与分析》第十一周学习笔记

20169217 《Linux内核原理与分析》 课程总结