内核崩溃引起的 Linux 系统调用（奇怪的偏移）

Posted 2023-02-16

技术标签:

【中文标题】内核崩溃引起的 Linux 系统调用（奇怪的偏移）【英文标题】：Linux system call from kernel crashing (weird offset) 【发布时间】：2012-03-18 13:57:03 【问题描述】：

我正在尝试从内核模块调用系统调用，我有以下代码：

    set_fs( get_ds() );    // lets our module do the system-calls 


    // Save everything before systemcalling

    asm ("     push    %rax     "); 
    asm  ("     push    %rdi     "); 
    asm  ("     push    %rcx     "); 
    asm  ("     push    %rsi     "); 
    asm  ("     push    %rdx     "); 
    asm  ("     push    %r10     "); 
    asm  ("     push    %r8      "); 
    asm  ("     push    %r9      "); 
    asm  ("     push    %r11     "); 
    asm  ("     push    %r12     "); 
    asm  ("     push    %r15     "); 
    asm  ("     push    %rbp     "); 
    asm  ("     push    %rbx     "); 


    // Invoke the long sys_mknod(const char __user *filename, int mode, unsigned dev);

    asm volatile ("     movq    $133, %rax     "); // system call number

    asm volatile ("    lea    path(%rip), %rdi     "); // path is char path[] = ".."

    asm volatile ("     movq    mode, %rsi     "); // mode is S_IFCHR | ...

    asm volatile ("     movq    dev, %rdx     ");  // dev is 70 >> 8

    asm volatile ("     syscall     "); 


      // POP EVERYTHING 

    asm ("     pop     %rbx     "); 
    asm ("     pop        %rbp     "); 
    asm ("     pop     %r15     "); 
    asm ("     pop        %r12     "); 
    asm ("     pop        %r11     "); 
    asm ("     pop        %r9      "); 
    asm ("     pop        %r8      "); 
    asm ("     pop        %r10     "); 
    asm ("     pop        %rdx     "); 
    asm ("     pop        %rsi     "); 
    asm ("     pop        %rcx     "); 
    asm ("     pop        %rdi     "); 
    asm ("     pop     %rax     "); 



    set_fs( savedFS );    // restore the former address-limit value

这段代码不工作，导致系统崩溃（它是一个内核模块）。

带有重定位信息的那段代码的转储是：

  2c:    50                      push  %rax 
  2d:    57                      push  %rdi 
  2e:    51                      push  %rcx 
  2f:    56                      push  %rsi 
  30:    52                      push  %rdx 
  31:    41 52                    push  %r10 
  33:    41 50                    push  %r8 
  35:    41 51                    push  %r9 
  37:    41 53                    push  %r11 
  39:    41 54                    push  %r12 
  3b:    41 57                    push  %r15 
  3d:    55                      push  %rbp 
  3e:    53                      push  %rbx 
  3f:    48 c7 c0 85 00 00 00     mov    $0x85,%rax 
  46:    48 8d 3d 00 00 00 00     lea    0x0(%rip),%rdi        # 4d <init_module+0x4d> 
            49: R_X86_64_PC32    path-0x4 
  4d:    48 83 c7 04              add    $0x4,%rdi 
  51:    48 8b 34 25 00 00 00     mov    0x0,%rsi 
  58:    00 
            55: R_X86_64_32S    mode 
  59:    48 8b 14 25 00 00 00     mov    0x0,%rdx 
  60:    00 
            5d: R_X86_64_32S    dev 
  61:    0f 05                    syscall 
  63:    5b                      pop    %rbx 
  64:    5d                      pop    %rbp 
  65:    41 5f                    pop    %r15 
  67:    41 5c                    pop    %r12 
  69:    41 5b                    pop    %r11 
  6b:    41 59                    pop    %r9 
  6d:    41 58                    pop    %r8 
  6f:    41 5a                    pop    %r10 
  71:    5a                      pop    %rdx 
  72:    5e                      pop    %rsi 
  73:    59                      pop    %rcx 
  74:    5f                      pop    %rdi 
  75:    58                      pop    %rax

我想知道..为什么有 -0x4 偏移量 49: R_X86_64_PC32 path-0x4 ?

我的意思是：mode 和 dev 应该自动解决，没有问题，但是路径呢？为什么是 -0x4 偏移量？

我试图用

来“补偿”

lea 0x0(%rip),%rdi // 这以某种方式添加了 -0x4 偏移量添加 $0x4, %rdi ....

但代码仍然崩溃。

我哪里错了？

【问题讨论】：

您不能从内核内部调用系统调用。内核正在为应用程序提供系统调用。你真正想做什么？你不能避免在内核领域工作吗？这就是为什么我把 set_fs(get_ds());这应该会增加可以调用系统调用的段限制。这是一个考试练习 (cs.usfca.edu/~cruse/cs635)，所以我需要弄清楚如何使用 amd64 进行练习来自syscall 调用约定评论：rcx return address for syscall/sysret, C arg3。 【参考方案1】：

我对这里发生的事情的猜测是堆栈问题。与int $0x80 不同，syscall 指令不会为内核设置堆栈。如果您查看来自system_call: 的实际代码，您会看到类似SWAPGS_UNSAFE_STACK 的内容。这个宏的核心是 SwapGS 指令——参见第 152 页here。当进入内核模式时，内核需要一种方法来拉取指向其数据结构的指针，而这条指令正是让它做到这一点。它通过将用户%gs 寄存器与保存在特定于模型的寄存器中的值交换来实现这一点，然后它可以从中拉取内核模式堆栈。

您可以想象，一旦调用了syscall 入口点，此交换将产生错误的值，因为您已经处于内核模式，并且内核开始尝试使用虚假堆栈。您可以尝试手动调用 SwapGS，使内核的 SwapGS 得到预期的结果，看看是否有效。

【讨论】：

分段错误，该死。无论如何，我认为你是对的，发生了很多事情，我应该阅读所有系统调用例程以找出问题所在。确实有很多时间。对不起，我给你上面的链接错误。现在应该修复。我还将查看 AMD 规范中第 345 页上 syscall 指令 here 的伪代码。至少，这将帮助您了解 linux 在 system_call 例程中所做的事情。顺便说一句，当你说它崩溃时到底发生了什么？导致机器死机？打印内核哎呀？等等。如果它没有挂起机器，你可以尝试运行 dmesg 并寻找奇怪的东西【参考方案2】：

您似乎无法以这种方式做到这一点。见system_call之前的comment：

 /*
  * Register setup:
  * rax  system call number
  * rdi  arg0
  * rcx  return address for syscall/sysret, C arg3
  * rsi  arg1
  * rdx  arg2
  * r10  arg3    (--> moved to rcx for C)
  * r8   arg4
  * r9   arg5
  * r11  eflags for syscall/sysret, temporary for C
  * r12-r15,rbp,rbx saved by C code, not touched.
  *
  * Interrupts are off on entry.
  * Only called from user space.
  *
  * XXX  if we had a free scratch register we could save the RSP into the stack frame
  *      and report it properly in ps. Unfortunately we haven't.
  *
  * When user can change the frames always force IRET. That is because
  * it deals with uncanonical addresses better. SYSRET has trouble
  * with them due to bugs in both AMD and Intel CPUs.
  */

因此，您不能从内核调用syscall。但是你可以尝试使用int $0x80 来达到这个目的。如我所见kernel_execve 存根使用trick

【讨论】：

我可以从 64 位程序调用 int 0x80 吗？它会起作用吗？而且我需要把 32 位系统调用索引放在 rax 中，不是吗？原来我不能在 64 位内核模块中使用 int 0x80。我想知道是否有任何一种 VA-whence-protection 机制.. 为什么需要syscall？可以试试直接拨打sys_mknod吗？.. 不，sys_mknod 没有“记录在案”，只要您不处于用户模式，就不应使用它您认为从内核调用系统调用是一种好的且有文档记录的方式吗？

以上是关于内核崩溃引起的 Linux 系统调用（奇怪的偏移）的主要内容，如果未能解决你的问题，请参考以下文章