Linux驱动Linux内核

深入理解系统调用

2020-05-17  本文已影响0人  Minority

实验环境 based on 调试跟踪Linux内核的启动过程

实验要求:

  • 找一个系统调用,系统调用号为学号最后2位相同的系统调用
  • 通过汇编指令触发该系统调用
  • 通过gdb跟踪该系统调用的内核处理过程
  • 重点阅读分析系统调用入口的保存现场、恢复现场和系统调用返回,以及重点关注系统调用过程中内核堆栈状态的变化

一、选择系统调用

本人学号尾数为31,但是查找syscall_32.tbl表后发现31号系统调用为stty,进一步搜素在系统调用描述文件里面找到此系统调用和32号gtty都为sys_ni_syscall,进一步查资料发现上述两个系统调用已经被淘汰,所以它所对应的服务例程就要被指定为sys_ni_syscall

系统调用表
stty被指定为sys_ni_syscall

知识拓展:
即使31号和32号系统调用已经被淘汰了,但是我们并不能将它们的位置分配给其他的系统调用,因为一些老的代码可能还会使用到它们。否则,如果某个用户应用试图调用这些已经被淘汰的系统调用,所得到的结果,比如打开了一个文件,就会与预期完全不同,这将令人感到非常奇怪。其实,sys_ni_syscall中的"ni"即表示"not implemented(没有实现)


下面转而分析31号上面的系统调用,即30号utime

# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
30  i386    utime           sys_utime32         __ia32_sys_utime32

utime作用为修改文件的访问时间和修改时间。其对应的32位entry pointsys_utime32,搜索sys_utime32utimes.c文件中找到了其实现,它是通过调用do_utimes来实现的。do_utimes的代码实现如下:

/*
 * do_utimes - change times on filename or file descriptor
 * @dfd: open file descriptor, -1 or AT_FDCWD
 * @filename: path name or NULL
 * @times: new times or NULL
 * @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
 *
 * If filename is NULL and dfd refers to an open file, then operate on
 * the file.  Otherwise look up filename, possibly using dfd as a
 * starting point.
 *
 * If times==NULL, set access and modification to current time,
 * must be owner or have write permission.
 * Else, update from *times, must be owner or super user.
 */
long do_utimes(int dfd, const char __user *filename, struct timespec64 *times,
           int flags)
{
    int error = -EINVAL;

    if (times && (!nsec_valid(times[0].tv_nsec) ||
              !nsec_valid(times[1].tv_nsec))) {
        goto out;
    }

    if (flags & ~AT_SYMLINK_NOFOLLOW)
        goto out;

    if (filename == NULL && dfd != AT_FDCWD) {
        struct fd f;

        if (flags & AT_SYMLINK_NOFOLLOW)
            goto out;

        f = fdget(dfd);
        error = -EBADF;
        if (!f.file)
            goto out;

        error = utimes_common(&f.file->f_path, times);
        fdput(f);
    } else {
        struct path path;
        int lookup_flags = 0;

        if (!(flags & AT_SYMLINK_NOFOLLOW))
            lookup_flags |= LOOKUP_FOLLOW;
retry:
        error = user_path_at(dfd, filename, lookup_flags, &path);
        if (error)
            goto out;

        error = utimes_common(&path, times);
        path_put(&path);
        if (retry_estale(error, lookup_flags)) {
            lookup_flags |= LOOKUP_REVAL;
            goto retry;
        }
    }

out:
    return error;
}

二、 触发系统调用(直接触发+汇编触发)

使用下面的代码直接触发utime系统调用:

#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    char *pathname;
    struct stat sb;
    struct utimbuf utb;

    if (argc != 2 || strcmp(argv[1], "--help") == 0){
        printf("%s file\n", argv[0]);
        return 1;
    }

    pathname = argv[1];

    //获取当前文件时间
    if (stat(pathname, &sb) == -1)
        return 1;

    //把最近修改时间改成访问时间
    utb.actime = sb.st_atime;
    utb.modtime = sb.st_atime;        /* Make modify time same as access time */
    // 调用utime
    if (utime(pathname, &utb) == -1)  /* Update file times */
        return 1;

    return 0;
}
程序执行效果

对上述的程序进行修改,使用汇编来调用utime,其实就是使用汇编指令传递utime的参数,并使用系统调用通过软中断0x80陷入内核,跳转到系统调用处理程序system_call(sys_utime32)函数,并执行相应的服务例程,但由于是代表用户进程,所以这个执行过程并不属于中断上下文,而是处于进程上下文:

#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
    char *pathname;
    struct stat sb;
    struct utimbuf utb;

    if (argc != 2 || strcmp(argv[1], "--help") == 0){
        printf("%s file\n", argv[0]);
        return 1;
    }

    pathname = argv[1];

    //获取当前文件时间
    if (stat(pathname, &sb) == -1)
        return 1;

    //把最近修改时间改成访问时间
    utb.actime = sb.st_atime;
    utb.modtime = sb.st_atime;        /* Make modify time same as access time */
    int flag;

    asm volatile(
    "movl %1, %%ebx\n\t"  // 将pathname放入ebx
    "movl %2, %%ecx\n\t"  // 将utimbuf 的引用放入ecx
    "movl $30, %%eax\n\t" //通过EAX寄存器返回系统调用值
    "int $0x80\n\t"       // 通过软中断0x80陷入内核
    "movl %%eax, %0\n\t"  // 将输出通过eax赋值给flag
    :"=m"(flag)
    :"b"(pathname),"c"(&utb)
    );

    if (flag == -1)  /* Update file times */
        return 1;

    return 0;
}
使用汇编修改最近改动时间

三、 通过gdb跟踪该系统调用的内核处理过程

3.1、 gdb环境配置

此部分的实验环境搭建见https://www.jianshu.com/p/17b8261fe74e

首先执行qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz启动qemu(注意路径),然后把本地使用汇编触发utime系统调用的编译过可执行程序copy到rootfs/home/目录下,然后再在rootfs/home/目录下建一个b.test文件。然后使用以下命令重新打包根文件系统镜像(rootfs下执行),再重启qemu。

find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../rootfs.cpio.gz
// 重新运行qemu
qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz
启动qemu,注意与自己机器上的路径 copy assembly 重启后的qemu

关掉qemu,在终端使用qemu-system-x86_64 -kernel ./arch/x86/boot/bzImage -initrd ./busybox-1.31.1/rootfs.cpio.gz -S -s -nographic -append "console=ttyS0"以shell的形式运行qemu进行调试(退出使用killall qemu-system-x86_64)。再新开一个终端,执行以下命令加载vmlinux和连接gdb server,然后尝试着在start_kernel处打断点,可以看到qemu执行到Booting the kernel会停下来:

gdb
file vmlinux
target remote:1234
b start_kernel
c
....

可能出现的错误及解决方法:

  • ERROR:执行file vmlinux可能会报一下错误:
  • 解决方法:
vi ~/.gdbinit
================添加以下内容==============
add-auto-load-safe-path /home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py
set auto-load safe-path /
python sys.path.append("/home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py")

3.2、系统调用分析

使用gcc a.c -static -m32a.c编译成32位的可执行文件,然后再使用objdump -S a.out > a32.s反汇编查看utime的调用过程。

mian函数调用utime
utime call 0x80ea9f0
可以看到utime并没有使用syscall,而是调用0x80ea9f0,使用gdb 运行x 0x80ea9f0查看该地址的值如下
查看0x80ea9f0地址值

无奈,只好转而分析一下64位的utime,使用上述方法重新得到64的反汇编代码如下(部分):

000000000043f250 <utime>:
  43f250:       b8 84 00 00 00          mov    $0x84,%eax
  43f255:       0f 05                   syscall
  43f257:       48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
  43f25d:       0f 83 4d 52 00 00       jae    4444b0 <__syscall_error>
  43f263:       c3                      retq
  43f264:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  43f26b:       00 00 00 
  43f26e:       66 90                   xchg   %ax,%ax

从上面的代码可以看到,utime的系统调用号为0x84(132),查看系统调用表可以发现对应的系统调用函数为__x64_sys_utime

64位系统调用表

3.3、使用gdb调试跟踪

__x64_sys_utime打断点,然后在qemu运行64位的程序(注意要重新打包rootfs),可以看到成功跟踪到了utime.c文件的相关代码


可以看到调用的是do_futimesat在utime.c中可以发现下面这段注释:
futimesat()、utimes()和utime()是utimensat()的旧版本为与传统C库兼容而提供的。
在现代体系中,我们总是使用libc包装器utimensat ()

即utime是为了对c语言库进行兼容,现在使用utimensat,其为第320号系统调用,并且不管是utime还是utimensat,都是调用的do_utimes()函数。


utimensat在系统调用表中的信息
utimensat函数体
=======================do_utimes描述==========================

/*
 * do_utimes - change times on filename or file descriptor
 * @dfd: open file descriptor, -1 or AT_FDCWD
 * @filename: path name or NULL
 * @times: new times or NULL
 * @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
 *
 * If filename is NULL and dfd refers to an open file, then operate on
 * the file.  Otherwise look up filename, possibly using dfd as a
 * starting point.
 *
 * If times==NULL, set access and modification to current time,
 * must be owner or have write permission.
 * Else, update from *times, must be owner or super user.
 */

具体的跟踪过程如下两段代码所示(第一段先整体查看调用流程,并监视堆栈的变化,第二段进入部分函数内部,查看细节):

(gdb) b __x64_sys_utime
Note: breakpoints 1, 2, 3, 4, 5 and 6 also set at pc 0xffffffff81206f07.
Breakpoint 8 at 0xffffffff81206f07: file fs/utimes.c, line 204.
(gdb) c
Continuing.

(gdb) bt
#0  __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
#1  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:290
#2  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3  0x0000000000000000 in ?? ()
(gdb) n

Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90  {
(gdb) n
93      if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94                !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93      if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98      if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101     if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119             lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121         error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122         if (error)
(gdb) n
125         error = utimes_common(&path, times);
(gdb) n
126         path_put(&path);
(gdb) n
127         if (retry_estale(error, lookup_flags)) {
(gdb) n
135 }
(gdb) bt
#0  do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:119
#1  0xffffffff81206f64 in __do_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:215
#2  __se_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:204
#3  __x64_sys_utime (regs=<optimized out>) at fs/utimes.c:204
#4  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x4a1024) at arch/x86/entry/common.c:290
#5  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#6  0x0000000000000000 in ?? ()
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) bt
#0  __x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
#1  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x0 <fixed_percpu_data>) at arch/x86/entry/common.c:290
#2  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3  0x0000000000000000 in ?? ()
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300     syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) bt
#0  do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:301
#1  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#2  0x0000000000000000 in ?? ()
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184     movq    RCX(%rsp), %rcx
(gdb) bt
#0  entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
#1  0x0000000000000000 in ?? ()
(gdb) n
185     movq    RIP(%rsp), %r11
(gdb) n
187     cmpq    %rcx, %r11  /* SYSRET requires RCX == RIP */
(gdb) n
188     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205     shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206     sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210     cmpq    %rcx, %r11
(gdb) n
211     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213     cmpq    $__USER_CS, CS(%rsp)        /* CS must match SYSRET */
(gdb) n
214     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216     movq    R11(%rsp), %r11
(gdb) n
217     cmpq    %r11, EFLAGS(%rsp)      /* R11 == RFLAGS */
(gdb) n
218     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238     testq   $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239     jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243     cmpq    $__USER_DS, SS(%rsp)        /* SS must match SYSRET */
(gdb) n
244     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253     POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) bt
#0  syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
#1  0x0000000000000000 in ?? ()
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259     movq    %rsp, %rdi
(gdb) n
260     movq    PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262     pushq   RSP-RDI(%rdi)   /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263     pushq   (%rdi)      /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271     SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273     popq    %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274     popq    %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275     USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb) 


Breakpoint 1, __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n

Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90  {
(gdb) n
93      if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94                !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93      if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98      if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101     if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119             lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121         error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122         if (error)
(gdb) n
125         error = utimes_common(&path, times);
(gdb) n
126         path_put(&path);
(gdb) n
127         if (retry_estale(error, lookup_flags)) {
(gdb) s
retry_estale (flags=<optimized out>, error=<optimized out>) at ./include/linux/namei.h:91
91      return error == -ESTALE && !(flags & LOOKUP_REVAL);
(gdb) n
do_utimes (dfd=118112576, filename=0x64 <error: Cannot access memory at address 0x64>, times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:135
135 }
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300     syscall_return_slowpath(regs);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:300
300     syscall_return_slowpath(regs);
(gdb) s
get_current () at ./arch/x86/include/asm/current.h:15
15      return this_cpu_read_stable(current_task);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:256
256     u32 cached_flags = READ_ONCE(ti->flags);
(gdb) n
270     if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
(gdb) n
273     local_irq_disable();
(gdb) s
arch_local_irq_disable () at arch/x86/entry/common.c:273
273     local_irq_disable();
(gdb) s
native_irq_disable () at ./arch/x86/include/asm/irqflags.h:49
49      asm volatile("cli": : :"memory");
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:274
274     prepare_exit_to_usermode(regs);
(gdb) n
do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:300
300     syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184     movq    RCX(%rsp), %rcx
(gdb) n
185     movq    RIP(%rsp), %r11
(gdb) n
187     cmpq    %rcx, %r11  /* SYSRET requires RCX == RIP */
(gdb) n
188     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205     shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206     sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210     cmpq    %rcx, %r11
(gdb) n
211     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213     cmpq    $__USER_CS, CS(%rsp)        /* CS must match SYSRET */
(gdb) n
214     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216     movq    R11(%rsp), %r11
(gdb) n
217     cmpq    %r11, EFLAGS(%rsp)      /* R11 == RFLAGS */
(gdb) n
218     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238     testq   $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239     jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243     cmpq    $__USER_DS, SS(%rsp)        /* SS must match SYSRET */
(gdb) n
244     jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253     POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259     movq    %rsp, %rdi
(gdb) n
260     movq    PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262     pushq   RSP-RDI(%rdi)   /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263     pushq   (%rdi)      /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271     SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273     popq    %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274     popq    %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275     USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb) 

四、 分析总结

utime的系统调用触发大致过程如下(错误之处望指正):

  1. utime函数触发系统调用__x64_sys_utime,其主要通过调用do_utimes来完成相应的功能。
  2. do_utimes通过文件描述符引用一个打开的文件,然后操作文件。If times==NULL,就将访问和修改设置为当前时间。然后调用do_syscall_64从寄存器%rax里面取出系统调用号,然后根据系统调用号,在系统调用表sys_call_table中找到相应的函数进行调用并将寄存器中保存的参数取出来,作为函数参数,然后陷入内核。
  3. 最后系统调用结束前,一般会调用prepare_exit_to_usermode进行准备工作,然后使用jne条件转移指令等进行一系列的restore,恢复到用户态。e.g:jne swapgs_restore_regs_and_return_to_usermode

参考文章:

上一篇 下一篇

猜你喜欢

热点阅读