C++ `ptrace`：进程跟踪与调试器的底层实现原理

好的，各位观众，欢迎来到今天的“C++ ptrace: 进程跟踪与调试器的底层实现原理”讲座。我是你们的老朋友，今天咱们不讲高深的理论，就聊聊这个听起来有点神秘，但其实很有意思的 ptrace。

开场白：ptrace是啥？为啥要学它？

想象一下，你想偷偷观察你的程序在干什么，每一步都想知道，甚至想修改它的行为，怎么办？这时候 ptrace 就闪亮登场了！

ptrace 是一个强大的 Linux 系统调用，它允许一个进程（tracer）控制另一个进程（tracee）。简单来说，tracer 可以暂停 tracee 的执行，检查它的内存、寄存器，甚至修改它。这听起来是不是有点像电影里的黑客？

为什么要学 ptrace？

调试器底层原理： 几乎所有的调试器 (gdb, lldb) 都是基于 ptrace 实现的。理解 ptrace 就像掌握了屠龙术，以后再也不怕调试难题了。
安全研究： 恶意代码分析、漏洞挖掘，都离不开 ptrace。它可以让你深入了解程序的运行细节，发现潜在的安全隐患。
程序分析与优化： 可以用 ptrace 收集程序运行时的性能数据，帮助你优化代码，提高效率。
装逼利器： 懂 ptrace，在同事面前装逼的时候，腰杆都挺得更直了！

ptrace 的基本用法：Hello, World! 的跟踪之旅

咱们先来个最简单的例子，用 ptrace 跟踪一个 "Hello, World!" 程序。

首先，我们需要两个程序：tracer (跟踪者) 和 tracee (被跟踪者)。

Tracee (hello.c):

#include <stdio.h>
#include <unistd.h>

int main() {
  printf("Hello, World!n");
  return 0;
}

Tracer (tracer.c):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>

int main(int argc, char *argv[]) {
  pid_t pid;
  int status;
  long ptrace_ret;

  if (argc < 2) {
    fprintf(stderr, "Usage: %s <program>n", argv[0]);
    exit(1);
  }

  pid = fork();

  if (pid == 0) {
    // Child process (tracee)
    ptrace(PTRACE_TRACEME, 0, NULL, NULL); // Important
    execv(argv[1], &argv[1]); // Execute the target program
    perror("execv"); // Should not reach here
    exit(1);
  } else if (pid > 0) {
    // Parent process (tracer)
    wait(&status); // Wait for the tracee to stop (usually at exec)

    printf("Tracee stopped, status: %dn", status);

    // Tracee is now stopped, we can examine its state
    ptrace_ret = ptrace(PTRACE_CONT, pid, NULL, NULL);
    if(ptrace_ret == -1){
      perror("ptrace CONT");
      return 1;
    }

    wait(NULL); // Wait for tracee to exit. The return value of hello.c is available here.
    printf("Tracee exitedn");
  } else {
    perror("fork");
    exit(1);
  }

  return 0;
}

编译这两个程序：

gcc hello.c -o hello
gcc tracer.c -o tracer

运行 tracer：

./tracer ./hello

代码解释：

fork(): tracer 创建一个子进程。
ptrace(PTRACE_TRACEME, 0, NULL, NULL): 在子进程 (tracee) 中，调用 ptrace，并传递 PTRACE_TRACEME 命令。这告诉内核，这个进程要被跟踪。这是关键一步！ 只有 tracee 主动要求被跟踪，tracer 才能控制它。
execv(argv[1], &argv[1]): 子进程执行目标程序 (hello)。execv 会替换当前进程的映像，执行新的程序。
wait(&status): 在父进程 (tracer) 中，wait 函数会等待子进程停止。当 tracee 调用 execv 时，内核会发送一个 SIGTRAP 信号给它，导致它停止。所以 tracer 会在这里被唤醒。
ptrace(PTRACE_CONT, pid, NULL, NULL): tracer 调用 ptrace，传递 PTRACE_CONT 命令，告诉内核继续执行 tracee。
wait(NULL): tracer 等待 tracee 结束。

输出：

Tracee stopped, status: 5
Hello, World!
Tracee exited

ptrace 的核心命令：就像遥控器上的按钮

ptrace 的第一个参数是一个命令，告诉内核你要做什么。常见的命令如下表所示：

命令	描述
`PTRACE_TRACEME`	声明进程将被其父进程跟踪。这个命令必须在 tracee 中调用，且必须在任何 `execve` 调用之前。
`PTRACE_PEEKTEXT`	从 tracee 的内存空间读取数据。常用于读取 tracee 的指令。
`PTRACE_PEEKDATA`	从 tracee 的内存空间读取数据。常用于读取 tracee 的数据。
`PTRACE_PEEKUSER`	从 tracee 的 `user` 结构体中读取数据。 `user` 结构体包含了进程的各种信息，例如寄存器值。
`PTRACE_POKETEXT`	向 tracee 的内存空间写入数据。小心使用！
`PTRACE_POKEDATA`	向 tracee 的内存空间写入数据。小心使用！
`PTRACE_POKEUSER`	向 tracee 的 `user` 结构体中写入数据。可以修改 tracee 的寄存器值。非常强大！
`PTRACE_CONT`	继续执行 tracee。
`PTRACE_SINGLESTEP`	单步执行 tracee。每执行一条指令，tracee 就会停止。
`PTRACE_KILL`	向 tracee 发送 `SIGKILL` 信号，终止它的执行。
`PTRACE_ATTACH`	让 tracer 跟踪一个已经存在的进程。需要 root 权限。
`PTRACE_DETACH`	停止跟踪 tracee，让它自由运行。
`PTRACE_GETREGS`	获取 tracee 的所有寄存器值。将寄存器值写入到 `struct user_regs_struct` 结构体中。
`PTRACE_SETREGS`	设置 tracee 的所有寄存器值。需要提供一个 `struct user_regs_struct` 结构体，其中包含了要设置的寄存器值。
`PTRACE_GETREGSET`	以更通用的方式获取寄存器值，支持不同的寄存器集（例如，浮点寄存器）。
`PTRACE_SETREGSET`	以更通用的方式设置寄存器值。
`PTRACE_GETSIGINFO`	获取导致 tracee 停止的信号的信息（例如，信号编号、发送者 PID）。
`PTRACE_SETSIGINFO`	设置要传递给 tracee 的信号的信息。
`PTRACE_LISTEN`	该命令用于支持`ptrace`事件过滤，允许跟踪器仅在发生特定事件时接收通知，从而减少不必要的上下文切换和开销。
`PTRACE_SEIZE`	这是一个更现代的`ptrace`命令，它允许tracer更可靠地附加到tracee，尤其是在多线程环境中。与`ATTACH`相比，`SEIZE`不会向tracee发送信号，并且允许tracer控制tracee的信号传递。
`PTRACE_INTERRUPT`	向 tracee 发送一个中断信号，使其停止。这通常用于在调试器中设置断点。

读取和修改 tracee 的内存：窥探和操控的艺术

PTRACE_PEEKTEXT、PTRACE_PEEKDATA、PTRACE_POKETEXT、PTRACE_POKEDATA 这四个命令，是 ptrace 中最常用的命令之一。它们允许 tracer 读取和修改 tracee 的内存。

例子：读取 tracee 的字符串

假设我们想读取 hello.c 中 "Hello, World!" 这个字符串。我们需要知道这个字符串在 tracee 内存中的地址。最简单的方法就是用 gdb 先找到这个地址。

gdb hello
(gdb) break main
(gdb) run
(gdb) x/s &puts

假设 gdb 输出的地址是 0x400644。那么我们可以修改 tracer.c 来读取这个字符串：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <string.h>

int main(int argc, char *argv[]) {
  pid_t pid;
  int status;
  long ptrace_ret;
  long word;
  char buffer[1024];
  char *addr = (char *)0x400644; // Replace with the actual address from gdb
  int i;

  if (argc < 2) {
    fprintf(stderr, "Usage: %s <program>n", argv[0]);
    exit(1);
  }

  pid = fork();

  if (pid == 0) {
    // Child process (tracee)
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execv(argv[1], &argv[1]);
    perror("execv");
    exit(1);
  } else if (pid > 0) {
    // Parent process (tracer)
    wait(&status); // Wait for the tracee to stop

    printf("Tracee stopped, status: %dn", status);

    // Read the string from tracee's memory
    for (i = 0; i < sizeof(buffer); i += sizeof(long)) {
      errno = 0;
      word = ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL);
      if (errno != 0) {
        perror("ptrace PEEKDATA");
        break;
      }
      memcpy(buffer + i, &word, sizeof(long));
      if (memchr(&word, 0, sizeof(long)) != NULL) {
        break; // Found null terminator
      }
    }

    buffer[sizeof(buffer) - 1] = ''; // Ensure null termination

    printf("String from tracee: %sn", buffer);

    ptrace_ret = ptrace(PTRACE_CONT, pid, NULL, NULL);
    if(ptrace_ret == -1){
      perror("ptrace CONT");
      return 1;
    }
    wait(NULL);
    printf("Tracee exitedn");
  } else {
    perror("fork");
    exit(1);
  }

  return 0;
}

代码解释：

*`addr = (char )0x400644`: 字符串的地址。一定要替换成你用 gdb 找到的实际地址！**
ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL): 从 tracee 的地址 addr + i 处读取一个 long (通常是 4 或 8 字节) 的数据。
memcpy(buffer + i, &word, sizeof(long)): 将读取到的数据拷贝到 buffer 中。
memchr(&word, 0, sizeof(long)) != NULL: 检查读取到的数据中是否包含空字符 ()。如果找到了，说明字符串已经读取完毕。

运行结果：

Tracee stopped, status: 5
String from tracee: Hello, World!
Hello, World!
Tracee exited

注意：

PTRACE_PEEKTEXT 和 PTRACE_PEEKDATA 的区别在于，PTRACE_PEEKTEXT 主要用于读取代码段，而 PTRACE_PEEKDATA 用于读取数据段。在现代系统中，由于地址空间布局随机化（ASLR），代码段也可能被视为数据，所以 PTRACE_PEEKDATA 通常更通用。
ptrace 读取内存时，通常以 long 为单位。
需要处理错误情况，例如 ptrace 调用失败。

读取和修改寄存器：控制 tracee 的灵魂

PTRACE_GETREGS 和 PTRACE_SETREGS 命令允许 tracer 读取和修改 tracee 的寄存器值。这非常强大，因为寄存器是 CPU 内部存储数据的地方，修改寄存器可以改变程序的行为。

例子：修改 tracee 的指令指针 (RIP/EIP)

假设我们想让 hello.c 不执行 printf 函数，直接返回。我们可以修改它的指令指针 (RIP/EIP) 跳过 printf 函数的调用。

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <sys/user.h>

int main(int argc, char *argv[]) {
  pid_t pid;
  int status;
  long ptrace_ret;
  struct user_regs_struct regs;
  unsigned long printf_address;

  if (argc < 2) {
    fprintf(stderr, "Usage: %s <program>n", argv[0]);
    exit(1);
  }

  pid = fork();

  if (pid == 0) {
    // Child process (tracee)
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execv(argv[1], &argv[1]);
    perror("execv");
    exit(1);
  } else if (pid > 0) {
    // Parent process (tracer)
    wait(&status); // Wait for the tracee to stop

    printf("Tracee stopped, status: %dn", status);

    // Get the current register values
    ptrace_ret = ptrace(PTRACE_GETREGS, pid, NULL, &regs);
    if (ptrace_ret == -1) {
      perror("ptrace GETREGS");
      return 1;
    }

    // Find the address of the printf call (use gdb to find the address)
    // Replace with the actual address
    printf_address = 0x400600; // Placeholder address, find the correct one with gdb

    // Set the instruction pointer to skip the printf call
    regs.rip = printf_address; // x86_64 architecture

    // Set the modified register values
    ptrace_ret = ptrace(PTRACE_SETREGS, pid, NULL, &regs);
    if (ptrace_ret == -1) {
      perror("ptrace SETREGS");
      return 1;
    }

    ptrace_ret = ptrace(PTRACE_CONT, pid, NULL, NULL);
     if(ptrace_ret == -1){
      perror("ptrace CONT");
      return 1;
    }

    wait(NULL);
    printf("Tracee exitedn");
  } else {
    perror("fork");
    exit(1);
  }

  return 0;
}

代码解释：

struct user_regs_struct regs: 定义一个 user_regs_struct 结构体，用于存储寄存器值。
ptrace(PTRACE_GETREGS, pid, NULL, &regs): 获取 tracee 的所有寄存器值，并将它们存储到 regs 结构体中。
regs.rip = printf_address: 修改 regs 结构体中的 rip (指令指针) 寄存器的值。
注意： 在 x86-64 架构中，指令指针寄存器是 rip，在 x86 架构中是 eip。
重要： printf_address 需要用 gdb 实际找到 printf 之后的指令地址！
ptrace(PTRACE_SETREGS, pid, NULL, &regs): 将修改后的 regs 结构体写回到 tracee 的寄存器中。

运行结果：

程序直接退出，没有输出 "Hello, World!"。

注意：

修改寄存器是非常危险的操作，需要非常小心。
不同的架构有不同的寄存器名称和结构体定义。

单步执行：像福尔摩斯一样追踪线索

PTRACE_SINGLESTEP 命令允许 tracer 单步执行 tracee。每次执行一条指令，tracee 就会停止，tracer 可以检查它的状态。

例子：单步执行 hello.c 并打印每一条指令

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <sys/user.h>

int main(int argc, char *argv[]) {
  pid_t pid;
  int status;
  long ptrace_ret;
  struct user_regs_struct regs;
  unsigned char instruction[16]; // Maximum instruction length

  if (argc < 2) {
    fprintf(stderr, "Usage: %s <program>n", argv[0]);
    exit(1);
  }

  pid = fork();

  if (pid == 0) {
    // Child process (tracee)
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execv(argv[1], &argv[1]);
    perror("execv");
    exit(1);
  } else if (pid > 0) {
    // Parent process (tracer)
    wait(&status); // Wait for the tracee to stop

    printf("Tracee stopped, status: %dn", status);

    while (1) {
      // Single step the tracee
      ptrace_ret = ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
      if (ptrace_ret == -1) {
        perror("ptrace SINGLESTEP");
        break;
      }

      wait(&status); // Wait for the tracee to stop after single step

      if (WIFEXITED(status)) {
        printf("Tracee exited with status: %dn", WEXITSTATUS(status));
        break;
      }

      // Get the current register values
      ptrace_ret = ptrace(PTRACE_GETREGS, pid, NULL, &regs);
      if (ptrace_ret == -1) {
        perror("ptrace GETREGS");
        break;
      }

      // Read the instruction at the current instruction pointer
      ptrace_ret = ptrace(PTRACE_PEEKTEXT, pid, (void *)regs.rip, NULL);

        // Read the instruction at the current instruction pointer
        for(int i = 0; i < sizeof(instruction); i += sizeof(long)){
            errno = 0;
            long word = ptrace(PTRACE_PEEKTEXT, pid, regs.rip + i, NULL);
            if(errno != 0){
                perror("ptrace PEEKTEXT");
                break;
            }
            memcpy(instruction + i, &word, sizeof(long));
        }

      printf("RIP: 0x%llx, Instruction: ", regs.rip);
      for(int i = 0; i < sizeof(instruction); i++){
          printf("%02x ", instruction[i]);
      }
      printf("n");
    }

    printf("Tracer finishedn");
  } else {
    perror("fork");
    exit(1);
  }

  return 0;
}

代码解释：

ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL): 单步执行 tracee。
wait(&status): 等待 tracee 停止。
WIFEXITED(status): 检查 tracee 是否正常退出。
*`ptrace(PTRACE_PEEKTEXT, pid, (void )regs.rip, NULL)`:** 从 tracee 的指令指针处读取指令。
打印指令指针和指令内容。

运行结果：

会输出 hello.c 中每一条指令的地址和内容。

高级技巧：断点、信号处理、多线程

ptrace 的功能远不止这些。还有一些高级技巧，例如：

断点： 在 tracee 的代码中设置断点，让它在执行到特定位置时停止。可以通过修改 tracee 的内存，将指令替换为 int 3 (断点指令)，然后在 tracer 中捕获 SIGTRAP 信号。
信号处理： tracer 可以控制 tracee 收到的信号。可以阻止信号传递给 tracee，或者修改信号的内容。
多线程： 跟踪多线程程序更加复杂，需要使用 PTRACE_ATTACH 和 PTRACE_DETACH 命令，以及处理线程相关的信号。

ptrace 的缺点：性能问题、安全风险

ptrace 虽然强大，但也存在一些缺点：

性能问题： ptrace 会导致大量的上下文切换，影响程序的性能。
安全风险： ptrace 可以被恶意程序利用，例如注入代码、窃取敏感信息。
权限问题： 附加到其他进程需要相应的权限。

总结：ptrace 的价值

ptrace 是一个功能强大的工具，可以用于调试、安全研究、程序分析等领域。掌握 ptrace 可以让你更深入地了解程序的运行机制，解决各种复杂的问题。

希望今天的讲座能帮助你入门 ptrace。记住，实践是检验真理的唯一标准。多写代码，多做实验，你就能成为 ptrace 大师！

谢谢大家！

发表回复 取消回复

发表回复取消回复