`subprocess` 模块：与外部命令高效交互与管道操作 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

好的，咱们今天来聊聊 Python 的 subprocess 模块，这玩意儿就像个万能遥控器，能让你在 Python 代码里指挥电脑干各种各样的活儿，比如运行个命令、执行个脚本啥的。别害怕，听起来高大上，其实用起来挺顺手的。

开场白：为什么我们需要 subprocess？

想象一下，你在写一个程序，突然需要调用一个外部程序，比如你想用 ffmpeg 处理一下视频，或者用 grep 搜索一下文件内容。难道你要重写 ffmpeg 或者 grep 吗？当然不用！subprocess 就是来解决这个问题的，它能让你像在命令行里一样，轻松地执行外部程序，并获取它们的输出。

subprocess.run()：最常用的指挥棒

subprocess.run() 是 subprocess 模块里最常用的函数，它能执行一个命令，等待它完成，然后返回一个 CompletedProcess 对象，里面包含了命令的执行结果。

咱们先来个最简单的例子：

import subprocess

result = subprocess.run(['ls', '-l'], capture_output=True, text=True)

print("命令执行结果:")
print(result.stdout)
print("命令执行状态码:", result.returncode)

这段代码做了什么呢？

subprocess.run(['ls', '-l'], ...): 这句是关键。它告诉 Python 运行 ls -l 这个命令。注意，命令和参数要分开放在一个列表里。
capture_output=True: 加上这个参数，subprocess 就会把命令的输出（包括标准输出和标准错误输出）都抓取到 result 对象里。
text=True: 这个参数告诉 subprocess，输出是文本，把它解码成字符串。如果不加，输出就是字节串。
result.stdout: 这是命令的标准输出。
result.returncode: 这是命令的退出状态码。0 表示成功，非 0 表示出错。

更高级的用法：输入、输出和错误处理

输入 (input): 如果你想给命令传递一些输入，可以用 input 参数。
```
import subprocess

result = subprocess.run(['grep', 'hello'], capture_output=True, text=True, input='hello worldn')
print(result.stdout) # 输出: hello world
```
这里，我们用 grep 命令搜索 "hello"，并把 "hello worldn" 作为输入传递给它。
标准错误输出 (stderr): 如果命令出错了，错误信息会跑到标准错误输出里。你可以通过 result.stderr 来获取它。
```
import subprocess

result = subprocess.run(['ls', 'nonexistent_file'], capture_output=True, text=True)
print(result.stderr) # 输出: ls: nonexistent_file: No such file or directory
print(result.returncode) # 输出: 2
```
可以看到，ls 命令找不到文件，所以报错了，并且返回了一个非 0 的状态码。
check=True: 如果你希望在命令出错的时候，subprocess 自动抛出一个异常，可以加上 check=True。
```
import subprocess

try:
    result = subprocess.run(['ls', 'nonexistent_file'], capture_output=True, text=True, check=True)
except subprocess.CalledProcessError as e:
    print("命令出错了:", e)
    print("错误信息:", e.stderr)
```
这样，如果 ls 命令找不到文件，就会抛出一个 subprocess.CalledProcessError 异常，你就可以在 except 块里处理它。

管道 (pipe)：把命令串起来

管道是 Unix 系统里一个非常强大的概念，它能把一个命令的输出作为另一个命令的输入。subprocess 也能实现管道的功能。

import subprocess

# 模拟： ls -l | grep "txt"
p1 = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)
p2 = subprocess.Popen(['grep', 'txt'], stdin=p1.stdout, stdout=subprocess.PIPE, text=True)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output, error = p2.communicate()

print(output)

这段代码模拟了 ls -l | grep "txt" 这个命令。它做了什么呢？

subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE): 这句创建了一个 Popen 对象，用于执行 ls -l 命令。stdout=subprocess.PIPE 表示把 ls -l 命令的标准输出连接到一个管道。
subprocess.Popen(['grep', 'txt'], stdin=p1.stdout, stdout=subprocess.PIPE, text=True): 这句创建了另一个 Popen 对象，用于执行 grep "txt" 命令。stdin=p1.stdout 表示把 grep "txt" 命令的标准输入连接到 ls -l 命令的标准输出管道。stdout=subprocess.PIPE 表示把 grep "txt" 命令的标准输出也连接到一个管道。
p1.stdout.close(): 这个很重要。如果不关掉 p1 的标准输出，p2 就不知道什么时候输入结束，会一直等待下去。
p2.communicate(): 这句执行 grep "txt" 命令，并获取它的输出。

Popen 对象：更灵活的控制

subprocess.run() 已经很方便了，但如果你需要更精细的控制，比如你想异步地执行命令，或者你想实时地获取命令的输出，就可以使用 subprocess.Popen()。

import subprocess
import threading

def read_output(process):
    while True:
        line = process.stdout.readline()
        if not line:
            break
        print("输出:", line.strip())

process = subprocess.Popen(['ping', '8.8.8.8'], stdout=subprocess.PIPE, text=True)

# 启动一个线程来读取输出
thread = threading.Thread(target=read_output, args=(process,))
thread.start()

# 等待命令执行完成
process.wait()

print("命令执行完成")
thread.join()

这段代码用 ping 命令来演示如何异步地获取命令的输出。

subprocess.Popen(['ping', '8.8.8.8'], stdout=subprocess.PIPE, text=True): 这句创建了一个 Popen 对象，用于执行 ping 8.8.8.8 命令。stdout=subprocess.PIPE 表示把 ping 命令的标准输出连接到一个管道。
read_output(process): 这个函数负责从管道里读取数据，并打印出来。
threading.Thread(target=read_output, args=(process,)): 这句创建了一个线程，用于执行 read_output 函数。
process.wait(): 这句等待 ping 命令执行完成。
thread.join(): 这句等待读取输出的线程结束。

安全问题：小心驶得万年船

subprocess 非常强大，但也存在一些安全风险。如果你不小心，可能会让你的程序执行一些恶意的命令。

命令注入: 最常见的安全问题是命令注入。如果你把用户输入直接拼接到命令里，攻击者就可以通过构造恶意的输入来执行任意的命令。

import subprocess

# 危险的代码
user_input = input("请输入文件名: ")
command = ['ls', user_input]
subprocess.run(command)

# 如果用户输入 "file.txt; rm -rf /"，就会执行删除所有文件的命令

为了避免命令注入，一定要避免直接拼接用户输入。可以使用 shlex.quote() 来转义用户输入，或者使用参数化的方式来传递参数。

import subprocess
import shlex

# 安全的代码
user_input = input("请输入文件名: ")
command = ['ls', shlex.quote(user_input)]
subprocess.run(command)

权限问题: subprocess 默认会以当前用户的身份来执行命令。如果你的程序需要以更高的权限来执行命令，可以使用 sudo 或者其他提权工具。但一定要小心，不要滥用权限，尽量以最小的权限来执行命令。

subprocess 的各种参数总结

为了方便大家查阅，我把 subprocess.run() 和 subprocess.Popen() 的常用参数整理成表格：

参数	含义	适用函数
`args`	要执行的命令，可以是一个字符串，也可以是一个列表。如果是字符串，会调用 shell 来执行。如果是列表，会直接执行命令，不会调用 shell。	`run`, `Popen`
`stdin`	标准输入，可以是一个文件对象，也可以是 `subprocess.PIPE`，表示从管道读取输入。	`run`, `Popen`
`stdout`	标准输出，可以是一个文件对象，也可以是 `subprocess.PIPE`，表示把输出写入管道。	`run`, `Popen`
`stderr`	标准错误输出，可以是一个文件对象，也可以是 `subprocess.PIPE`，表示把错误信息写入管道。还可以是 `subprocess.STDOUT`，表示把错误信息和标准输出合并。	`run`, `Popen`
`shell`	是否调用 shell 来执行命令。如果设置为 `True`，`args` 必须是一个字符串。	`run`, `Popen`
`cwd`	命令执行的目录。	`run`, `Popen`
`env`	环境变量。	`run`, `Popen`
`timeout`	命令执行的超时时间，单位是秒。	`run`, `Popen`
`check`	如果命令的退出状态码不是 0，是否抛出一个 `subprocess.CalledProcessError` 异常。	`run`
`capture_output`	是否捕获命令的输出（包括标准输出和标准错误输出）。如果设置为 `True`，`result.stdout` 和 `result.stderr` 会包含命令的输出。	`run`
`text`	是否把输出解码成字符串。如果设置为 `True`，`result.stdout` 和 `result.stderr` 会是字符串，否则是字节串。	`run`
`input`	命令的标准输入，可以是一个字符串或者字节串。	`run`
`bufsize`	缓冲区大小，用于控制管道的缓冲行为。	`Popen`
`close_fds`	是否关闭所有文件描述符。	`Popen`
`preexec_fn`	在子进程执行命令之前调用的函数。	`Popen`
`pass_fds`	保留的文件描述符。	`Popen`
`restore_signals`	是否恢复信号处理。	`Popen`
`start_new_session`	是否创建一个新的会话。	`Popen`
`encoding`	用于解码输出的编码。	`Popen`
`errors`	解码错误的处理方式。	`Popen`
`universal_newlines`	是否使用通用换行符。	`Popen`

实际案例分析

自动化部署: 你可以用 subprocess 来执行部署脚本，比如 rsync 或者 scp。

import subprocess

def deploy(host, user, password, source_dir, target_dir):
    command = [
        'rsync',
        '-avz',
        '--delete',
        source_dir,
        f'{user}@{host}:{target_dir}'
    ]
    result = subprocess.run(command, check=True)
    print("部署完成")

deploy('example.com', 'user', 'password', '/path/to/local/code', '/path/to/remote/code')

数据处理: 你可以用 subprocess 来调用一些数据处理工具，比如 awk 或者 sed。

import subprocess

def process_data(input_file, output_file):
    command = [
        'awk',
        '{print $1, $3}',
        input_file
    ]
    result = subprocess.run(command, capture_output=True, text=True)
    with open(output_file, 'w') as f:
        f.write(result.stdout)
    print("数据处理完成")

process_data('input.txt', 'output.txt')

系统监控: 你可以用 subprocess 来获取系统信息，比如 CPU 使用率或者内存使用率。

import subprocess

def get_cpu_usage():
    command = ['top', '-bn1']
    result = subprocess.run(command, capture_output=True, text=True)
    lines = result.stdout.splitlines()
    for line in lines:
        if line.startswith('%Cpu(s)'):
            cpu_usage = line.split(',')[0].split(':')[1].strip()
            return cpu_usage
    return None

cpu_usage = get_cpu_usage()
print("CPU 使用率:", cpu_usage)

总结：subprocess，你值得拥有

subprocess 模块是 Python 里一个非常强大的工具，它能让你轻松地与外部命令交互，实现各种各样的功能。只要你掌握了它的基本用法，并注意安全问题，就能让你的 Python 程序更加强大和灵活。记住，熟练使用 subprocess，能让你从一个 Python 程序员，变成一个真正的系统管理员！

希望今天的讲解对你有所帮助！祝大家编程愉快！

发表回复 取消回复

发表回复取消回复