如何使用`ctypes`库调用`DLL`或`.so`动态链接库，并处理`C`语言`指针`。 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

好的，我们开始今天的讲座。主题是使用 Python 的 ctypes 库调用动态链接库（DLL 或 .so），并处理 C 语言指针。ctypes 是 Python 的一个外部函数库，它提供了与 C 兼容的数据类型，允许 Python 代码调用 DLL 或共享库中的函数。我们将深入探讨如何在 Python 中使用 ctypes 来实现这一目标，重点关注指针处理。

1. ctypes 基础

首先，我们需要了解 ctypes 的基本用法。ctypes 允许我们加载动态链接库，并定义 C 函数的参数类型和返回类型。

加载动态链接库:

import ctypes

# Windows
# my_dll = ctypes.cdll.LoadLibrary("my_library.dll")  # 或者 ctypes.WinDLL("my_library.dll")

# Linux / macOS
my_lib = ctypes.cdll.LoadLibrary("./my_library.so") # 或者 ctypes.CDLL("./my_library.so")

# 如果动态链接库位于系统路径中
# my_lib = ctypes.CDLL("my_library.so")

这里，ctypes.cdll.LoadLibrary 或 ctypes.CDLL 用于加载 C 动态链接库。Windows 上也可以使用 ctypes.WinDLL。路径是必须的，除非库在系统路径下。

定义函数签名:

加载库之后，我们需要定义 C 函数的签名，包括参数类型和返回类型。如果未正确定义函数签名，可能导致程序崩溃或返回错误的结果。
```
my_lib.my_function.argtypes = [ctypes.c_int, ctypes.c_char_p]  # 两个参数：int 和 char*
my_lib.my_function.restype = ctypes.c_int  # 返回值类型：int
```
argtypes 是一个列表，包含函数参数的 ctypes 类型。restype 定义函数的返回类型。ctypes 提供了一系列与 C 数据类型对应的类型，例如 c_int, c_float, c_char_p 等。

2. C 数据类型和 ctypes 映射

下表列出了一些常见的 C 数据类型以及它们在 ctypes 中的对应类型：

C 数据类型	`ctypes` 类型	说明
`int`	`c_int`	C 语言的 int 类型
`float`	`c_float`	C 语言的 float 类型
`double`	`c_double`	C 语言的 double 类型
`char`	`c_char`	C 语言的 char 类型
`char*`	`c_char_p`	C 语言的 char 指针，指向字符串
`void*`	`c_void_p`	C 语言的 void 指针，通用指针
`int*`	`POINTER(c_int)`	C 语言的 int 指针
`float*`	`POINTER(c_float)`	C 语言的 float 指针
`double*`	`POINTER(c_double)`	C 语言的 double 指针
`char**`	`POINTER(c_char_p)`	指向 char 指针的指针 (例如，字符串数组)

3. 指针处理

指针是 C 语言的核心概念，也是 ctypes 中最复杂的部分。我们需要理解如何创建、传递和解引用指针。

创建指针:

ctypes 提供了 POINTER() 函数用于创建指针类型。
```
int_ptr = ctypes.POINTER(ctypes.c_int) # 定义一个 int 指针类型
```
然后，可以使用 ctypes.pointer() 或 ctypes.byref() 创建指针实例。
```
# 使用 ctypes.pointer()
x = ctypes.c_int(10)
ptr_x = ctypes.pointer(x)  # ptr_x 是指向 x 的指针

# 使用 ctypes.byref()
y = ctypes.c_int(20)
ptr_y = ctypes.byref(y) # ptr_y 也是指向 y 的指针
```
ctypes.pointer() 创建一个新的指针对象，它拥有自己的内存。而 ctypes.byref() 创建一个指向现有变量的指针，不拥有自己的内存。 ctypes.byref() 通常更高效，因为它避免了额外的内存分配。通常函数需要传入指针作为参数时，使用byref比较合适，除非需要对指针进行一些额外的操作。

传递指针:

将指针传递给 C 函数非常简单，只需要将指针变量作为参数传递即可。

# 假设 C 函数的定义是 void increment(int *x);
my_lib.increment.argtypes = [ctypes.POINTER(ctypes.c_int)]
my_lib.increment.restype = None # void 返回值

value = ctypes.c_int(5)
ptr_value = ctypes.byref(value)
my_lib.increment(ptr_value)
print(value.value)  # 输出 6，因为 C 函数修改了 value 的值

解引用指针:

要获取指针指向的值，可以使用 contents 属性。
```
print(ptr_x.contents.value)  # 输出 10
```
contents 属性返回指针指向的对象，然后我们可以访问该对象的 value 属性来获取实际的值。

处理 NULL 指针:

在 C 语言中，NULL 指针表示一个无效的内存地址。在 ctypes 中，NULL 指针可以表示为 None。在将指针传递给 C 函数之前，务必检查指针是否为 None，以避免程序崩溃。

null_ptr = None
# 假设 C 函数的定义是 void process_data(int *data);
my_lib.process_data.argtypes = [ctypes.POINTER(ctypes.c_int)]
my_lib.process_data.restype = None

if null_ptr is not None:
    my_lib.process_data(null_ptr) # 这样写是不安全的
else:
    print("Null pointer detected, skipping function call.")

更好的做法是在 C 代码中处理 NULL 指针。

4. 数组和指针

在 C 语言中，数组名可以隐式地转换为指向数组第一个元素的指针。ctypes 也支持数组和指针之间的转换。

创建数组:

可以使用 ctypes.Array 创建数组。

# 创建一个包含 5 个整数的数组
IntArray5 = ctypes.c_int * 5
my_array = IntArray5(1, 2, 3, 4, 5)

数组作为指针:

可以将数组传递给接受指针作为参数的 C 函数。

# 假设 C 函数的定义是 int sum_array(int *arr, int size);
my_lib.sum_array.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.c_int]
my_lib.sum_array.restype = ctypes.c_int

result = my_lib.sum_array(my_array, len(my_array))
print(result)  # 输出 15

这里，my_array 会隐式地转换为指向数组第一个元素的指针。

从 C 函数返回数组:

如果 C 函数返回一个数组指针，我们需要定义正确的返回类型，并手动分配内存。

# 假设 C 函数的定义是 int* create_array(int size);
my_lib.create_array.argtypes = [ctypes.c_int]
my_lib.create_array.restype = ctypes.POINTER(ctypes.c_int)

size = 10
ptr = my_lib.create_array(size)

# 创建一个与 C 数组大小相同的 Python 数组
result_array = (ctypes.c_int * size)()

# 将 C 数组的数据复制到 Python 数组
for i in range(size):
    result_array[i] = ptr[i] # 访问指针的第i个元素

# 打印结果
for i in range(size):
    print(result_array[i])

# 释放 C 数组的内存 (假设 C 函数分配了内存，并且有一个 free_array 函数)
my_lib.free_array.argtypes = [ctypes.POINTER(ctypes.c_int)]
my_lib.free_array.restype = None
my_lib.free_array(ptr)

重要提示：如果 C 代码动态分配了内存（例如使用 malloc），则需要在 Python 代码中手动释放内存，以避免内存泄漏。上面的例子假设C代码提供了free_array函数来释放内存。

5. 结构体和指针

结构体是 C 语言中组织数据的常用方式。ctypes 允许我们定义与 C 结构体对应的 Python 类。

定义结构体:

class Point(ctypes.Structure):
    _fields_ = [("x", ctypes.c_int), ("y", ctypes.c_int)]

# 创建一个 Point 实例
p = Point(10, 20)
print(p.x, p.y)  # 输出 10 20

_fields_ 是一个列表，包含结构体成员的名称和类型。

结构体指针:

# 创建一个指向 Point 结构体的指针
PointPtr = ctypes.POINTER(Point)
ptr_p = ctypes.pointer(p) # 或者 ptr_p = ctypes.byref(p)

# 假设 C 函数的定义是 void move_point(Point *p, int dx, int dy);
my_lib.move_point.argtypes = [PointPtr, ctypes.c_int, ctypes.c_int]
my_lib.move_point.restype = None

my_lib.move_point(ptr_p, 5, -2)
print(p.x, p.y)  # 输出 15 18，因为 C 函数修改了 p 的值
print(ptr_p.contents.x, ptr_p.contents.y) # 输出 15 18

嵌套结构体:

class Rectangle(ctypes.Structure):
    _fields_ = [("top_left", Point), ("bottom_right", Point)]

rect = Rectangle(Point(0, 0), Point(100, 100))
print(rect.top_left.x, rect.bottom_right.y)  # 输出 0 100

6. 回调函数

ctypes 允许我们将 Python 函数作为回调函数传递给 C 代码。

定义回调函数类型:

# 定义一个接受两个整数参数并返回整数的回调函数类型
CallbackType = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int, ctypes.c_int)

# 定义一个 Python 函数
def my_callback(x, y):
    print("Callback called with", x, y)
    return x + y

# 创建一个回调函数实例
callback_func = CallbackType(my_callback)

ctypes.CFUNCTYPE 用于定义回调函数类型，第一个参数是返回类型，后面的参数是参数类型。

将回调函数传递给 C 代码:

# 假设 C 函数的定义是 int call_callback(int x, int y, int (*callback)(int, int));
my_lib.call_callback.argtypes = [ctypes.c_int, ctypes.c_int, CallbackType]
my_lib.call_callback.restype = ctypes.c_int

result = my_lib.call_callback(1, 2, callback_func)
print(result)  # 输出 3 (callback_func 的返回值)

C 代码将调用我们定义的 Python 回调函数。

7. 字符串处理

C 语言中的字符串通常以 char* 或 const char* 的形式表示。ctypes 提供了 c_char_p 类型来处理 C 字符串。

将 Python 字符串传递给 C 代码:

# 假设 C 函数的定义是 int string_length(const char *str);
my_lib.string_length.argtypes = [ctypes.c_char_p]
my_lib.string_length.restype = ctypes.c_int

my_string = "Hello, world!"
# 将 Python 字符串编码为 bytes
encoded_string = my_string.encode('utf-8')
result = my_lib.string_length(encoded_string)
print(result)  # 输出 13

需要将 Python 字符串编码为 bytes，因为 c_char_p 期望的是字节序列。

从 C 代码返回字符串:

# 假设 C 函数的定义是 char* get_string();
my_lib.get_string.argtypes = []
my_lib.get_string.restype = ctypes.c_char_p

result = my_lib.get_string()
# 将 C 字符串解码为 Python 字符串
if result:
  decoded_string = result.decode('utf-8')
  print(decoded_string)
else:
  print("Received a NULL pointer from C function")

需要将 C 字符串解码为 Python 字符串。并且需要对NULL指针做判断。

*8. `void` 指针**

void* 是 C 语言中的通用指针，可以指向任何类型的数据。ctypes 提供了 c_void_p 类型来表示 void* 指针。

*使用 `void` 指针:**

# 假设 C 函数的定义是 void process_data(void *data, int type);
my_lib.process_data.argtypes = [ctypes.c_void_p, ctypes.c_int]
my_lib.process_data.restype = None

# 处理整数数据
int_data = ctypes.c_int(123)
my_lib.process_data(ctypes.byref(int_data), 1)  # type=1 表示整数

# 处理浮点数数据
float_data = ctypes.c_float(3.14)
my_lib.process_data(ctypes.byref(float_data), 2)  # type=2 表示浮点数

# 处理字符串数据
string_data = "Hello".encode('utf-8')
my_lib.process_data(ctypes.create_string_buffer(string_data), 3) # type=3 表示字符串

在使用 void* 指针时，需要小心地处理数据类型。通常需要传递一个类型标识符，以便 C 代码知道如何解释指针指向的数据。

9. 错误处理

在使用 ctypes 调用 C 代码时，错误处理非常重要。C 代码中的错误可能导致 Python 程序崩溃。

检查返回值:

许多 C 函数通过返回值来指示成功或失败。应该检查返回值，并在出现错误时抛出异常。

# 假设 C 函数返回 0 表示成功，返回非 0 表示失败
result = my_lib.some_function()
if result != 0:
    raise Exception("C function failed with error code: {}".format(result))

处理 C 异常:

C 代码可能会抛出异常（例如，访问无效内存地址）。这些异常通常会导致 Python 程序崩溃。可以使用 try...except 块来捕获异常。但是，ctypes 无法直接捕获 C 异常。因此，需要仔细测试 C 代码，并确保它不会抛出未处理的异常。
使用 errcheck:

ctypes 提供了 errcheck 机制，允许我们在 C 函数调用后执行自定义的错误检查。
```
def check_result(result, func, args):
    if result == -1:
        raise OSError("C function failed")
    return args

my_lib.my_function.errcheck = check_result
```
errcheck 函数接收三个参数：返回值、函数对象和参数。可以在 errcheck 函数中检查返回值，并在出现错误时抛出异常。

10. 示例：动态数组

以下是一个更完整的例子，演示如何使用 ctypes 调用 C 代码来操作动态数组。

C 代码 (dynamic_array.c):

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int *data;
    int size;
    int capacity;
} DynamicArray;

DynamicArray* create_dynamic_array(int initial_capacity) {
    DynamicArray* arr = (DynamicArray*)malloc(sizeof(DynamicArray));
    if (arr == NULL) {
        return NULL;
    }
    arr->data = (int*)malloc(initial_capacity * sizeof(int));
    if (arr->data == NULL) {
        free(arr);
        return NULL;
    }
    arr->size = 0;
    arr->capacity = initial_capacity;
    return arr;
}

int add_element(DynamicArray* arr, int element) {
    if (arr == NULL) {
        return -1; // Error: Null pointer
    }
    if (arr->size == arr->capacity) {
        // Resize the array
        int new_capacity = arr->capacity * 2;
        int *new_data = (int*)realloc(arr->data, new_capacity * sizeof(int));
        if (new_data == NULL) {
            return -2; // Error: Reallocation failed
        }
        arr->data = new_data;
        arr->capacity = new_capacity;
    }
    arr->data[arr->size] = element;
    arr->size++;
    return 0; // Success
}

int get_element(DynamicArray* arr, int index, int *element) {
    if (arr == NULL) {
        return -1; // Error: Null pointer
    }
    if (index < 0 || index >= arr->size) {
        return -2; // Error: Index out of bounds
    }
    *element = arr->data[index];
    return 0; // Success
}

int get_size(DynamicArray* arr) {
    if (arr == NULL) {
        return -1; // Indicate error
    }
    return arr->size;
}

void free_dynamic_array(DynamicArray* arr) {
    if (arr != NULL) {
        free(arr->data);
        free(arr);
    }
}

编译 C 代码:

gcc -shared -o dynamic_array.so dynamic_array.c # Linux/macOS
# 或者
# gcc -shared -o dynamic_array.dll dynamic_array.c # Windows

Python 代码:

import ctypes

# 加载动态链接库
dynamic_array_lib = ctypes.CDLL("./dynamic_array.so")  # 或者 "dynamic_array.dll"

# 定义结构体
class DynamicArray(ctypes.Structure):
    _fields_ = [("data", ctypes.POINTER(ctypes.c_int)),
                ("size", ctypes.c_int),
                ("capacity", ctypes.c_int)]

# 定义函数签名
dynamic_array_lib.create_dynamic_array.argtypes = [ctypes.c_int]
dynamic_array_lib.create_dynamic_array.restype = ctypes.POINTER(DynamicArray)

dynamic_array_lib.add_element.argtypes = [ctypes.POINTER(DynamicArray), ctypes.c_int]
dynamic_array_lib.add_element.restype = ctypes.c_int

dynamic_array_lib.get_element.argtypes = [ctypes.POINTER(DynamicArray), ctypes.c_int, ctypes.POINTER(ctypes.c_int)]
dynamic_array_lib.get_element.restype = ctypes.c_int

dynamic_array_lib.get_size.argtypes = [ctypes.POINTER(DynamicArray)]
dynamic_array_lib.get_size.restype = ctypes.c_int

dynamic_array_lib.free_dynamic_array.argtypes = [ctypes.POINTER(DynamicArray)]
dynamic_array_lib.free_dynamic_array.restype = None

# 创建动态数组
initial_capacity = 2
arr = dynamic_array_lib.create_dynamic_array(initial_capacity)

if not arr:
    raise OSError("Failed to create dynamic array")

# 添加元素
dynamic_array_lib.add_element(arr, 10)
dynamic_array_lib.add_element(arr, 20)
dynamic_array_lib.add_element(arr, 30) # 触发扩容

# 获取元素
element = ctypes.c_int()
result = dynamic_array_lib.get_element(arr, 1, ctypes.byref(element))
if result == 0:
    print("Element at index 1:", element.value)  # 输出 20
else:
    print("Failed to get element")

# 获取大小
size = dynamic_array_lib.get_size(arr)
print("Array size:", size)  # 输出 3

# 释放内存
dynamic_array_lib.free_dynamic_array(arr)

总结
掌握ctypes库对于调用C语言动态链接库至关重要，理解C数据类型与ctypes类型的映射关系是基础。正确处理指针，包括创建、传递、解引用以及NULL指针的判断，是关键。

关键点提示
正确声明函数签名至关重要，否则会导致不可预测的结果。内存管理很重要，如果 C 代码分配了内存，需要在 Python 中释放它。错误处理是必须的，以避免程序崩溃。

发表回复 取消回复

发表回复取消回复