解析 ‘Function Calling’ 的幻觉防御：如何强制 LLM 只生成预定义 Schema 内的工具参数？ - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

解析 ‘Function Calling’ 的幻觉防御：如何强制 LLM 只生成预定义 Schema 内的工具参数

各位编程专家、架构师和对大型语言模型（LLM）应用充满热情的开发者们，大家好！

今天，我们将深入探讨一个在构建基于 LLM 的智能应用时至关重要的话题：如何有效防御 LLM 在“函数调用”（Function Calling）场景中产生的幻觉，并强制模型严格按照我们预定义的工具参数 Schema 来生成输出。随着 LLM 能力的飞速发展，函数调用已成为其与外部系统交互、扩展其能力的强大桥梁。然而，这种能力也伴随着潜在的风险——模型可能会“臆想”出不存在的函数、错误的参数名、不符合类型的值，甚至生成格式错误的数据。这些幻觉不仅会破坏应用程序的稳定性，更可能导致安全漏洞和不可预测的行为。

本讲座旨在从编程专家的视角，为您提供一套系统性的防御策略，涵盖从 Schema 设计、前置约束到后置验证、智能重试等多个层面，并辅以大量的代码示例，确保您能将这些理论知识转化为实际可操作的解决方案。我们将聚焦于如何构建坚不可摧的防线，确保 LLM 成为一个可靠的工具执行者，而非一个难以预测的“幻术师”。

1. 函数调用：LLM 与外部世界的桥梁

首先，让我们简要回顾一下 LLM 的函数调用机制。函数调用，或称工具使用（Tool Usage），是现代 LLM 的一项核心能力，它允许模型在识别用户意图后，生成一个结构化的 JSON 对象，该对象描述了需要调用的函数名称及其参数。随后，应用程序可以解析这个 JSON，实际执行相应的函数，并将结果反馈给 LLM，从而实现复杂的、多步骤的交互流程。

其核心工作原理大致如下：

工具定义 (Tool Definition): 开发者向 LLM 提供一组可用的工具（函数）列表。每个工具都包含一个唯一的名称、一个描述其功能的自然语言描述，以及最重要的——一个描述其参数的 JSON Schema。
用户请求 (User Request): 用户向 LLM 发送一个自然语言请求。
意图识别与工具选择 (Intent Recognition & Tool Selection): LLM 分析用户请求，判断是否有必要调用某个工具来满足用户的意图。
参数生成 (Parameter Generation): 如果 LLM 决定调用某个工具，它会根据该工具的 JSON Schema 和用户请求中的信息，生成一个包含函数名和对应参数值的 JSON 对象。
外部执行 (External Execution): 应用程序接收到 LLM 生成的 JSON 对象后，进行解析，调用实际的后端函数，并获取执行结果。
结果反馈 (Result Feedback): 函数执行结果被发送回 LLM，作为其后续对话的上下文。

这种机制极大地扩展了 LLM 的应用边界，使其能够执行数据库查询、发送邮件、预订机票、控制智能家居设备等，将“思考”与“行动”紧密结合。

一个简单的工具定义示例：

# 假设我们使用 OpenAI API 风格的工具定义
tools_definition = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取指定城市当前的实时天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市名称，例如：北京、上海"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "温度单位，摄氏度或华氏度"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# LLM 可能会生成这样的调用：
# {
#   "tool_calls": [
#     {
#       "id": "call_abc123",
#       "function": {
#         "name": "get_current_weather",
#         "arguments": "{"location": "纽约", "unit": "fahrenheit"}"
#       },
#       "type": "function"
#     }
#   ]
# }

2. 函数调用中的“幻觉”：潜在的风险

尽管函数调用功能强大，但 LLM 毕竟是基于概率生成文本的模型。它并非一个严格的逻辑引擎，因此在生成工具调用时，存在多种形式的“幻觉”风险。这些幻觉会导致生成的参数不符合预期，进而引发应用程序错误甚至更严重的后果。

我们将 LLM 在函数调用中可能产生的幻觉大致分为以下几类：

函数名幻觉 (Function Name Hallucination): LLM 生成了在 tools 列表中不存在的函数名。
- 示例: 用户问“查询股票价格”，但 LLM 却尝试调用 send_email。
参数名幻觉 (Argument Name Hallucination): LLM 生成了工具 Schema 中未定义的参数名。
- 示例: get_current_weather 函数只需要 location 和 unit，但 LLM 却生成了 {"city": "北京", "temp_unit": "celsius"}。
参数类型幻觉 (Argument Type Hallucination): LLM 为参数生成了与 Schema 定义不符的数据类型。
- 示例: location 预期为 string，LLM 却生成 12345 (number)。unit 预期为 enum 字符串，LLM 却生成 {"value": "celsius"} (object)。
参数值幻觉 (Argument Value Hallucination – within type): LLM 生成了符合类型但语义上或业务逻辑上不正确的值。这是最隐蔽也最难防御的一类。
- 示例: unit 预期为 celsius 或 fahrenheit，LLM 却生成 kelvin。product_id 预期为存在于数据库中的 ID，LLM 却生成一个随机不存在的 ID。
缺失必要参数 (Missing Required Arguments): LLM 忘记生成 Schema 中标记为 required 的参数。
- 示例: get_current_weather 需要 location，但 LLM 只生成了 {"unit": "celsius"}。
额外或未知参数 (Extra/Unknown Arguments): LLM 在 Schema 未指定 additionalProperties: true 的情况下，生成了多余的参数。
- 示例: 除了 location 和 unit，LLM 还生成了 {"location": "北京", "unit": "celsius", "time": "now"}。
JSON 格式错误 (Malformed JSON): LLM 生成的 arguments 字符串不是一个合法的 JSON 格式。
- 示例: {"location": "北京", "unit": "celsius" (缺少右括号)。

这些幻觉轻则导致程序崩溃，重则引发不可预料的业务逻辑错误，例如给错误的邮箱发送邮件、删除错误的数据等。因此，构建强大的防御机制是任何生产级 LLM 应用的基石。

3. 第一道防线：坚固的 Schema 定义与提示工程

防御幻觉的第一步，也是最重要的一步，是在源头——即工具定义和提示工程——上进行优化。一个清晰、精确、约束力强的 JSON Schema，配合恰当的提示，能显著降低 LLM 产生幻觉的概率。

3.1 充分利用 JSON Schema 的威力

JSON Schema 不仅仅是定义数据结构，更是一个强大的数据验证语言。通过合理利用其各种关键字，我们可以为 LLM 提供极其清晰的参数约束。

JSON Schema 关键字	作用	示例	幻觉防御效果
`type`	定义参数的数据类型（如 `string`, `number`, `integer`, `boolean`, `array`, `object`）	`"type": "string"`	防御类型幻觉
`description`	详细描述参数的用途和期望值	`"description": "用户输入的城市名称，例如：北京、上海"`	帮助 LLM 理解参数含义，减少语义错误
`required`	标记参数是否必须提供	`"required": ["location", "product_id"]`	防御缺失必要参数幻觉
`enum`	限制参数值必须是预定义列表中的一个	`"enum": ["celsius", "fahrenheit"]`	防御参数值幻觉 (有限集合)
`pattern`	对字符串类型参数使用正则表达式进行格式约束	`"pattern": "^\d{4}-\d{2}-\d{2}$"` (日期格式)	防御字符串格式幻觉
`minimum`/`maximum`	限制数字类型参数的取值范围	`"minimum": 1, "maximum": 100"`	防御数字值越界幻觉
`minLength`/`maxLength`	限制字符串类型参数的长度	`"minLength": 2, "maxLength": 50"`	防御字符串长度幻觉
`items`	定义数组类型参数中每个元素的 Schema	`"items": {"type": "string"}`	防御数组元素类型幻觉
`properties`	定义对象类型参数的子属性及其 Schema		定义对象结构，配合 `additionalProperties`
`additionalProperties`	关键！设为 `false` 可禁止 LLM 生成 Schema 中未定义的额外属性	`"additionalProperties": false`	强烈推荐！防御额外/未知参数幻觉

一个更复杂的工具 Schema 示例：

complex_tool_definition = [
    {
        "type": "function",
        "function": {
            "name": "create_order",
            "description": "创建一个新的销售订单",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {
                        "type": "string",
                        "description": "客户的唯一标识符，必须是字母数字组合",
                        "pattern": "^[A-Za-z0-9]+$",
                        "minLength": 5,
                        "maxLength": 20
                    },
                    "items": {
                        "type": "array",
                        "description": "订单包含的商品列表",
                        "minItems": 1,
                        "items": {
                            "type": "object",
                            "properties": {
                                "product_id": {
                                    "type": "string",
                                    "description": "商品的唯一标识符",
                                    "pattern": "^P-\d{3}$"
                                },
                                "quantity": {
                                    "type": "integer",
                                    "description": "商品数量，必须是正整数",
                                    "minimum": 1
                                },
                                "price_per_unit": {
                                    "type": "number",
                                    "description": "单价，精确到两位小数",
                                    "minimum": 0.01
                                }
                            },
                            "required": ["product_id", "quantity", "price_per_unit"],
                            "additionalProperties": false # 确保每个item对象没有额外属性
                        }
                    },
                    "shipping_address": {
                        "type": "object",
                        "description": "订单的收货地址",
                        "properties": {
                            "street": {"type": "string"},
                            "city": {"type": "string"},
                            "zip_code": {"type": "string", "pattern": "^\d{5}$"}
                        },
                        "required": ["street", "city", "zip_code"],
                        "additionalProperties": false # 确保地址对象没有额外属性
                    },
                    "payment_method": {
                        "type": "string",
                        "enum": ["credit_card", "paypal", "bank_transfer"],
                        "description": "支付方式"
                    },
                    "notes": {
                        "type": "string",
                        "description": "订单备注，可选",
                        "maxLength": 200
                    }
                },
                "required": ["customer_id", "items", "shipping_address", "payment_method"],
                "additionalProperties": false # 确保整个订单对象没有额外属性
            }
        }
    }
]

在这个 create_order 示例中，我们使用了 pattern 约束 customer_id 和 product_id 的格式，minLength/maxLength 约束长度，minItems 约束数组数量，minimum 约束数字范围，以及最关键的 additionalProperties: false 来防止 LLM 生成任何未定义的额外参数。这些都为 LLM 提供了极其清晰的边界。

3.2 强化的提示工程

除了强大的 Schema，清晰的提示也能引导 LLM 更好地遵守规则。

系统消息 (System Message): 在系统消息中明确强调严格遵守规则的重要性。

system_message = {
    "role": "system",
    "content": "你是一个严格的工具执行助手。你的任务是根据用户请求，精确地调用提供的工具，并严格遵守每个工具的参数 Schema。绝不允许生成未定义的函数、未定义的参数、不符合类型或格式的值，或者额外的参数。如果用户请求无法通过提供的工具实现，请告知用户并拒绝执行。"
}

具体示例 (Few-shot Examples): 如果可能，提供一些高质量的工具调用示例，帮助 LLM 学习正确的模式。
负面示例 (Negative Examples): 偶尔也可以提供一些错误的例子，并指出错误所在，引导 LLM 避免类似错误（但要小心使用，以免混淆模型）。

4. 第二道防线：生成后验证与智能重试

即使有了最完美的 Schema 和提示，LLM 仍然是一个概率模型，偶尔的“失误”在所难免。因此，在接收到 LLM 的工具调用输出后，进行严格的客户端验证是不可或缺的第二道防线。这包括 JSON 格式验证、JSON Schema 验证，以及更深层次的业务逻辑验证。

4.1 JSON 格式验证

这是最基础的验证。LLM 生成的 arguments 字段是一个字符串，必须能被解析为合法的 JSON 对象。

import json

def parse_llm_tool_call_arguments(arg_string: str) -> dict:
    """尝试将 LLM 生成的参数字符串解析为 JSON 对象。"""
    try:
        return json.loads(arg_string)
    except json.JSONDecodeError as e:
        print(f"错误：LLM 生成的参数字符串不是有效的 JSON 格式: {e}")
        return None

# 示例使用
faulty_arg_string = '{"location": "北京", "unit": "celsius"' # 缺少 }
parsed_args = parse_llm_tool_call_arguments(faulty_arg_string)
if parsed_args is None:
    print("JSON 解析失败，需要处理此错误。")

4.2 JSON Schema 验证：严格的结构与类型检查

一旦 JSON 字符串被成功解析，下一步就是使用 JSON Schema 验证器来检查其结构、类型、必需字段、枚举值、模式等是否符合预期的 Schema。Python 中，jsonschema 库是实现这一功能的标准选择。

from jsonschema import validate, ValidationError
import json

# 假设这是我们之前定义的 get_current_weather 工具的 Schema
weather_tool_schema = {
    "type": "object",
    "properties": {
        "location": {
            "type": "string",
            "description": "城市名称，例如：北京、上海"
        },
        "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "温度单位，摄氏度或华氏度"
        }
    },
    "required": ["location"],
    "additionalProperties": False # 关键：禁止额外属性
}

def validate_llm_arguments_with_schema(args: dict, schema: dict) -> bool:
    """
    使用 JSON Schema 验证 LLM 生成的参数。
    返回 True 表示验证通过，False 表示验证失败并打印错误。
    """
    try:
        validate(instance=args, schema=schema)
        return True
    except ValidationError as e:
        print(f"JSON Schema 验证失败：{e.message}")
        print(f"路径: {' -> '.join(map(str, e.path))}")
        print(f"Schema 路径: {' -> '.join(map(str, e.schema_path))}")
        return False

# 模拟 LLM 生成的参数
valid_args = {"location": "上海", "unit": "celsius"}
invalid_type_args = {"location": 123, "unit": "celsius"} # location 类型错误
invalid_enum_args = {"location": "深圳", "unit": "kelvin"} # unit 枚举值错误
missing_required_args = {"unit": "fahrenheit"} # 缺少 location
extra_property_args = {"location": "广州", "unit": "celsius", "time": "now"} # 额外属性

print("n--- 验证示例 ---")
print(f"Valid args: {validate_llm_arguments_with_schema(valid_args, weather_tool_schema)}")
print(f"Invalid type args: {validate_llm_arguments_with_schema(invalid_type_args, weather_tool_schema)}")
print(f"Invalid enum args: {validate_llm_arguments_with_schema(invalid_enum_args, weather_tool_schema)}")
print(f"Missing required args: {validate_llm_arguments_with_schema(missing_required_args, weather_tool_schema)}")
print(f"Extra property args: {validate_llm_arguments_with_schema(extra_property_args, weather_tool_schema)}")

4.3 业务逻辑验证：超越 Schema 的深度检查

JSON Schema 验证器可以确保数据格式正确，但无法验证其业务语义的正确性。例如，它无法知道某个 product_id 是否真实存在于您的商品数据库中，或者某个 date 是否在允许的业务范围内。这类验证需要结合您的后端服务或数据库进行。

from datetime import datetime

# 假设我们有一个简单的商品数据库
PRODUCT_DB = {
    "P-001": {"name": "Laptop", "stock": 10},
    "P-002": {"name": "Mouse", "stock": 50}
}

def is_valid_product_id(product_id: str) -> bool:
    """检查商品ID是否存在于数据库中。"""
    return product_id in PRODUCT_DB

def is_sufficient_stock(product_id: str, quantity: int) -> bool:
    """检查商品库存是否充足。"""
    if product_id in PRODUCT_DB:
        return PRODUCT_DB[product_id]["stock"] >= quantity
    return False

def is_valid_shipping_date(date_str: str) -> bool:
    """检查配送日期是否在未来且合理范围内。"""
    try:
        delivery_date = datetime.strptime(date_str, "%Y-%m-%d")
        now = datetime.now()
        # 假设只能在未来7天内配送
        return now < delivery_date < now + timedelta(days=7)
    except ValueError:
        return False

# 结合业务逻辑进行验证
def validate_create_order_business_logic(order_args: dict) -> list[str]:
    """对 create_order 工具的参数进行业务逻辑验证。"""
    errors = []

    # 验证商品列表
    for item in order_args.get("items", []):
        product_id = item.get("product_id")
        quantity = item.get("quantity")

        if not is_valid_product_id(product_id):
            errors.append(f"商品ID '{product_id}' 不存在。")
        elif not is_sufficient_stock(product_id, quantity):
            errors.append(f"商品 '{product_id}' 库存不足，请求数量 {quantity}。")

    # 假设还可能验证收货地址是否在服务区，这里简化
    # address = order_args.get("shipping_address")
    # if not is_serviceable_area(address.get("zip_code")):
    #     errors.append(f"邮编 '{address.get('zip_code')}' 不在服务区。")

    return errors

# 示例：LLM 生成的 create_order 参数
order_args_from_llm = {
    "customer_id": "CUST123",
    "items": [
        {"product_id": "P-001", "quantity": 1, "price_per_unit": 999.99},
        {"product_id": "P-003", "quantity": 5, "price_per_unit": 10.00} # P-003 不存在
    ],
    "shipping_address": {"street": "Main St", "city": "Anytown", "zip_code": "12345"},
    "payment_method": "credit_card"
}

business_errors = validate_create_order_business_logic(order_args_from_llm)
if business_errors:
    print("n业务逻辑验证失败：")
    for error in business_errors:
        print(f"- {error}")
else:
    print("n业务逻辑验证通过。")

4.4 智能重试与错误反馈

当验证失败时，我们不应该立即放弃，而是将验证失败的信息（包括具体的错误消息）反馈给 LLM，并要求它进行修正。这形成了一个“验证-反馈-重试”的循环。

核心思想：

捕获 LLM 的初始工具调用。
执行所有验证（JSON 解析、Schema 验证、业务逻辑验证）。
如果验证失败，构建一个新的消息，包含原始用户请求、LLM 之前的错误输出，以及详细的验证错误信息。
将这个新消息作为上下文的一部分，再次发送给 LLM，指示其根据错误信息进行修正。
设置重试次数限制，防止无限循环。

import openai
import json
from jsonschema import validate, ValidationError

# 假设您已经配置了 OpenAI API 密钥
# openai.api_key = "YOUR_OPENAI_API_KEY"

# 工具定义 (简化，使用之前的 weather_tool_schema)
tools_definition = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取指定城市当前的实时天气信息",
            "parameters": weather_tool_schema
        }
    }
]

# 模拟 LLM API 调用函数
def call_llm_with_tools(messages: list, tools: list, tool_choice="auto"):
    """
    模拟调用 OpenAI API，返回 tool_calls。
    在真实应用中，这里会调用 openai.ChatCompletion.create 或 client.chat.completions.create
    """
    # 这是一个简化模拟，实际会调用 OpenAI API
    # 例如：
    # client = openai.OpenAI()
    # response = client.chat.completions.create(
    #     model="gpt-4-0125-preview",
    #     messages=messages,
    #     tools=tools,
    #     tool_choice=tool_choice
    # )
    # return response.choices[0].message.tool_calls

    # --- 模拟 LLM 响应 ---
    user_prompt = messages[-1]['content']
    if "纽约" in user_prompt and "华氏度" in user_prompt:
        return [
            {
                "id": "call_1",
                "function": {
                    "name": "get_current_weather",
                    "arguments": '{"location": "纽约", "unit": "fahrenheit"}'
                },
                "type": "function"
            }
        ]
    elif "北京" in user_prompt and "摄氏度" in user_prompt:
        return [
            {
                "id": "call_2",
                "function": {
                    "name": "get_current_weather",
                    "arguments": '{"location": "北京", "unit": "celsius"}'
                },
                "type": "function"
            }
        ]
    elif "上海" in user_prompt and "我不要单位" in user_prompt:
        # 模拟 LLM 忘记 required 参数
        return [
            {
                "id": "call_3",
                "function": {
                    "name": "get_current_weather",
                    "arguments": '{"location": "上海", "unknown_param": "test"}' # 模拟额外参数
                },
                "type": "function"
            }
        ]
    elif "修正" in user_prompt and "额外属性" in user_prompt:
        # 模拟 LLM 修正后的输出
        return [
            {
                "id": "call_4",
                "function": {
                    "name": "get_current_weather",
                    "arguments": '{"location": "上海"}' # 修正后，符合 schema
                },
                "type": "function"
            }
        ]
    else:
        return [] # 没有工具调用

MAX_RETRIES = 3

def process_user_request_with_retries(user_prompt: str, tools: list, tool_schemas: dict):
    messages = [{"role": "user", "content": user_prompt}]
    tool_call_successful = False
    retries = 0

    while not tool_call_successful and retries < MAX_RETRIES:
        print(f"n--- 尝试 {retries + 1}/{MAX_RETRIES} ---")

        # 1. 调用 LLM
        llm_response_tool_calls = call_llm_with_tools(messages, tools)

        if not llm_response_tool_calls:
            print("LLM 未识别到工具调用或拒绝调用工具。")
            tool_call_successful = True # 结束循环，让 LLM 以普通对话模式响应
            break

        first_tool_call = llm_response_tool_calls[0] # 通常我们处理第一个工具调用
        function_name = first_tool_call.function.name
        arguments_str = first_tool_call.function.arguments

        print(f"LLM 建议调用工具: {function_name}")
        print(f"原始参数字符串: {arguments_str}")

        # 2. JSON 解析验证
        parsed_args = parse_llm_tool_call_arguments(arguments_str)
        if parsed_args is None:
            error_message = "LLM 生成的参数不是有效的 JSON 格式。"
            print(f"验证失败: {error_message}")
            messages.append({"role": "assistant", "content": f"我尝试调用工具 `{function_name}`，但其参数 `{arguments_str}` 无法解析为有效的 JSON。请修正。" if retries == 0 else ""})
            messages.append({"role": "user", "content": f"请修正你的工具调用。错误信息：{error_message}"})
            retries += 1
            continue

        # 3. JSON Schema 验证
        if function_name not in tool_schemas:
            error_message = f"LLM 尝试调用一个不存在的工具: `{function_name}`。"
            print(f"验证失败: {error_message}")
            messages.append({"role": "assistant", "content": f"我尝试调用工具 `{function_name}`，但它不存在。请修正。" if retries == 0 else ""})
            messages.append({"role": "user", "content": f"请修正你的工具调用。错误信息：{error_message}"})
            retries += 1
            continue

        current_schema = tool_schemas[function_name]
        is_schema_valid = validate_llm_arguments_with_schema(parsed_args, current_schema)

        if not is_schema_valid:
            # jsonschema 验证函数会打印详细错误
            error_message = "LLM 生成的工具参数不符合预期的 JSON Schema。"
            # 在这里，我们可以捕获 ValidationError 对象，提取更详细的错误信息
            try:
                validate(instance=parsed_args, schema=current_schema)
            except ValidationError as e:
                error_message += f"具体错误: {e.message} (路径: {e.path})"

            messages.append({"role": "assistant", "content": f"我尝试调用工具 `{function_name}`，但其参数 `{arguments_str}` 不符合预期的 Schema。请修正。" if retries == 0 else ""})
            messages.append({"role": "user", "content": f"请修正你的工具调用。错误信息：{error_message}"})
            retries += 1
            continue

        # 4. 业务逻辑验证 (如果需要)
        # business_errors = validate_create_order_business_logic(parsed_args) # 假设这是针对 create_order 的
        # if business_errors:
        #     error_message = "业务逻辑验证失败：" + "; ".join(business_errors)
        #     print(f"验证失败: {error_message}")
        #     messages.append({"role": "user", "content": f"请修正你的工具调用。错误信息：{error_message}"})
        #     retries += 1
        #     continue

        # 所有验证通过
        print(f"所有验证通过！准备执行工具: {function_name}，参数: {parsed_args}")
        # 实际执行工具函数
        # tool_execution_result = execute_tool(function_name, parsed_args)
        # 将结果返回给 LLM
        # messages.append({"role": "tool", "tool_call_id": first_tool_call.id, "content": tool_execution_result})
        tool_call_successful = True # 成功，结束循环

    if not tool_call_successful:
        print(f"n达到最大重试次数 ({MAX_RETRIES})，工具调用失败。可能需要人工介入或采取默认行动。")

    return tool_call_successful, messages

# 构建工具名称到 Schema 的映射，方便查找
tool_schemas_map = {tool['function']['name']: tool['function']['parameters'] for tool in tools_definition}

# 测试用例
print("--- 场景一：LLM 成功调用 ---")
success, final_messages = process_user_request_with_retries("请告诉我纽约现在的华氏度天气。", tools_definition, tool_schemas_map)
print(f"最终结果: {'成功' if success else '失败'}")

print("n--- 场景二：LLM 生成额外参数 (需要修正) ---")
success, final_messages = process_user_request_with_retries("查询上海天气，我不要单位。", tools_definition, tool_schemas_map)
print(f"最终结果: {'成功' if success else '失败'}")

# 如果要看 LLM 修正的实际效果，需要一个真实的 LLM 客户端
# from openai import OpenAI
# client = OpenAI()
# 
# def call_llm_with_tools_real(messages: list, tools: list, tool_choice="auto"):
#     response = client.chat.completions.create(
#         model="gpt-4-0125-preview",
#         messages=messages,
#         tools=tools,
#         tool_choice=tool_choice
#     )
#     return response.choices[0].message.tool_calls
# 
# # 替换模拟函数
# # call_llm_with_tools = call_llm_with_tools_real
# # success, final_messages = process_user_request_with_retries("查询上海天气，我不要单位。", tools_definition, tool_schemas_map)

在上面的 process_user_request_with_retries 函数中，我们构建了一个重试循环。每次验证失败时，都会将详细的错误信息添加到对话历史中，并作为新的用户消息发送给 LLM，指导其进行修正。这种机制极大地提高了系统面对 LLM 幻觉时的鲁棒性。

5. 高级策略与最佳实践

除了上述两道防线，还有一些高级策略和最佳实践可以进一步提升防御能力。

5.1 `tool_choice` 参数的妙用

OpenAI API 提供了一个 tool_choice 参数，可以在某些场景下极大地增强对工具调用的控制：

"none": 强制 LLM 不调用任何工具。当你知道用户请求不涉及工具时，可以节省成本并防止意外的工具调用。
"auto" (默认): 允许 LLM 自行决定是否调用工具，以及调用哪个工具。
{"type": "function", "function": {"name": "my_specific_tool"}}: 强制 LLM 调用 指定的 工具。当用户意图非常明确，或者你希望在多步流程的某个特定阶段只允许调用某个工具时，这非常有用。

# 强制 LLM 调用 get_current_weather 工具
# response = client.chat.completions.create(
#     model="gpt-4-0125-preview",
#     messages=[{"role": "user", "content": "纽约的天气怎么样？"}],
#     tools=tools_definition,
#     tool_choice={"type": "function", "function": {"name": "get_current_weather"}}
# )

使用 tool_choice 强制调用特定工具，可以有效避免函数名幻觉，因为 LLM 被明确告知只能生成该工具的调用。

5.2 Pydantic 与 `instructor` 库：Python 生态的利器

对于 Python 开发者，Pydantic 和 instructor 库是构建健壮函数调用接口的强大组合。

Pydantic: Pydantic 允许你使用 Python 类型注解来定义数据模型。它的一个强大功能是能够自动生成符合模型定义的 JSON Schema。这意味着你可以用 Python 类来定义工具的参数，Pydantic 会处理 Schema 的生成和数据的验证。

from pydantic import BaseModel, Field, conint, conlist
from typing import Literal, List, Optional

class WeatherUnit(str, Literal["celsius", "fahrenheit"]):
    """温度单位，摄氏度或华氏度。"""
    pass

class GetCurrentWeatherArgs(BaseModel):
    """获取指定城市当前的实时天气信息。"""
    location: str = Field(description="城市名称，例如：北京、上海")
    unit: Optional[WeatherUnit] = Field(default="celsius", description="温度单位")

# Pydantic 自动生成 JSON Schema
weather_schema = GetCurrentWeatherArgs.model_json_schema()
print("Pydantic 生成的 Weather Schema:")
print(json.dumps(weather_schema, indent=2))

# Pydantic 也可以直接验证数据
try:
    valid_data = GetCurrentWeatherArgs(location="北京", unit="celsius")
    print(f"nValid Pydantic data: {valid_data.model_dump_json()}")
    invalid_data = GetCurrentWeatherArgs(location="上海", unit="kelvin") # 会抛出 ValidationError
except Exception as e:
    print(f"nPydantic 验证错误: {e}")

通过 Pydantic 定义工具参数，可以确保你的代码与 Schema 定义保持同步，并提供强大的运行时验证能力。

instructor 库: instructor 是一个令人兴奋的库，它构建在 OpenAI API 之上，利用 Pydantic 模型来强制 LLM 生成结构化、符合 Schema 的响应，并内置了自动重试机制。它将 LLM 的输出直接解析为 Pydantic 模型实例，如果解析失败，它会自动将验证错误信息反馈给 LLM 进行修正。

# 需要安装 instructor: pip install instructor openai pydantic
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, ValidationError
from typing import List, Literal, Optional

# 1. 使用 Pydantic 定义工具的参数模型
class WeatherUnit(str, Literal["celsius", "fahrenheit"]):
    """温度单位，摄氏度或华氏度。"""
    pass

class GetCurrentWeatherArgs(BaseModel):
    """获取指定城市当前的实时天气信息。"""
    location: str = Field(description="城市名称，例如：北京、上海")
    unit: Optional[WeatherUnit] = Field(default="celsius", description="温度单位")

# 2. 创建一个 instructor 客户端
# instructor 会自动将 Pydantic 模型转换为 OpenAI 工具定义，并处理解析和验证
client = instructor.patch(OpenAI())

def get_weather_with_instructor(query: str) -> GetCurrentWeatherArgs:
    """
    使用 instructor 库调用 LLM 获取天气工具参数。
    instructor 会自动处理工具定义、LLM 调用、Pydantic 解析和验证，
    并在验证失败时自动重试。
    """
    try:
        # response_model 参数告诉 instructor 期望的输出是一个 GetCurrentWeatherArgs 对象
        # instructor 会在内部将 GetCurrentWeatherArgs 转换为工具 Schema 并传递给 LLM
        # 然后解析 LLM 的工具调用，并尝试将 arguments 字符串解析为 GetCurrentWeatherArgs 实例
        # 如果解析或验证失败，instructor 会自动重试，直到成功或达到内部重试上限
        weather_args = client.chat.completions.create(
            model="gpt-4-0125-preview",
            response_model=GetCurrentWeatherArgs,
            messages=[
                {"role": "system", "content": "你是一个精确的天气助手，专门用于获取天气信息。"},
                {"role": "user", "content": query}
            ],
            max_retries=3 # instructor 内置重试
        )
        return weather_args
    except ValidationError as e:
        print(f"Instructor 最终验证失败: {e}")
        raise
    except Exception as e:
        print(f"Instructor 调用或重试过程中发生错误: {e}")
        raise

# 示例调用
print("n--- Instructor 示例 ---")
try:
    # 正常情况
    weather_params = get_weather_with_instructor("纽约的天气怎么样，用华氏度？")
    print(f"成功获取参数: {weather_params.model_dump()}")

    # 模拟 LLM 犯错，instructor 应该能修正 (例如，如果 LLM 初始返回 unit='kelvin')
    # 注意：这里需要真实的 LLM 调用才能看到 instructor 的修正效果
    # 如果 LLM 总是返回正确的，这个例子就体现不出修正
    weather_params_corrected = get_weather_with_instructor("帮我查一下上海的天气，单位是摄氏度，但要非常精确地告诉我。")
    print(f"修正后获取参数: {weather_params_corrected.model_dump()}")

    # 模拟 LLM 无法修正的极端情况 (例如，用户请求太模糊，LLM 无法生成有效参数)
    # 这可能会导致 Instructor 达到 max_retries 并抛出 ValidationError
    # weather_params_fail = get_weather_with_instructor("随便给我点天气信息。")
    # print(f"失败尝试结果: {weather_params_fail.model_dump()}")

except Exception as e:
    print(f"Instructor 演示捕获到错误: {e}")

instructor 库极大地简化了函数调用中的验证和重试逻辑，是 Python 开发者构建可靠 LLM 应用的利器。

5.3 动态 Schema 与预计算枚举值

对于某些参数，其有效值可能不是静态的，而是来自数据库查询或外部 API。例如，一个 product_id 必须是当前库存中的商品 ID。

在这种情况下，我们可以在每次调用 LLM 之前，动态地查询这些有效值，并将其注入到 JSON Schema 的 enum 字段中。

def get_active_product_ids() -> List[str]:
    """模拟从数据库获取当前活跃的商品ID列表。"""
    return ["P-001", "P-002", "P-004"] # 假设 P-003 已下架

def get_dynamic_create_order_schema(product_ids: List[str]):
    """动态生成 create_order 工具的 Schema，包含实时的 product_id 枚举。"""
    schema = {
        "type": "object",
        "properties": {
            "customer_id": {
                "type": "string",
                "description": "客户的唯一标识符",
                "pattern": "^[A-Za-z0-9]+$"
            },
            "items": {
                "type": "array",
                "minItems": 1,
                "items": {
                    "type": "object",
                    "properties": {
                        "product_id": {
                            "type": "string",
                            "description": "商品的唯一标识符，必须是当前在售商品之一",
                            "enum": product_ids # 动态注入
                        },
                        "quantity": {
                            "type": "integer",
                            "minimum": 1
                        }
                    },
                    "required": ["product_id", "quantity"],
                    "additionalProperties": False
                }
            },
            # ... 其他属性 ...
        },
        "required": ["customer_id", "items"],
        "additionalProperties": False
    }
    return schema

# 获取当前活跃商品ID
active_product_ids = get_active_product_ids()

# 生成包含动态枚举的工具定义
dynamic_tools_definition = [
    {
        "type": "function",
        "function": {
            "name": "create_order",
            "description": "创建一个新的销售订单",
            "parameters": get_dynamic_create_order_schema(active_product_ids)
        }
    }
]

print("n--- 动态 Schema 示例 ---")
print(json.dumps(dynamic_tools_definition[0]['function']['parameters']['properties']['items']['items']['properties']['product_id'], indent=2))

# 此时，如果 LLM 尝试生成 "product_id": "P-003"，在 Schema 验证阶段就会被捕获。

这种方法虽然增加了 Schema 生成的复杂性，但能极大地提高 LLM 生成参数的准确性，有效防御“参数值幻觉”中那些业务上无效的值。

5.4 人工干预与回退机制

即使有了最完善的防御体系，也可能遇到 LLM 无法理解或无法纠正的情况，或者在极少数情况下 LLM 仍然生成了“合法但错误”的输出（例如，一个确实存在的 product_id 但用户本意不是那个）。

日志记录： 记录所有验证失败的尝试，包括原始请求、LLM 输出、错误消息和重试次数。这些日志是宝贵的调试和模型改进数据。
人工审核： 对于高风险或关键业务操作，可以引入人工审核环节。如果自动重试达到上限仍未成功，可以将请求转交给人工处理。
安全回退： 定义明确的回退策略。例如，如果无法创建订单，不是简单报错，而是告知用户“我们暂时无法处理您的请求，请稍后再试或联系客服”，并记录详细错误。

6. 性能考虑与权衡

实施这些防御机制会带来额外的计算开销和潜在的延迟：

计算成本： 每次 LLM 调用和重试都会消耗 API 额度。复杂的 Schema 验证和业务逻辑验证也需要计算资源。
延迟： LLM 的重试机制意味着一个请求可能需要多次 API 往返，显著增加响应时间。
复杂性： 引入 Pydantic、jsonschema、重试逻辑等会增加代码库的复杂性。

在设计系统时，需要根据应用程序对错误容忍度、性能要求和成本预算进行权衡。对于低风险的、对实时性要求不高的场景，可以放宽一些验证或重试策略；而对于金融交易、医疗诊断等高风险场景，则必须采取最严格的防御措施。

7. 展望未来

LLM 技术仍在飞速发展，我们可以期待未来在以下方面取得进展：

更强大的模型内在约束： 未来的 LLM 可能会更好地理解并遵守 JSON Schema，减少幻觉的发生。
更智能的错误反馈： 模型可能能够更精准地理解验证错误，并自主地进行更有效的修正。
标准化框架： 可能会出现更统一、更强大的框架，以简化 LLM 函数调用的定义、验证和执行。

在此之前，作为开发者，我们必须清醒地认识到 LLM 的局限性，并积极构建多层次的防御体系，将 LLM 的输出视为“未经验证的外部输入”，对其进行严格的检查和处理。

结语

在构建基于 LLM 的智能应用时，函数调用无疑是一个强大的功能，它赋予了 LLM 连接外部世界并执行实际行动的能力。然而，这种能力也伴随着 LLM 固有的概率性和不确定性。通过本讲座，我们探讨了一套全面的幻觉防御策略，涵盖了从前端的 Schema 设计、提示工程，到后端的 JSON 解析、Schema 验证、业务逻辑验证，以及智能重试和高级库（如 Pydantic 和 instructor）的运用。

核心思想是：构建多层次的防御体系，将 LLM 的输出视为需要严格审查的外部输入，并通过清晰的约束、运行时验证和智能的错误处理，确保其行为的可靠性和安全性。只有这样，我们才能真正释放 LLM 的潜力，构建出稳定、高效且值得信赖的智能系统。