大家好,作为一名编程专家,今天我们来深入探讨一个在大型语言模型(LLM)应用中日益关键的概念:’Verification Nodes’(验证节点)。随着LLM能力的飞速提升,它们生成的内容——无论是代码、文本、数据还是复杂指令——正被广泛应用于生产系统。然而,LLM的“幻觉”(hallucination)、逻辑错误、事实偏差以及潜在的安全风险,使得我们不能盲目信任其输出。如何系统、自动化地校验LLM生成内容的正确性,成为了构建可靠AI应用的核心挑战。
今天,我将围绕“利用确定性算法(如代码运行结果)来校验LLM生成内容的正确性”这一主题,为大家详细讲解验证节点的设计理念、技术实现、应用场景以及面临的挑战。
LLM的挑战与验证需求
在深入验证节点之前,我们必须首先理解为何需要它们。LLM的强大能力常常掩盖了其固有的局限性,这些局限性在实际应用中可能导致严重后果。
1. 幻觉问题 (Hallucination)
LLM在生成内容时,有时会“编造”事实、数据或引用,这些内容听起来非常合理,但实际上是虚构的。例如,LLM可能会生成一篇看似专业的技术文章,其中引用了不存在的论文或概念。对于依赖事实准确性的应用来说,这是致命的缺陷。
2. 代码错误 (Code Errors)
LLM在代码生成方面表现出色,但生成的代码往往存在语法错误、逻辑错误、API使用不当或安全漏洞。这些错误如果未经校验直接部署,可能导致系统崩溃、数据损坏甚至安全漏洞。例如,一个LLM可能会生成一段Python代码来处理文件,但忘记关闭文件句柄,导致资源泄露。
3. 事实性错误 (Factual Inaccuracies)
即使不涉及“幻觉”,LLM也可能因为训练数据的时间限制、数据偏差或未能充分理解上下文而生成过时或不准确的事实信息。例如,询问最新的经济数据或法律条文时,LLM可能给出几年前的信息。
4. 逻辑不一致 (Logical Inconsistencies)
在处理复杂的多步骤任务或需要严格逻辑推理的场景时,LLM可能会在生成的不同部分之间产生逻辑冲突或不一致。例如,生成一个软件架构设计,但在不同模块的交互描述上出现矛盾。
5. 安全风险 (Security Risks)
LLM生成的代码或配置可能无意中引入安全漏洞,如SQL注入、跨站脚本(XSS)漏洞、不安全的API密钥处理等。更甚者,恶意用户可能通过“提示注入”(prompt injection)等攻击方式,诱导LLM生成恶意代码或指令。
6. 信任危机 (Trust Crisis)
上述所有问题最终都会导致用户对LLM生成内容失去信任。如果一个AI系统频繁出错,用户将不再愿意依赖它,这严重阻碍了AI技术的广泛采纳和应用。
为了解决这些问题,我们需要一个可靠、自动化且高效的机制来对LLM的输出进行“质量控制”。这就是“验证节点”的用武之地。
什么是验证节点 (Verification Nodes)?
验证节点是一组专门设计用于接收、分析并使用确定性算法校验LLM生成内容的独立服务或组件。它们的核心思想是:将LLM的非确定性(或至少是难以预测的)生成过程与确定性、可重复的验证过程解耦。
想象一下一个工厂的生产线:LLM是负责生产产品的机器,而验证节点则是产线末端的质量检测站。这些检测站不依赖于猜测或模糊的判断,而是严格按照预设的标准和程序(确定性算法)来检查每一个产品。
确定性算法的核心作用
确定性算法指的是,给定相同的输入,算法总会产生相同的输出。这与LLM的生成过程形成鲜明对比,LLM即使在相同的提示下,也可能生成略有不同的文本。在验证场景中,确定性是至关重要的,因为它保证了验证结果的可靠性和可重复性。
例如,运行一段Python代码,如果输入数据和环境不变,其执行结果(包括是否报错、输出内容、变量状态等)应该是完全一致的。这就是确定性。通过这种确定性,我们可以客观地判断LLM生成的代码是否“正确”地完成了任务。
验证节点与传统验证方式的对比
| 特性 | 人工审查 | RAG (Retrieval Augmented Generation) | 验证节点 (Verification Nodes) |
|---|---|---|---|
| 自动化程度 | 低 | 中(增强生成,但仍需人工验证) | 高 |
| 准确性 | 高(依赖专家经验),但易受主观性影响 | 提高事实准确性,但无法校验逻辑或代码正确性 | 极高(基于确定性算法),客观、可重复 |
| 速度 | 慢,无法扩展 | 相对较快 | 极快,可并行处理,高吞吐量 |
| 成本 | 高昂的人力成本 | 数据检索和模型推理成本 | 计算资源成本(沙箱、执行环境),开发成本高 |
| 校验范围 | 广,可处理复杂语义和主观判断 | 主要处理事实性内容 | 代码逻辑、数据格式、API行为、结构化输出等 |
| 核心机制 | 人类认知与判断 | 外部知识库检索 | 确定性算法执行 |
确定性算法的核心作用与分类
现在,让我们深入探讨验证节点可以采用的各种确定性算法。这些算法是验证节点能够提供可靠校验的基石。
1. 代码执行与结果校验
这是最直接也最强大的验证方式之一。如果LLM生成了代码(无论是Python、JavaScript、SQL还是Shell脚本),验证节点可以直接在沙箱环境中执行这些代码,并检查其运行时行为和输出。
1.1 Python 代码执行
我们可以通过运行LLM生成的Python代码,捕获其标准输出、错误输出以及执行结果。
场景示例: LLM被要求生成一个计算斐波那契数列第n项的函数。
LLM生成的代码 (示例):
# fibonacci_calculator.py
def fibonacci(n):
if n <= 0:
return 0
elif n == 1:
return 1
else:
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
if __name__ == '__main__':
import sys
try:
num = int(sys.argv[1])
print(f"Fibonacci({num}) = {fibonacci(num)}")
except (IndexError, ValueError):
print("Usage: python fibonacci_calculator.py <integer>")
sys.exit(1)
验证节点中的 Python 执行逻辑:
为了安全和隔离,我们通常会在一个独立的进程甚至容器中执行LLM生成的代码。
import subprocess
import json
import os
import tempfile
import time
def execute_python_code(code: str, test_inputs: list, timeout: int = 10) -> dict:
"""
在沙箱环境中执行Python代码并测试。
:param code: LLM生成的Python代码字符串。
:param test_inputs: 一个列表,包含要传递给代码的测试输入参数。
例如: [[5], [10], [-1], ["invalid"]]
:param timeout: 代码执行的最大超时时间(秒)。
:return: 包含执行结果的字典。
"""
results = []
# 创建临时文件来存放LLM生成的代码
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
code_file_path = f.name
try:
for inputs in test_inputs:
input_args = [str(arg) for arg in inputs]
command = ["python", code_file_path] + input_args
try:
# 使用subprocess运行代码,设置超时,并捕获stdout和stderr
process = subprocess.run(
command,
capture_output=True,
text=True,
timeout=timeout,
check=False # 不抛出CalledProcessError,我们手动检查returncode
)
# 检查返回值
if process.returncode == 0:
status = "success"
else:
status = "error"
results.append({
"input": inputs,
"status": status,
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip(),
"return_code": process.returncode
})
except subprocess.TimeoutExpired:
results.append({
"input": inputs,
"status": "timeout",
"stdout": "",
"stderr": f"Code execution timed out after {timeout} seconds.",
"return_code": -1
})
except Exception as e:
results.append({
"input": inputs,
"status": "internal_error",
"stdout": "",
"stderr": f"Internal verification error: {str(e)}",
"return_code": -1
})
finally:
# 清理临时文件
if os.path.exists(code_file_path):
os.remove(code_file_path)
return {"test_results": results}
# 假设LLM生成了上面的fibonacci函数代码
llm_generated_code = """
# fibonacci_calculator.py
def fibonacci(n):
if n <= 0:
return 0
elif n == 1:
return 1
else:
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
if __name__ == '__main__':
import sys
try:
num = int(sys.argv[1])
print(f"Fibonacci({num}) = {fibonacci(num)}")
except (IndexError, ValueError):
print("Usage: python fibonacci_calculator.py <integer>")
sys.exit(1)
"""
test_cases = [
[5], # 预期输出: Fibonacci(5) = 5
[10], # 预期输出: Fibonacci(10) = 55
[-1], # 预期输出: Usage: python fibonacci_calculator.py <integer> (或自定义错误信息)
[0], # 预期输出: Fibonacci(0) = 0
["abc"] # 预期输出: Usage: python fibonacci_calculator.py <integer>
]
# 执行验证
verification_output = execute_python_code(llm_generated_code, test_cases)
print("--- Verification Output ---")
print(json.dumps(verification_output, indent=2))
# 进一步分析结果
expected_outputs = {
"Fibonacci(5) = 5",
"Fibonacci(10) = 55",
"Fibonacci(0) = 0"
}
overall_status = "PASSED"
for result in verification_output["test_results"]:
if result["status"] == "success":
if result["input"] in [[5], [10], [0]] and result["stdout"] not in expected_outputs:
print(f"FAIL: Input {result['input']} - Unexpected stdout: {result['stdout']}")
overall_status = "FAILED"
elif result["input"] in [[-1], ["abc"]] and "Usage: python fibonacci_calculator.py" not in result["stdout"] and "Usage: python fibonacci_calculator.py" not in result["stderr"]:
print(f"FAIL: Input {result['input']} - Expected usage error, got: {result['stdout']} / {result['stderr']}")
overall_status = "FAILED"
else:
print(f"PASS: Input {result['input']} - Output: {result['stdout']}")
elif result["status"] == "error":
print(f"FAIL: Input {result['input']} - Runtime Error: {result['stderr']}")
overall_status = "FAILED"
elif result["status"] == "timeout":
print(f"FAIL: Input {result['input']} - Timeout: {result['stderr']}")
overall_status = "FAILED"
else:
print(f"FAIL: Input {result['input']} - Internal Error: {result['stderr']}")
overall_status = "FAILED"
print(f"nOverall Verification Status: {overall_status}")
说明:
subprocess.run是Python中执行外部命令的标准方式。timeout参数防止恶意或死循环代码无限运行。capture_output=True捕获标准输出和标准错误。text=True将输出解码为文本。check=False允许我们手动处理非零返回码。- 将LLM代码写入临时文件并执行,确保隔离。
- 沙箱化: 在生产环境中,
subprocess应该被更强大的沙箱机制(如Docker容器、gVisor或Firecracker microVMs)取代,以彻底隔离LLM生成的代码,防止其访问敏感资源或执行恶意操作。
1.2 JavaScript (Node.js) 代码执行
类似地,如果LLM生成了JavaScript代码,我们可以使用Node.js来执行。
LLM生成的代码 (示例):
// array_sum.js
function sumArray(arr) {
if (!Array.isArray(arr)) {
throw new Error("Input must be an array.");
}
return arr.reduce((acc, current) => acc + current, 0);
}
if (require.main === module) {
const input = JSON.parse(process.argv[2]);
try {
const result = sumArray(input);
console.log(JSON.stringify({ success: true, result: result }));
} catch (error) {
console.error(JSON.stringify({ success: false, error: error.message }));
process.exit(1);
}
}
验证节点中的 Node.js 执行逻辑:
import subprocess
import json
import os
import tempfile
import time
def execute_nodejs_code(code: str, test_inputs: list, timeout: int = 10) -> dict:
"""
在沙箱环境中执行Node.js代码并测试。
:param code: LLM生成的JavaScript代码字符串。
:param test_inputs: 一个列表,包含要传递给代码的测试输入数据(通常是JSON格式)。
例如: [[[1,2,3]], [[10,20]], ["not_an_array"]]
:param timeout: 代码执行的最大超时时间(秒)。
:return: 包含执行结果的字典。
"""
results = []
with tempfile.NamedTemporaryFile(mode='w', suffix='.js', delete=False) as f:
f.write(code)
code_file_path = f.name
try:
for inputs in test_inputs:
# Node.js通常通过命令行参数或stdin接收JSON输入
input_json = json.dumps(inputs[0]) # 假设每个测试用例只传递一个参数
command = ["node", code_file_path, input_json]
try:
process = subprocess.run(
command,
capture_output=True,
text=True,
timeout=timeout,
check=False
)
output = {}
try:
# 尝试解析stdout为JSON
output = json.loads(process.stdout.strip())
except json.JSONDecodeError:
output = {"success": False, "error": f"Invalid JSON output: {process.stdout.strip()}"}
results.append({
"input": inputs,
"status": "success" if process.returncode == 0 and output.get("success") else "error",
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip(),
"parsed_output": output,
"return_code": process.returncode
})
except subprocess.TimeoutExpired:
results.append({
"input": inputs,
"status": "timeout",
"stdout": "",
"stderr": f"Code execution timed out after {timeout} seconds.",
"return_code": -1
})
except Exception as e:
results.append({
"input": inputs,
"status": "internal_error",
"stdout": "",
"stderr": f"Internal verification error: {str(e)}",
"return_code": -1
})
finally:
if os.path.exists(code_file_path):
os.remove(code_file_path)
return {"test_results": results}
# 假设LLM生成了上面的sumArray函数代码
llm_generated_js_code = """
// array_sum.js
function sumArray(arr) {
if (!Array.isArray(arr)) {
throw new Error("Input must be an array.");
}
return arr.reduce((acc, current) => acc + current, 0);
}
if (require.main === module) {
const input = JSON.parse(process.argv[2]);
try {
const result = sumArray(input);
console.log(JSON.stringify({ success: true, result: result }));
} catch (error) {
console.error(JSON.stringify({ success: false, error: error.message }));
process.exit(1);
}
}
"""
js_test_cases = [
[[1,2,3]], # 预期: {success: true, result: 6}
[[10,20,30]],# 预期: {success: true, result: 60}
[[]], # 预期: {success: true, result: 0}
["not_an_array"] # 预期: {success: false, error: "Input must be an array."}
]
js_verification_output = execute_nodejs_code(llm_generated_js_code, js_test_cases)
print("n--- JS Verification Output ---")
print(json.dumps(js_verification_output, indent=2))
js_overall_status = "PASSED"
for result in js_verification_output["test_results"]:
if result["status"] == "success":
if result["input"][0] == [1,2,3] and result["parsed_output"].get("result") != 6:
js_overall_status = "FAILED"
print(f"FAIL: Input {result['input']} - Expected 6, got {result['parsed_output'].get('result')}")
elif result["input"][0] == [10,20,30] and result["parsed_output"].get("result") != 60:
js_overall_status = "FAILED"
print(f"FAIL: Input {result['input']} - Expected 60, got {result['parsed_output'].get('result')}")
elif result["input"][0] == [] and result["parsed_output"].get("result") != 0:
js_overall_status = "FAILED"
print(f"FAIL: Input {result['input']} - Expected 0, got {result['parsed_output'].get('result')}")
else:
print(f"PASS: Input {result['input']} - Output: {result['parsed_output']}")
elif result["status"] == "error":
if result["input"][0] == "not_an_array" and "Input must be an array." in result["parsed_output"].get("error", ""):
print(f"PASS: Input {result['input']} - Correctly handled error: {result['parsed_output'].get('error')}")
else:
js_overall_status = "FAILED"
print(f"FAIL: Input {result['input']} - Unexpected error: {result['stderr']} / {result['parsed_output']}")
else:
js_overall_status = "FAILED"
print(f"FAIL: Input {result['input']} - Status: {result['status']}, Error: {result['stderr']}")
print(f"nOverall JS Verification Status: {js_overall_status}")
1.3 SQL 查询执行
LLM可能生成SQL查询来提取或修改数据。验证节点可以在一个受控的、隔离的数据库实例(例如SQLite内存数据库或临时Dockerized PostgreSQL/MySQL)中执行这些查询,并检查结果集。
LLM生成的SQL (示例):
-- employees_by_department.sql
SELECT name, email FROM employees WHERE department = 'Sales' ORDER BY name;
验证节点中的 SQL 执行逻辑 (使用 SQLite 内存数据库):
import sqlite3
import json
def execute_sql_query(sql_query: str, db_schema: dict) -> dict:
"""
在内存SQLite数据库中执行SQL查询并返回结果。
:param sql_query: LLM生成的SQL查询字符串。
:param db_schema: 定义数据库表的字典,例如:
{"employees": ["id INTEGER PRIMARY KEY", "name TEXT", "email TEXT", "department TEXT"]}
:return: 包含查询结果或错误信息的字典。
"""
conn = None
try:
conn = sqlite3.connect(':memory:') # 使用内存数据库
cursor = conn.cursor()
# 根据db_schema创建表
for table_name, columns in db_schema.items():
create_table_sql = f"CREATE TABLE IF NOT EXISTS {table_name} ({', '.join(columns)})"
cursor.execute(create_table_sql)
# 插入一些模拟数据
cursor.execute("INSERT INTO employees (name, email, department) VALUES ('Alice', '[email protected]', 'Sales')")
cursor.execute("INSERT INTO employees (name, email, department) VALUES ('Bob', '[email protected]', 'Engineering')")
cursor.execute("INSERT INTO employees (name, email, department) VALUES ('Charlie', '[email protected]', 'Sales')")
cursor.execute("INSERT INTO employees (name, email, department) VALUES ('David', '[email protected]', 'Marketing')")
conn.commit()
# 执行LLM生成的查询
cursor.execute(sql_query)
rows = cursor.fetchall()
column_names = [description[0] for description in cursor.description]
return {
"status": "success",
"columns": column_names,
"rows": rows
}
except sqlite3.Error as e:
return {
"status": "error",
"message": str(e)
}
finally:
if conn:
conn.close()
# 假设LLM生成了上面的SQL查询
llm_generated_sql = "SELECT name, email FROM employees WHERE department = 'Sales' ORDER BY name;"
llm_generated_malicious_sql = "DROP TABLE employees; SELECT * FROM users;" # 恶意SQL示例
db_schema_definition = {
"employees": ["id INTEGER PRIMARY KEY", "name TEXT", "email TEXT", "department TEXT"]
}
# 验证正确SQL
sql_verification_output = execute_sql_query(llm_generated_sql, db_schema_definition)
print("n--- SQL Verification Output (Correct) ---")
print(json.dumps(sql_verification_output, indent=2))
expected_rows = [('Alice', '[email protected]'), ('Charlie', '[email protected]')]
if sql_verification_output["status"] == "success" and
sql_verification_output["columns"] == ['name', 'email'] and
sql_verification_output["rows"] == expected_rows:
print("Overall SQL Verification Status: PASSED")
else:
print("Overall SQL Verification Status: FAILED")
# 验证恶意SQL (应被隔离和捕获)
malicious_sql_verification_output = execute_sql_query(llm_generated_malicious_sql, db_schema_definition)
print("n--- SQL Verification Output (Malicious) ---")
print(json.dumps(malicious_sql_verification_output, indent=2))
if malicious_sql_verification_output["status"] == "error":
print("Overall Malicious SQL Verification Status: PASSED (Correctly identified error/malice)")
else:
print("Overall Malicious SQL Verification Status: FAILED (Malicious SQL executed successfully)")
说明:
- 使用内存数据库确保每次验证都在一个干净、隔离的环境中进行。
- 通过定义
db_schema来模拟目标数据库的结构,使LLM的查询有目标可依。 - SQL注入防护: 对于生产系统,绝不能直接执行用户或LLM生成的、未经参数化的SQL。这里是为了演示确定性执行。实际应用中,LLM生成的SQL应先经过AST解析、安全审计或仅限于预定义的模板。
1.4 Shell 脚本执行
对于生成自动化脚本或系统命令的场景,验证节点可以执行Shell脚本。
LLM生成的Shell脚本 (示例):
#!/bin/bash
# count_files.sh
if [ -z "$1" ]; then
echo "Usage: $0 <directory>"
exit 1
fi
DIR=$1
if [ ! -d "$DIR" ]; then
echo "Error: Directory '$DIR' not found."
exit 1
fi
echo "Number of files in $DIR: $(find "$DIR" -maxdepth 1 -type f | wc -l)"
验证节点中的 Shell 执行逻辑:
import subprocess
import os
import tempfile
def execute_shell_script(script_content: str, test_args: list, timeout: int = 10) -> dict:
"""
在沙箱环境中执行Shell脚本。
:param script_content: LLM生成的Shell脚本字符串。
:param test_args: 传递给脚本的命令行参数。
:param timeout: 超时时间(秒)。
:return: 包含执行结果的字典。
"""
results = []
with tempfile.NamedTemporaryFile(mode='w', suffix='.sh', delete=False) as f:
f.write(script_content)
script_file_path = f.name
# 赋予执行权限
os.chmod(script_file_path, 0o755)
# 创建一个临时目录用于测试,避免影响系统
with tempfile.TemporaryDirectory() as test_dir:
# 在测试目录中创建一些文件
with open(os.path.join(test_dir, "file1.txt"), "w") as f: f.write("test")
with open(os.path.join(test_dir, "file2.log"), "w") as f: f.write("test")
os.mkdir(os.path.join(test_dir, "subdir"))
for args in test_args:
full_command = [script_file_path] + args
# 如果参数是目录,替换为我们的临时测试目录
processed_args = [test_dir if arg == "<temp_dir>" else arg for arg in args]
full_command = [script_file_path] + processed_args
try:
process = subprocess.run(
full_command,
capture_output=True,
text=True,
timeout=timeout,
check=False
)
results.append({
"input_args": args,
"status": "success" if process.returncode == 0 else "error",
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip(),
"return_code": process.returncode
})
except subprocess.TimeoutExpired:
results.append({
"input_args": args,
"status": "timeout",
"stdout": "",
"stderr": f"Script execution timed out after {timeout} seconds.",
"return_code": -1
})
except Exception as e:
results.append({
"input_args": args,
"status": "internal_error",
"stdout": "",
"stderr": f"Internal verification error: {str(e)}",
"return_code": -1
})
finally:
if os.path.exists(script_file_path):
os.remove(script_file_path)
return {"test_results": results}
# 假设LLM生成了上面的Shell脚本
llm_generated_shell_script = """
#!/bin/bash
# count_files.sh
if [ -z "$1" ]; then
echo "Usage: $0 <directory>"
exit 1
fi
DIR=$1
if [ ! -d "$DIR" ]; then
echo "Error: Directory '$DIR' not found."
exit 1
fi
echo "Number of files in $DIR: $(find "$DIR" -maxdepth 1 -type f | wc -l)"
"""
shell_test_cases = [
[], # 预期: Usage error
["/nonexistent"],# 预期: Directory not found error
["<temp_dir>"] # 预期: Number of files in <temp_dir>: 2 (因为我们创建了2个文件)
]
shell_verification_output = execute_shell_script(llm_generated_shell_script, shell_test_cases)
print("n--- Shell Verification Output ---")
print(json.dumps(shell_verification_output, indent=2))
shell_overall_status = "PASSED"
for result in shell_verification_output["test_results"]:
if result["input_args"] == [] and "Usage: " not in result["stdout"]:
print(f"FAIL: Input {result['input_args']} - Expected usage error, got: {result['stdout']}")
shell_overall_status = "FAILED"
elif result["input_args"] == ["/nonexistent"] and "Error: Directory '/nonexistent' not found." not in result["stderr"]:
print(f"FAIL: Input {result['input_args']} - Expected directory not found error, got: {result['stderr']}")
shell_overall_status = "FAILED"
elif result["input_args"] == ["<temp_dir>"] and "Number of files in" not in result["stdout"] and "2" not in result["stdout"]:
print(f"FAIL: Input {result['input_args']} - Expected file count, got: {result['stdout']}")
shell_overall_status = "FAILED"
else:
print(f"PASS: Input {result['input_args']} - Output: {result['stdout']} / {result['stderr']}")
print(f"nOverall Shell Verification Status: {shell_overall_status}")
2. 数据结构与格式校验 (Schema Validation)
LLM经常被要求生成JSON、XML或其他结构化数据。我们可以使用预定义的Schema来验证这些数据的结构和类型。
场景示例: LLM生成一个用户配置的JSON对象。
LLM生成的JSON (示例):
{
"username": "john_doe",
"email": "[email protected]",
"preferences": {
"theme": "dark",
"notifications": true
},
"roles": ["admin", "editor"]
}
JSON Schema 定义 (示例):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "User Configuration",
"description": "Schema for user configuration data",
"type": "object",
"required": ["username", "email", "preferences"],
"properties": {
"username": {
"type": "string",
"pattern": "^[a-z0-9_]{3,16}$"
},
"email": {
"type": "string",
"format": "email"
},
"preferences": {
"type": "object",
"required": ["theme", "notifications"],
"properties": {
"theme": {
"type": "string",
"enum": ["dark", "light", "system"]
},
"notifications": {
"type": "boolean"
}
},
"additionalProperties": false
},
"roles": {
"type": "array",
"items": {
"type": "string"
},
"uniqueItems": true
}
},
"additionalProperties": false
}
验证节点中的 JSON Schema 校验逻辑 (使用 jsonschema 库):
import json
from jsonschema import validate, ValidationError
def validate_json_with_schema(json_data: dict, schema: dict) -> dict:
"""
使用JSON Schema验证JSON数据。
:param json_data: LLM生成的JSON数据(Python字典)。
:param schema: JSON Schema定义(Python字典)。
:return: 包含验证结果的字典。
"""
try:
validate(instance=json_data, schema=schema)
return {"status": "success", "message": "JSON is valid against schema."}
except ValidationError as e:
return {"status": "error", "message": f"JSON validation failed: {e.message}", "path": list(e.path)}
except Exception as e:
return {"status": "internal_error", "message": f"Internal error during schema validation: {str(e)}"}
# 假设LLM生成了正确和错误的JSON
llm_generated_valid_json = {
"username": "john_doe",
"email": "[email protected]",
"preferences": {
"theme": "dark",
"notifications": True
},
"roles": ["admin", "editor"]
}
llm_generated_invalid_json = {
"username": "johndoe!", # 不符合pattern
"email": "invalid-email", # 不符合format
"preferences": {
"theme": "red", # 不符合enum
"notifications": "yes" # 类型错误
},
"extra_field": "should not be here" # additionalProperties: false
}
schema_definition = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "User Configuration",
"description": "Schema for user configuration data",
"type": "object",
"required": ["username", "email", "preferences"],
"properties": {
"username": {
"type": "string",
"pattern": "^[a-z0-9_]{3,16}$"
},
"email": {
"type": "string",
"format": "email"
},
"preferences": {
"type": "object",
"required": ["theme", "notifications"],
"properties": {
"theme": {
"type": "string",
"enum": ["dark", "light", "system"]
},
"notifications": {
"type": "boolean"
}
},
"additionalProperties": False
},
"roles": {
"type": "array",
"items": {
"type": "string"
},
"uniqueItems": True
}
},
"additionalProperties": False
}
# 验证正确JSON
valid_json_verification = validate_json_with_schema(llm_generated_valid_json, schema_definition)
print("n--- Valid JSON Verification ---")
print(json.dumps(valid_json_verification, indent=2))
print(f"Overall Valid JSON Status: {valid_json_verification['status'].upper()}")
# 验证错误JSON
invalid_json_verification = validate_json_with_schema(llm_generated_invalid_json, schema_definition)
print("n--- Invalid JSON Verification ---")
print(json.dumps(invalid_json_verification, indent=2))
print(f"Overall Invalid JSON Status: {invalid_json_verification['status'].upper()} (Expected error)")
3. 正则表达式匹配 (Regex)
对于文本内容的格式、关键词或模式验证,正则表达式是非常高效和确定性的工具。
场景示例: 验证LLM生成的日志行是否符合特定格式。
LLM生成的日志行 (示例):
[2023-10-27 10:30:05] INFO: User 'admin' logged in from 192.168.1.100
[2023-10-27 10:30:10] ERROR: Failed to connect to DB: connection refused
Regex 规则 (示例):
^[d{4}-d{2}-d{2} d{2}:d{2}:d{2}] (INFO|WARN|ERROR): .*
验证节点中的 Regex 校验逻辑:
import re
def validate_text_with_regex(text: str, regex_pattern: str) -> dict:
"""
使用正则表达式验证文本。
:param text: LLM生成的文本字符串。
:param regex_pattern: 正则表达式模式。
:return: 包含验证结果的字典。
"""
try:
if re.match(regex_pattern, text):
return {"status": "success", "message": "Text matches regex pattern."}
else:
return {"status": "error", "message": "Text does not match regex pattern."}
except re.error as e:
return {"status": "internal_error", "message": f"Invalid regex pattern: {str(e)}"}
# 假设LLM生成了日志行
llm_generated_log_line_valid = "[2023-10-27 10:30:05] INFO: User 'admin' logged in from 192.168.1.100"
llm_generated_log_line_invalid = "10/27/2023 ERROR: Something went wrong"
log_regex_pattern = r"^[d{4}-d{2}-d{2} d{2}:d{2}:d{2}] (INFO|WARN|ERROR): .*"
# 验证正确日志
valid_log_verification = validate_text_with_regex(llm_generated_log_line_valid, log_regex_pattern)
print("n--- Valid Log Regex Verification ---")
print(json.dumps(valid_log_verification, indent=2))
print(f"Overall Valid Log Status: {valid_log_verification['status'].upper()}")
# 验证错误日志
invalid_log_verification = validate_text_with_regex(llm_generated_log_line_invalid, log_regex_pattern)
print("n--- Invalid Log Regex Verification ---")
print(json.dumps(invalid_log_verification, indent=2))
print(f"Overall Invalid Log Status: {invalid_log_verification['status'].upper()} (Expected error)")
4. API 调用与响应验证
如果LLM生成了API请求参数或完整的API调用指令,验证节点可以实际执行这些API请求,并验证响应的状态码、响应头和响应体是否符合预期。
场景示例: LLM生成一个查询用户信息的API请求。
LLM生成的API指令 (示例):
{
"method": "GET",
"url_path": "/api/v1/users/123",
"headers": {
"Authorization": "Bearer <TOKEN>"
}
}
验证节点中的 API 调用逻辑 (使用 requests 库和模拟):
import requests
import json
from unittest.mock import patch, Mock
def mock_get_user_api(url, headers):
"""模拟一个API响应。"""
if url == "http://mockapi.com/api/v1/users/123" and "Bearer mock_token" in headers.get("Authorization", ""):
return Mock(status_code=200, json=lambda: {"id": 123, "name": "Test User", "email": "[email protected]"})
elif url == "http://mockapi.com/api/v1/users/456":
return Mock(status_code=404, json=lambda: {"error": "User not found"})
else:
return Mock(status_code=400, json=lambda: {"error": "Bad Request"})
def verify_api_call(api_config: dict, expected_response: dict) -> dict:
"""
执行API调用并验证响应。
:param api_config: 包含method, url_path, headers等的字典。
:param expected_response: 预期的响应状态码、JSON体等。
:return: 包含验证结果的字典。
"""
base_url = "http://mockapi.com" # 在实际环境中,这里会是真实的API基地址
method = api_config.get("method", "GET").upper()
url = base_url + api_config.get("url_path", "")
headers = api_config.get("headers", {})
body = api_config.get("body")
try:
# 使用patch来模拟requests.get, requests.post等
# 在真实场景中,这里直接调用requests.request
with patch('requests.get', side_effect=mock_get_user_api) as mock_get,
patch('requests.post', side_effect=mock_get_user_api) as mock_post: # 仅为演示,post也用get的mock
response = None
if method == "GET":
response = requests.get(url, headers=headers, timeout=5)
elif method == "POST":
response = requests.post(url, headers=headers, json=body, timeout=5)
# ... 其他HTTP方法
if response is None:
return {"status": "error", "message": f"Unsupported HTTP method: {method}"}
# 校验状态码
if response.status_code != expected_response.get("status_code"):
return {"status": "error", "message": f"Unexpected status code. Expected {expected_response.get('status_code')}, got {response.status_code}"}
# 校验响应体(如果是JSON)
if "json_body" in expected_response:
try:
response_json = response.json()
# 简单比较,实际可能需要更复杂的子集或模式匹配
if not all(item in response_json.items() for item in expected_response["json_body"].items()):
return {"status": "error", "message": f"Unexpected JSON response body. Expected partial {expected_response['json_body']}, got {response_json}"}
except json.JSONDecodeError:
return {"status": "error", "message": "Response is not valid JSON when JSON body was expected."}
return {"status": "success", "message": "API call and response validated successfully."}
except requests.exceptions.RequestException as e:
return {"status": "error", "message": f"API request failed: {str(e)}"}
except Exception as e:
return {"status": "internal_error", "message": f"Internal error during API verification: {str(e)}"}
# LLM生成的API配置
llm_generated_api_call_valid = {
"method": "GET",
"url_path": "/api/v1/users/123",
"headers": {
"Authorization": "Bearer mock_token"
}
}
llm_generated_api_call_invalid_user = {
"method": "GET",
"url_path": "/api/v1/users/456",
"headers": {
"Authorization": "Bearer mock_token"
}
}
# 预期响应
expected_valid_response = {
"status_code": 200,
"json_body": {"id": 123, "name": "Test User"} # 只校验部分字段
}
expected_invalid_user_response = {
"status_code": 404,
"json_body": {"error": "User not found"}
}
# 验证正确API
valid_api_verification = verify_api_call(llm_generated_api_call_valid, expected_valid_response)
print("n--- Valid API Call Verification ---")
print(json.dumps(valid_api_verification, indent=2))
print(f"Overall Valid API Status: {valid_api_verification['status'].upper()}")
# 验证错误API
invalid_api_verification = verify_api_call(llm_generated_api_call_invalid_user, expected_invalid_user_response)
print("n--- Invalid User API Call Verification ---")
print(json.dumps(invalid_api_verification, indent=2))
print(f"Overall Invalid User API Status: {invalid_api_verification['status'].upper()} (Expected error)")
5. 单元测试框架 (Generated Tests)
如果LLM生成了某个特定功能的代码,它也可以被要求生成相应的单元测试。验证节点可以使用标准的单元测试框架(如Python的pytest、Java的JUnit、JavaScript的Jest)来执行这些测试。
场景示例: LLM生成一个排序函数及其测试用例。
LLM生成的Python代码 (示例):
# sort_function.py
def custom_sort(arr):
return sorted(arr)
LLM生成的 Pytest 测试 (示例):
# test_sort_function.py
import pytest
from sort_function import custom_sort
def test_empty_list():
assert custom_sort([]) == []
def test_sorted_list():
assert custom_sort([1, 2, 3, 4, 5]) == [1, 2, 3, 4, 5]
def test_reverse_sorted_list():
assert custom_sort([5, 4, 3, 2, 1]) == [1, 2, 3, 4, 5]
def test_unsorted_list():
assert custom_sort([3, 1, 4, 1, 5, 9, 2, 6]) == [1, 1, 2, 3, 4, 5, 6, 9]
def test_list_with_duplicates():
assert custom_sort([3, 1, 2, 3, 1]) == [1, 1, 2, 3, 3]
def test_single_element_list():
assert custom_sort([42]) == [42]
验证节点中的 Pytest 执行逻辑:
import subprocess
import os
import tempfile
import json
def run_pytest_tests(code: str, tests: str, timeout: int = 30) -> dict:
"""
在临时文件中写入代码和测试,然后使用pytest执行。
:param code: LLM生成的代码字符串。
:param tests: LLM生成的测试代码字符串。
:param timeout: Pytest执行的超时时间(秒)。
:return: 包含测试结果的字典。
"""
results = {"status": "error", "message": "Unknown error"}
code_file_path = None
test_file_path = None
try:
# 创建临时文件来存放LLM生成的代码
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
code_file_path = f.name
# 创建临时文件来存放LLM生成的测试
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(tests)
test_file_path = f.name
# 为了让测试能导入代码,需要将代码文件放在测试文件所在的目录,或者调整PYTHONPATH
# 更简单的方法是确保它们都在同一个临时目录下
temp_dir = os.path.dirname(code_file_path) # 或者创建一个新的temp_dir
# 如果不是在同一个目录,需要复制或移动
# shutil.move(code_file_path, os.path.join(temp_dir, "sort_function.py"))
# shutil.move(test_file_path, os.path.join(temp_dir, "test_sort_function.py"))
# 为了演示,假设code_file_path是 sort_function.py,test_file_path是 test_sort_function.py
# 并且它们在同一个目录下,所以测试可以直接导入
# 运行pytest,并输出JSON格式结果
command = ["pytest", test_file_path, "--json-report", "--json-report-omit=collected", "--json-report-file", os.path.join(temp_dir, "report.json")]
process = subprocess.run(
command,
capture_output=True,
text=True,
timeout=timeout,
check=False,
cwd=temp_dir # 在临时目录下执行pytest
)
report_file_path = os.path.join(temp_dir, "report.json")
if os.path.exists(report_file_path):
with open(report_file_path, 'r') as f:
pytest_report = json.load(f)
summary = pytest_report.get("summary", {})
passed = summary.get("passed", 0)
failed = summary.get("failed", 0)
errors = summary.get("errors", 0)
total = passed + failed + errors
results = {
"status": "success" if failed == 0 and errors == 0 else "failure",
"message": f"{passed}/{total} tests passed. {failed} failed, {errors} errors.",
"test_details": pytest_report.get("tests", []),
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip()
}
else:
results = {
"status": "error",
"message": "Pytest report file not found.",
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip()
}
except subprocess.TimeoutExpired:
results = {"status": "timeout", "message": f"Pytest execution timed out after {timeout} seconds."}
except Exception as e:
results = {"status": "internal_error", "message": f"Internal error during pytest execution: {str(e)}"}
finally:
# 清理临时文件
if code_file_path and os.path.exists(code_file_path):
os.remove(code_file_path)
if test_file_path and os.path.exists(test_file_path):
os.remove(test_file_path)
if os.path.exists(report_file_path):
os.remove(report_file_path)
return results
# 假设LLM生成了排序函数及其测试
llm_sort_code = """
# sort_function.py
def custom_sort(arr):
return sorted(arr)
"""
llm_sort_tests = """
# test_sort_function.py
import pytest
# 假设sort_function.py在同一目录
from sort_function import custom_sort
def test_empty_list():
assert custom_sort([]) == []
def test_sorted_list():
assert custom_sort([1, 2, 3, 4, 5]) == [1, 2, 3, 4, 5]
def test_reverse_sorted_list():
assert custom_sort([5, 4, 3, 2, 1]) == [1, 2, 3, 4, 5]
def test_unsorted_list():
assert custom_sort([3, 1, 4, 1, 5, 9, 2, 6]) == [1, 1, 2, 3, 4, 5, 6, 9]
def test_list_with_duplicates():
assert custom_sort([3, 1, 2, 3, 1]) == [1, 1, 2, 3, 3]
def test_single_element_list():
assert custom_sort([42]) == [42]
"""
# 执行单元测试
pytest_verification_output = run_pytest_tests(llm_sort_code, llm_sort_tests)
print("n--- Pytest Verification Output ---")
print(json.dumps(pytest_verification_output, indent=2))
print(f"Overall Pytest Verification Status: {pytest_verification_output['status'].upper()}")
6. 数学与逻辑断言
对于涉及数值计算或简单逻辑判断的LLM输出,可以直接在验证节点中执行数学计算或逻辑表达式,并与期望结果进行比较。
场景示例: LLM生成一个计算器函数的表达式或结果。
LLM生成的表达式 (示例): (10 + 5) * 2 - 7 预期结果 23
验证节点中的逻辑:
def evaluate_expression(expression: str, expected_result: float) -> dict:
"""
评估数学表达式并与预期结果比较。
注意:直接eval用户或LLM输入非常危险,这里仅为演示简单逻辑。
生产环境需要更安全的解析器或白名单机制。
"""
try:
# 极度危险!生产环境应使用抽象语法树(AST)解析器或受限的数学库
# eval() 允许执行任意代码,存在严重安全漏洞。
result = eval(expression)
if result == expected_result:
return {"status": "success", "message": f"Expression evaluates correctly: {result}"}
else:
return {"status": "error", "message": f"Expression evaluated to {result}, but expected {expected_result}"}
except SyntaxError:
return {"status": "error", "message": f"Invalid expression syntax: {expression}"}
except Exception as e:
return {"status": "error", "message": f"Error evaluating expression: {str(e)}"}
# 假设LLM生成了表达式
llm_expression = "(10 + 5) * 2 - 7"
expected_value = 23
# 验证表达式
expression_verification = evaluate_expression(llm_expression, expected_value)
print("n--- Expression Verification ---")
print(json.dumps(expression_verification, indent=2))
print(f"Overall Expression Status: {expression_verification['status'].upper()}")
# 错误示例
llm_expression_wrong = "(10 + 5) * 2 - 8"
expression_verification_wrong = evaluate_expression(llm_expression_wrong, expected_value)
print("n--- Wrong Expression Verification ---")
print(json.dumps(expression_verification_wrong, indent=2))
print(f"Overall Wrong Expression Status: {expression_verification_wrong['status'].upper()} (Expected error)")
# 语法错误示例
llm_expression_syntax_error = "(10 + 5) * 2 -"
expression_verification_syntax_error = evaluate_expression(llm_expression_syntax_error, expected_value)
print("n--- Syntax Error Expression Verification ---")
print(json.dumps(expression_verification_syntax_error, indent=2))
print(f"Overall Syntax Error Expression Status: {expression_verification_syntax_error['status'].upper()} (Expected error)")
验证节点的架构与工作流程
为了有效地运用这些确定性算法,验证节点需要一个健壮的架构和清晰的工作流程。
核心架构组件
| 组件名称 | 职责 “`python
import subprocess
import os
import json
import tempfile
def verify_llm_generated_code(code: str, language: str, expected_output: str, test_cases: list = None, timeout: int = 10) -> dict:
"""
通过执行代码来验证LLM生成的内容。
:param code: LLM生成的代码字符串。
:param language: 代码的语言(’python’, ‘nodejs’, ‘bash’)。
:param expected_output: 预期的标准输出或特定结果。
:param test_cases: 针对代码的测试用例列表,每个用例是一个字典,包含’args’(命令行参数)和’expected_stdout’。
:param timeout: 代码执行的最大超时时间(秒)。
:return: 包含验证结果的字典。
"""
results = []
# 根据语言确定文件后缀和执行命令
if language == 'python':
suffix = '.py'
interpreter = 'python'
elif language == 'nodejs':
suffix = '.js'
interpreter = 'node'
elif language == 'bash':
suffix = '.sh'
interpreter = 'bash'
else:
return {"overall_status": "error", "message": f"Unsupported language: {language}"}
code_file_path = None
try:
with tempfile.NamedTemporaryFile(mode='w', suffix=suffix, delete=False) as f:
f.write(code)
code_file_path = f.name
if language == 'bash':
os.chmod(code_file_path, 0o755) # Give execute permission for bash scripts
# 如果没有提供test_cases,则生成一个简单的默认测试
if not test_cases:
test_cases = [{"args": [], "expected_stdout": expected_output}]
overall_test_status = "PASSED"
for i, tc in enumerate(test_cases):
test_args = tc.get("args", [])
expected_stdout_for_case = tc.get("expected_stdout", expected_output)
command = [interpreter, code_file_path] + test_args
try:
process = subprocess.run(
command,
capture_output=True,
text=True,
timeout=timeout,
check=False
)
current_status = "FAILED"
if process.returncode == 0:
if expected_stdout_for_case in process.stdout.strip():
current_status = "PASSED"
else:
current_status = "FAILED - Output Mismatch"
else:
current_status = "FAILED - Runtime Error"
if current_status.startswith("FAILED"):
overall_test_status = "FAILED"
results.append({
"test_case_index": i,
"args": test_args,
"status": current_status,
"stdout": process.stdout.strip(),
"stderr": process.stderr.strip(),
"return_code": process.returncode
})
except subprocess.TimeoutExpired:
results.append({
"test_case_index": i,
"args": test_args,
"status": "TIMEOUT",
"stdout": "",
"stderr": f"Code execution timed out after {timeout} seconds.",
"return_code": -1
})
overall_test_status = "FAILED"
except Exception as e:
results.append({
"test_case_index": i,
"args": test_args,
"status": "INTERNAL_VERIFICATION_ERROR",
"stdout": "",
"stderr": f"Internal verification error: {str(e)}",
"return_code": -1
})
overall_test_status = "FAILED"
finally:
if code_file_path and os.path.exists(code_file_path):
os.remove(code_file_path)
return {
"overall_status": overall_test_status,
"test_results": results
}
— 示例使用 —
1. Python代码验证
python_code = """
def greet(name):
return f"Hello, {name}!"
if name == ‘main‘:
import sys
if len(sys.argv) > 1:
print(greet(sys.argv[1]))
else:
print(greet("World"))
"""
python_test_cases = [
{"args": ["Alice"], "expected_stdout": "Hello, Alice!"},
{"args": [], "expected_stdout": "Hello, World!"},
{"args": ["Bob"], "expected_stdout": "Hello, Bob!"}
]
python_verification_result = verify_llm_generated_code(python_code, ‘python’, "Hello, World!", python_test_cases)
print("— Python Code Verification Result —")
print(json.dumps(python_verification_result, indent=2))
2. Node.js代码验证
nodejs_code = """
const greet = (name) => Hello, ${name}!;
if (require.main === module) {
const name = process.argv[2] || "World";
console.log(greet(name));
}
"""
nodejs_test_cases = [
{"args": ["Charlie"], "expected_stdout": "Hello, Charlie!"},
{"args": [], "expected_stdout": "Hello, World!"}
]
nodejs_verification_result = verify_llm_generated_code(nodejs_code, ‘nodejs’, "Hello, World!", nodejs_test_cases)
print("n— Node.js Code Verification Result —")
print(json.dumps(nodejs_verification_result, indent=2))
3. Bash脚本验证
bash_code = """
!/bin/bash
if [ -z "$1" ]; then
echo "Hello, World!"
else
echo "Hello, $1!"
fi
"""
bash_test_cases = [
{"args": ["David"], "expected_stdout": "Hello, David!"},
{"args": [], "expected_stdout": "Hello, World!"}
]
bash_verification_result = verify_llm_generated_code(bash_code, ‘bash’, "Hello, World!", bash_test_cases)
print("n— Bash Script Verification Result —")
print(json.dumps(bash_verification_result, indent=2))
4. 模拟一个失败的Python代码
python_failing_code = """
def greet(name):
raise ValueError("Something went wrong!")
if name == ‘main‘:
import sys
try:
if len(sys.argv) > 1:
print(greet(sys.argv[1]))
else:
print(greet("World"))
except ValueError as e:
print(f"Error: {e}")
sys.exit(1)
"""
python_failing_test_cases = [
{"args": ["Alice"], "expected_stdout": "Hello, Alice!"} # 预期输出不会匹配
]
python_failing_verification_result = verify_llm_generated_code(python_failing_code, ‘python’, "Hello, Alice!", python_failing_test_cases)
print("n— Failing Python Code Verification Result —")
print(json.dumps(python_failing_verification_result, indent=2))
### 工作流程
1. **LLM生成内容:** 用户向LLM发出请求,LLM生成代码、配置、文档或其他结构化/非结构化内容。
2. **验证任务分发:** LLM的输出被发送到一个任务调度器(Task Scheduler)。调度器根据输出的类型(例如,Python代码、JSON配置、SQL查询)和所需的验证逻辑,将任务分发给相应的验证节点。
3. **验证节点执行:**
* 接收到任务的验证节点在一个**