各位同仁,各位技术先锋们,
今天,我们聚焦一个在软件开发领域日益凸显的痛点——代码审查与测试的效率与质量。在高速迭代的现代开发实践中,传统的人工代码审查和测试流程往往成为瓶颈,耗时耗力,且容易受主观因素影响,导致质量参差不齐。然而,随着人工智能,特别是大型语言模型(LLMs)的飞速发展,我们看到了一个前所未有的机遇:构建一个智能、自动化、自进化的“Peer Review Circuits”系统。
我们将深入探讨如何设计并实现这样一个由三个独立Agent组成的闭环自动化图,它能够自主地进行代码审查、测试生成与执行,并最终协调整个流程,形成一个高效、高质量、低干预的开发反馈回路。
Part 1: ‘Peer Review Circuits’:闭环代码审查与测试的宏伟愿景
1.1 什么是“Peer Review Circuits”?
“Peer Review Circuits”是一个概念模型,它将传统的代码审查和测试过程转化为一个自动化、持续运行、自我修正的反馈循环。这里的“Circuits”(电路)强调的是其闭环、连续和迭代的特性,如同电流在电路中循环往复,不断地对代码质量进行检测、反馈、修正和优化。
在这个系统中,代码不再是静止的文本,而是一个动态的实体,在被提交后便立即进入一个由智能Agent组成的自动化“电路”。这些Agent协同工作,从多个维度(代码规范、潜在缺陷、功能正确性、性能、安全性等)对代码进行全面评估,并根据评估结果采取行动——无论是自动修正、提供详细反馈,还是触发人工介入。
1.2 为什么需要“Peer Review Circuits”?
传统的人工代码审查和测试面临诸多挑战:
- 效率瓶颈: 人工审查耗时,往往导致合并请求(PR)长时间挂起,拖慢开发节奏。
- 一致性缺失: 不同的审查者有不同的经验和标准,导致审查结果缺乏一致性。
- 主观性强: 审查质量受限于审查者的情绪、疲劳度和专业领域知识。
- 测试覆盖率不足: 人工编写测试用例可能遗漏边缘情况,且维护成本高昂。
- 重复性工作: 许多代码风格、简单错误和回归测试是重复性的,占用开发人员宝贵的时间。
“Peer Review Circuits”旨在解决这些问题,通过自动化带来以下核心优势:
- 加速反馈循环: 代码提交后立即获得反馈,大大缩短了从开发到部署的周期。
- 提升代码质量: 自动化工具能执行更严格、更一致的检查,减少缺陷流入主干。
- 释放开发生产力: 将重复性、机械性的工作交给Agent,让开发人员专注于更具创造性的任务。
- 持续学习与改进: 智能Agent可以通过学习历史数据和人工反馈不断优化其审查和测试能力。
- 增强CI/CD管道: 无缝集成到现有的持续集成/持续部署流程中,进一步强化自动化交付能力。
想象一下,一个能够在你提交代码的瞬间,就完成静态分析、生成并运行测试、甚至自动修复一些小问题的系统,这将是开发效率和代码质量的一次革命性飞跃。
Part 2: 拆解问题:Agent化方法论的必然性
为了构建“Peer Review Circuits”,我们需要一种能够处理复杂、动态任务的自动化实体——Agent。每个Agent都将拥有特定的职责和能力,并通过明确的协议进行通信,共同完成整个闭环。
2.1 代码审查的困境与Agent的应对
人工代码审查是一个高认知负荷的任务。审查者需要理解代码逻辑、关注代码风格、检测潜在缺陷、评估架构影响,甚至考虑安全性。这个过程耗时且容易出错。
Agent的介入可以解决这些问题:
- 静态分析Agent: 可以快速执行 linting、格式化检查、复杂度分析、安全漏洞扫描等标准化的任务,确保代码符合预设规范。
- 语义理解Agent(LLM-powered): 能够理解代码意图,识别不符合最佳实践的模式,甚至根据上下文提出更智能的优化建议,超越传统静态分析工具的范畴。
2.2 测试的挑战与Agent的赋能
测试是软件质量的基石,但测试的编写、维护和执行同样充满挑战。
- 测试覆盖率: 如何确保测试覆盖到所有关键路径和边缘情况?
- 测试用例生成: 编写高质量的测试用例本身就是一项需要经验和洞察力的工作。
- 回归测试: 代码变更后,如何高效地运行并验证所有相关功能未受影响?
- 环境配置: 测试环境的搭建和管理往往复杂。
Agent在测试领域可以发挥巨大作用:
- 测试生成Agent: 利用LLM理解代码功能,自动生成单元测试、集成测试甚至模拟用户行为的端到端测试。
- 测试执行Agent: 自动化地在隔离环境中运行测试套件,收集结果,并生成详细报告。
- 覆盖率分析Agent: 评估测试覆盖率,并根据覆盖率差距建议或生成新的测试用例。
2.3 Agent架构:构建一个智能协作系统
我们的核心思想是构建一个三Agent架构,每个Agent专注于一个核心任务,并通过协作和信息交换形成一个完整的闭环。
- 代码审查Agent (Code Review Agent – CRA): 专注于代码质量、风格和潜在缺陷的静态分析。
- 测试生成与执行Agent (Test Generation & Execution Agent – TGEA): 专注于代码功能正确性、鲁棒性和覆盖率的动态验证。
- 重构与编排Agent (Refactoring & Orchestration Agent – ROA): 作为系统的“大脑”,协调CRA和TGEA的活动,综合它们的反馈,并做出决策,甚至直接进行代码重构或修复。
这三个Agent将协同工作,形成一个高度自动化的代码质量保障流程。
Part 3: 3-Agent架构设计与通信协议
3.1 核心Agent角色与职责
为了实现“Peer Review Circuits”,我们定义以下三个核心Agent及其职责:
| Agent 名称 | 核心职责 | 关键输入 | 关键输出 | 核心技术栈 |
|---|---|---|---|---|
| 代码审查Agent (CRA) | 静态代码质量分析、风格检查、潜在缺陷识别、安全漏洞扫描、最佳实践遵循。 | 待审查代码(文件或Diff)、项目配置、审查规则。 | 详细审查报告(问题、严重性、建议)。 | LLM、ESLint/Pylint/SonarQube、自定义规则。 |
| 测试生成与执行Agent (TGEA) | 根据代码生成测试用例、执行现有测试、评估测试覆盖率、收集测试结果。 | 待测试代码、CRA审查报告(可选)、现有测试套件。 | 测试报告(通过/失败、覆盖率)、新生成的测试用例。 | LLM、Pytest/JUnit/Jest、Coverage.py/Istanbul。 |
| 重构与编排Agent (ROA) | 接收并综合CRA和TGEA的报告、决策下一步行动、协调Agent交互、自动应用修复或建议、版本控制系统交互。 | CRA审查报告、TGEA测试报告、原始代码。 | 决策指令、自动修复补丁、PR描述、通知信息。 | LLM、决策引擎、Git API、Patching工具。 |
3.2 Agent间通信协议
Agent之间的通信是整个系统顺畅运行的关键。我们采用基于JSON的标准化消息格式,通过内部API或消息队列进行异步通信。
通用消息结构示例:
{
"message_id": "uuid-v4-string",
"timestamp": "ISO-8601-datetime",
"sender": "AgentName",
"receiver": "AgentName",
"event_type": "code_submitted" | "review_request" | "review_report" | "test_request" | "test_report" | "action_decision" | "refactoring_proposal",
"payload": {
// Specific data based on event_type
}
}
示例一:ROA向CRA请求代码审查
{
"message_id": "msg-001",
"timestamp": "2023-10-27T10:00:00Z",
"sender": "OrchestrationAgent",
"receiver": "CodeReviewAgent",
"event_type": "review_request",
"payload": {
"repository_url": "[email protected]:your-org/your-repo.git",
"branch_name": "feature/new-login",
"commit_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0",
"target_files": ["src/auth/login.py", "src/auth/user.py"], // 可选,指定审查文件
"diff_content": "..." // 可选,直接提供diff内容
}
}
示例二:CRA向ROA发送审查报告
{
"message_id": "msg-002",
"timestamp": "2023-10-27T10:05:00Z",
"sender": "CodeReviewAgent",
"receiver": "OrchestrationAgent",
"event_type": "review_report",
"payload": {
"commit_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0",
"status": "completed", // completed, failed
"findings": [
{
"file": "src/auth/login.py",
"line": 42,
"severity": "ERROR",
"rule": "E0001: Unused import",
"message": "`requests` imported but not used.",
"suggestion": "Remove `import requests`.",
"auto_fixable": true,
"fix_patch": "--- a/src/auth/login.pyn+++ b/src/auth/login.pyn@@ -39,7 +39,6 @@n n-import requestsn if __name__ == '__main__':"
},
{
"file": "src/auth/user.py",
"line": 15,
"severity": "WARNING",
"rule": "W0002: Function too long",
"message": "Function `create_user` has 60 lines, consider refactoring.",
"suggestion": "Break down `create_user` into smaller, focused functions.",
"auto_fixable": false
}
],
"summary": {
"errors": 1,
"warnings": 1,
"info": 0
}
}
}
3.3 闭环工作流概览
- 代码提交/PR创建: 开发者将代码推送到版本控制系统(VCS),或创建新的合并请求(Pull Request)。
- ROA触发: ROA通过VCS的webhook或轮询机制检测到新的代码提交。
- ROA -> CRA: ROA向CRA发送审查请求,附带代码更改的详细信息。
- CRA执行审查: CRA分析代码,生成审查报告,并将其发送回ROA。
- ROA -> TGEA: ROA将原始代码和(可选的)CRA报告传递给TGEA,请求生成和执行测试。
- TGEA执行测试: TGEA根据代码生成新测试,运行现有测试,收集结果和覆盖率,生成测试报告,并发送回ROA。
- ROA决策: ROA综合CRA和TGEA的报告,根据预设的策略(如:所有错误必须修复、测试覆盖率不得下降、关键测试必须通过等)做出决策。
- 决策1:自动修复并重新审查/测试。 如果存在可自动修复的简单错误(如格式、未使用的导入),ROA直接应用补丁,然后重新启动审查-测试循环。
- 决策2:生成修复建议和PR。 如果问题复杂或涉及逻辑修改,ROA生成详细的修复建议,并创建一个新的分支和PR,通知开发者进行审查和批准。
- 决策3:通知开发者。 如果问题严重(如关键测试失败、大量高优先级审查问题),ROA直接向开发者发送详细报告,要求其手动介入。
- 决策4:批准合并。 如果所有检查都通过,且没有需要人工干预的问题,ROA可以自动合并代码(或将其标记为可合并)。
- 循环迭代: 如果需要人工介入,开发者根据ROA的反馈修改代码并重新提交,整个循环再次启动,直到代码符合所有质量标准。
这个流程确保了每次代码变更都经过严格的自动化质量把关,显著提升了开发效率和代码质量。
Part 4: Agent深度解析与实现细节
现在,让我们深入到每个Agent的内部工作原理和实现细节。
4.1 Agent 1: 代码审查Agent (CRA)
CRA的核心目标是识别代码中的静态问题,包括但不限于风格违规、潜在错误、复杂性过高、安全漏洞以及不符合最佳实践的代码模式。
CRA的工作流程:
- 接收审查请求: 从ROA接收
review_request消息,解析出仓库URL、分支、提交哈希和目标文件。 - 代码拉取与准备: 克隆或拉取指定分支的代码到隔离的临时目录。
- 执行静态分析: 运行一系列静态分析工具。
- LLM洞察与补充: 将静态分析结果和代码片段输入LLM,请求更深层次的语义分析和优化建议。
- 报告生成: 整合所有发现,生成结构化的审查报告。
- 发送审查报告: 将
review_report消息发送回ROA。
关键技术与实现考虑:
- 多语言支持: CRA需要能够处理多种编程语言。这通常通过集成不同的静态分析工具实现(如Python的Pylint/Flake8,JavaScript的ESLint,Java的SonarQube等)。
- LLM集成策略:
- 问题解释与建议: 将静态分析工具识别出的简单问题,通过LLM进行更友好的解释,并提供更具上下文的修改建议。
- 模式识别: 对于传统工具难以识别的复杂代码模式(如反模式、性能陷阱),LLM可以通过学习大量代码库来识别并给出警告。
- 安全审查: LLM可以辅助识别潜在的安全漏洞,例如不安全的API使用、SQL注入风险等。
- 自定义规则: 允许项目团队定义自己的审查规则和最佳实践。
- Patch生成: 对于格式化、未使用的导入等简单且明确的修复,CRA应能够生成兼容Git的
diff或patch格式,方便ROA直接应用。
CRA Python代码示例(简化版):
import os
import subprocess
import json
import logging
from typing import List, Dict, Any, Optional
from openai import OpenAI # 假设使用OpenAI API
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
class CodeReviewAgent:
def __init__(self, llm_api_key: str, workspace_dir: str = "cra_workspace"):
self.client = OpenAI(api_key=llm_api_key)
self.workspace_dir = workspace_dir
os.makedirs(self.workspace_dir, exist_ok=True)
logging.info("CodeReviewAgent initialized.")
def _run_pylint(self, file_path: str) -> List[Dict[str, Any]]:
"""Runs Pylint on a given Python file."""
try:
# -f json 输出json格式,--disable=C0114,C0115,C0116 禁用一些默认的文档字符串检查
result = subprocess.run(
['pylint', '--output-format=json', '--disable=C0114,C0115,C0116', file_path],
capture_output=True, text=True, check=False
)
if result.stdout:
return json.loads(result.stdout)
return []
except Exception as e:
logging.error(f"Pylint execution failed for {file_path}: {e}")
return []
def _run_eslint(self, file_path: str) -> List[Dict[str, Any]]:
"""Runs ESLint on a given JavaScript file."""
# Assuming ESLint is installed globally or in project's node_modules
try:
# -f json 输出json格式
result = subprocess.run(
['eslint', '-f', 'json', file_path],
capture_output=True, text=True, check=False
)
if result.stdout:
# ESLint outputs an array of results, one per file
return json.loads(result.stdout)[0]['messages'] if json.loads(result.stdout) else []
return []
except FileNotFoundError:
logging.warning("ESLint command not found. Skipping JavaScript linting.")
return []
except Exception as e:
logging.error(f"ESLint execution failed for {file_path}: {e}")
return []
def _get_llm_insights(self, code_snippet: str, issue_description: str) -> Optional[Dict[str, Any]]:
"""Uses LLM to get more detailed insights and suggestions."""
prompt = f"""Given the following code snippet and an identified issue, please provide a more detailed explanation of the issue, suggest a specific fix, and if possible, provide a code patch in Git diff format.
Code Snippet:
{code_snippet}
Identified Issue: {issue_description}
Please respond in JSON format with keys: 'explanation', 'suggestion', 'fix_patch' (if applicable).
Example:
{{
"explanation": "This is why it's an issue...",
"suggestion": "To fix this, you can...",
"fix_patch": "--- a/file.py\n+++ b/file.py\n@@ -L,C +L,C\n -old line\n +new line"
}}
"""
try:
response = self.client.chat.completions.create(
model="gpt-4", # 或者其他适合的模型
messages=[
{"role": "system", "content": "You are a helpful code review assistant."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
content = response.choices[0].message.content
return json.loads(content)
except Exception as e:
logging.error(f"LLM insight generation failed: {e}")
return None
def review_code(self, repo_path: str, target_files: List[str]) -> Dict[str, Any]:
"""Main method to review code files."""
all_findings = []
for file_path_relative in target_files:
full_file_path = os.path.join(repo_path, file_path_relative)
if not os.path.exists(full_file_path):
logging.warning(f"File not found: {full_file_path}. Skipping.")
continue
file_extension = os.path.splitext(full_file_path)[1]
file_content = ""
try:
with open(full_file_path, 'r', encoding='utf-8') as f:
file_content = f.read()
except Exception as e:
logging.error(f"Could not read file {full_file_path}: {e}")
continue
if file_extension == '.py':
pylint_findings = self._run_pylint(full_file_path)
for finding in pylint_findings:
issue_description = f"Pylint [{finding['type']}] {finding['symbol']}: {finding['message']}"
code_snippet = self._get_code_snippet(file_content, finding['line'], finding['column'])
llm_insights = self._get_llm_insights(code_snippet, issue_description)
all_findings.append({
"file": file_path_relative,
"line": finding['line'],
"column": finding['column'],
"severity": finding['type'].upper(), # Pylint uses 'error', 'warning', 'refactor', 'convention'
"rule": finding['symbol'],
"message": finding['message'],
"suggestion": llm_insights.get('suggestion') if llm_insights else "No specific LLM suggestion.",
"auto_fixable": False, # Pylint doesn't directly offer auto-fix patches in output
"fix_patch": llm_insights.get('fix_patch') if llm_insights else None,
"llm_explanation": llm_insights.get('explanation') if llm_insights else None
})
elif file_extension in ['.js', '.jsx', '.ts', '.tsx']:
eslint_findings = self._run_eslint(full_file_path)
for finding in eslint_findings:
issue_description = f"ESLint [{finding['severity']}] {finding['ruleId']}: {finding['message']}"
code_snippet = self._get_code_snippet(file_content, finding['line'], finding['column'])
llm_insights = self._get_llm_insights(code_snippet, issue_description)
all_findings.append({
"file": file_path_relative,
"line": finding['line'],
"column": finding['column'],
"severity": self._map_eslint_severity(finding['severity']),
"rule": finding['ruleId'],
"message": finding['message'],
"suggestion": llm_insights.get('suggestion') if llm_insights else "No specific LLM suggestion.",
"auto_fixable": finding.get('fix') is not None, # ESLint can provide fixes
"fix_patch": llm_insights.get('fix_patch') if llm_insights else None, # LLM patch might be better
"llm_explanation": llm_insights.get('explanation') if llm_insights else None
})
else:
logging.info(f"No static analysis tool configured for {file_extension} files.")
summary = self._summarize_findings(all_findings)
return {"status": "completed", "findings": all_findings, "summary": summary}
def _get_code_snippet(self, code_content: str, line: int, column: int, context_lines: int = 3) -> str:
"""Extracts a code snippet around a given line."""
lines = code_content.splitlines()
start_line = max(0, line - 1 - context_lines)
end_line = min(len(lines), line - 1 + context_lines + 1)
snippet_lines = lines[start_line:end_line]
# Prepend line numbers for better context in LLM prompt
return "n".join(f"{start_line + i + 1}: {l}" for i, l in enumerate(snippet_lines))
def _map_eslint_severity(self, eslint_severity: int) -> str:
if eslint_severity == 2: return "ERROR"
if eslint_severity == 1: return "WARNING"
return "INFO"
def _summarize_findings(self, findings: List[Dict[str, Any]]) -> Dict[str, int]:
summary = {"ERROR": 0, "WARNING": 0, "INFO": 0}
for finding in findings:
severity = finding['severity']
if severity in summary:
summary[severity] += 1
return summary
# Example usage (within ROA or a test harness)
# if __name__ == "__main__":
# # This part would typically be orchestrated by ROA
# # For demonstration, let's create a dummy repo
# dummy_repo_path = "temp_repo"
# os.makedirs(dummy_repo_path, exist_ok=True)
# with open(os.path.join(dummy_repo_path, "test_file.py"), "w") as f:
# f.write("""
# import os
# import requests # unused import
# def long_function_name(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z):
# x = 1
# y = 2
# z = 3
# if x > 0:
# print(y)
# else:
# print(z)
# return x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z
# def another_function():
# pass
# """)
# with open(os.path.join(dummy_repo_path, "test_file.js"), "w") as f:
# f.write("""
# const unusedVar = 1;
# function someFunc(arg1, arg2) {
# if (arg1 === 1) {
# console.log('one');
# } else if (arg1 === 2) {
# console.log('two');
# } else {
# console.log('other');
# }
# return arg1 + arg2;
# }
# """)
# cra = CodeReviewAgent(llm_api_key=os.getenv("OPENAI_API_KEY"))
# target_files = ["test_file.py", "test_file.js"]
# report = cra.review_code(dummy_repo_path, target_files)
# print(json.dumps(report, indent=2))
# # Clean up
# # import shutil
# # shutil.rmtree(dummy_repo_path)
审查报告数据结构示例:
| 字段 | 类型 | 描述 | 示例值 |
|---|---|---|---|
file |
string |
发现问题的相对文件路径。 | src/auth/login.py |
line |
integer |
问题所在的行号。 | 42 |
column |
integer |
问题所在的列号。 | 1 |
severity |
enum |
问题严重性:ERROR, WARNING, INFO。 |
ERROR |
rule |
string |
触发的审查规则ID(如Pylint的E0001)。 |
E0001: Unused import |
message |
string |
问题的简短描述。 | `requests imported but not used.` |
suggestion |
string |
LLM或工具提供的具体修改建议。 | Remove import requests. |
auto_fixable |
boolean |
是否可以自动修复。 | true |
fix_patch |
string |
如果auto_fixable为true,提供Git diff格式的补丁内容。 |
--- a/src/auth/login.pyn+++ b/src/auth/login.pyn@@ -39,7 +39,6 @@n n-import requestsn if __name__ == '__main__': |
llm_explanation |
string |
LLM对问题的详细解释,提供更深层次的上下文。 | The 'requests' library is imported but never utilized within the 'login.py' file. This constitutes dead code, increasing file size and potentially misleading readers. Removing it will improve code cleanliness and reduce unnecessary dependencies. |
4.2 Agent 2: 测试生成与执行Agent (TGEA)
TGEA负责确保代码的功能正确性。它不仅执行现有的测试,更重要的是能够根据代码的意图和上下文,智能地生成新的测试用例。
TGEA的工作流程:
- 接收测试请求: 从ROA接收
test_request消息,解析出仓库URL、分支、提交哈希和目标文件。 - 代码拉取与准备: 克隆或拉取指定分支的代码到隔离的测试环境。
- 分析代码与意图: 利用LLM分析目标文件的函数签名、docstrings、现有测试和相关上下文,理解其预期行为。
- 生成新测试用例: 根据分析结果和覆盖率目标,生成新的单元测试或集成测试。
- 执行测试: 运行所有(现有和新生成的)测试套件。
- 收集覆盖率: 使用代码覆盖率工具收集测试覆盖率数据。
- 报告生成: 整合测试结果和覆盖率数据,生成结构化的测试报告。
- 发送测试报告: 将
test_report消息发送回ROA。
关键技术与实现考虑:
- LLM驱动的测试生成:
- 函数签名分析: LLM可以解析函数或方法的签名,推断输入类型和可能的边界条件。
- Docstring理解: 如果代码有良好的文档字符串,LLM可以利用它们来理解函数的功能和预期行为,从而生成更精准的测试。
- 现有测试学习: 分析项目中的现有测试模式,学习如何编写符合项目风格和质量标准的测试用例。
- 边缘情况推断: LLM可以根据代码逻辑推断出可能导致错误或特殊行为的边缘情况,并生成相应的测试。
- 测试框架集成: 能够与流行的测试框架(如Python的Pytest,Java的JUnit,JavaScript的Jest/Mocha)无缝集成,执行测试并解析结果。
- 隔离测试环境: 测试应在独立的、可重现的环境中运行(如Docker容器),以避免环境污染和依赖冲突。
- 增量测试: 仅对受代码变更影响的部分进行测试,以提高效率。这需要精确的依赖分析。
TGEA Python代码示例(简化版):
import os
import subprocess
import json
import logging
from typing import List, Dict, Any, Optional
from openai import OpenAI # 假设使用OpenAI API
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
class TestGenerationExecutionAgent:
def __init__(self, llm_api_key: str, workspace_dir: str = "tgea_workspace"):
self.client = OpenAI(api_key=llm_api_key)
self.workspace_dir = workspace_dir
os.makedirs(self.workspace_dir, exist_ok=True)
logging.info("TestGenerationExecutionAgent initialized.")
def _get_function_details(self, file_content: str, file_path_relative: str) -> List[Dict[str, Any]]:
"""Uses LLM to extract function signatures and docstrings from code."""
prompt = f"""Given the following Python code from file '{file_path_relative}', identify all functions/methods. For each, extract its name, full signature (including parameters and their types/defaults if specified), and its docstring (if present).
Code:
```python
{file_content}
Please respond in JSON format, an array of objects. Each object should have ‘name’, ‘signature’, ‘docstring’, and ‘start_line’, ‘end_line’ keys.
Example:
[
{{
"name": "my_function",
"signature": "def my_function(param1: int, param2: str = ‘default’) -> bool:",
"docstring": "This function does X and returns Y.",
"start_line": 10,
"end_line": 25
}}
]
"""
try:
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant for code analysis."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
content = response.choices[0].message.content
return json.loads(content)
except Exception as e:
logging.error(f"LLM function detail extraction failed: {e}")
return []
def _generate_unit_tests(self, function_details: Dict[str, Any], code_snippet: str, file_path_relative: str) -> Optional[str]:
"""Uses LLM to generate unit tests for a given function."""
prompt = f"""Given the following Python function details and its code snippet from '{file_path_relative}', generate comprehensive unit tests using `pytest`. Focus on typical cases, edge cases, and error handling. Ensure proper imports and setup.
Function Details:
{json.dumps(function_details, indent=2)}
Code Snippet:
{code_snippet}
Generate only the Python code for the test file. Do not include any explanations or extra text.
The test file should be named test_{function_details['name']}.py.
"""
try:
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an expert in writing pytest unit tests."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
except Exception as e:
logging.error(f"LLM test generation failed for {function_details[‘name’]}: {e}")
return None
def _execute_pytest(self, test_dir: str, target_files: List[str]) -> Dict[str, Any]:
"""Executes pytest in a given directory and collects results and coverage."""
try:
# Run pytest and collect coverage
# --cov=. --cov-report=json:coverage.json
# target_files can be used to limit scope, e.g., pytest tests/test_my_module.py
cmd = ['pytest', '--json-report', '--cov', test_dir, '--cov-report=json:coverage.json']
# If specific files are passed, add them to the command
if target_files:
# Need to map relative paths to test file paths
# For simplicity, we assume tests are in `test_dir/test_*.py`
# A more robust solution would map changed source files to relevant test files
pass # Skipping for now for simplicity, assuming pytest finds tests in test_dir
result = subprocess.run(
cmd,
capture_output=True, text=True, check=False, cwd=test_dir
)
test_report = {"status": "failed", "details": "No report found.", "coverage": {}}
if os.path.exists(os.path.join(test_dir, ".report.json")):
with open(os.path.join(test_dir, ".report.json"), 'r') as f:
test_report['details'] = json.load(f)
test_report['status'] = "passed" if test_report['details']['summary']['passed'] == test_report['details']['summary']['total'] else "failed"
if os.path.exists(os.path.join(test_dir, "coverage.json")):
with open(os.path.join(test_dir, "coverage.json"), 'r') as f:
test_report['coverage'] = json.load(f)
return test_report
except Exception as e:
logging.error(f"Pytest execution failed in {test_dir}: {e}")
return {"status": "error", "details": str(e), "coverage": {}}
def generate_and_run_tests(self, repo_path: str, target_files: List[str]) -> Dict[str, Any]:
"""Main method to generate and run tests."""
test_output_dir = os.path.join(self.workspace_dir, "tests", os.path.basename(repo_path))
os.makedirs(test_output_dir, exist_ok=True)
all_test_reports = []
newly_generated_tests_count = 0
for file_path_relative in target_files:
full_file_path = os.path.join(repo_path, file_path_relative)
if not os.path.exists(full_file_path):
logging.warning(f"File not found: {full_file_path}. Skipping test generation.")
continue
file_content = ""
try:
with open(full_file_path, 'r', encoding='utf-8') as f:
file_content = f.read()
except Exception as e:
logging.error(f"Could not read file {full_file_path}: {e}")
continue
if os.path.splitext(full_file_path)[1] == '.py':
function_details_list = self._get_function_details(file_content, file_path_relative)
for func_detail in function_details_list:
# Extract snippet for the function
lines = file_content.splitlines()
func_snippet = "n".join(lines[func_detail['start_line']-1:func_detail['end_line']])
test_code = self._generate_unit_tests(func_detail, func_snippet, file_path_relative)
if test_code:
test_file_name = f"test_{func_detail['name']}.py"
test_file_path = os.path.join(test_output_dir, test_file_name)
with open(test_file_path, 'w', encoding='utf-8') as f:
f.write(test_code)
logging.info(f"Generated test file: {test_file_path}")
newly_generated_tests_count += 1
else:
logging.info(f"Test generation not supported for {os.path.splitext(full_file_path)[1]} files.")
# Now run all tests (including existing ones and newly generated ones)
# For simplicity, we'll assume existing tests are also in `test_output_dir`
# In a real scenario, you'd copy existing tests or point to them.
overall_test_report = self._execute_pytest(test_output_dir, target_files)
overall_test_report['newly_generated_tests_count'] = newly_generated_tests_count
return overall_test_report
Example usage (within ROA or a test harness)
if name == "main":
This part would typically be orchestrated by ROA
dummy_repo_path = "temp_repo_for_tests"
os.makedirs(dummy_repo_path, exist_ok=True)
with open(os.path.join(dummy_repo_path, "my_module.py"), "w") as f:
f.write("""
def add(a: int, b: int) -> int:
"""Adds two integers and returns the sum."""
return a + b
def subtract(a: int, b: int) -> int:
"""Subtracts b from a."""
return a – b
def divide(numerator: float, denominator: float) -> float:
"""Divides numerator by denominator. Raises ValueError if denominator is zero."""
if denominator == 0:
raise ValueError("Cannot divide by zero")
return numerator / denominator
""")
tgea = TestGenerationExecutionAgent(llm_api_key=os.getenv("OPENAI_API_KEY"))
target_files = ["my_module.py"]
report = tgea.generate_and_run_tests(dummy_repo_path, target_files)
print(json.dumps(report, indent=2))
Clean up
import shutil
shutil.rmtree(dummy_repo_path)
**测试报告数据结构示例:**
| 字段 | 类型 | 描述 | 示例值 #
# Test result structures
# This defines the expected test report structure from the TGEA.
# It includes overall status, detailed results for each test, and coverage information.
test_report_example = {
"status": "passed", # "passed", "failed", "error"
"newly_generated_tests_count": 3,
"details": {
"summary": {
"total": 5,
"passed": 5,
"failed": 0,
"skipped": 0,
"errors": 0,
"xpassed": 0,
"xfailed": 0,
"duration": 0.123
},
"tests": [
{
"nodeid": "test_add.py::test_add_positive_numbers",
"outcome": "passed",
"duration": 0.001,
"call": {
"longrepr": None,
"when": "call",
"wasxfail": False
}
},
{
"nodeid": "test_add.py::test_add_negative_numbers",
"outcome": "passed",
"duration": 0.001,
"call": {
"longrepr": None,
"when": "call",
"wasxfail": False
}
},
{
"nodeid": "test_divide.py::test_divide_by_zero",
"outcome": "passed", # In this case, passing means it raised ValueError as expected
"duration": 0.002,
"call": {
"longrepr": None,
"when": "call",
"wasxfail": False
}
}
# ... more tests
]
},
"coverage": {
"meta": {
"version": "6.5.0",
"timestamp": "2023-10-27T10:30:00Z"
},
"files": {
"my_module.py": {
"executed_pct": 100.0,
"num_statements": 10,
"missing_statements": 0,
"excluded_statements": 0,
"missing_branches": 0,
"num_branches": 0,
"executed_branches": 0,
"summary": {
"covered_lines": 10,
"num_statements": 10,
"percent_covered": 100.0,
"missing_lines": 0
}
}
},
"totals": {
"covered_lines": 10,
"num_statements": 10,
"percent_covered": 100.0,
"missing_lines": 0
}
}
}
4.3 Agent 3: 重构与编排Agent (ROA)
ROA是整个“Peer Review Circuits”的控制中心和决策者。它负责接收和综合CRA和TGEA的报告,根据预定义的策略和LLM的推理能力,决定下一步的行动,并与版本控制系统交互。
ROA的工作流程:
- 检测代码变更: 通过VCS webhook或定期轮询,检测到新的代码提交或PR创建。
- 触发审查与测试: 向CRA和TGEA发送相应的请求,并跟踪它们的执行状态。
- 接收并综合报告: 等待并接收CRA的
review_report和TGEA的test_report。 - 分析与决策:
- 问题分类: 将CRA报告中的问题(
ERROR,WARNING,INFO)和TGEA报告中的测试结果(passed,failed,error)进行分类和优先级排序。 - 策略评估: 根据预设的策略(例如,
ERROR级别的问题不允许合并,测试覆盖率不能低于X%,所有测试必须通过)。 - LLM辅助决策: 对于复杂的场景,LLM可以帮助ROA理解报告的语义,权衡不同类型问题的严重性,并提出更合理的行动建议。
- 问题分类: 将CRA报告中的问题(
- 执行行动: 根据决策结果,执行以下一个或多个操作:
- 自动应用修复: 对于CRA报告中带有
fix_patch的自动修复项,ROA直接应用补丁,并提交到一个新的分支。 - 触发重新审查/测试: 如果应用了修复,ROA会再次触发CRA和TGEA,以验证修复效果。
- 创建PR并建议修改: 如果存在无法自动修复的问题,ROA创建一个新的PR,并在PR描述中详细列出所有问题、建议和LLM的解释。
- 通知开发者: 通过Slack、邮件或VCS评论通知开发者。
- 批准/合并代码: 如果所有检查通过,ROA可以自动批准PR或将其合并到目标分支。
- 回滚/阻止合并: 如果存在严重问题(如关键测试失败),ROA阻止合并并可能回滚变更。
- 自动应用修复: 对于CRA报告中带有
- 更新VCS状态: 根据行动结果,更新PR状态(如“通过”、“失败”、“需要修改”)。
关键技术与实现考虑:
- VCS集成: 必须能够与Git、GitHub、GitLab、Bitbucket等版本控制系统进行深度集成,包括:
- 接收webhook事件。
- 拉取代码。
- 创建新分支。
- 应用补丁。
- 提交代码。
- 创建/更新PR。
- 添加评论。
- 设置PR状态。
- 决策引擎: 实现一个灵活的规则引擎来处理各种决策逻辑,例如:
IF CRA.errors > 0 THEN block_mergeIF TGEA.status == "failed" THEN request_reworkIF CRA.auto_fixable_errors > 0 AND TGEA.status == "passed" THEN auto_apply_fix_and_retest
- LLM for高级决策与重构:
- 综合报告解释: LLM可以对CRA和TGEA的原始报告进行更高层次的抽象和总结,帮助ROA做出更明智的决策。
- 上下文感知重构: 对于CRA识别出的“函数过长”或TGEA识别出的“覆盖率不足”,ROA可以利用LLM生成更复杂的重构建议,甚至直接生成重构后的代码补丁。
- PR描述生成: 自动生成清晰、有条理的PR描述,包含所有审查和测试结果,以及建议的下一步行动。
ROA Python代码示例(简化版):
import os
import json
import logging
from typing import List, Dict, Any
from github import Github # 假设使用PyGithub库与GitHub交互
from openai import OpenAI # 假设使用OpenAI API
from cra_agent import CodeReviewAgent # 导入之前定义的Agent
from tgea_agent import TestGenerationExecutionAgent # 导入之前定义的Agent
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
class OrchestrationAgent:
def __init__(self, github_token: str, llm_api_key: str, workspace_dir: str = "roa_workspace"):
self.github = Github(github_token)
self.llm_client = OpenAI(api_key=llm_api_key)
self.workspace_dir = workspace_dir
os.makedirs(self.workspace_dir, exist_ok=True)
self.cra = CodeReviewAgent(llm_api_key=llm_api_key)
self.tgea = TestGenerationExecutionAgent(llm_api_key=llm_api_key)
logging.info("OrchestrationAgent initialized.")
def _clone_repo(self, repo_url: str, branch_name: str, commit_hash: str) -> str:
"""Clones a repository to a temporary directory."""
repo_name = repo_url.split('/')[-1].replace('.git', '')
local_repo_path = os.path.join(self.workspace_dir, repo_name, commit_hash)
os.makedirs(local_repo_path, exist_ok=True)
try:
# Clone and checkout specific commit
subprocess.run(['git', 'clone', repo_url, local_repo_path], check=True)
subprocess.run(['git', '-C', local_repo_path, 'checkout', commit_hash], check=True)
logging.info(f"Cloned {repo_url} to {local_repo_path} and checked out {commit_hash}")
return local_repo_path
except subprocess.CalledProcessError as e:
logging.error(f"Failed to clone/checkout repo {repo_url}: {e}")
raise
def _apply_patch(self, repo_path: str, patch_content: str) -> bool:
"""Applies a git patch to the repository."""
try:
# Use 'git apply' to apply the patch
process = subprocess.run(
['git', 'apply', '--whitespace=fix', '-'],
input=patch_content,
text=True,
capture_output=True,
check=True,
cwd=repo_path
)
logging.info(f"Patch applied successfully to {repo_path}:n{process.stdout}")
return True
except subprocess.CalledProcessError as e:
logging.error(f"Failed to apply patch to {repo_path}:n{e.stderr}")
return False
def _create_github_pr(self, repo_full_name: str, base_branch: str, head_branch: str, title: str, body: str) -> Any:
"""Creates a GitHub Pull Request."""
try:
repo = self.github.get_user().get_repo(repo_full_name)
pr = repo.create_pull(title=title, body=body, head=head_branch, base=base_branch)
logging.info(f"Created PR: {pr.html_url}")
return pr
except Exception as e:
logging.error(f"Failed to create GitHub PR: {e}")
raise
def _commit_and_push_fixes(self, repo_path: str, branch_name: str, commit_message: str) -> str:
"""Commits changes and pushes to a new branch."""
try:
subprocess.run(['git', '-C', repo_path, 'add', '.'], check=True)
subprocess.run(['git', '-C', repo_path, 'commit', '-m', commit_message], check=True)
subprocess.run(['git', '-C', repo_path, 'push', '-u', 'origin', branch_name], check=True)
logging.info(f"Committed and pushed to branch {branch_name}")
return subprocess.run(['git', '-C', repo_path, 'rev-parse', 'HEAD'], capture_output=True, text=True, check=True).stdout.strip()
except subprocess.CalledProcessError as e:
logging.error(f"Failed to commit/push fixes: {e.stderr}")
raise
def process_code_change(self, repo_url: str, branch_name: str, commit_hash: str, changed_files: List[str]):
"""Main entry point for processing a code change."""
logging.info(f"Processing commit {commit_hash} in {repo_url}/{branch_name}")
local_repo_path = self._clone_repo(repo_url, branch_name, commit_hash)
# Step 1: Request Code Review from CRA
logging.info("Requesting code review...")
cra_report = self.cra.review_code(local_repo_path, changed_files)
logging.info(f"CRA Report Summary: {cra_report['summary']}")
# Step 2: Request Test Generation & Execution from TGEA
logging.info("Requesting test generation and execution...")
tgea_report = self.tgea.generate_and_run_tests(local_repo_path, changed_files)
logging.info(f"TGEA Report Status: {tgea_report['status']}, Newly generated tests: {tgea_report.get('newly_generated_tests_count', 0)}")
# Step 3: Analyze Reports and Make Decision
pr_title = f"Automated Review for {commit_hash[:7]}"
pr_body_parts = []
action_required = False
auto_fix_patches = []
# Analyze CRA report
if cra_report['summary']['ERROR'] > 0:
pr_body_parts.append(f"**Code Review Errors:** {cra_report['summary']['ERROR']} critical issues found.")
action_required = True
if cra_report['summary']['WARNING'] > 0:
pr_body_parts.append(f"**Code Review Warnings:** {cra_report['summary']['WARNING']} warnings found.")
for finding in cra_report['findings']:
if finding['auto_fixable'] and finding['fix_patch']:
auto_fix_patches.append(finding['fix_patch'])
pr_body_parts.append(f"- [{finding['severity']}] {finding['file']}:{finding['line']} - {finding['message']}n Suggestion: {finding['suggestion']}n Explanation: {finding['llm_explanation']}")
# Analyze TGEA report
if tgea_report['status'] == "failed" or tgea_report['status'] == "error":
pr_body_parts.append(f"**Test Execution Failed!**")
pr_body_parts.append(json.dumps(tgea_report['details'], indent=2))
action_required = True
else:
pr_body_parts.append(f"**Tests Passed!** Coverage: {tgea_report['coverage'].get('totals', {}).get('percent_covered', 'N/A')}%")
if tgea_report['newly_generated_tests_count'] > 0:
pr_body_parts.append(f"Generated {tgea_report['newly_generated_tests_count']} new tests.")
final_pr_body = "nn".join(pr_body_parts)
# Decision Logic
if not action_required and not auto_fix_patches:
logging.info("All checks passed, no fixes needed. Code is ready for merge.")
# In a real system, you might auto-merge here or mark PR as approved
print(f"Code for {commit_hash[:7]} is clean and tests pass. Ready for merge.")
return True
if auto_fix_patches and not action_required:
logging.info("Applying auto-fixable patches and re-running checks...")
# Create a new branch for auto-fixes
fix_branch_name = f"auto-fix/{branch_name}-{commit_hash[:7]}"
subprocess.run(['git', '-C', local_repo_path, 'checkout', '-b', fix_branch_name], check=True)
for patch in auto_fix_patches:
self._apply_patch(local_repo_path, patch)
# Commit and push fixes
new_commit_hash = self._commit_and_push_fixes(local_repo_path, fix_branch_name, f"Auto-fix for {commit_hash[:7]}")
# Re-trigger the process for the new commit
logging.info(f"Auto-fixes applied. Re-triggering review for new commit {new_commit_hash[:7]}")
# This would be an internal loop or message back to self
self.process_code_change(repo_url, fix_branch_name, new_commit_hash, changed_files)
return True
if action_required or auto_fix_patches: # If there are issues or fixes to be proposed
logging.info("Issues found or fixes proposed. Creating/updating PR.")
# Assume we are dealing with a PR, or create a new one
# For simplicity, we'll just print the PR details.
# In a real system, you'd interact with the actual PR object
print(f"n--- PR Title ---n{pr_title}nn--- PR Body ---n{final_pr_body}")
# Example: self._create_github_pr(repo_full_name, branch_name, f"agent-review/{commit_hash[:7]}", pr_title, final_pr_body)
return False
# Example usage (simulating a webhook trigger)
# if __name__ == "__