各位技术同仁，下午好！

今天，我们齐聚一堂，共同探讨一个激动人心且极具挑战性的前沿议题：构建一个自主研发Agent。在AI技术飞速发展的当下，我们不禁思考，软件开发的未来形态会是怎样？能否有一个系统，它不仅能辅助我们，更能独立地完成从需求理解、代码编写、测试验证、问题修复，直至最终提交代码的整个开发闭环？

答案是肯定的，并且我们正在一步步将其变为现实。今天，我将深入解析如何构建这样一个“自主研发Agent”，一个能够编写代码、运行测试、根据报错自我修复，并最终提交Pull Request (PR) 的闭环系统。我们将从架构设计、核心组件到实际代码实现细节，进行一次全面的技术解剖。

01. 自主研发Agent：愿景与核心理念

想象一下，你只需向一个系统描述你的需求，它便能自动理解、规划、编码、测试、修正，直至将一个功能完备、通过所有测试的代码提交到你的版本控制系统。这正是我们所追求的“自主研发Agent”——一个能够模拟甚至超越初级开发人员工作流程的智能体。

其核心理念在于闭环反馈。传统的开发流程是线性的：需求 -> 开发 -> 测试 -> 修复 -> 提交。而自主研发Agent则将这一过程内化为一个动态循环：它不仅仅是生成代码，更重要的是它能“感知”代码运行的结果（通过测试），“理解”失败的原因（通过错误分析），并“行动”去修复问题（通过代码修改）。这个循环的每一个环节都由AI驱动，旨在最大限度地减少人工干预。

这个Agent不仅是一个代码生成器，它更是一个智能的“开发者”，拥有以下关键能力：

理解与规划：将高层级需求转化为可执行的技术方案和步骤。
代码生成与修改：基于规划编写新代码，或对现有代码进行迭代修改。
环境交互：在实际环境中运行代码、执行测试，并捕获输出。
错误分析与反思：解析运行结果和错误信息，诊断问题根源。
自我修正：根据诊断结果，调整规划或直接修改代码。
版本控制与协作：将最终成果提交到代码仓库，并准备好与团队协作。

我们将围绕这六大能力，构建我们的Agent。

02. Agent的宏观架构：一个迭代的闭环系统

为了实现上述愿景，我们的自主研发Agent需要一个模块化、可扩展的架构。我将其设计为一个以大型语言模型（LLM）为核心的迭代执行闭环。

核心组件概览：

组件名称	职责	输入	输出	驱动技术
规划器 (Planner)	任务分解，制定详细的开发计划。	用户需求/Issue，历史任务上下文	结构化的任务列表，包含步骤和目标	LLM (Chain-of-Thought, ReAct)
编码器 (Coder)	生成或修改代码，包括功能代码和测试代码。	规划器输出，现有文件内容，代码结构上下文	建议的代码块，修改文件路径	LLM (In-context learning, RAG)
执行器 (Executor)	在沙箱环境中运行代码，执行测试，捕获输出。	编码器输出（文件内容），执行命令	命令执行结果（stdout, stderr, exit code）	沙箱环境 (Docker, venv)，CLI工具 (pytest)
反思器 (Reflector)	分析执行结果和错误，诊断问题，提出修复建议。	执行器输出（错误日志），原始规划，代码更改	问题诊断，修复策略，或直接的代码修改建议	LLM (Error parsing, Self-reflection)
提交器 (Commiter)	管理版本控制，创建分支，提交代码，发起PR。	成功通过测试的代码，PR描述	Git操作结果，PR链接	Git CLI/API，LLM (PR描述生成)
内存/上下文管理器 (Memory/Context Manager)	存储和检索Agent的历史交互、代码片段、文件结构等。	所有组件的输入/输出	上下文信息	向量数据库，KV存储

工作流程概览：

初始化：用户提交一个需求（例如，一个GitHub Issue）。
规划阶段：规划器接收需求，结合项目上下文，制定一个详细的、可执行的开发计划。
编码阶段：编码器根据计划，逐一生成或修改代码文件（包括功能代码和测试代码）。
执行阶段：执行器在隔离环境中运行编码器生成的代码和测试，捕获所有输出。
反思阶段：反思器分析执行器的输出。
- 如果测试通过且无其他错误，则流程进入提交阶段。
- 如果测试失败或出现运行时错误，反思器诊断问题，并向规划器或编码器提供修正建议，从而回到规划或编码阶段，形成自我修复循环。
提交阶段：当所有测试通过，代码质量检查无误后，提交器将代码提交到版本控制系统，创建新分支，并发起一个Pull Request。

现在，我们逐一深入探讨每个核心组件的实现细节。

03. 核心组件深入：规划器 (The Planner)

规划器是Agent的“大脑”，它负责将模糊的需求转化为清晰、可执行的步骤。这一阶段的质量直接影响后续编码和修复的效率。

核心任务：

需求理解：从用户输入中提取关键信息和目标。
任务分解：将大任务拆解为更小、更具体的子任务。
策略制定：为每个子任务制定实现策略，包括需要修改的文件、新增的函数、测试用例等。

技术实现：

我们主要依赖LLM的强大理解和推理能力。通过精心设计的Prompt，引导LLM进行结构化的思考。

Prompt工程示例：

假设我们的任务是“在 math_utils.py 中添加一个 divide 函数，该函数能够处理除数为零的情况，返回特定的错误信息，并为其编写单元测试。”

# planner_prompt.py

PLANNER_SYSTEM_PROMPT = """
你是一个高级软件开发工程师，负责将用户需求转化为详细的开发计划。
你的目标是：
1. 深入理解用户需求。
2. 将需求分解为一系列具体、可执行的子任务。
3. 为每个子任务指定需要操作的文件、预期更改类型（新增、修改、删除）、以及实现思路。
4. 思考可能遇到的边缘情况和测试策略。

请以JSON格式输出你的计划，确保每个步骤都清晰明确。
你的输出必须只包含JSON，不包含任何额外的说明。

JSON Schema:
{
  "plan": [
    {
      "step": "string", // 步骤描述
      "description": "string", // 详细说明
      "files_to_operate": ["string"], // 涉及的文件路径列表
      "operation_type": "string", // "ADD", "MODIFY", "DELETE"
      "expected_outcome": "string", // 预期的结果或状态
      "implementation_notes": "string" // 实现时的注意事项或思路
    }
  ]
}
"""

PLANNER_USER_PROMPT_TEMPLATE = """
用户需求：
{user_request}

项目上下文 (部分文件结构，如果相关)：
{project_context}

请根据以上信息，生成详细的开发计划。
"""

def generate_plan(llm_client, user_request, project_context):
    """
    使用LLM生成开发计划。
    """
    user_prompt = PLANNER_USER_PROMPT_TEMPLATE.format(
        user_request=user_request,
        project_context=project_context
    )

    response = llm_client.chat.completions.create(
        model="gpt-4o", # 或其他合适的LLM
        messages=[
            {"role": "system", "content": PLANNER_SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        response_format={"type": "json_object"},
        temperature=0.7
    )

    # 假设LLM返回了有效的JSON字符串
    plan_json_str = response.choices[0].message.content
    import json
    return json.loads(plan_json_str)

# 示例调用
# llm_client = OpenAI(...) # 假设已经初始化了LLM客户端
# user_req = "在 `src/math_utils.py` 中添加一个 `divide` 函数，处理除零情况，并为其编写单元测试。"
# proj_context = """
# src/
#   __init__.py
#   math_utils.py
# tests/
#   __init__.py
#   test_math_utils.py
# """
# plan = generate_plan(llm_client, user_req, proj_context)
# print(json.dumps(plan, indent=2))

示例规划输出 (JSON格式)：

{
  "plan": [
    {
      "step": "创建或修改 `src/math_utils.py` 文件",
      "description": "在 `src/math_utils.py` 中定义 `divide` 函数。",
      "files_to_operate": ["src/math_utils.py"],
      "operation_type": "MODIFY",
      "expected_outcome": "`divide` 函数已定义并可导入。",
      "implementation_notes": "函数应接受两个参数：`numerator` 和 `denominator`。需要处理 `denominator` 为零的情况，返回特定的错误信息或抛出自定义异常。"
    },
    {
      "step": "创建或修改 `tests/test_math_utils.py` 文件",
      "description": "为 `src/math_utils.py` 中的 `divide` 函数编写单元测试。",
      "files_to_operate": ["tests/test_math_utils.py"],
      "operation_type": "MODIFY",
      "expected_outcome": "`divide` 函数的所有主要功能和边缘情况都通过测试。",
      "implementation_notes": "至少需要测试：正常除法、除数为零的情况（预期错误）、负数除法、浮点数除法。使用 `pytest` 框架。"
    }
  ]
}

规划器通过这样的方式，将一个高层级任务细化为一系列可操作的步骤，为后续的编码工作提供了清晰的指引。

04. 核心组件深入：编码器 (The Coder)

编码器是Agent的“双手”，它负责将规划器的指令转化为实际的代码。这包括生成新代码块、修改现有代码、甚至生成新的测试文件。

核心任务：

代码生成：根据规划和上下文生成功能代码。
测试生成：为新功能编写对应的单元测试。
代码修改：在自我修复阶段，根据反思器的建议修改现有代码。

技术实现：

同样依赖LLM，但更强调上下文管理和精确的代码结构控制。

上下文管理 (Retrieval-Augmented Generation – RAG)：

LLM的上下文窗口是有限的。对于大型项目，我们不能将所有代码都塞进Prompt。因此，我们需要：

文件检索：根据规划器指定的 files_to_operate 和任务内容，检索相关文件的代码。
代码片段提取：如果文件过大，可能需要只提取与任务相关的函数或类定义。
结构化上下文：除了代码，还需要提供项目结构、依赖关系、编码规范等信息。

Prompt工程示例：

# coder_prompt.py

CODER_SYSTEM_PROMPT = """
你是一个资深Python工程师，你的任务是根据给定的开发计划和文件内容，精确地修改或创建代码。
请严格按照要求操作，并只输出修改后的完整文件内容（如果文件不存在，则输出新文件的完整内容）。
不要包含任何解释性文字，除了代码本身。

你的输出格式应为：
```python
# <file_path>
# <file_content>

如果需要修改多个文件，请重复上述格式。
"""

CODER_USER_PROMPT_TEMPLATE = """
开发计划步骤：
{plan_step_description}

当前文件内容 (如果文件已存在)：

{file_path}

{file_content}

请根据上述计划步骤，对 {file_path} 文件进行相应的修改或创建，并输出修改后的完整文件内容。
"""

def generate_code(llm_client, plan_step, file_path, current_file_content):
"""
使用LLM生成或修改代码。
"""
user_prompt = CODER_USER_PROMPT_TEMPLATE.format(
plan_step_description=plan_step["description"] + "n" + plan_step["implementation_notes"],
file_path=file_path,
current_file_content=current_file_content if current_file_content else "# 文件不存在，请创建新文件"
)

response = llm_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": CODER_SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt}
    ],
    temperature=0.2 # 编码时通常需要更低的温度以提高确定性
)

# 解析LLM的输出，提取文件路径和内容
output_text = response.choices[0].message.content.strip()

# 查找并提取代码块
import re
code_blocks = re.findall(r'```pythonn# (.+?)n(.*?)n```', output_text, re.DOTALL)

if not code_blocks:
    raise ValueError("LLM未能输出预期的代码块格式。")

# 返回第一个匹配到的文件和内容，实际应用中可能需要处理多个文件
return code_blocks[0][0], code_blocks[0][1]

假设 `src/math_utils.py` 初始为空或只包含一些导入

current_math_utils_content = ""

file_path_math_utils, new_math_utils_code = generate_code(llm_client, plan["plan"][0], "src/math_utils.py", current_math_utils_content)

print(f"Generated {file_path_math_utils}:n{new_math_utils_code}")

假设 `tests/test_math_utils.py` 初始为空

current_test_utils_content = ""

file_path_test_utils, new_test_utils_code = generate_code(llm_client, plan["plan"][1], "tests/test_math_utils.py", current_test_utils_content)

print(f"Generated {file_path_test_utils}:n{new_test_utils_code}")


**示例编码器输出 (`src/math_utils.py`)：**

```python
# src/math_utils.py
def divide(numerator: float, denominator: float) -> float:
    """
    Performs division of two numbers.

    Args:
        numerator: The dividend.
        denominator: The divisor.

    Returns:
        The result of the division.

    Raises:
        ValueError: If the denominator is zero.
    """
    if denominator == 0:
        raise ValueError("Cannot divide by zero.")
    return numerator / denominator

示例编码器输出 (tests/test_math_utils.py)：

# tests/test_math_utils.py
import pytest
from src.math_utils import divide

def test_divide_positive_numbers():
    assert divide(10, 2) == 5.0
    assert divide(7, 3) == pytest.approx(2.333333)

def test_divide_negative_numbers():
    assert divide(-10, 2) == -5.0
    assert divide(10, -2) == -5.0
    assert divide(-10, -2) == 5.0

def test_divide_by_zero():
    with pytest.raises(ValueError, match="Cannot divide by zero."):
        divide(10, 0)
    with pytest.raises(ValueError, match="Cannot divide by zero."):
        divide(0, 0)

def test_divide_floating_point_numbers():
    assert divide(10.5, 2.5) == pytest.approx(4.2)
    assert divide(1, 3) == pytest.approx(0.3333333333333333)

def test_divide_zero_numerator():
    assert divide(0, 5) == 0.0

编码器在生成代码时，会尽力遵循Python的最佳实践和类型提示，以及根据Prompt中提供的上下文信息（例如，项目已有的代码风格、命名约定等）。

05. 核心组件深入：执行器 (The Executor)

执行器是Agent的“测试台”，它负责在受控环境中运行代码并捕获其输出。这是实现“自我修复”能力的关键一步，因为它提供了关于代码行为的客观事实。

核心任务：

环境隔离：提供一个干净、可重复的执行环境，防止副作用和环境污染。
命令执行：运行指定的Shell命令，例如单元测试、Linter、静态分析工具等。
结果捕获：捕获命令的标准输出、标准错误和退出码。

技术实现：

沙箱环境：推荐使用Docker容器或Python的 venv 虚拟环境。Docker提供更强的隔离性，适用于多语言项目；venv 更轻量，适用于纯Python项目。
Python子进程：使用 subprocess 模块来执行Shell命令。

执行流程：

将编码器生成的所有文件写入到沙箱环境的工作目录中。
构建并执行测试命令（例如 pytest）。
解析 pytest 的输出，判断测试是否通过，并提取详细的错误信息。

代码示例 (使用 subprocess 和 pytest)：

# executor.py
import subprocess
import os
import shutil

class Executor:
    def __init__(self, base_work_dir="agent_workspace"):
        self.base_work_dir = base_work_dir
        # 为每次执行创建一个独立的临时工作目录
        self.current_work_dir = os.path.join(self.base_work_dir, "run_" + os.urandom(4).hex())
        os.makedirs(self.current_work_dir, exist_ok=True)
        print(f"Executor initialized with work directory: {self.current_work_dir}")

    def _setup_environment(self, files_to_write):
        """
        将文件写入到工作目录中，并设置好Python虚拟环境（如果需要）。
        """
        for file_path, content in files_to_write.items():
            full_path = os.path.join(self.current_work_dir, file_path)
            os.makedirs(os.path.dirname(full_path), exist_ok=True)
            with open(full_path, "w") as f:
                f.write(content)
            print(f"Wrote file: {full_path}")

        # 假设项目根目录是 self.current_work_dir
        # 可以创建虚拟环境并在其中安装依赖
        # venv_path = os.path.join(self.current_work_dir, ".venv")
        # if not os.path.exists(venv_path):
        #     subprocess.run(["python3", "-m", "venv", venv_path], cwd=self.current_work_dir, check=True)
        #     # 激活并安装依赖
        #     pip_exec = os.path.join(venv_path, "bin", "pip")
        #     subprocess.run([pip_exec, "install", "pytest"], cwd=self.current_work_dir, check=True)

    def execute_tests(self, files_to_write):
        """
        在工作目录中执行pytest测试。

        Args:
            files_to_write: 一个字典，键为文件路径（相对于工作目录），值为文件内容。

        Returns:
            一个字典，包含stdout, stderr, exit_code, test_passed布尔值。
        """
        self._setup_environment(files_to_write)

        # 构建pytest命令
        # 假设pytest在系统PATH中，或者已经在虚拟环境内安装
        command = ["pytest", "--json-report", "--json-report-file=.pytest_result.json", self.current_work_dir]

        try:
            # 运行命令并捕获输出
            # cwd设置为工作目录，确保pytest能找到文件
            process = subprocess.run(
                command,
                cwd=self.current_work_dir,
                capture_output=True,
                text=True,
                check=False # 不抛出异常，即使命令返回非零退出码
            )

            stdout = process.stdout
            stderr = process.stderr
            exit_code = process.returncode

            # 解析pytest的JSON报告
            test_passed = False
            test_report = {}
            report_file = os.path.join(self.current_work_dir, ".pytest_result.json")
            if os.path.exists(report_file):
                with open(report_file, 'r') as f:
                    try:
                        test_report = json.load(f)
                        # 根据pytest的退出码和报告判断是否通过
                        # pytest的退出码0表示所有测试通过
                        # 退出码1表示有测试失败
                        # 退出码2表示pytest内部错误
                        # ...
                        if exit_code == 0:
                            test_passed = True
                        else:
                            # 即使exit_code不为0，也需要检查报告来确认是失败还是错误
                            summary = test_report.get('summary', {})
                            if summary.get('failed', 0) == 0 and summary.get('errors', 0) == 0:
                                test_passed = True # 偶尔pytest会返回非0但实际所有测试通过的情况
                    except json.JSONDecodeError:
                        print("Failed to decode pytest JSON report.")
            else:
                # 如果没有json报告，则仅根据退出码判断
                test_passed = (exit_code == 0)

            return {
                "stdout": stdout,
                "stderr": stderr,
                "exit_code": exit_code,
                "test_passed": test_passed,
                "test_report": test_report
            }

        except FileNotFoundError:
            return {
                "stdout": "",
                "stderr": "Error: pytest command not found. Please ensure pytest is installed and in PATH.",
                "exit_code": 127,
                "test_passed": False,
                "test_report": {}
            }
        except Exception as e:
            return {
                "stdout": "",
                "stderr": f"An unexpected error occurred during execution: {e}",
                "exit_code": 1,
                "test_passed": False,
                "test_report": {}
            }
        finally:
            # 清理工作目录 (可选，调试时可以保留)
            shutil.rmtree(self.current_work_dir, ignore_errors=True)
            print(f"Cleaned up work directory: {self.current_work_dir}")

# 示例调用
# files = {
#     "src/math_utils.py": new_math_utils_code,
#     "tests/test_math_utils.py": new_test_utils_code
# }
# executor = Executor()
# results = executor.execute_tests(files)
# print(results)

执行器返回的结果包含了所有必要的信息：标准输出、标准错误、退出码以及结构化的测试报告。这些信息将是反思器进行问题诊断的关键。

06. 核心组件深入：反思器 (The Reflector)

反思器是Agent的“智慧”，它将执行器提供的原始数据转化为可操作的洞察。这是实现“自我修复”的核心环节，Agent通过反思器来学习并改进。

核心任务：

错误解析：从 stderr 和测试报告中提取关键错误信息，例如堆栈跟踪、错误类型、失败的测试用例。
问题诊断：根据错误信息，结合原始规划和代码，推断出问题的根本原因（例如，语法错误、逻辑错误、边界条件未处理、测试用例错误）。
修复策略建议：提出具体的代码修改建议或调整规划的建议。

技术实现：

同样主要依赖LLM，但需要更复杂的Prompt工程来引导其进行逻辑推理和问题解决。

Prompt工程示例：

# reflector_prompt.py

REFLECTOR_SYSTEM_PROMPT = """
你是一个经验丰富的软件调试专家，负责分析代码执行结果和错误日志，诊断问题并提出具体的修复建议。
你的任务是：
1. 仔细阅读执行结果（stdout, stderr, exit code）和测试报告。
2. 识别失败的测试用例或运行时错误。
3. 结合原始代码和开发计划，分析问题的根本原因。
4. 提出清晰、具体的代码修改建议，或者调整开发计划的建议。
5. 你的输出应包含诊断结果和修复建议。
6. 如果修复建议是代码修改，请以与编码器相同的格式提供修改后的代码块。

输出格式：
{
  "diagnosis": "string", // 问题的简要诊断
  "root_cause": "string", // 问题的根本原因分析
  "severity": "HIGH" | "MEDIUM" | "LOW", // 错误严重性
  "fix_strategy": "string", // 修复策略概述
  "code_changes": [ // 具体的代码修改建议列表
    {
      "file_path": "string", // 需要修改的文件路径
      "new_content": "string" // 修改后的完整文件内容
    }
  ],
  "plan_adjustment_needed": "boolean", // 是否需要调整原始计划
  "plan_adjustment_suggestion": "string" // 如果需要，提供计划调整建议
}
"""

REFLECTOR_USER_PROMPT_TEMPLATE = """
--- 原始开发计划步骤 ---
{original_plan_step}

--- 当前代码内容 ---
{current_code_files}

--- 执行结果 ---
Stdout:
{stdout}

Stderr:
{stderr}

Exit Code: {exit_code}
Test Passed: {test_passed}

--- Pytest JSON 报告 (如果存在) ---
{pytest_report_json}

请根据以上信息，诊断问题，并提供修复建议。
"""

def reflect_on_execution(llm_client, original_plan_step, current_code_files, execution_results):
    """
    使用LLM反思执行结果并生成修复建议。
    """
    user_prompt = REFLECTOR_USER_PROMPT_TEMPLATE.format(
        original_plan_step=original_plan_step,
        current_code_files="n".join([f"# {path}n{content}" for path, content in current_code_files.items()]),
        stdout=execution_results["stdout"],
        stderr=execution_results["stderr"],
        exit_code=execution_results["exit_code"],
        test_passed=execution_results["test_passed"],
        pytest_report_json=json.dumps(execution_results["test_report"], indent=2) if execution_results["test_report"] else "N/A"
    )

    response = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": REFLECTOR_SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        response_format={"type": "json_object"},
        temperature=0.3
    )

    reflection_json_str = response.choices[0].message.content
    return json.loads(reflection_json_str)

# 假设执行结果是成功的
# execution_results_success = {
#     "stdout": "============================= 4 passed in 0.01s =============================n",
#     "stderr": "",
#     "exit_code": 0,
#     "test_passed": True,
#     "test_report": {...} # 真实的pytest报告
# }
# current_code = {
#     "src/math_utils.py": new_math_utils_code,
#     "tests/test_math_utils.py": new_test_utils_code
# }
# reflection_success = reflect_on_execution(llm_client, plan["plan"][0], current_code, execution_results_success)
# print(json.dumps(reflection_success, indent=2))

# 假设 `divide` 函数中有一个bug，例如除以零时返回None而不是抛出异常
# current_math_utils_buggy = """
# def divide(numerator: float, denominator: float) -> float:
#     if denominator == 0:
#         return None # Bug: 应该抛出ValueError
#     return numerator / denominator
# """
# buggy_files = {
#     "src/math_utils.py": current_math_utils_buggy,
#     "tests/test_math_utils.py": new_test_utils_code # 测试用例是正确的，会捕获ValueError
# }
# # 假设执行器运行后，test_divide_by_zero 会失败
# execution_results_failure = {
#     "stdout": "...",
#     "stderr": "...E       assert None is Nonen...E       assert 0.0 == pytest.approx(0.0)n...",
#     "exit_code": 1,
#     "test_passed": False,
#     "test_report": {
#         "summary": {"failed": 1},
#         "collectors": [
#             {"nodeid": "tests/test_math_utils.py", "status": "passed", "tests": [
#                 {"nodeid": "tests/test_math_utils.py::test_divide_by_zero", "outcome": "failed", "call": {"excinfo": "AssertionError: Expected ValueError, but no exception was raised."}}
#             ]}
#         ]
#     }
# }
# reflection_failure = reflect_on_execution(llm_client, plan["plan"][0], buggy_files, execution_results_failure)
# print(json.dumps(reflection_failure, indent=2))

示例反思器输出 (针对除零Bug)：

{
  "diagnosis": "divide函数在处理除数为零时未按预期抛出ValueError。",
  "root_cause": "在`src/math_utils.py`的`divide`函数中，当`denominator`为0时，函数返回了`None`而不是抛出`ValueError`异常。这与单元测试`test_divide_by_zero`中预期的行为不符。",
  "severity": "HIGH",
  "fix_strategy": "修改`divide`函数，确保在除数为零时抛出`ValueError`。",
  "code_changes": [
    {
      "file_path": "src/math_utils.py",
      "new_content": "def divide(numerator: float, denominator: float) -> float:n    """n    Performs division of two numbers.nn    Args:n        numerator: The dividend.n        denominator: The divisor.nn    Returns:n        The result of the division.nn    Raises:n        ValueError: If the denominator is zero.n    """n    if denominator == 0:n        raise ValueError("Cannot divide by zero.")n    return numerator / denominator"
    }
  ],
  "plan_adjustment_needed": false,
  "plan_adjustment_suggestion": ""
}

反思器能够精准地定位问题，并提供可直接用于修复的代码。这个输出将再次喂给编码器，形成一个迭代的自我修复循环。

07. 核心组件深入：提交器 (The Committer)

当代码通过所有测试，并经过反思器确认无误后，提交器将负责将最终的成果整合到版本控制系统中，并生成一个可供人工评审的Pull Request。

核心任务：

版本控制操作：创建新分支、添加文件、提交代码。
PR生成：根据Agent的活动和最终代码更改，自动生成Pull Request的标题和描述。
清理：在完成任务后清理临时工作区。

技术实现：

Git CLI：通过 subprocess 调用Git命令行工具。
GitHub/GitLab API：用于创建Pull Request。

代码示例 (使用 subprocess 进行Git操作)：

# committer.py
import subprocess
import os
import json

class Committer:
    def __init__(self, repo_path, remote_name="origin"):
        self.repo_path = repo_path
        self.remote_name = remote_name
        # 确保repo_path是一个有效的git仓库
        if not os.path.isdir(os.path.join(repo_path, ".git")):
            raise ValueError(f"'{repo_path}' is not a valid Git repository.")

    def _run_git_command(self, command_args):
        """
        在仓库路径下运行Git命令。
        """
        full_command = ["git"] + command_args
        print(f"Running Git command: {' '.join(full_command)}")
        try:
            result = subprocess.run(
                full_command,
                cwd=self.repo_path,
                capture_output=True,
                text=True,
                check=True # 如果命令失败则抛出CalledProcessError
            )
            return result.stdout.strip()
        except subprocess.CalledProcessError as e:
            print(f"Git command failed: {e.cmd}")
            print(f"Stdout: {e.stdout}")
            print(f"Stderr: {e.stderr}")
            raise

    def create_and_checkout_branch(self, branch_name):
        """
        创建并切换到新分支。
        """
        self._run_git_command(["checkout", "-b", branch_name])
        print(f"Created and checked out branch: {branch_name}")

    def add_and_commit(self, files_to_add, commit_message):
        """
        添加文件并提交。
        """
        for file in files_to_add:
            self._run_git_command(["add", file])
        self._run_git_command(["commit", "-m", commit_message])
        print(f"Committed changes with message: '{commit_message}'")

    def push_branch(self, branch_name):
        """
        推送分支到远程仓库。
        """
        self._run_git_command(["push", "-u", self.remote_name, branch_name])
        print(f"Pushed branch '{branch_name}' to remote.")

    def create_pull_request(self, pr_title, pr_body, head_branch, base_branch="main"):
        """
        创建Pull Request（需要外部工具或API，这里模拟或使用GitHub CLI）。

        Args:
            pr_title: PR标题。
            pr_body: PR描述。
            head_branch: 源分支（Agent创建的分支）。
            base_branch: 目标分支（通常是main/master）。

        Returns:
            PR的URL或创建结果。
        """
        # 实际操作中，这里会调用GitHub/GitLab API或GitHub CLI。
        # 假设我们使用GitHub CLI: `gh pr create`
        # self._run_git_command(["gh", "pr", "create", "--title", pr_title, "--body", pr_body, "--head", head_branch, "--base", base_branch])

        print(f"n--- Simulating Pull Request Creation ---")
        print(f"Title: {pr_title}")
        print(f"Body:n{pr_body}")
        print(f"Source Branch: {head_branch}")
        print(f"Target Branch: {base_branch}")
        print(f"----------------------------------------")

        # 返回一个模拟的PR URL
        return f"https://github.com/your_org/your_repo/pull/{os.urandom(2).hex()}"

def generate_pr_description(llm_client, user_request, original_plan, final_code_changes):
    """
    使用LLM生成PR的标题和描述。
    """
    PR_SYSTEM_PROMPT = """
    你是一个熟练的软件工程师，负责根据开发任务、计划和最终代码更改，撰写清晰、专业的Pull Request标题和描述。
    标题应该简洁明了，描述应包含：
    1. 解决的问题或实现的功能。
    2. 核心更改点。
    3. 任何需要注意的细节。
    4. 如何测试（可选，但推荐）。

    输出格式应为JSON：
    {
      "title": "string",
      "body": "string"
    }
    """
    PR_USER_PROMPT_TEMPLATE = """
    用户需求：{user_request}
    原始开发计划：{original_plan}
    最终代码更改概述：{final_code_changes_summary}

    请根据以上信息，生成一个Pull Request的标题和描述。
    """

    # 简化final_code_changes，避免超出上下文窗口
    changes_summary = []
    for path, _ in final_code_changes.items():
        changes_summary.append(f"- Modified: {path}")

    user_prompt = PR_USER_PROMPT_TEMPLATE.format(
        user_request=user_request,
        original_plan=json.dumps(original_plan, indent=2),
        final_code_changes_summary="n".join(changes_summary)
    )

    response = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": PR_SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        response_format={"type": "json_object"},
        temperature=0.5
    )
    return json.loads(response.choices[0].message.content)

# 示例调用 (需要在实际Git仓库中运行)
# committer = Committer(repo_path="path/to/your/git/repo")
# branch_name = "feature/add-divide-function-" + os.urandom(2).hex()
# committer.create_and_checkout_branch(branch_name)

# # 假设我们将之前生成的代码写入到实际文件系统中
# # (这部分逻辑在Agent主循环中处理)
# # with open(os.path.join(committer.repo_path, "src/math_utils.py"), "w") as f:
# #     f.write(new_math_utils_code)
# # with open(os.path.join(committer.repo_path, "tests/test_math_utils.py"), "w") as f:
# #     f.write(new_test_utils_code)

# committer.add_and_commit(["src/math_utils.py", "tests/test_math_utils.py"], "feat: Add divide function with zero division handling")
# committer.push_branch(branch_name)

# pr_info = generate_pr_description(llm_client, user_req, plan, final_code_changes_dict)
# pr_url = committer.create_pull_request(pr_info["title"], pr_info["body"], branch_name)
# print(f"Pull Request created: {pr_url}")

提交器负责将Agent的努力成果最终交付，并以标准化的PR形式呈现，方便人工审查和集成。

08. 闭环系统实战：一次完整的开发流程

现在，让我们将所有组件整合起来，走一遍完整的“自主研发Agent”闭环流程。

场景： 在现有Python项目中，添加一个 utils/string_utils.py 文件，包含一个 reverse_string 函数，并为其编写测试。

import os
import json
import time
from openai import OpenAI # 假设你已经安装并配置了OpenAI客户端

# 假设所有组件的类和函数都已定义并可用
# Planner, Coder, Executor, Reflector, Committer, generate_pr_description, generate_plan, generate_code, reflect_on_execution

class AutonomousDevAgent:
    def __init__(self, llm_client, repo_path, work_dir_base="agent_workspace"):
        self.llm_client = llm_client
        self.repo_path = repo_path
        self.executor = Executor(base_work_dir=work_dir_base)
        self.committer = Committer(repo_path=self.repo_path)
        self.current_files_state = {} # 存储Agent工作目录中的文件状态
        self.original_request = ""
        self.current_plan = {}
        self.branch_name = ""

    def _update_file_state(self, file_path, content):
        """更新Agent内存中的文件状态，并写入到执行器的临时工作目录"""
        self.current_files_state[file_path] = content
        # 实际操作中，executor的工作目录在每次执行时会重新创建并写入，这里只是为了演示
        # 真正的文件写入发生在 executor._setup_environment 中

    def run(self, user_request):
        self.original_request = user_request
        self.branch_name = "feature/agent-" + user_request.replace(" ", "-").lower()[:30] + "-" + os.urandom(2).hex()

        print(f"n--- Agent Started ---")
        print(f"User Request: {user_request}")
        print(f"Target Branch: {self.branch_name}")

        try:
            # 1. 创建并切换到新分支
            self.committer.create_and_checkout_branch(self.branch_name)

            # 2. 规划阶段
            print("n[PLANNING] Generating initial plan...")
            # 假设我们能获取到初始的项目上下文
            project_context = self._get_initial_project_context() 
            self.current_plan = generate_plan(self.llm_client, user_request, project_context)
            print(f"Plan generated: {json.dumps(self.current_plan, indent=2)}")

            max_iterations = 5 # 防止无限循环
            iteration = 0

            while iteration < max_iterations:
                print(f"n--- Iteration {iteration + 1} ---")
                all_tests_passed = True

                # 遍历计划中的每个步骤
                for step_idx, step in enumerate(self.current_plan["plan"]):
                    print(f"n[STEP {step_idx + 1}/{len(self.current_plan['plan'])}] {step['step']}")

                    file_path = step["files_to_operate"][0] # 简化处理，假设每个步骤只操作一个文件
                    current_content = self.current_files_state.get(file_path, "")

                    # 3. 编码阶段
                    print(f"[CODING] Generating/modifying code for {file_path}...")
                    try:
                        _, new_content = generate_code(self.llm_client, step, file_path, current_content)
                        self._update_file_state(file_path, new_content)
                        print(f"Code for {file_path} generated/modified.")
                    except Exception as e:
                        print(f"Error during code generation: {e}. Attempting to reflect.")
                        all_tests_passed = False # 即使是编码错误，也视为未通过，进入反思
                        # 这里可以构建一个模拟的执行结果，让反思器处理编码本身的错误
                        reflection_result = reflect_on_execution(
                            self.llm_client, 
                            json.dumps(step), 
                            self.current_files_state, 
                            {"stdout": "", "stderr": f"Code generation failed: {e}", "exit_code": 1, "test_passed": False, "test_report": {}}
                        )
                        # 将反思结果应用到当前文件状态，重新尝试
                        self._apply_reflection_changes(reflection_result)
                        break # 跳出当前step循环，重新进入下一次迭代

                    # 4. 执行阶段 (每次编码后都立即测试，形成紧密循环)
                    print(f"[EXECUTING] Running tests...")
                    execution_results = self.executor.execute_tests(self.current_files_state)
                    print(f"Execution Results: Test Passed = {execution_results['test_passed']}")

                    if not execution_results["test_passed"]:
                        all_tests_passed = False
                        print("[REFLECTING] Tests failed. Analyzing errors...")
                        # 5. 反思阶段
                        reflection_result = reflect_on_execution(
                            self.llm_client, 
                            json.dumps(step), 
                            self.current_files_state, 
                            execution_results
                        )
                        print(f"Reflection Diagnosis: {reflection_result['diagnosis']}")
                        # 应用反思结果
                        self._apply_reflection_changes(reflection_result)
                        break # 跳出当前step循环，因为有错误，需要重新从规划或编码开始本轮迭代

                if all_tests_passed:
                    print("n[SUCCESS] All tests passed for this iteration!")
                    break # 所有测试通过，跳出迭代循环

                iteration += 1
                if iteration == max_iterations:
                    print(f"n[FAILURE] Max iterations ({max_iterations}) reached. Agent failed to complete the task.")
                    return False

            if all_tests_passed:
                # 6. 提交阶段
                print("n[COMMITTING] Preparing Pull Request...")
                pr_info = generate_pr_description(self.llm_client, self.original_request, self.current_plan, self.current_files_state)

                # 将最终的代码写入到实际仓库路径 (而不是executor的临时目录)
                for file_path, content in self.current_files_state.items():
                    full_path = os.path.join(self.repo_path, file_path)
                    os.makedirs(os.path.dirname(full_path), exist_ok=True)
                    with open(full_path, "w") as f:
                        f.write(content)

                files_to_commit = list(self.current_files_state.keys())
                self.committer.add_and_commit(files_to_commit, pr_info["title"])
                self.committer.push_branch(self.branch_name)
                pr_url = self.committer.create_pull_request(pr_info["title"], pr_info["body"], self.branch_name)
                print(f"Pull Request created: {pr_url}")
                print("n--- Agent Completed Successfully ---")
                return True
            else:
                print("n--- Agent Failed to Complete Task ---")
                return False

        except Exception as e:
            print(f"n[CRITICAL ERROR] Agent encountered a critical error: {e}")
            return False

    def _get_initial_project_context(self):
        """
        模拟获取项目上下文，例如文件列表、部分文件内容等。
        在真实Agent中，这会是一个RAG组件，从代码库中检索相关信息。
        """
        context_files = [
            "README.md",
            "requirements.txt",
            "src/__init__.py",
            "tests/__init__.py"
        ]
        context_str = "Project Structure:n"
        for root, dirs, files in os.walk(self.repo_path):
            for name in files:
                if ".git" not in root and "__pycache__" not in root:
                    context_str += f"- {os.path.relpath(os.path.join(root, name), self.repo_path)}n"

        # 可以在这里读取一些关键文件的内容
        # for file in context_files:
        #     try:
        #         with open(os.path.join(self.repo_path, file), 'r') as f:
        #             content = f.read(500) # 只读取前500字
        #             context_str += f"n--- {file} ---n{content}n...n"
        #     except FileNotFoundError:
        #         pass
        return context_str

    def _apply_reflection_changes(self, reflection_result):
        """
        根据反思器的建议更新Agent的内部文件状态或规划。
        """
        if reflection_result["code_changes"]:
            for change in reflection_result["code_changes"]:
                file_path = change["file_path"]
                new_content = change["new_content"]
                self._update_file_state(file_path, new_content)
                print(f"Applied code change to: {file_path}")

        if reflection_result["plan_adjustment_needed"]:
            print(f"Plan adjustment suggested: {reflection_result['plan_adjustment_suggestion']}")
            # 这里可以调用规划器重新生成或修改计划
            # self.current_plan = generate_plan(self.llm_client, reflection_result['plan_adjustment_suggestion'], self._get_initial_project_context())
            # 简化处理，暂时只打印建议，不实际修改plan

模拟运行步骤：

初始化：Agent被实例化，连接到LLM，并指定本地Git仓库路径。
用户请求：agent.run("添加一个reverse_string函数到utils/string_utils.py，并为其编写测试")
创建分支：Agent在本地Git仓库中创建一个新分支。
规划：规划器生成计划，例如：
- 创建 utils/string_utils.py 并实现 reverse_string。
- 创建 tests/test_string_utils.py 并编写测试。
循环迭代 (第一次)：
- 编码：编码器生成 utils/string_utils.py 的内容，并生成 tests/test_string_utils.py 的内容。
- 执行：执行器运行 pytest。
- 反思：
  - 情况A (完美情况)：所有测试通过。Agent直接进入提交阶段。
  - 情况B (常见情况)：测试失败。例如，reverse_string 函数有一个小Bug（可能忘记处理空字符串或非字符串输入）。反思器会解析错误日志，诊断出问题，并建议修改 reverse_string 函数。
循环迭代 (第二次，如果第一次失败)：
- 编码：编码器根据反思器的建议，修改 utils/string_utils.py 中的 reverse_string 函数。
- 执行：执行器再次运行 pytest。
- 反思：如果这次测试通过，Agent进入提交阶段。如果仍有错误，继续迭代。
提交：当所有测试通过后，Agent将最终的代码提交到新分支，推送至远程仓库，并使用LLM生成PR标题和描述，然后创建PR。

这个闭环系统展示了Agent如何从错误中学习，并自主地迭代改进代码，直至任务完成。

09. 挑战与未来展望

构建一个真正鲁棒的自主研发Agent，我们面临诸多挑战：

上下文管理与规模化：LLM的上下文窗口是有限的。对于大型、复杂的代码库，如何高效地检索、总结和注入相关上下文，避免“遗忘”或“幻觉”，是一个持续的挑战。RAG、分层记忆和智能缓存是关键。
非确定性与鲁棒性：LLM的输出具有一定的随机性。如何设计Agent使其能够容忍并从非确定性中恢复，例如通过多次尝试、验证输出、或使用更严格的Prompt工程。
成本与延迟：每次LLM调用和代码执行都需要时间和计算资源。优化Agent的决策过程，减少不必要的迭代，以及利用更高效的模型是必要的。
安全性与沙箱：Agent生成的代码和执行环境必须严格沙箱化，以防止恶意代码注入或对宿主系统造成破坏。
评估与可解释性：如何客观地评估Agent生成代码的质量、效率和安全性？当Agent出错时，如何追踪其决策路径并理解错误原因？
人类在环 (Human-in-the-Loop)：Agent的最终目标不是完全取代人类，而是赋能人类开发者。设计合适的交互点，例如PR评审、问题澄清、高层级指导，将是关键。
高级推理与跨模块协调：目前Agent擅长局部修复和功能添加。但对于需要跨多个模块、甚至跨项目进行复杂架构修改或重构的任务，其规划和执行能力仍需大幅提升。

展望未来，自主研发Agent将朝着更加智能、自主和协作的方向发展：

长期记忆与知识库：Agent将能够从过去的任务中学习，构建项目特定的知识库，从而提高解决问题的效率和质量。
多Agent协作：一个Agent可能负责前端，另一个负责后端，它们之间能够进行沟通和协作，共同完成复杂任务。
更深度的集成：与IDE、CI/CD流水线、项目管理工具的无缝集成，使其成为开发流程中不可或缺的一部分。
领域特定优化：针对特定编程语言、框架或业务领域进行优化，使其在该领域内表现出专家级的能力。

结语

我们今天所探讨的自主研发Agent，不仅仅是一个技术概念，它代表着软件开发范式的演进方向。通过将LLM的强大推理能力与工程化实践相结合，我们正逐步构建一个能够理解、行动、反思和自我修复的智能系统。这无疑将极大地提升开发效率，解放开发者，使他们能够专注于更具创造性和战略性的工作。虽然前方挑战重重，但每一次成功的闭环迭代，都在推动我们向着这个激动人心的未来更近一步。让我们共同期待并参与到这场软件开发的智能革命中来。

解析‘自主研发 Agent’：构建一个具备编写代码、运行测试、根据报错自我修复、最终提交 PR 的闭环系统

01. 自主研发Agent：愿景与核心理念

02. Agent的宏观架构：一个迭代的闭环系统

03. 核心组件深入：规划器 (The Planner)

04. 核心组件深入：编码器 (The Coder)

{file_path}

假设 `src/math_utils.py` 初始为空或只包含一些导入

current_math_utils_content = ""

file_path_math_utils, new_math_utils_code = generate_code(llm_client, plan["plan"][0], "src/math_utils.py", current_math_utils_content)

print(f"Generated {file_path_math_utils}:n{new_math_utils_code}")

假设 `tests/test_math_utils.py` 初始为空

current_test_utils_content = ""

file_path_test_utils, new_test_utils_code = generate_code(llm_client, plan["plan"][1], "tests/test_math_utils.py", current_test_utils_content)

print(f"Generated {file_path_test_utils}:n{new_test_utils_code}")

05. 核心组件深入：执行器 (The Executor)

06. 核心组件深入：反思器 (The Reflector)

07. 核心组件深入：提交器 (The Committer)

08. 闭环系统实战：一次完整的开发流程

09. 挑战与未来展望

结语

发表回复取消回复

01. 自主研发Agent：愿景与核心理念

02. Agent的宏观架构：一个迭代的闭环系统

03. 核心组件深入：规划器 (The Planner)

04. 核心组件深入：编码器 (The Coder)

{file_path}

假设 src/math_utils.py 初始为空或只包含一些导入

current_math_utils_content = ""

file_path_math_utils, new_math_utils_code = generate_code(llm_client, plan["plan"][0], "src/math_utils.py", current_math_utils_content)

print(f"Generated {file_path_math_utils}:n{new_math_utils_code}")

假设 tests/test_math_utils.py 初始为空

current_test_utils_content = ""

file_path_test_utils, new_test_utils_code = generate_code(llm_client, plan["plan"][1], "tests/test_math_utils.py", current_test_utils_content)

print(f"Generated {file_path_test_utils}:n{new_test_utils_code}")

05. 核心组件深入：执行器 (The Executor)

06. 核心组件深入：反思器 (The Reflector)

07. 核心组件深入：提交器 (The Committer)

08. 闭环系统实战：一次完整的开发流程

09. 挑战与未来展望

结语

发表回复 取消回复

假设 `src/math_utils.py` 初始为空或只包含一些导入

假设 `tests/test_math_utils.py` 初始为空

发表回复取消回复