feat: step isolation — each step runs in independent sub-loop
Main loop becomes a coordinator that reviews step summaries and may revise the plan. Each step gets its own chat history and scratchpad, preventing context pollution across steps. - Add run_step_loop with 50-iteration limit and isolated context - Replace advance_step with step_done (sub-loop only) - Add coordinator review after each step completion - Add scratchpad 8K capacity check - Add 33 unit tests for state, tools, and message building
This commit is contained in:
973
src/agent.rs
973
src/agent.rs
File diff suppressed because it is too large
Load Diff
@@ -1,25 +1,27 @@
|
|||||||
你是一个 AI 智能体,正处于【执行阶段】。请专注完成当前步骤的任务。
|
你是一个 AI 智能体的协调者,正处于【执行阶段】。每个步骤由独立的子执行器完成,你负责审视结果并协调整体进度。
|
||||||
|
|
||||||
可用工具:
|
## 你的角色
|
||||||
- execute:执行 shell 命令
|
|
||||||
- read_file / write_file / list_files:文件操作
|
|
||||||
- start_service / stop_service:管理后台服务
|
|
||||||
- update_requirement:更新项目需求
|
|
||||||
- advance_step:完成当前步骤并进入下一步(必须提供摘要)
|
|
||||||
- update_scratchpad:保存跨步骤持久化的关键信息
|
|
||||||
|
|
||||||
工作流程:
|
- 审视每个步骤的执行摘要
|
||||||
1. 阅读下方的「当前步骤」描述
|
- 根据执行结果决定:继续下一步、修改后续计划、或终止执行
|
||||||
2. 使用工具执行所需操作
|
- 维护全局备忘录,记录跨步骤的关键信息
|
||||||
3. 完成后调用 advance_step(summary=...) 推进到下一步
|
|
||||||
4. 最后一步完成后,直接回复简要总结(不调用工具)即可结束
|
## 可用工具
|
||||||
|
|
||||||
|
- update_plan:修改执行计划(提供完整步骤列表,系统自动 diff)
|
||||||
|
- update_scratchpad:更新全局备忘录(跨步骤持久化的关键信息)
|
||||||
|
- update_requirement:更新项目需求描述
|
||||||
|
|
||||||
|
## 工作流程
|
||||||
|
|
||||||
|
当你收到步骤执行摘要时:
|
||||||
|
1. 审视摘要,判断步骤是否成功完成了预期目标
|
||||||
|
2. 如需调整后续计划,使用 update_plan
|
||||||
|
3. 如无需调整,回复确认继续(不调用工具即可)
|
||||||
|
|
||||||
环境信息:
|
环境信息:
|
||||||
- 工作目录是独立的项目工作区,Python venv 已预先激活(.venv/)
|
- 工作目录是独立的项目工作区
|
||||||
- 使用 `uv add <包名>` 或 `pip install <包名>` 安装依赖
|
|
||||||
- 静态文件访问:/api/projects/{project_id}/files/{filename}
|
- 静态文件访问:/api/projects/{project_id}/files/{filename}
|
||||||
- 后台服务访问:/api/projects/{project_id}/app/(启动命令需监听 0.0.0.0:$PORT)
|
- 后台服务访问:/api/projects/{project_id}/app/
|
||||||
- 【重要】应用通过反向代理访问,前端 HTML/JS 中的 fetch/XHR 请求必须使用相对路径(如 fetch('todos')),绝对不能用 / 开头的路径(如 fetch('/todos')),否则会 404
|
|
||||||
- 知识库工具:kb_search(query) 搜索相关片段,kb_read() 读取全文
|
|
||||||
|
|
||||||
请使用中文回复。
|
请使用中文回复。
|
||||||
|
|||||||
34
src/prompts/step_execution.md
Normal file
34
src/prompts/step_execution.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
你是一个步骤执行者,负责完成当前分配给你的步骤。
|
||||||
|
|
||||||
|
## 可用工具
|
||||||
|
|
||||||
|
- execute:执行 shell 命令
|
||||||
|
- read_file / write_file / list_files:文件操作
|
||||||
|
- start_service / stop_service:管理后台服务
|
||||||
|
- kb_search / kb_read:搜索和读取知识库
|
||||||
|
- update_scratchpad:记录本步骤内的中间状态(步骤结束后丢弃,精华写进 summary)
|
||||||
|
- wait_for_approval:暂停执行等待用户确认
|
||||||
|
- step_done:**完成当前步骤时必须调用**,提供本步骤的工作摘要
|
||||||
|
|
||||||
|
## 工作流程
|
||||||
|
|
||||||
|
1. 阅读当前步骤的描述和上下文
|
||||||
|
2. 使用工具执行所需操作
|
||||||
|
3. 完成后调用 step_done(summary=...) 汇报结果
|
||||||
|
|
||||||
|
## 规则
|
||||||
|
|
||||||
|
- **专注当前步骤**,不做超出范围的事
|
||||||
|
- 完成后**必须**调用 step_done(summary),summary 应简洁概括本步骤做了什么、结果如何
|
||||||
|
- 需要用户确认时使用 wait_for_approval(reason)
|
||||||
|
- update_scratchpad 用于记录本步骤内的中间状态,是工作记忆而非日志,只保留当前有用的信息
|
||||||
|
|
||||||
|
## 环境信息
|
||||||
|
|
||||||
|
- 工作目录是独立的项目工作区,Python venv 已预先激活(.venv/)
|
||||||
|
- 使用 `uv add <包名>` 或 `pip install <包名>` 安装依赖
|
||||||
|
- 静态文件访问:/api/projects/{project_id}/files/{filename}
|
||||||
|
- 后台服务访问:/api/projects/{project_id}/app/(启动命令需监听 0.0.0.0:$PORT)
|
||||||
|
- 【重要】应用通过反向代理访问,前端 HTML/JS 中的 fetch/XHR 请求必须使用相对路径(如 fetch('todos')),绝对不能用 / 开头的路径(如 fetch('/todos')),否则会 404
|
||||||
|
|
||||||
|
请使用中文回复。
|
||||||
339
src/state.rs
339
src/state.rs
@@ -2,6 +2,36 @@ use serde::{Deserialize, Serialize};
|
|||||||
|
|
||||||
use crate::llm::ChatMessage;
|
use crate::llm::ChatMessage;
|
||||||
|
|
||||||
|
// --- Step result (returned by run_step_loop) ---
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct StepResult {
|
||||||
|
pub status: StepResultStatus,
|
||||||
|
pub summary: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub enum StepResultStatus {
|
||||||
|
Done,
|
||||||
|
Failed { error: String },
|
||||||
|
NeedsApproval { message: String },
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check scratchpad size. Limit: ~8K tokens ≈ 24K bytes.
|
||||||
|
const SCRATCHPAD_MAX_BYTES: usize = 24_000;
|
||||||
|
|
||||||
|
pub fn check_scratchpad_size(content: &str) -> Result<(), String> {
|
||||||
|
if content.len() > SCRATCHPAD_MAX_BYTES {
|
||||||
|
Err(format!(
|
||||||
|
"Scratchpad 超出容量限制(当前 {} 字节,上限 {} 字节)。请精简内容后重试。",
|
||||||
|
content.len(),
|
||||||
|
SCRATCHPAD_MAX_BYTES,
|
||||||
|
))
|
||||||
|
} else {
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// --- Agent phase state machine ---
|
// --- Agent phase state machine ---
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
@@ -205,3 +235,312 @@ impl AgentState {
|
|||||||
msgs
|
msgs
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
fn make_step(order: i32, title: &str, desc: &str, status: StepStatus) -> Step {
|
||||||
|
Step {
|
||||||
|
order,
|
||||||
|
title: title.into(),
|
||||||
|
description: desc.into(),
|
||||||
|
status,
|
||||||
|
summary: None,
|
||||||
|
user_feedbacks: Vec::new(),
|
||||||
|
db_id: String::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- check_scratchpad_size ---
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn scratchpad_empty_ok() {
|
||||||
|
assert!(check_scratchpad_size("").is_ok());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn scratchpad_under_limit_ok() {
|
||||||
|
let content = "a".repeat(24_000);
|
||||||
|
assert!(check_scratchpad_size(&content).is_ok());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn scratchpad_over_limit_err() {
|
||||||
|
let content = "a".repeat(24_001);
|
||||||
|
let err = check_scratchpad_size(&content).unwrap_err();
|
||||||
|
assert!(err.contains("24001"));
|
||||||
|
assert!(err.contains("24000"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn scratchpad_exactly_at_limit() {
|
||||||
|
let content = "a".repeat(SCRATCHPAD_MAX_BYTES);
|
||||||
|
assert!(check_scratchpad_size(&content).is_ok());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn scratchpad_multibyte_counts_bytes_not_chars() {
|
||||||
|
// 8000 个中文字 = 24000 bytes (UTF-8), exactly at limit
|
||||||
|
let content = "你".repeat(8000);
|
||||||
|
assert_eq!(content.len(), 24000);
|
||||||
|
assert!(check_scratchpad_size(&content).is_ok());
|
||||||
|
|
||||||
|
// One more char pushes over
|
||||||
|
let content_over = format!("{}你", content);
|
||||||
|
assert!(check_scratchpad_size(&content_over).is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- first_actionable_step ---
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn first_actionable_all_done() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 1 },
|
||||||
|
steps: vec![
|
||||||
|
make_step(1, "A", "a", StepStatus::Done),
|
||||||
|
make_step(2, "B", "b", StepStatus::Done),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
assert_eq!(state.first_actionable_step(), None);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn first_actionable_skips_done() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 2 },
|
||||||
|
steps: vec![
|
||||||
|
make_step(1, "A", "a", StepStatus::Done),
|
||||||
|
make_step(2, "B", "b", StepStatus::Pending),
|
||||||
|
make_step(3, "C", "c", StepStatus::Pending),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
assert_eq!(state.first_actionable_step(), Some(2));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn first_actionable_finds_running() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 2 },
|
||||||
|
steps: vec![
|
||||||
|
make_step(1, "A", "a", StepStatus::Done),
|
||||||
|
make_step(2, "B", "b", StepStatus::Running),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
assert_eq!(state.first_actionable_step(), Some(2));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn first_actionable_finds_waiting_approval() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 1 },
|
||||||
|
steps: vec![
|
||||||
|
make_step(1, "A", "a", StepStatus::WaitingApproval),
|
||||||
|
make_step(2, "B", "b", StepStatus::Pending),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
assert_eq!(state.first_actionable_step(), Some(1));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn first_actionable_skips_failed() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 2 },
|
||||||
|
steps: vec![
|
||||||
|
make_step(1, "A", "a", StepStatus::Failed),
|
||||||
|
make_step(2, "B", "b", StepStatus::Pending),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
assert_eq!(state.first_actionable_step(), Some(2));
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- apply_plan_diff ---
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn plan_diff_identical_keeps_done() {
|
||||||
|
let mut state = AgentState::new();
|
||||||
|
state.steps = vec![
|
||||||
|
Step { status: StepStatus::Done, summary: Some("did A".into()),
|
||||||
|
..make_step(1, "A", "desc A", StepStatus::Done) },
|
||||||
|
make_step(2, "B", "desc B", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
|
||||||
|
let new_steps = vec![
|
||||||
|
make_step(1, "A", "desc A", StepStatus::Pending),
|
||||||
|
make_step(2, "B", "desc B", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
state.apply_plan_diff(new_steps);
|
||||||
|
|
||||||
|
assert!(matches!(state.steps[0].status, StepStatus::Done));
|
||||||
|
assert_eq!(state.steps[0].summary.as_deref(), Some("did A"));
|
||||||
|
assert!(matches!(state.steps[1].status, StepStatus::Pending));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn plan_diff_change_invalidates_from_mismatch() {
|
||||||
|
let mut state = AgentState::new();
|
||||||
|
state.steps = vec![
|
||||||
|
Step { status: StepStatus::Done, summary: Some("did A".into()),
|
||||||
|
..make_step(1, "A", "desc A", StepStatus::Done) },
|
||||||
|
Step { status: StepStatus::Done, summary: Some("did B".into()),
|
||||||
|
..make_step(2, "B", "desc B", StepStatus::Done) },
|
||||||
|
make_step(3, "C", "desc C", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
|
||||||
|
// Change step 2's description → invalidates 2 and 3
|
||||||
|
let new_steps = vec![
|
||||||
|
make_step(1, "A", "desc A", StepStatus::Pending),
|
||||||
|
make_step(2, "B", "desc B CHANGED", StepStatus::Pending),
|
||||||
|
make_step(3, "C", "desc C", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
state.apply_plan_diff(new_steps);
|
||||||
|
|
||||||
|
assert!(matches!(state.steps[0].status, StepStatus::Done)); // kept
|
||||||
|
assert!(matches!(state.steps[1].status, StepStatus::Pending)); // invalidated
|
||||||
|
assert!(state.steps[1].summary.is_none()); // summary cleared
|
||||||
|
assert!(matches!(state.steps[2].status, StepStatus::Pending)); // invalidated
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn plan_diff_add_new_steps() {
|
||||||
|
let mut state = AgentState::new();
|
||||||
|
state.steps = vec![
|
||||||
|
Step { status: StepStatus::Done, summary: Some("did A".into()),
|
||||||
|
..make_step(1, "A", "desc A", StepStatus::Done) },
|
||||||
|
];
|
||||||
|
|
||||||
|
let new_steps = vec![
|
||||||
|
make_step(1, "A", "desc A", StepStatus::Pending),
|
||||||
|
make_step(2, "New", "new step", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
state.apply_plan_diff(new_steps);
|
||||||
|
|
||||||
|
assert_eq!(state.steps.len(), 2);
|
||||||
|
assert!(matches!(state.steps[0].status, StepStatus::Done));
|
||||||
|
assert!(matches!(state.steps[1].status, StepStatus::Pending));
|
||||||
|
assert_eq!(state.steps[1].title, "New");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn plan_diff_remove_steps() {
|
||||||
|
let mut state = AgentState::new();
|
||||||
|
state.steps = vec![
|
||||||
|
Step { status: StepStatus::Done, summary: Some("did A".into()),
|
||||||
|
..make_step(1, "A", "desc A", StepStatus::Done) },
|
||||||
|
make_step(2, "B", "desc B", StepStatus::Pending),
|
||||||
|
make_step(3, "C", "desc C", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
|
||||||
|
// New plan only has 1 step (same as step 1)
|
||||||
|
let new_steps = vec![
|
||||||
|
make_step(1, "A", "desc A", StepStatus::Pending),
|
||||||
|
];
|
||||||
|
state.apply_plan_diff(new_steps);
|
||||||
|
|
||||||
|
assert_eq!(state.steps.len(), 1);
|
||||||
|
assert!(matches!(state.steps[0].status, StepStatus::Done));
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- build_step_context ---
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn step_context_includes_all_sections() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 2 },
|
||||||
|
steps: vec![
|
||||||
|
Step { status: StepStatus::Done, summary: Some("installed deps".into()),
|
||||||
|
..make_step(1, "Setup", "install deps", StepStatus::Done) },
|
||||||
|
make_step(2, "Build", "compile code", StepStatus::Running),
|
||||||
|
make_step(3, "Test", "run tests", StepStatus::Pending),
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: "key=value".into(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let ctx = state.build_step_context("Build a web app");
|
||||||
|
|
||||||
|
assert!(ctx.contains("## 需求\nBuild a web app"));
|
||||||
|
assert!(ctx.contains("## 计划概览"));
|
||||||
|
assert!(ctx.contains("1. Setup done"));
|
||||||
|
assert!(ctx.contains("2. Build >> current"));
|
||||||
|
assert!(ctx.contains("3. Test"));
|
||||||
|
assert!(ctx.contains("## 当前步骤(步骤 2)"));
|
||||||
|
assert!(ctx.contains("标题:Build"));
|
||||||
|
assert!(ctx.contains("描述:compile code"));
|
||||||
|
assert!(ctx.contains("## 已完成步骤摘要"));
|
||||||
|
assert!(ctx.contains("installed deps"));
|
||||||
|
assert!(ctx.contains("## 备忘录\nkey=value"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn step_context_user_feedback() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 1 },
|
||||||
|
steps: vec![
|
||||||
|
Step {
|
||||||
|
user_feedbacks: vec!["please use React".into()],
|
||||||
|
..make_step(1, "Setup", "setup project", StepStatus::Running)
|
||||||
|
},
|
||||||
|
],
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let ctx = state.build_step_context("Build app");
|
||||||
|
assert!(ctx.contains("用户反馈"));
|
||||||
|
assert!(ctx.contains("please use React"));
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- build_messages ---
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn build_messages_planning() {
|
||||||
|
let state = AgentState::new();
|
||||||
|
let msgs = state.build_messages("system prompt", "requirement text");
|
||||||
|
|
||||||
|
assert_eq!(msgs.len(), 2);
|
||||||
|
assert_eq!(msgs[0].role, "system");
|
||||||
|
assert_eq!(msgs[0].content.as_deref(), Some("system prompt"));
|
||||||
|
assert_eq!(msgs[1].role, "user");
|
||||||
|
assert_eq!(msgs[1].content.as_deref(), Some("requirement text"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn build_messages_executing_includes_history() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Executing { step: 1 },
|
||||||
|
steps: vec![make_step(1, "Do thing", "details", StepStatus::Running)],
|
||||||
|
current_step_chat_history: vec![
|
||||||
|
ChatMessage { role: "assistant".into(), content: Some("let me help".into()), tool_calls: None, tool_call_id: None },
|
||||||
|
],
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let msgs = state.build_messages("sys", "req");
|
||||||
|
assert_eq!(msgs.len(), 3); // system + user context + 1 history
|
||||||
|
assert_eq!(msgs[2].role, "assistant");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn build_messages_completed_minimal() {
|
||||||
|
let state = AgentState {
|
||||||
|
phase: AgentPhase::Completed,
|
||||||
|
steps: Vec::new(),
|
||||||
|
current_step_chat_history: Vec::new(),
|
||||||
|
scratchpad: String::new(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let msgs = state.build_messages("sys", "req");
|
||||||
|
assert_eq!(msgs.len(), 1); // only system
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user