Add global knowledge base with RAG search

- KB module: fastembed (AllMiniLML6V2) for CPU embedding, SQLite for vector storage with brute-force cosine similarity search - Chunking by ## headings, embeddings stored as BLOB in kb_chunks table - API: GET/PUT /api/kb for full-text read/write with auto re-indexing - Agent tools: kb_search (top-5 semantic search) and kb_read (full text) available in both planning and execution phases - Frontend: Settings menu in sidebar footer, KB editor as independent view with markdown textarea and save button - Also: extract shared db_err/ApiResult to api/mod.rs, add context management design doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 08:15:50 +00:00
parent 1aa81896b5
commit d9d3bc340c
19 changed files with 2283 additions and 53 deletions
--- a/doc/context.md
+++ b/doc/context.md
@@ -0,0 +1,54 @@
+# Context 管理现状与设计
+
+## 现状
+
+当前没有做 context 长度限制，存在超过 model token limit 的风险。
+
+### 已有的缓解机制
+
+1. **Phase transition 时 clear**：`step_messages` 在 planning→executing 和 step→step 切换时会 `clear()`，避免跨阶段累积
+2. **单条 tool output 截断**：bash 输出限制 8000 bytes，read_file 超长时也截断
+3. **Step context 摘要**：已完成步骤只保留 summary（`step_summaries`），不带完整输出
+
+### 风险场景
+
+- 一个 execution step 内 tool call 轮次过多（反复 bash、read_file），`step_messages` 无限增长
+- 每轮 LLM 的 assistant message + tool result 都 push 进 `step_messages`，没有上限
+- 最终整个 messages 数组超过模型 context window
+
+## 方案设计
+
+### 策略：滑动窗口 + 早期消息摘要
+
+当 `step_messages` 长度超过阈值时，保留最近 N 轮完整对话，早期的 tool call/result 对折叠为一条摘要消息。
+
+```
+[system prompt]
+[user: step context]
+[summary of early tool interactions]  ← 压缩后的历史
+[recent assistant + tool messages]    ← 完整保留最近 N 轮
+```
+
+### 具体实现
+
+1. **Token 估算**：用字符数粗估（1 token ≈ 3-4 chars 中英混合），不需要精确 tokenizer
+2. **阈值**：可配置，默认如 80000 chars（约 20k-25k tokens），给 system prompt 和 response 留余量
+3. **压缩触发**：每次构建 messages 时检查总长度，超过阈值则压缩
+4. **压缩方式**：
+   - 简单版：直接丢弃早期 tool call/result 对，替换为 `[已执行 N 次工具调用，最近结果见下文]`
+   - 进阶版：用 LLM 生成摘要（额外一次 API 调用，但质量更好）
+5. **不压缩的部分**：system prompt、user context、最近 2-3 轮完整交互
+
+### 实现位置
+
+在 `run_agent_loop` 中构建 messages 之后、调用 LLM 之前，插入压缩逻辑：
+
+```rust
+// agent.rs run_agent_loop 内，约 L706-L725
+let (mut messages, tools) = match &state.phase { ... };
+
+// 压缩 context
+compact_messages(&mut messages, MAX_CONTEXT_CHARS);
+```
+
+`compact_messages` 函数：从前往后扫描，保留 system/user 头部，计算总长度，超限时将早期 assistant+tool 消息替换为摘要。