add gen_voice tool, message timestamps, image multimodal, group chat, whisper STT

- gen_voice: IndexTTS2 voice cloning via tools/gen_voice script, ref audio cached on server to avoid re-upload - Message timestamps: created_at column in messages table, prepended to content in API calls so LLM sees message times - Image understanding: photos converted to base64 multimodal content for vision-capable models - Group chat: independent session contexts per chat_id, sendMessageDraft disabled in groups (private chat only) - Voice transcription: whisper service integration, transcribed text injected as [语音消息] prefix - Integration tests marked #[ignore] (require external services) - Reference voice asset: assets/ref_voice.mp3 - .gitignore: target/, noc.service, config/state/db files
2026-04-09 20:12:15 +01:00
parent 9d5dd4eb16
commit ec1bd7cb25
6 changed files with 370 additions and 54 deletions
--- a/doc/todo.md
+++ b/doc/todo.md
@@ -11,34 +11,40 @@
 - [ ] 情境感知：根据时间、地点、日历自动调整行为

 ### 记忆与成长
- [ ] 长期记忆 (MEMORY.md)：跨 session 的持久化记忆
- [ ] 语义搜索：基于 embedding 的记忆检索
+- [x] 持久记忆槽 (memory_slots)：100 个跨 session 的记忆槽位，注入 system prompt
+- [ ] AutoMem：后台定时（如每 10 条消息）自动分析对话，由 LLM 决定 SKIP/UPDATE/INSERT 记忆，无需用户手动触发（参考 luke）
+- [ ] 分层记忆：核心记忆（身份/原则，始终注入）+ 长期记忆（偏好/事实，RAG 检索）+ scratch（当前任务）（参考 xg 三层 + luke 四层架构）
+- [ ] 语义搜索：基于 embedding 的记忆检索（BGE-M3/Gemini embedding + Qdrant 向量库）
+- [ ] 记忆合并：新记忆与已有记忆 cosine ≥ 0.7 时，用 LLM 合并而非插入（参考 xg）
+- [ ] 二次联想召回：第一轮直接检索 → 用 top-K 结果做第二轮关联检索，去重后合并（参考 xg/luke 2-pass recall）
+- [ ] 时间衰减：记忆按时间指数衰减加权，近期记忆优先（参考 xg 30 天半衰期）
 - [ ] 自我反思：定期回顾对话质量，优化自己的行为

+### 知识图谱（参考 luke concept graph）
+- [ ] 概念图：Aho-Corasick 模式匹配用户消息中的关键概念，自动注入相关知识
+- [ ] update_concept tool：LLM 可动态添加/更新概念节点及关联关系
+- [ ] LRU 缓存：内存中保持热门概念，微秒级匹配
+
+### 工具系统
+- [x] spawn_agent（Claude Code 子代理）
+- [x] update_scratch / update_memory
+- [x] send_file / agent_status / kill_agent
+- [x] 外部脚本工具发现 (tools/ 目录)
+- [ ] run_code tool：安全沙箱执行 Python/Shell 代码，捕获输出返回（参考 luke run_python）
+- [ ] gen_image tool：调用图像生成 API（Gemini/FLUX/本地模型）
+- [ ] gen_voice tool：TTS 语音合成，发送语音消息（参考 luke Elevenlabs / xg Fish-Speech）
+- [ ] set_timer tool：LLM 可设置延迟/定时任务，到时触发回调（参考 luke timer 系统）
+- [ ] web_search tool：网页搜索 + 摘要，不必每次都 spawn 完整 agent
+
 ### 感知能力
- [x] 图片理解：multimodal vision input
- [ ] 语音转录：whisper API 转文字
- [ ] 屏幕/截图分析
 - [ ] 链接预览/摘要
+- [ ] 语音转文字 (STT)：接收语音消息后自动转写（当前 xg 用 FunASR，luke 用 Whisper）

 ### 交互体验
- [x] 群组支持：独立上下文
- [x] 流式输出：sendMessageDraft + editMessageText
- [x] Markdown 渲染
- [ ] Typing indicator
- [ ] Inline keyboard 交互
 - [ ] 语音回复 (TTS)
+- [ ] 流式分句发送：长回复按句号/问号断句分批发送，体验更自然
+- [ ] 多频道支持：同一 bot 核心逻辑支持 Telegram + WebSocket + HTTP（参考 luke MxN 多路复用架构）

-### 工具生态
- [x] 脚本工具发现 (tools/ + --schema)
- [x] 异步子代理 (spawn_agent)
- [x] 飞书待办管理
- [ ] Web search / fetch
- [ ] 更多脚本工具
- [ ] MCP 协议支持
-
-### 可靠性
- [ ] API 重试策略 (指数退避)
- [ ] 用量追踪
- [ ] Context pruning (只裁工具输出)
- [ ] Model failover
+### 上下文管理
+- [ ] 智能上下文分配：system prompt / 记忆 / 历史消息 / 工具输出各占比可配置，预留 60-70% 给工具输出（参考 luke 保守分配策略）
+- [ ] 对话历史滚动窗口优化：当前 100 条硬上限，可改为 token 预算制