2 Commits

Author SHA1 Message Date
Fam Zheng
ec1bd7cb25 add gen_voice tool, message timestamps, image multimodal, group chat, whisper STT
- gen_voice: IndexTTS2 voice cloning via tools/gen_voice script, ref audio
  cached on server to avoid re-upload
- Message timestamps: created_at column in messages table, prepended to
  content in API calls so LLM sees message times
- Image understanding: photos converted to base64 multimodal content
  for vision-capable models
- Group chat: independent session contexts per chat_id, sendMessageDraft
  disabled in groups (private chat only)
- Voice transcription: whisper service integration, transcribed text
  injected as [语音消息] prefix
- Integration tests marked #[ignore] (require external services)
- Reference voice asset: assets/ref_voice.mp3
- .gitignore: target/, noc.service, config/state/db files
2026-04-09 20:12:15 +01:00
Fam Zheng
128f2481c0 add tool calling, SQLite persistence, group chat, image vision, voice transcription
Major features:
- OpenAI function calling with tool call loop (streaming SSE parsing)
- Built-in tools: spawn_agent (async claude -p), agent_status, kill_agent,
  update_scratch, send_file
- Script-based tool discovery: tools/ dir with --schema convention
- Feishu todo management script (tools/manage_todo)
- SQLite persistence: conversations, messages, config, scratch_area tables
- Sliding window context (100 msgs, slide 50, auto-summarize)
- Conversation summary generation via LLM on window slide
- Group chat support with independent session contexts
- Image understanding: multimodal vision input (base64 to API)
- Voice transcription via faster-whisper Docker service
- Configurable persona stored in DB
- diag command for session diagnostics
- System prompt restructured: persona + tool instructions separated
- RUST_BACKTRACE=1 in service, clippy in deploy pipeline
- .gitignore for config/state/db files
2026-04-09 16:38:28 +01:00