Files
nuonuo/doc/p6_multiturn.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

1.9 KiB
Raw Blame History

P6: 多轮对话验证

场景

3 天的对话DB troubleshooting → deployment → monitoring12 条记忆 + heuristic paraphrase augmentation。

跨会话召回12/12 (100%)

查询 跨天? 结果
DB is slow again Day 1 ✓ "missing index on created_at"
How big is the users table? Day 1 ✓ "2.3 million rows"
Who can access the database? Day 1 ✓ "Alice, Bob, Charlie"
What Postgres version? Day 1 ✓ "PostgreSQL 15.2"
How to deploy? Day 2 ✓ "blue-green via GitHub Actions"
How to rollback? Day 2 ✓ "switch load balancer"
Who approves deploys? Day 2 ✓ "Alice or David"
Monitoring dashboard? Day 3 ✓ "grafana.internal"
What alerts? Day 3 ✓ "PagerDuty"
DB slow, what index? Cross ✓ "created_at"
Deploy logs? Cross ✓ "Loki"
Database monitoring exporter Cross ✓ "pg_exporter"

全部 similarity=1.0。Hopfield + augmentation 在小规模12 memories下完美。

Multi-hop

"database is slow" → hop1: "missing index" → hop2: "missing index" → hop3: "2.3 million rows"

hop2 循环了(指回自己),因为 Hebbian W 里 "missing index" 的最强关联还是它自己(自身的 outer product 贡献最大)。需要在 multi-hop 中加去重:已访问的 memory 不参与下一跳。

Memory 冲突

存了两个版本的 PostgreSQL 版本15.2 和 16.1

  • Top-1: "Upgraded to 16.1" (sim=1.0) ← 更新的版本排第一
  • Top-2: "version 15.2" (sim=0.0) ← 旧版本也返回了

当前行为可接受(都返回,新的排前面)。更好的做法:

  • 检测到同 cue 的更新 → 自动替换旧记忆
  • 或标记旧记忆为 "superseded"

待改进

  1. Multi-hop 去重: 已访问的 memory 排除出下一跳候选
  2. Memory update 检测: 同 cue 新值自动覆盖旧值
  3. 大规模验证: 12 条是小规模,需要 100+ 条跨 session 的测试