Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
1.9 KiB
1.9 KiB
P6: 多轮对话验证
场景
3 天的对话(DB troubleshooting → deployment → monitoring),12 条记忆 + heuristic paraphrase augmentation。
跨会话召回:12/12 (100%)
| 查询 | 跨天? | 结果 |
|---|---|---|
| DB is slow again | Day 1 | ✓ "missing index on created_at" |
| How big is the users table? | Day 1 | ✓ "2.3 million rows" |
| Who can access the database? | Day 1 | ✓ "Alice, Bob, Charlie" |
| What Postgres version? | Day 1 | ✓ "PostgreSQL 15.2" |
| How to deploy? | Day 2 | ✓ "blue-green via GitHub Actions" |
| How to rollback? | Day 2 | ✓ "switch load balancer" |
| Who approves deploys? | Day 2 | ✓ "Alice or David" |
| Monitoring dashboard? | Day 3 | ✓ "grafana.internal" |
| What alerts? | Day 3 | ✓ "PagerDuty" |
| DB slow, what index? | Cross | ✓ "created_at" |
| Deploy logs? | Cross | ✓ "Loki" |
| Database monitoring exporter | Cross | ✓ "pg_exporter" |
全部 similarity=1.0。Hopfield + augmentation 在小规模(12 memories)下完美。
Multi-hop
"database is slow" → hop1: "missing index" → hop2: "missing index" → hop3: "2.3 million rows"
hop2 循环了(指回自己),因为 Hebbian W 里 "missing index" 的最强关联还是它自己(自身的 outer product 贡献最大)。需要在 multi-hop 中加去重:已访问的 memory 不参与下一跳。
Memory 冲突
存了两个版本的 PostgreSQL 版本(15.2 和 16.1):
- Top-1: "Upgraded to 16.1" (sim=1.0) ← 更新的版本排第一
- Top-2: "version 15.2" (sim=0.0) ← 旧版本也返回了
当前行为可接受(都返回,新的排前面)。更好的做法:
- 检测到同 cue 的更新 → 自动替换旧记忆
- 或标记旧记忆为 "superseded"
待改进
- Multi-hop 去重: 已访问的 memory 排除出下一跳候选
- Memory update 检测: 同 cue 新值自动覆盖旧值
- 大规模验证: 12 条是小规模,需要 100+ 条跨 session 的测试