NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
This commit is contained in:
2026-04-07 10:37:24 +01:00
commit d923aa1e31
65 changed files with 13148 additions and 0 deletions

52
doc/p2_auto_paraphrase.md Normal file
View File

@@ -0,0 +1,52 @@
# P2: Auto Paraphrase Generation
## 核心数据
| 策略 | bg=0 | bg=500 | bg=2000 | 实现难度 |
|------|------|--------|---------|---------|
| None | 95% | 65% | 55% | - |
| Heuristic (synonym swap) | 95% | 85% | **75%** | 零成本 |
| Oracle (hard cases only) | 100% | 95% | **95%** | 需 LLM |
| Oracle (全覆盖) | 100% | 100% | **100%** | 需 LLM |
## 发现
1. **Heuristic 已经很有价值**55% → 75%+20pp不需要 LLM
2. **Oracle 全覆盖 = 100%**:证明问题完全可通过 paraphrase 解决
3. **大部分 failure 可被 paraphrase 修复**9 个 failure 中 8 个有 oracle fix
## Failure 分类
| 类型 | 例子 | 原因 | 修复方式 |
|------|------|------|---------|
| 词汇鸿沟 | "Ship the release" ↔ "deploy" (cos=0.46) | 完全不同的词 | LLM paraphrase ✓ |
| 概念映射 | "Need observability" ↔ "monitoring" (cos=0.26) | 抽象→具体 | LLM paraphrase ✓ |
| 领域知识 | "Fix login issue" ↔ "auth bug" (cos=0.65) | 需要知道 login=auth | LLM paraphrase ✓ |
| 竞争 | "DB terrible" ↔ "DB slow" (cos=0.72) 但被 bg 抢走 | cos 够高但 bg 更近 | 增加 augmentation 密度 |
## 实际部署策略
### 存储时(异步,不影响延迟)
```
1. 用户说了一句话
2. 提取 (cue, target)
3. 同步存原始 cue
4. 异步LLM 生成 3-5 个 paraphrase → 追加存入
```
### Heuristic fallbackLLM 不可用时)
当前 heuristic 规则已验证有效(+20pp可以作为 baseline
- 去除常见前缀 ("Can you", "I need to", "How do I")
- 同义词替换 (deploy↔release, database↔DB, fix↔resolve)
- 添加 "issue with X" 模式
### LLM Prompt待 Gateway 恢复后验证)
```
Generate 3-5 different ways a user might say this:
"The database is slow again"
Requirements:
- Same core meaning, different wording
- Include informal/colloquial versions
- Include technical jargon alternatives
```