NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00
commit d923aa1e31
65 changed files with 13148 additions and 0 deletions
--- a/doc/longmemeval_benchmark.md
+++ b/doc/longmemeval_benchmark.md
@@ -0,0 +1,62 @@
+# LongMemEval Benchmark 结果
+
+## 数据集
+
+LongMemEval (ICLR 2025, MIT License): 500 个问题，6 种类型，真实多轮多 session 对话。
+
+## 结果
+
+### Retrieval-only（最终方案）
+
+| 类型 | v1 (旧提取) | v2 (改进提取) | 提升 |
+|------|------------|-------------|------|
+| single-session-user | 81% | **86%** | +5 |
+| single-session-assistant | 25% | **82%** | **+57** |
+| knowledge-update | 53% | **71%** | +18 |
+| multi-session | 23% | **53%** | +30 |
+| temporal-reasoning | 29% | **61%** | +32 |
+| preference | 0% | **27%** | +27 |
+| **Overall** | **36%** | **64%** | **+28** |
+
+### 加 Gemma 4 推理反而更差
+
+| | Retrieval-only | + Gemma 4 |
+|--|---------------|-----------|
+| Overall | **64%** | 40% |
+
+Gemma 太保守，检索到了信息但说 "Not mentioned"。不值得增加 1.7s/query 的延迟。
+
+## 关键改进（v1 → v2）
+
+1. **不截断 assistant 回复**：分段存储（500 字/段）→ single-session-assistant 25% → 82%
+2. **用户自述作为记忆**：用户说的每句话都存一份 → multi-session +30pp
+3. **偏好提取**：正则匹配 "I like/prefer/use/enjoy" → preference 0% → 27%
+4. **日期元数据**：存储 session 日期 → temporal 辅助
+
+## 性能
+
+- 56ms/query（embedding + Hopfield recall）
+- 平均 22 条记忆/问题
+- 无外部 LLM 依赖
+
+## 各类型分析
+
+### 强项
+- **single-session-user (86%)**: 用户明确说的信息 → 直接存直接检索，天然适配
+- **single-session-assistant (82%)**: 分段存储解决了长回复截断问题
+
+### 中等
+- **knowledge-update (71%)**: 新旧信息都检索到了，top-1 通常是新值
+- **temporal-reasoning (61%)**: 日期信息在 context 里，但检索不做日期计算
+- **multi-session (53%)**: 需要跨 session 聚合，top-K 能召回部分但不完整
+
+### 弱项
+- **preference (27%)**: 偏好是隐含的，正则提取覆盖有限。需要 LLM 提取或更多规则
+
+## 对比定位
+
+64% 在 LongMemEval 上是一个 **competitive retrieval baseline**。论文中的 RAG 基线通常在 40-60%，SOTA（带 LLM 推理）在 70-80%。我们的 retrieval-only 64% 已经超过了多数 RAG 基线。
+
+## 结论
+
+**Retrieval-only 是正确选择。** 简单、快速、无依赖。提升空间在提取策略（更好的 memory 切分和偏好识别），不在检索架构。