NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00
commit d923aa1e31
65 changed files with 13148 additions and 0 deletions
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -0,0 +1,129 @@
+# NuoNuo: Hippocampal Memory Module — Architecture v2
+
+## 项目目标
+
+为 LLM（如 Gemma 4）添加一个类海马体的长期记忆模块：
+- 不使用传统 RAG（向量数据库 + 检索）
+- 记忆存储在网络权重（Hebbian）和显式模式（Hopfield）中
+- 支持 paraphrase 容忍的模糊检索
+- 支持多跳联想推理（A→B→C，RAG 做不到）
+- 每晚可整合/遗忘
+
+## 核心架构
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  Query Embedding (from Sentence Transformer)             │
+│                    ↓                                     │
+│  ┌──── Stage 1: NN Pre-filter ────────────────────────┐ │
+│  │  cosine(query, stored_cues) → top-20 candidates     │ │
+│  │  O(N) brute force, O(log N) with FAISS              │ │
+│  └─────────────────────┬──────────────────────────────┘ │
+│                        ↓                                 │
+│  ┌──── Stage 2: Hopfield Settle ──────────────────────┐ │
+│  │  softmax(β · query @ candidates^T) → attention       │ │
+│  │  Iterate 3 steps → converge to nearest attractor     │ │
+│  │  Aggregate attention by memory_id (cue variants)     │ │
+│  └─────────────────────┬──────────────────────────────┘ │
+│                        ↓                                 │
+│  ┌──── Optional: Multi-hop Hebbian Chain ─────────────┐ │
+│  │  Settled cue → WTA code → W @ code → next target     │ │
+│  │  Repeat for N hops (A → B → C → ...)                 │ │
+│  └─────────────────────┬──────────────────────────────┘ │
+│                        ↓                                 │
+│               Retrieved memories                          │
+└─────────────────────────────────────────────────────────┘
+```
+
+## 生物学类比
+
+| 大脑区域 | 系统组件 | 功能 |
+|----------|----------|------|
+| 嗅内皮层 (EC) | Sentence Transformer | 感知编码 |
+| 齿状回 (DG) | WTA Pattern Separation | 稀疏化/正交化 |
+| CA3 | Hebbian W matrix | 联想存储 + 多跳 |
+| CA1 | Hopfield attention | 检索输出 |
+| 睡眠重播 | W rebuild | 整合/遗忘 |
+
+## 实验验证总结
+
+| 能力 | 验证结果 | 实验 |
+|------|----------|------|
+| Paraphrase recall (+ augmentation) | **95%** | exp07e |
+| Multi-hop (3 hops, 500 bg) | **100%** (sim=1.0) | exp07b, 07c |
+| Scale (20K memories) | **80%** | exp07d |
+| Exact cue recall | **100%** | exp02c |
+| Memory capacity | **20K+** | exp02d |
+| Recall latency | **4ms** @ 20K | exp05, 07d |
+| SNN encoder roundtrip | **CosSim 0.99** | exp01b |
+
+## 参数推荐
+
+| 参数 | 值 | 备注 |
+|------|-----|------|
+| embed_dim | 384-768 | 取决于 Sentence Transformer |
+| code_dim | 16384 | Hebbian 容量 20K+ |
+| k (WTA) | 50 | 平衡噪声容忍和容量 |
+| β (Hopfield) | 16.0 | 中等锐度 |
+| hopfield_top_k | 20 | 候选集大小，越小越稳 |
+| hopfield_steps | 3 | 收敛迭代次数 |
+| cue_variants | 3-5 per memory | LLM 生成 paraphrase |
+
+## VRAM 预算 (RTX 4090, 24GB)
+
+| 组件 | 大小 |
+|------|------|
+| Hebbian W (16384²) | 1024 MB |
+| WTA projection (384×16384) | 24 MB |
+| Hopfield store (20K × 384 × 2) | ~60 MB |
+| Sentence Transformer | ~90 MB |
+| Gemma 4B (fp16) | ~8 GB |
+| **Total** | **~9.2 GB** |
+| **Headroom** | **~14.8 GB** |
+
+## 与 Gemma 集成
+
+推荐方案：**Context Injection**
+
+```python
+# 1. User input → embed
+query_emb = encoder.encode(user_input)
+
+# 2. Recall memories
+results = memory.recall(query_emb, top_k=3)
+chain = memory.recall_chain(query_emb, hops=2)
+
+# 3. Format and inject
+context = format_memories(results + chain)
+prompt = f"[Recalled memories]\n{context}\n\n[User]\n{user_input}"
+
+# 4. Generate response
+response = gemma.generate(prompt)
+
+# 5. Store new memory (with LLM-generated paraphrases)
+paraphrases = gemma.generate(f"Generate 3 paraphrases of: {user_input}")
+memory.store(query_emb, response_emb,
+             cue_variants=[encoder.encode(p) for p in paraphrases])
+```
+
+## 文件结构
+
+```
+src/nuonuo/
+├── hippocampus.py    # 最终模块 v2 (Hopfield + Hebbian hybrid)
+├── encoder.py        # SNN spike encoder/decoder
+├── memory.py         # STDP + Hebbian memory (historical)
+├── consolidation.py  # Sleep consolidation (historical)
+└── __init__.py
+
+doc/
+├── architecture.md   # 本文件
+├── findings.md       # 核心发现与反直觉结论
+├── exp01_*.md        # SNN Encoder
+├── exp02_*.md        # Associative Recall
+├── exp03_*.md        # Consolidation
+├── exp04_*.md        # Real Embeddings
+├── exp05_*.md        # Benchmarks
+├── exp06_*.md        # BioHash
+└── exp07_*.md        # Hopfield (突破)
+```