NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
This commit is contained in:
129
doc/architecture.md
Normal file
129
doc/architecture.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# NuoNuo: Hippocampal Memory Module — Architecture v2
|
||||
|
||||
## 项目目标
|
||||
|
||||
为 LLM(如 Gemma 4)添加一个类海马体的长期记忆模块:
|
||||
- 不使用传统 RAG(向量数据库 + 检索)
|
||||
- 记忆存储在网络权重(Hebbian)和显式模式(Hopfield)中
|
||||
- 支持 paraphrase 容忍的模糊检索
|
||||
- 支持多跳联想推理(A→B→C,RAG 做不到)
|
||||
- 每晚可整合/遗忘
|
||||
|
||||
## 核心架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Query Embedding (from Sentence Transformer) │
|
||||
│ ↓ │
|
||||
│ ┌──── Stage 1: NN Pre-filter ────────────────────────┐ │
|
||||
│ │ cosine(query, stored_cues) → top-20 candidates │ │
|
||||
│ │ O(N) brute force, O(log N) with FAISS │ │
|
||||
│ └─────────────────────┬──────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──── Stage 2: Hopfield Settle ──────────────────────┐ │
|
||||
│ │ softmax(β · query @ candidates^T) → attention │ │
|
||||
│ │ Iterate 3 steps → converge to nearest attractor │ │
|
||||
│ │ Aggregate attention by memory_id (cue variants) │ │
|
||||
│ └─────────────────────┬──────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──── Optional: Multi-hop Hebbian Chain ─────────────┐ │
|
||||
│ │ Settled cue → WTA code → W @ code → next target │ │
|
||||
│ │ Repeat for N hops (A → B → C → ...) │ │
|
||||
│ └─────────────────────┬──────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ Retrieved memories │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 生物学类比
|
||||
|
||||
| 大脑区域 | 系统组件 | 功能 |
|
||||
|----------|----------|------|
|
||||
| 嗅内皮层 (EC) | Sentence Transformer | 感知编码 |
|
||||
| 齿状回 (DG) | WTA Pattern Separation | 稀疏化/正交化 |
|
||||
| CA3 | Hebbian W matrix | 联想存储 + 多跳 |
|
||||
| CA1 | Hopfield attention | 检索输出 |
|
||||
| 睡眠重播 | W rebuild | 整合/遗忘 |
|
||||
|
||||
## 实验验证总结
|
||||
|
||||
| 能力 | 验证结果 | 实验 |
|
||||
|------|----------|------|
|
||||
| Paraphrase recall (+ augmentation) | **95%** | exp07e |
|
||||
| Multi-hop (3 hops, 500 bg) | **100%** (sim=1.0) | exp07b, 07c |
|
||||
| Scale (20K memories) | **80%** | exp07d |
|
||||
| Exact cue recall | **100%** | exp02c |
|
||||
| Memory capacity | **20K+** | exp02d |
|
||||
| Recall latency | **4ms** @ 20K | exp05, 07d |
|
||||
| SNN encoder roundtrip | **CosSim 0.99** | exp01b |
|
||||
|
||||
## 参数推荐
|
||||
|
||||
| 参数 | 值 | 备注 |
|
||||
|------|-----|------|
|
||||
| embed_dim | 384-768 | 取决于 Sentence Transformer |
|
||||
| code_dim | 16384 | Hebbian 容量 20K+ |
|
||||
| k (WTA) | 50 | 平衡噪声容忍和容量 |
|
||||
| β (Hopfield) | 16.0 | 中等锐度 |
|
||||
| hopfield_top_k | 20 | 候选集大小,越小越稳 |
|
||||
| hopfield_steps | 3 | 收敛迭代次数 |
|
||||
| cue_variants | 3-5 per memory | LLM 生成 paraphrase |
|
||||
|
||||
## VRAM 预算 (RTX 4090, 24GB)
|
||||
|
||||
| 组件 | 大小 |
|
||||
|------|------|
|
||||
| Hebbian W (16384²) | 1024 MB |
|
||||
| WTA projection (384×16384) | 24 MB |
|
||||
| Hopfield store (20K × 384 × 2) | ~60 MB |
|
||||
| Sentence Transformer | ~90 MB |
|
||||
| Gemma 4B (fp16) | ~8 GB |
|
||||
| **Total** | **~9.2 GB** |
|
||||
| **Headroom** | **~14.8 GB** |
|
||||
|
||||
## 与 Gemma 集成
|
||||
|
||||
推荐方案:**Context Injection**
|
||||
|
||||
```python
|
||||
# 1. User input → embed
|
||||
query_emb = encoder.encode(user_input)
|
||||
|
||||
# 2. Recall memories
|
||||
results = memory.recall(query_emb, top_k=3)
|
||||
chain = memory.recall_chain(query_emb, hops=2)
|
||||
|
||||
# 3. Format and inject
|
||||
context = format_memories(results + chain)
|
||||
prompt = f"[Recalled memories]\n{context}\n\n[User]\n{user_input}"
|
||||
|
||||
# 4. Generate response
|
||||
response = gemma.generate(prompt)
|
||||
|
||||
# 5. Store new memory (with LLM-generated paraphrases)
|
||||
paraphrases = gemma.generate(f"Generate 3 paraphrases of: {user_input}")
|
||||
memory.store(query_emb, response_emb,
|
||||
cue_variants=[encoder.encode(p) for p in paraphrases])
|
||||
```
|
||||
|
||||
## 文件结构
|
||||
|
||||
```
|
||||
src/nuonuo/
|
||||
├── hippocampus.py # 最终模块 v2 (Hopfield + Hebbian hybrid)
|
||||
├── encoder.py # SNN spike encoder/decoder
|
||||
├── memory.py # STDP + Hebbian memory (historical)
|
||||
├── consolidation.py # Sleep consolidation (historical)
|
||||
└── __init__.py
|
||||
|
||||
doc/
|
||||
├── architecture.md # 本文件
|
||||
├── findings.md # 核心发现与反直觉结论
|
||||
├── exp01_*.md # SNN Encoder
|
||||
├── exp02_*.md # Associative Recall
|
||||
├── exp03_*.md # Consolidation
|
||||
├── exp04_*.md # Real Embeddings
|
||||
├── exp05_*.md # Benchmarks
|
||||
├── exp06_*.md # BioHash
|
||||
└── exp07_*.md # Hopfield (突破)
|
||||
```
|
||||
Reference in New Issue
Block a user