Files
nuonuo/doc/architecture.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

130 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NuoNuo: Hippocampal Memory Module — Architecture v2
## 项目目标
为 LLM如 Gemma 4添加一个类海马体的长期记忆模块
- 不使用传统 RAG向量数据库 + 检索)
- 记忆存储在网络权重Hebbian和显式模式Hopfield
- 支持 paraphrase 容忍的模糊检索
- 支持多跳联想推理A→B→CRAG 做不到)
- 每晚可整合/遗忘
## 核心架构
```
┌─────────────────────────────────────────────────────────┐
│ Query Embedding (from Sentence Transformer) │
│ ↓ │
│ ┌──── Stage 1: NN Pre-filter ────────────────────────┐ │
│ │ cosine(query, stored_cues) → top-20 candidates │ │
│ │ O(N) brute force, O(log N) with FAISS │ │
│ └─────────────────────┬──────────────────────────────┘ │
│ ↓ │
│ ┌──── Stage 2: Hopfield Settle ──────────────────────┐ │
│ │ softmax(β · query @ candidates^T) → attention │ │
│ │ Iterate 3 steps → converge to nearest attractor │ │
│ │ Aggregate attention by memory_id (cue variants) │ │
│ └─────────────────────┬──────────────────────────────┘ │
│ ↓ │
│ ┌──── Optional: Multi-hop Hebbian Chain ─────────────┐ │
│ │ Settled cue → WTA code → W @ code → next target │ │
│ │ Repeat for N hops (A → B → C → ...) │ │
│ └─────────────────────┬──────────────────────────────┘ │
│ ↓ │
│ Retrieved memories │
└─────────────────────────────────────────────────────────┘
```
## 生物学类比
| 大脑区域 | 系统组件 | 功能 |
|----------|----------|------|
| 嗅内皮层 (EC) | Sentence Transformer | 感知编码 |
| 齿状回 (DG) | WTA Pattern Separation | 稀疏化/正交化 |
| CA3 | Hebbian W matrix | 联想存储 + 多跳 |
| CA1 | Hopfield attention | 检索输出 |
| 睡眠重播 | W rebuild | 整合/遗忘 |
## 实验验证总结
| 能力 | 验证结果 | 实验 |
|------|----------|------|
| Paraphrase recall (+ augmentation) | **95%** | exp07e |
| Multi-hop (3 hops, 500 bg) | **100%** (sim=1.0) | exp07b, 07c |
| Scale (20K memories) | **80%** | exp07d |
| Exact cue recall | **100%** | exp02c |
| Memory capacity | **20K+** | exp02d |
| Recall latency | **4ms** @ 20K | exp05, 07d |
| SNN encoder roundtrip | **CosSim 0.99** | exp01b |
## 参数推荐
| 参数 | 值 | 备注 |
|------|-----|------|
| embed_dim | 384-768 | 取决于 Sentence Transformer |
| code_dim | 16384 | Hebbian 容量 20K+ |
| k (WTA) | 50 | 平衡噪声容忍和容量 |
| β (Hopfield) | 16.0 | 中等锐度 |
| hopfield_top_k | 20 | 候选集大小,越小越稳 |
| hopfield_steps | 3 | 收敛迭代次数 |
| cue_variants | 3-5 per memory | LLM 生成 paraphrase |
## VRAM 预算 (RTX 4090, 24GB)
| 组件 | 大小 |
|------|------|
| Hebbian W (16384²) | 1024 MB |
| WTA projection (384×16384) | 24 MB |
| Hopfield store (20K × 384 × 2) | ~60 MB |
| Sentence Transformer | ~90 MB |
| Gemma 4B (fp16) | ~8 GB |
| **Total** | **~9.2 GB** |
| **Headroom** | **~14.8 GB** |
## 与 Gemma 集成
推荐方案:**Context Injection**
```python
# 1. User input → embed
query_emb = encoder.encode(user_input)
# 2. Recall memories
results = memory.recall(query_emb, top_k=3)
chain = memory.recall_chain(query_emb, hops=2)
# 3. Format and inject
context = format_memories(results + chain)
prompt = f"[Recalled memories]\n{context}\n\n[User]\n{user_input}"
# 4. Generate response
response = gemma.generate(prompt)
# 5. Store new memory (with LLM-generated paraphrases)
paraphrases = gemma.generate(f"Generate 3 paraphrases of: {user_input}")
memory.store(query_emb, response_emb,
cue_variants=[encoder.encode(p) for p in paraphrases])
```
## 文件结构
```
src/nuonuo/
├── hippocampus.py # 最终模块 v2 (Hopfield + Hebbian hybrid)
├── encoder.py # SNN spike encoder/decoder
├── memory.py # STDP + Hebbian memory (historical)
├── consolidation.py # Sleep consolidation (historical)
└── __init__.py
doc/
├── architecture.md # 本文件
├── findings.md # 核心发现与反直觉结论
├── exp01_*.md # SNN Encoder
├── exp02_*.md # Associative Recall
├── exp03_*.md # Consolidation
├── exp04_*.md # Real Embeddings
├── exp05_*.md # Benchmarks
├── exp06_*.md # BioHash
└── exp07_*.md # Hopfield (突破)
```