NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
This commit is contained in:
61
doc/exp05_benchmark.md
Normal file
61
doc/exp05_benchmark.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# 实验5:性能 Benchmark
|
||||
|
||||
## 学习吞吐量
|
||||
|
||||
| code_dim | k | 吞吐量 | 5000条耗时 |
|
||||
|----------|---|--------|-----------|
|
||||
| 8192 | 50 | **794/s** | 6.3s |
|
||||
| 16384 | 50 | 211/s | 23.7s |
|
||||
| 32768 | 50 | 54/s | 92.7s |
|
||||
|
||||
瓶颈是 outer-product 更新:O(code_dim²) per memory。
|
||||
16384 维的 211/s 意味着一天的对话(假设 1000 条记忆)只需 ~5 秒。
|
||||
|
||||
## 召回延迟
|
||||
|
||||
| code_dim | k | 延迟 |
|
||||
|----------|---|------|
|
||||
| 8192 | 50 | **0.35 ms** |
|
||||
| 16384 | 50 | 1.26 ms |
|
||||
| 32768 | 50 | 4.63 ms |
|
||||
|
||||
**16384 维:1.3ms/query**——对 LLM 对话场景完全够快(LLM 生成一个 token 都要 ~20ms)。
|
||||
|
||||
## Multi-hop 延迟
|
||||
|
||||
| 跳数 | 延迟 (code=16384) |
|
||||
|------|-------------------|
|
||||
| 1 | 1.26 ms |
|
||||
| 2 | 2.45 ms |
|
||||
| 3 | 3.64 ms |
|
||||
| 5 | 6.03 ms |
|
||||
| 10 | 12.05 ms |
|
||||
|
||||
线性增长:~1.2ms/hop。10 跳 12ms 仍然远快于 LLM inference。
|
||||
|
||||
## GPU 显存
|
||||
|
||||
| code_dim | W 矩阵 | 总占用 |
|
||||
|----------|---------|--------|
|
||||
| 4096 | 64 MB | 70 MB |
|
||||
| 8192 | 256 MB | 268 MB |
|
||||
| **16384** | **1024 MB** | **1048 MB** |
|
||||
| 32768 | 4096 MB | 4144 MB |
|
||||
|
||||
推荐 **16384 维 = 1GB 显存**,在 RTX 4090 (24GB) 上轻松和 Gemma 4B 共存。
|
||||
|
||||
## 端到端 Pipeline(含 embedding 模型)
|
||||
|
||||
| 步骤 | 延迟 |
|
||||
|------|------|
|
||||
| Embedding (all-MiniLM-L6-v2) | 1.8 ms |
|
||||
| Hebbian Recall (1-hop) | 1.3 ms |
|
||||
| **Total** | **3.1 ms** |
|
||||
|
||||
Embedding 和 recall 耗时相当。总计 3ms 远低于 LLM 生成延迟。
|
||||
|
||||
## 结论
|
||||
|
||||
- code_dim=16384 是最佳平衡点:1GB 显存,1.3ms 召回,211/s 学习
|
||||
- 性能完全不是瓶颈——LLM inference 才是
|
||||
- 32768 维如果需要更大容量也可以(4GB,但 learning 慢 4x)
|
||||
Reference in New Issue
Block a user