Files

Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture

2026-04-07 10:37:24 +01:00

1.6 KiB

Raw Permalink Blame History

实验5：性能 Benchmark

学习吞吐量

code_dim	k	吞吐量	5000条耗时
8192	50	794/s	6.3s
16384	50	211/s	23.7s
32768	50	54/s	92.7s

瓶颈是 outer-product 更新：O(code_dim²) per memory。 16384 维的 211/s 意味着一天的对话（假设 1000 条记忆）只需 ~5 秒。

召回延迟

code_dim	k	延迟
8192	50	0.35 ms
16384	50	1.26 ms
32768	50	4.63 ms

16384 维：1.3ms/query——对 LLM 对话场景完全够快（LLM 生成一个 token 都要 ~20ms）。

Multi-hop 延迟

跳数	延迟 (code=16384)
1	1.26 ms
2	2.45 ms
3	3.64 ms
5	6.03 ms
10	12.05 ms

线性增长：~1.2ms/hop。10 跳 12ms 仍然远快于 LLM inference。

GPU 显存

code_dim	W 矩阵	总占用
4096	64 MB	70 MB
8192	256 MB	268 MB
16384	1024 MB	1048 MB
32768	4096 MB	4144 MB

推荐 16384 维 = 1GB 显存，在 RTX 4090 (24GB) 上轻松和 Gemma 4B 共存。

端到端 Pipeline（含 embedding 模型）

步骤	延迟
Embedding (all-MiniLM-L6-v2)	1.8 ms
Hebbian Recall (1-hop)	1.3 ms
Total	3.1 ms

Embedding 和 recall 耗时相当。总计 3ms 远低于 LLM 生成延迟。

结论

code_dim=16384 是最佳平衡点：1GB 显存，1.3ms 召回，211/s 学习
性能完全不是瓶颈——LLM inference 才是
32768 维如果需要更大容量也可以（4GB，但 learning 慢 4x）

1.6 KiB Raw Permalink Blame History Unescape Escape