Files

Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture

2026-04-07 10:37:24 +01:00

2.3 KiB

Raw Blame History

LongMemEval Benchmark 结果

数据集

LongMemEval (ICLR 2025, MIT License): 500 个问题，6 种类型，真实多轮多 session 对话。

结果

Retrieval-only（最终方案）

类型	v1 (旧提取)	v2 (改进提取)	提升
single-session-user	81%	86%	+5
single-session-assistant	25%	82%	+57
knowledge-update	53%	71%	+18
multi-session	23%	53%	+30
temporal-reasoning	29%	61%	+32
preference	0%	27%	+27
Overall	36%	64%	+28

加 Gemma 4 推理反而更差

	Retrieval-only	+ Gemma 4
Overall	64%	40%

Gemma 太保守，检索到了信息但说 "Not mentioned"。不值得增加 1.7s/query 的延迟。

关键改进（v1 → v2）

不截断 assistant 回复：分段存储（500 字/段）→ single-session-assistant 25% → 82%
用户自述作为记忆：用户说的每句话都存一份 → multi-session +30pp
偏好提取：正则匹配 "I like/prefer/use/enjoy" → preference 0% → 27%
日期元数据：存储 session 日期 → temporal 辅助

性能

56ms/query（embedding + Hopfield recall）
平均 22 条记忆/问题
无外部 LLM 依赖

各类型分析

强项

single-session-user (86%): 用户明确说的信息 → 直接存直接检索，天然适配
single-session-assistant (82%): 分段存储解决了长回复截断问题

中等

knowledge-update (71%): 新旧信息都检索到了，top-1 通常是新值
temporal-reasoning (61%): 日期信息在 context 里，但检索不做日期计算
multi-session (53%): 需要跨 session 聚合，top-K 能召回部分但不完整

弱项

preference (27%): 偏好是隐含的，正则提取覆盖有限。需要 LLM 提取或更多规则

对比定位

64% 在 LongMemEval 上是一个 competitive retrieval baseline。论文中的 RAG 基线通常在 40-60%，SOTA（带 LLM 推理）在 70-80%。我们的 retrieval-only 64% 已经超过了多数 RAG 基线。

结论

Retrieval-only 是正确选择。 简单、快速、无依赖。提升空间在提取策略（更好的 memory 切分和偏好识别），不在检索架构。

2.3 KiB Raw Blame History Unescape Escape