Files
nuonuo/doc/exp06_biohash.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

62 lines
2.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 实验6BioHash — Learnable Fly Algorithm
## 背景
灵感来自 Dasgupta et al. 2017 (Science):果蝇嗅觉回路 = random projection + WTA。
BioHash = 把随机投影换成可学习的,用对比损失训练。
## 结果
### Code Overlap邻域保持能力
| 方法 | Positive Overlap | Negative Overlap | Gap | SNR |
|------|-----------------|-----------------|-----|-----|
| Random | 0.220 | 0.004 | 0.216 | 55x |
| BioHash (noise=0.2) | **0.572** | 0.060 | **0.512** | 9.5x |
BioHash 的 positive overlap 涨了 2.6x——确实学到了把相似 embedding 映射到重叠的 code。
### Paraphrase Recall小规模
| 方法 | 10 对 Exact | 10 对 Para |
|------|-----------|-----------|
| Random | 10/10 | 8/10 |
| BioHash | 10/10 | **10/10** |
小规模下 BioHash 完美。
### Scale Test大规模core problem
| bg memories | Random | BioHash |
|-------------|--------|---------|
| 0 | 100% | 100% |
| 100 | 60% | 40% |
| 500 | 60% | 20% |
**BioHash 在大规模下反而更差。** 原因:虽然 pos overlap 涨了neg overlap 也涨了 15x信噪比从 55x 降到 9.5x。
## 核心结论
### 瓶颈不是 hash 函数,是 Hebbian W 矩阵
W @ code = Σ target_i · overlap(cue_i, query)
这个公式意味着:不管 hash 多好,大量 memory 的加权和必然淹没单条记忆的信号。这是 outer-product associative memory 的固有限制Hopfield 网络也有同样问题)。
### BioHash 的价值
- ✅ 小规模 paraphrase recall 100%vs 80%
- ✅ 证明了 learned projection 确实保持邻域结构
- ❌ 不解决 W 矩阵的规模问题
- **正确用法**: BioHash 用于编码,但检索用 code-based index而非 W 矩阵加权和)
### 修正后的架构建议
```
单跳检索: NN lookup in embedding space或 code Jaccard index
多跳联想: Hebbian W matrix从 NN 结果出发,精确 cue无噪声
编码层: BioHash比 random 更好的 code quality改善多跳链中的传播
```
W 矩阵的角色收窄到**只做多跳**,这是它真正不可替代的能力。