NuoNuo: Hippocampal memory module prototype

Hopfield + Hebbian hybrid memory system for LLMs. Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025). Architecture: - Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle) - Multi-hop: Hebbian W matrix with WTA pattern separation - 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency - 4ms latency @ 20K memories, ~1GB VRAM Key findings: - Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian) - WTA pattern separation enables 20K+ capacity - Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this - MiniLM-L6 is optimal (discrimination gap > absolute similarity) - Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark - SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00
commit d923aa1e31
65 changed files with 13148 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,33 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 env/
 venv/
 .venv/
 ENV/
 build/
 dist/
 *.egg-info/
 .pytest_cache/
 # Node
 node_modules/
 dist/
 .DS_Store
 *.log
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 # Project specific
 *.pth
 *.pt
 checkpoints/
 uv.lock
 data/
--- a/.python-version
+++ b/.python-version
@@ -0,0 +1 @@
 3.12
--- a/doc/README.md
+++ b/doc/README.md
@@ -0,0 +1,84 @@
 # NuoNuo: Hippocampal Memory Module for LLMs
 通宵原型验证实验记录（2026-04-06 ~ 07）。
 最终架构：**Hopfield + Hebbian 混合记忆系统**。
 ## 核心能力
 | 能力 | 数值 |
 |------|------|
 | 跨会话召回 (12 memories) | **100%** |
 | Paraphrase recall (+ augmentation, 2K bg) | **95-100%** |
 | Multi-hop 联想 (3 hops, 500 bg) | **100%** |
 | Scale (20K memories, no augmentation) | **80%** |
 | Latency @ 20K | **4ms** |
 | VRAM | **~1 GB** |
 ## 实验索引
 ### Phase 1: 基础验证（exp01-06）
 | # | 实验 | 结论 |
 |---|------|------|
 | 01 | [SNN Encoder Roundtrip](exp01_encoder_roundtrip.md) | ✅ CosSim 0.99 |
 | 02 | [Associative Recall](exp02_associative_recall.md) | ✅ WTA+Hebbian 20K, multi-hop 完美 |
 | 03 | [Sleep Consolidation](exp03_consolidation.md) | ⚠️ 简化为权重重建 |
 | 04 | [Real Embeddings](exp04_real_embeddings.md) | ✅ 语义 embedding 可用 |
 | 05 | [Benchmark](exp05_benchmark.md) | ✅ 3ms E2E |
 | 06 | [BioHash](exp06_biohash.md) | ⚠️ 改善编码但不解决 W 矩阵问题 |
 ### Phase 2: 突破（exp07）
 | # | 实验 | 结论 |
 |---|------|------|
 | 07 | [**Hopfield Attention**](exp07_hopfield.md) | ⭐ 噪声容忍 + 多跳 = 完美 |
 ### Phase 3: P0-P6 深入探索
 | # | 问题 | 文档 | 结论 |
 |---|------|------|------|
 | P0 | [LLM Integration](p0_llm_integration.md) | `exp08` | ✅ Pipeline 可用，LLM Gateway 待验证 |
 | P1 | [Embedding Models](p1_embedding_models.md) | `exp09` | ⭐ MiniLM 最优（gap 比 sim 重要） |
 | P2 | [Auto Paraphrase](p2_auto_paraphrase.md) | `exp10` | ✅ Heuristic +20pp, Oracle +45pp |
 | P3 | [Scale Ceiling](p3_scale_ceiling.md) | `exp11` | 结论=P2（ceiling 来自 embedding 不是架构）|
 | P4 | [Lifecycle](p4_lifecycle.md) | `exp12` | ✅ Dedup + importance scoring 可行 |
 | P5 | [SNN Hopfield](p5_snn_hopfield.md) | `exp13` | ❌ 不可行，softmax 远优于 LIF dynamics |
 | P6 | [Multi-turn](p6_multiturn.md) | `exp14` | ✅ 12/12 跨会话召回 |
 ## 综合文档
 - [**架构设计 v2**](architecture.md) — Hopfield + Hebbian 混合架构
 - [核心发现](findings.md) — 什么有用、什么没用、反直觉结论
 ## 核心模块
 - **`src/nuonuo/hippocampus.py`** — Hopfield-Hebbian 混合实现 (v2)
 - `llm.py` — LLM 集成（提取/paraphrase/context injection）
 - `src/nuonuo/encoder.py` — SNN spike encoder (备用)
 ## Quick Start
 ```python
 from sentence_transformers import SentenceTransformer
 from nuonuo.hippocampus import HippocampalMemory
 model = SentenceTransformer('all-MiniLM-L6-v2', device='cuda')
 memory = HippocampalMemory(embed_dim=384)
 def emb(text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device='cuda')[0]
 # Store with paraphrase augmentation
 memory.store(emb("The database is slow"), emb("Check missing indexes"),
             cue_variants=[emb("DB performance terrible"), emb("Database crawling")],
             metadata={"target": "Check missing indexes"})
 # Recall
 results = memory.recall(emb("DB is really slow today"), top_k=3)
 chain = memory.recall_chain(emb("DB is really slow today"), hops=3)
 # Save/Load
 memory.save("hippocampus.pt")
 ```
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -0,0 +1,129 @@
 # NuoNuo: Hippocampal Memory Module — Architecture v2
 ## 项目目标
 为 LLM（如 Gemma 4）添加一个类海马体的长期记忆模块：
 - 不使用传统 RAG（向量数据库 + 检索）
 - 记忆存储在网络权重（Hebbian）和显式模式（Hopfield）中
 - 支持 paraphrase 容忍的模糊检索
 - 支持多跳联想推理（A→B→C，RAG 做不到）
 - 每晚可整合/遗忘
 ## 核心架构
 ```
 ┌─────────────────────────────────────────────────────────┐
 │  Query Embedding (from Sentence Transformer)             │
 │                    ↓                                     │
 │  ┌──── Stage 1: NN Pre-filter ────────────────────────┐ │
 │  │  cosine(query, stored_cues) → top-20 candidates     │ │
 │  │  O(N) brute force, O(log N) with FAISS              │ │
 │  └─────────────────────┬──────────────────────────────┘ │
 │                        ↓                                 │
 │  ┌──── Stage 2: Hopfield Settle ──────────────────────┐ │
 │  │  softmax(β · query @ candidates^T) → attention       │ │
 │  │  Iterate 3 steps → converge to nearest attractor     │ │
 │  │  Aggregate attention by memory_id (cue variants)     │ │
 │  └─────────────────────┬──────────────────────────────┘ │
 │                        ↓                                 │
 │  ┌──── Optional: Multi-hop Hebbian Chain ─────────────┐ │
 │  │  Settled cue → WTA code → W @ code → next target     │ │
 │  │  Repeat for N hops (A → B → C → ...)                 │ │
 │  └─────────────────────┬──────────────────────────────┘ │
 │                        ↓                                 │
 │               Retrieved memories                          │
 └─────────────────────────────────────────────────────────┘
 ```
 ## 生物学类比
 | 大脑区域 | 系统组件 | 功能 |
 |----------|----------|------|
 | 嗅内皮层 (EC) | Sentence Transformer | 感知编码 |
 | 齿状回 (DG) | WTA Pattern Separation | 稀疏化/正交化 |
 | CA3 | Hebbian W matrix | 联想存储 + 多跳 |
 | CA1 | Hopfield attention | 检索输出 |
 | 睡眠重播 | W rebuild | 整合/遗忘 |
 ## 实验验证总结
 | 能力 | 验证结果 | 实验 |
 |------|----------|------|
 | Paraphrase recall (+ augmentation) | **95%** | exp07e |
 | Multi-hop (3 hops, 500 bg) | **100%** (sim=1.0) | exp07b, 07c |
 | Scale (20K memories) | **80%** | exp07d |
 | Exact cue recall | **100%** | exp02c |
 | Memory capacity | **20K+** | exp02d |
 | Recall latency | **4ms** @ 20K | exp05, 07d |
 | SNN encoder roundtrip | **CosSim 0.99** | exp01b |
 ## 参数推荐
 | 参数 | 值 | 备注 |
 |------|-----|------|
 | embed_dim | 384-768 | 取决于 Sentence Transformer |
 | code_dim | 16384 | Hebbian 容量 20K+ |
 | k (WTA) | 50 | 平衡噪声容忍和容量 |
 | β (Hopfield) | 16.0 | 中等锐度 |
 | hopfield_top_k | 20 | 候选集大小，越小越稳 |
 | hopfield_steps | 3 | 收敛迭代次数 |
 | cue_variants | 3-5 per memory | LLM 生成 paraphrase |
 ## VRAM 预算 (RTX 4090, 24GB)
 | 组件 | 大小 |
 |------|------|
 | Hebbian W (16384²) | 1024 MB |
 | WTA projection (384×16384) | 24 MB |
 | Hopfield store (20K × 384 × 2) | ~60 MB |
 | Sentence Transformer | ~90 MB |
 | Gemma 4B (fp16) | ~8 GB |
 | **Total** | **~9.2 GB** |
 | **Headroom** | **~14.8 GB** |
 ## 与 Gemma 集成
 推荐方案：**Context Injection**
 ```python
 # 1. User input → embed
 query_emb = encoder.encode(user_input)
 # 2. Recall memories
 results = memory.recall(query_emb, top_k=3)
 chain = memory.recall_chain(query_emb, hops=2)
 # 3. Format and inject
 context = format_memories(results + chain)
 prompt = f"[Recalled memories]\n{context}\n\n[User]\n{user_input}"
 # 4. Generate response
 response = gemma.generate(prompt)
 # 5. Store new memory (with LLM-generated paraphrases)
 paraphrases = gemma.generate(f"Generate 3 paraphrases of: {user_input}")
 memory.store(query_emb, response_emb,
             cue_variants=[encoder.encode(p) for p in paraphrases])
 ```
 ## 文件结构
 ```
 src/nuonuo/
 ├── hippocampus.py    # 最终模块 v2 (Hopfield + Hebbian hybrid)
 ├── encoder.py        # SNN spike encoder/decoder
 ├── memory.py         # STDP + Hebbian memory (historical)
 ├── consolidation.py  # Sleep consolidation (historical)
 └── __init__.py
 doc/
 ├── architecture.md   # 本文件
 ├── findings.md       # 核心发现与反直觉结论
 ├── exp01_*.md        # SNN Encoder
 ├── exp02_*.md        # Associative Recall
 ├── exp03_*.md        # Consolidation
 ├── exp04_*.md        # Real Embeddings
 ├── exp05_*.md        # Benchmarks
 ├── exp06_*.md        # BioHash
 └── exp07_*.md        # Hopfield (突破)
 ```
--- a/doc/exp01_encoder_roundtrip.md
+++ b/doc/exp01_encoder_roundtrip.md
@@ -0,0 +1,38 @@
 # 实验1：Encoder Roundtrip Test
 ## 目标
 验证 embedding → spike train → embedding 往返编码的信息保留度。
 ## 关键发现
 ### 结论：roundtrip 编码完全可行，CosSim 可达 0.99
 最佳配置：**768-dim, 2048 neurons, 64 steps → CosSim 0.9898, MSE 0.000111**
 ### 详细结果 (200 epochs, AdamW + CosineAnnealing)
 | Dim | Neurons | Steps | MSE | CosSim | 备注 |
 |-----|---------|-------|-----|--------|------|
 | 768 | 2048 | 64 | 0.000111 | **0.9898** | ⭐ 最佳 |
 | 768 | 4096 | 64 | 0.000057 | 0.9873 | MSE最低但CosSim略低 |
 | 768 | 8192 | 64 | 0.000094 | 0.9773 | 过宽反而差 |
 | 768 | 4096 | 128 | 0.000711 | 0.9640 | 步数太多反而差！|
 ### 重要观察
 1. **"死神经元"相变**：训练前60个epoch，firing rate = 0，网络完全不放电。然后突然开始放电，CosSim飙升。这是因为膜电位初始化需要学习到正确的尺度才能突破阈值。类似生物神经网络中的突触成熟过程。
 2. **更宽不等于更好**：2048 neurons 比 4096、8192 都好。更窄的瓶颈迫使编码更高效。这和 autoencoder 的经典结论一致。
 3. **更多 steps 反而有害**：128 steps 比 64 差很多（0.964 vs 0.990）。LIF 膜电位指数衰减，长序列末端的脉冲和初始 embedding 的关联太弱了。
 4. **firing rate 自然收敛到 ~6%**：目标是 10%，实际收敛到 5-7%。说明稀疏编码是最优的。
 5. **收敛速度**：50 epochs 时 768-dim 只有 ~0.89，但 200 epochs 可以到 0.99。CosineAnnealing scheduler 帮助很大。
 ### 对后续实验的指导
 - 使用 **768-dim, 2048 neurons, 64 steps** 作为默认配置
 - 训练至少 200 epochs
 - 实际记忆模块不需要完美重建——0.95 的 CosSim 已经足够做 associative recall
 - 关键瓶颈不在 encoder，而在后续的 STDP 记忆层是否能保持 spike pattern 的完整性
--- a/doc/exp01_results.json
+++ b/doc/exp01_results.json
--- a/doc/exp01b_results.json
+++ b/doc/exp01b_results.json
@@ -0,0 +1,238 @@
 [
  {
    "dim": 768,
    "neurons": 2048,
    "steps": 64,
    "final_mse": 0.00011098239338025451,
    "final_cos": 0.9898157119750977,
    "milestones": [
      {
        "epoch": 20,
        "mse": 0.005041939315075675,
        "cos": -0.0007408469663156817
      },
      {
        "epoch": 40,
        "mse": 0.0029456913859272995,
        "cos": -0.0003333062321568529
      },
      {
        "epoch": 60,
        "mse": 0.0029715588005880516,
        "cos": 0.0005352402261147896
      },
      {
        "epoch": 80,
        "mse": 0.04361877404153347,
        "cos": 0.4805794248978297
      },
      {
        "epoch": 100,
        "mse": 0.005344521099080642,
        "cos": 0.7873762448628744
      },
      {
        "epoch": 120,
        "mse": 0.001494182685079674,
        "cos": 0.9197443743546804
      },
      {
        "epoch": 140,
        "mse": 0.0003552741633029655,
        "cos": 0.9758868634700775
      },
      {
        "epoch": 160,
        "mse": 0.00016522348839013526,
        "cos": 0.9866191744804382
      },
      {
        "epoch": 180,
        "mse": 0.00011800844416332741,
        "cos": 0.9894002715746562
      },
      {
        "epoch": 200,
        "mse": 0.00011065248036175035,
        "cos": 0.9898596425851186
      }
    ]
  },
  {
    "dim": 768,
    "neurons": 4096,
    "steps": 64,
    "final_mse": 5.6636981753399596e-05,
    "final_cos": 0.9872701168060303,
    "milestones": [
      {
        "epoch": 20,
        "mse": 0.004513699406137069,
        "cos": 7.949230493977665e-05
      },
      {
        "epoch": 40,
        "mse": 0.0028209949222703775,
        "cos": 0.0006807217665482313
      },
      {
        "epoch": 60,
        "mse": 0.002746186009608209,
        "cos": -0.0012927929715563853
      },
      {
        "epoch": 80,
        "mse": 0.048195418591300644,
        "cos": 0.49279734392960867
      },
      {
        "epoch": 100,
        "mse": 0.011376503172020118,
        "cos": 0.7687788685162862
      },
      {
        "epoch": 120,
        "mse": 0.0018575659099345405,
        "cos": 0.9089678009351094
      },
      {
        "epoch": 140,
        "mse": 0.00029495314811356366,
        "cos": 0.9680179615815481
      },
      {
        "epoch": 160,
        "mse": 0.00010300778691695693,
        "cos": 0.9824542800585429
      },
      {
        "epoch": 180,
        "mse": 6.22785273056555e-05,
        "cos": 0.986561139424642
      },
      {
        "epoch": 200,
        "mse": 5.633314976876136e-05,
        "cos": 0.9872957944869996
      }
    ]
  },
  {
    "dim": 768,
    "neurons": 4096,
    "steps": 128,
    "final_mse": 0.0007109043071977794,
    "final_cos": 0.9640029072761536,
    "milestones": [
      {
        "epoch": 20,
        "mse": 0.004640598734840751,
        "cos": 0.0001389272161759436
      },
      {
        "epoch": 40,
        "mse": 0.0028830923062438765,
        "cos": -0.0005388486936377982
      },
      {
        "epoch": 60,
        "mse": 0.0026579547052582105,
        "cos": -0.0008515000498543183
      },
      {
        "epoch": 80,
        "mse": 0.005524608632549643,
        "cos": 0.3971738278865814
      },
      {
        "epoch": 100,
        "mse": 0.44284523477156956,
        "cos": 0.14999981944759685
      },
      {
        "epoch": 120,
        "mse": 0.009387427164862553,
        "cos": 0.8101295113563538
      },
      {
        "epoch": 140,
        "mse": 0.0032115802091235916,
        "cos": 0.9130531450112661
      },
      {
        "epoch": 160,
        "mse": 0.001285675020578007,
        "cos": 0.9493551254272461
      },
      {
        "epoch": 180,
        "mse": 0.0007889122760389,
        "cos": 0.9620140413443248
      },
      {
        "epoch": 200,
        "mse": 0.0007097914950766911,
        "cos": 0.9642268856366475
      }
    ]
  },
  {
    "dim": 768,
    "neurons": 8192,
    "steps": 64,
    "final_mse": 9.41839098231867e-05,
    "final_cos": 0.977264404296875,
    "milestones": [
      {
        "epoch": 20,
        "mse": 0.0042252690686533844,
        "cos": 0.0009540434480489542
      },
      {
        "epoch": 40,
        "mse": 0.0026403106516227127,
        "cos": -0.00011461178073659539
      },
      {
        "epoch": 60,
        "mse": 0.002510098453300695,
        "cos": 3.730244352482259e-05
      },
      {
        "epoch": 80,
        "mse": 0.07319205676515897,
        "cos": 0.5515906274318695
      },
      {
        "epoch": 100,
        "mse": 0.02154427437732617,
        "cos": 0.6362018167972565
      },
      {
        "epoch": 120,
        "mse": 0.005301868465418617,
        "cos": 0.8255152304967245
      },
      {
        "epoch": 140,
        "mse": 0.0007266401468465725,
        "cos": 0.9310513158639272
      },
      {
        "epoch": 160,
        "mse": 0.00019424428513351206,
        "cos": 0.9668811519940694
      },
      {
        "epoch": 180,
        "mse": 0.00010609042850167801,
        "cos": 0.9758348226547241
      },
      {
        "epoch": 200,
        "mse": 9.468134303460828e-05,
        "cos": 0.9772639731566112
      }
    ]
  }
 ]
--- a/doc/exp02_associative_recall.md
+++ b/doc/exp02_associative_recall.md
@@ -0,0 +1,68 @@
 # 实验2：STDP / Hebbian Associative Recall
 ## 系列实验总结
 ### 2a: 原始 STDP（完全失败）
 - **问题**: W 初始化为 0 → 无脉冲 → STDP 不触发 → W 保持为 0（鸡生蛋死循环）
 - **教训**: STDP 学习不能依赖网络自身产生 post-spikes，必须 teacher forcing
 ### 2b: 修复后的 STDP + 直接 Hebbian
 - **Direct Hebbian**: 1 对完美（CosSim=1.0），但多对时交叉干扰严重（10 对 Disc=0.007）
 - **STDP v2**: 比 Hebbian 差，LIF 阈值非线性扭曲输出
 - **根因**: 随机 spike pattern 不够正交，pattern 重叠导致灾难性干扰
 ### 2c: Pattern Separation（突破性进展）⭐
 - 引入 Winner-Take-All 模式分离（类比齿状回 dentate gyrus）
 - **结果**: code=16384, k=20 时，**2000 对记忆完美召回**（Disc=0.999）
 - 500 对记忆：Correct=1.0, Wrong=0.001
 ### 2d: 鲁棒性与容量
 - **容量**: 20,000 对记忆仍然完美（code=16384, k=20）
 - **Partial cue**: 30% 缺失仍 100% 召回，50% 缺失 86% 准确
 - **噪声**: ⚠️ 致命弱点——noise_std=0.1 就崩溃到 9% 准确率
 - WTA 对输入微扰极其敏感（改变 top-k 排序）
 ### 2e: 抗噪方案
 - **Soft WTA**: 虽然 CosSim 高但 discrimination=0（所有 pattern 都一样，无法区分）
 - **Multi-probe**: 完全失败
 - **Coarse-to-fine**: noise≤0.2 完美，本质上是 NN lookup + Hebbian recall
 - **Wider k**: 略有改善但不根本
 ### 2f: Learned Separator
 - 随机 embedding 上训练失败（pos_match ≈ neg_match）
 - 原因：随机高维向量没有语义结构，contrastive loss 无法学到有意义的分离
 - **需要真实语义 embedding 才能验证**
 ### 2g: Multi-hop 联想（核心卖点）⭐⭐
 - **A→B→C→D→E→F→G (6跳): CosSim=1.0**，完美链式联想
 - 100 条长度为 4 的链（300 个 pair），零干扰
 - 收敛链（A→C, B→C）: 两条路径都完美到达 C
 - 发散链（A→B, A→C）: 自然产生 50/50 混合——符合生物记忆行为
 - **这是 RAG 无法实现的能力**：RAG 只能做单跳 NN 检索
 ## 架构决策
 ### 确定的方案
 1. **Pattern Separation**: WTA（code_dim=16384, k=20）是核心组件
 2. **Hebbian Outer-Product**: 存储机制（不是 STDP trace-based）
 3. **Multi-hop**: 通过权重矩阵链式乘法实现
 4. **容量**: 20K+ 记忆毫无压力
 ### 待解决
 1. **噪声容忍**: 实际使用需要 coarse retrieval（NN lookup）辅助
   - 或者: learned separator 在真实语义 embedding 上可能 work
 2. **STDP 的角色**: 在此架构中，直接 Hebbian 比 STDP 好
   - STDP 可能在 consolidation（exp03）中找到位置
 3. **SNN 的角色**: encoder/decoder 验证通过，但 memory core 更适合 rate-based
   - SNN 的价值在于: temporal encoding + neuromorphic hardware + consolidation dynamics
 ## 关键数字
 | 指标 | 数值 |
 |------|------|
 | 最大容量 (code=16384, k=20) | >20,000 memories |
 | 单跳召回精度 (clean cue) | 1.0000 |
 | 多跳召回精度 (6 hops) | 1.0000 |
 | 噪声容忍 (noise=0.1) | ❌ 0.09 exact rate |
 | Partial cue 容忍 (30% missing) | ✅ 100% |
 | Weight matrix 内存 | 16384² × 4B = 1GB |
--- a/doc/exp02_results.json
+++ b/doc/exp02_results.json
@@ -0,0 +1,808 @@
 [
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 1,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.13341617584228516,
    "test": "single_pair"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 5,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.38840293884277344,
    "test": "pairs_5"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7765586376190186,
    "test": "pairs_10"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 20,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 1.5450711250305176,
    "test": "pairs_20"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 50,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 3.9536848068237305,
    "test": "pairs_50"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.001,
    "a_minus": 0.0012,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.774240255355835,
    "test": "lr_0.001"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.8001570701599121,
    "test": "lr_0.005"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.01,
    "a_minus": 0.012,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7752792835235596,
    "test": "lr_0.01"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.05,
    "a_minus": 0.06,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7797460556030273,
    "test": "lr_0.05"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.02,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.778449296951294,
    "test": "fr_0.02"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7686772346496582,
    "test": "fr_0.05"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.1,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7901496887207031,
    "test": "fr_0.1"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.2,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7785372734069824,
    "test": "fr_0.2"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 1,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.18436217308044434,
    "test": "pres_1"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 3,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.4729011058807373,
    "test": "pres_3"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.777827262878418,
    "test": "pres_5"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 10,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 1.5397796630859375,
    "test": "pres_10"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 20,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 3.1238980293273926,
    "test": "pres_20"
  },
  {
    "num_neurons": 1024,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7898683547973633,
    "test": "width_1024"
  },
  {
    "num_neurons": 2048,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 0.7738516330718994,
    "test": "width_2048"
  },
  {
    "num_neurons": 4096,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 6.378566026687622,
    "test": "width_4096"
  },
  {
    "num_neurons": 8192,
    "num_steps": 64,
    "num_pairs": 10,
    "firing_rate": 0.05,
    "num_presentations": 5,
    "a_plus": 0.005,
    "a_minus": 0.006,
    "mean_correct_sim": 0.0,
    "mean_wrong_sim": 0.0,
    "discrimination": 0.0,
    "correct_sims": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ],
    "recall_firing_rate": 0.0,
    "weight_stats": {
      "mean": 0.0,
      "std": 0.0,
      "abs_mean": 0.0,
      "sparsity": 1.0,
      "max": 0.0,
      "min": 0.0
    },
    "learn_time": 27.351831674575806,
    "test": "width_8192"
  }
 ]
--- a/doc/exp02b_results.json
+++ b/doc/exp02b_results.json
@@ -0,0 +1,504 @@
 [
  {
    "method": "direct_hebbian",
    "correct": 1.0,
    "wrong": 0,
    "disc": 1.0,
    "w_stats": {
      "mean": 0.0012326654978096485,
      "std": 0.001009981264360249,
      "abs_mean": 0.0012326654978096485,
      "sparsity": 0.5203375816345215,
      "max": 0.01220703125,
      "min": 0.0
    },
    "time": 0.018577098846435547,
    "num_pairs": 1,
    "lr": 0.5,
    "test": "hebb_pairs_1"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.9141042113304139,
    "wrong": 0.8997887462377548,
    "disc": 0.014315465092659019,
    "w_stats": {
      "mean": 0.006221722345799208,
      "std": 0.0023041535168886185,
      "abs_mean": 0.006221722345799208,
      "sparsity": 0.0006265640258789062,
      "max": 0.023681640625,
      "min": 0.0
    },
    "time": 0.0002448558807373047,
    "num_pairs": 5,
    "lr": 0.5,
    "test": "hebb_pairs_5"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8956584692001343,
    "wrong": 0.8881289852990044,
    "disc": 0.007529483901129841,
    "w_stats": {
      "mean": 0.012574371881783009,
      "std": 0.0033114321995526552,
      "abs_mean": 0.012574371881783009,
      "sparsity": 0.0,
      "max": 0.034423828125,
      "min": 0.0015869140625
    },
    "time": 0.0003325939178466797,
    "num_pairs": 10,
    "lr": 0.5,
    "test": "hebb_pairs_10"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8879856646060944,
    "wrong": 0.8841540270730068,
    "disc": 0.003831637533087573,
    "w_stats": {
      "mean": 0.024841923266649246,
      "std": 0.004587384406477213,
      "abs_mean": 0.024841923266649246,
      "sparsity": 0.0,
      "max": 0.054443359375,
      "min": 0.0079345703125
    },
    "time": 0.0014731884002685547,
    "num_pairs": 20,
    "lr": 0.5,
    "test": "hebb_pairs_20"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8821950948238373,
    "wrong": 0.8806546637719991,
    "disc": 0.0015404310518382092,
    "w_stats": {
      "mean": 0.06239410862326622,
      "std": 0.0072029875591397285,
      "abs_mean": 0.06239410862326622,
      "sparsity": 0.0,
      "max": 0.1075439453125,
      "min": 0.0311279296875
    },
    "time": 0.0010445117950439453,
    "num_pairs": 50,
    "lr": 0.5,
    "test": "hebb_pairs_50"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8799643820524216,
    "wrong": 0.8791960634968498,
    "disc": 0.0007683185555718008,
    "w_stats": {
      "mean": 0.12517985701560974,
      "std": 0.010384579189121723,
      "abs_mean": 0.12517985701560974,
      "sparsity": 0.0,
      "max": 0.181884765625,
      "min": 0.080810546875
    },
    "time": 0.001986265182495117,
    "num_pairs": 100,
    "lr": 0.5,
    "test": "hebb_pairs_100"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8980890333652496,
    "wrong": 0.8907561533980899,
    "disc": 0.0073328799671597,
    "w_stats": {
      "mean": 0.00025050563272088766,
      "std": 6.504161865450442e-05,
      "abs_mean": 0.00025050563272088766,
      "sparsity": 1.0,
      "max": 0.0007348632207140326,
      "min": 3.9062499126885086e-05
    },
    "time": 0.00024819374084472656,
    "num_pairs": 10,
    "lr": 0.01,
    "test": "hebb_lr_0.01"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8962267279624939,
    "wrong": 0.8887399951616923,
    "disc": 0.007486732800801588,
    "w_stats": {
      "mean": 0.002498718211427331,
      "std": 0.0006414031959138811,
      "abs_mean": 0.002498718211427331,
      "sparsity": 0.0017719268798828125,
      "max": 0.006787109654396772,
      "min": 0.0003906250058207661
    },
    "time": 0.00022459030151367188,
    "num_pairs": 10,
    "lr": 0.1,
    "test": "hebb_lr_0.1"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.897282725572586,
    "wrong": 0.8897124224238926,
    "disc": 0.007570303148693447,
    "w_stats": {
      "mean": 0.012481803074479103,
      "std": 0.003280544187873602,
      "abs_mean": 0.012481803074479103,
      "sparsity": 0.0,
      "max": 0.035400390625,
      "min": 0.001220703125
    },
    "time": 0.0002167224884033203,
    "num_pairs": 10,
    "lr": 0.5,
    "test": "hebb_lr_0.5"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8980260252952575,
    "wrong": 0.8906061040030585,
    "disc": 0.007419921292199039,
    "w_stats": {
      "mean": 0.024723926559090614,
      "std": 0.006540043745189905,
      "abs_mean": 0.024723926559090614,
      "sparsity": 0.0,
      "max": 0.07080078125,
      "min": 0.0029296875
    },
    "time": 0.000244140625,
    "num_pairs": 10,
    "lr": 1.0,
    "test": "hebb_lr_1.0"
  },
  {
    "method": "direct_hebbian",
    "correct": 0.8966590642929078,
    "wrong": 0.8892279856734806,
    "disc": 0.007431078619427156,
    "w_stats": {
      "mean": 0.12513691186904907,
      "std": 0.033127814531326294,
      "abs_mean": 0.12513691186904907,
      "sparsity": 0.0,
      "max": 0.34912109375,
      "min": 0.01708984375
    },
    "time": 0.00022292137145996094,
    "num_pairs": 10,
    "lr": 5.0,
    "test": "hebb_lr_5.0"
  },
  {
    "method": "stdp_v2",
    "correct": 0.6561112403869629,
    "wrong": 0,
    "disc": 0.6561112403869629,
    "w_stats": {
      "mean": -0.021826405078172684,
      "std": 0.10513758659362793,
      "abs_mean": 0.0771329402923584,
      "sparsity": 0.01354217529296875,
      "max": 0.9745967388153076,
      "min": -1.0
    },
    "time": 0.020874977111816406,
    "num_pairs": 1,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_single"
  },
  {
    "method": "stdp_v2",
    "correct": 0.18890744000673293,
    "wrong": 0.1745286915037367,
    "disc": 0.014378748502996225,
    "w_stats": {
      "mean": -0.022967157885432243,
      "std": 0.0347970575094223,
      "abs_mean": 0.03315284848213196,
      "sparsity": 0.01936030387878418,
      "max": 0.15670186281204224,
      "min": -0.2564994990825653
    },
    "time": 0.26916003227233887,
    "num_pairs": 10,
    "a_plus": 0.001,
    "num_pres": 5,
    "test": "stdp_ap_0.001"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2575246155261993,
    "wrong": 0.24171155989170073,
    "disc": 0.015813055634498585,
    "w_stats": {
      "mean": -0.11097157001495361,
      "std": 0.16529808938503265,
      "abs_mean": 0.15848013758659363,
      "sparsity": 0.003999948501586914,
      "max": 0.7865057587623596,
      "min": -1.0
    },
    "time": 0.2617373466491699,
    "num_pairs": 10,
    "a_plus": 0.005,
    "num_pres": 5,
    "test": "stdp_ap_0.005"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2595768585801125,
    "wrong": 0.24608167906602224,
    "disc": 0.013495179514090239,
    "w_stats": {
      "mean": -0.2241937518119812,
      "std": 0.3288184702396393,
      "abs_mean": 0.31941741704940796,
      "sparsity": 0.0019876956939697266,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.2628958225250244,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_ap_0.01"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2949586361646652,
    "wrong": 0.2823951015869776,
    "disc": 0.012563534577687607,
    "w_stats": {
      "mean": -0.3816400170326233,
      "std": 0.6254727244377136,
      "abs_mean": 0.6577693819999695,
      "sparsity": 0.0006313323974609375,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.26494669914245605,
    "num_pairs": 10,
    "a_plus": 0.05,
    "num_pres": 5,
    "test": "stdp_ap_0.05"
  },
  {
    "method": "stdp_v2",
    "correct": 0.4278454571962357,
    "wrong": 0.4212547073761622,
    "disc": 0.0065907498200734604,
    "w_stats": {
      "mean": -0.2731684446334839,
      "std": 0.7176912426948547,
      "abs_mean": 0.6977914571762085,
      "sparsity": 0.0005943775177001953,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.263822078704834,
    "num_pairs": 10,
    "a_plus": 0.1,
    "num_pres": 5,
    "test": "stdp_ap_0.1"
  },
  {
    "method": "stdp_v2",
    "correct": 0.23125857561826707,
    "wrong": 0.2177843016054895,
    "disc": 0.013474274012777565,
    "w_stats": {
      "mean": -0.04514722526073456,
      "std": 0.06740628927946091,
      "abs_mean": 0.06436040252447128,
      "sparsity": 0.009995222091674805,
      "max": 0.39074867963790894,
      "min": -0.4776502847671509
    },
    "time": 0.04672598838806152,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 1,
    "test": "stdp_pres_1"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2634019389748573,
    "wrong": 0.25012489590379927,
    "disc": 0.013277043071058037,
    "w_stats": {
      "mean": -0.1368531435728073,
      "std": 0.20195430517196655,
      "abs_mean": 0.19379907846450806,
      "sparsity": 0.0032529830932617188,
      "max": 0.9678819179534912,
      "min": -1.0
    },
    "time": 0.15973997116088867,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 3,
    "test": "stdp_pres_3"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2491248592734337,
    "wrong": 0.23636625193887287,
    "disc": 0.012758607334560829,
    "w_stats": {
      "mean": -0.23120653629302979,
      "std": 0.3264971673488617,
      "abs_mean": 0.3213178515434265,
      "sparsity": 0.0019659996032714844,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.2647593021392822,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pres_5"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2780441254377365,
    "wrong": 0.2647830416758855,
    "disc": 0.013261083761850978,
    "w_stats": {
      "mean": -0.35562774538993835,
      "std": 0.5160319805145264,
      "abs_mean": 0.5376336574554443,
      "sparsity": 0.0010309219360351562,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.5369422435760498,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 10,
    "test": "stdp_pres_10"
  },
  {
    "method": "stdp_v2",
    "correct": 0.2884770005941391,
    "wrong": 0.27335384570890003,
    "disc": 0.015123154885239076,
    "w_stats": {
      "mean": -0.39499378204345703,
      "std": 0.616945207118988,
      "abs_mean": 0.6560592651367188,
      "sparsity": 0.0006542205810546875,
      "max": 1.0,
      "min": -1.0
    },
    "time": 1.07774019241333,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 20,
    "test": "stdp_pres_20"
  },
  {
    "method": "stdp_v2",
    "correct": 0.6523751020431519,
    "wrong": 0,
    "disc": 0.6523751020431519,
    "w_stats": {
      "mean": -0.021571537479758263,
      "std": 0.10514378547668457,
      "abs_mean": 0.07724925875663757,
      "sparsity": 0.013613700866699219,
      "max": 0.8662262558937073,
      "min": -1.0
    },
    "time": 0.019371509552001953,
    "num_pairs": 1,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pairs_1"
  },
  {
    "method": "stdp_v2",
    "correct": 0.38826957941055296,
    "wrong": 0.3618871793150902,
    "disc": 0.026382400095462777,
    "w_stats": {
      "mean": -0.11600317060947418,
      "std": 0.23469609022140503,
      "abs_mean": 0.20440348982810974,
      "sparsity": 0.003264188766479492,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.13651704788208008,
    "num_pairs": 5,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pairs_5"
  },
  {
    "method": "stdp_v2",
    "correct": 0.24784058183431626,
    "wrong": 0.23306812014844683,
    "disc": 0.014772461685869431,
    "w_stats": {
      "mean": -0.22731736302375793,
      "std": 0.3267463445663452,
      "abs_mean": 0.3194453716278076,
      "sparsity": 0.001964569091796875,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.28417372703552246,
    "num_pairs": 10,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pairs_10"
  },
  {
    "method": "stdp_v2",
    "correct": 0.12115511521697045,
    "wrong": 0.11465478504174634,
    "disc": 0.006500330175224112,
    "w_stats": {
      "mean": -0.42493927478790283,
      "std": 0.4062454402446747,
      "abs_mean": 0.5013920068740845,
      "sparsity": 0.0010747909545898438,
      "max": 1.0,
      "min": -1.0
    },
    "time": 0.5374035835266113,
    "num_pairs": 20,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pairs_20"
  },
  {
    "method": "stdp_v2",
    "correct": 0.01958512845332734,
    "wrong": 0.01876905316739089,
    "disc": 0.000816075285936451,
    "w_stats": {
      "mean": -0.6996303796768188,
      "std": 0.3604925274848938,
      "abs_mean": 0.7365255355834961,
      "sparsity": 0.0003490447998046875,
      "max": 1.0,
      "min": -1.0
    },
    "time": 1.3608872890472412,
    "num_pairs": 50,
    "a_plus": 0.01,
    "num_pres": 5,
    "test": "stdp_pairs_50"
  }
 ]
--- a/doc/exp02c_results.json
+++ b/doc/exp02c_results.json
@@ -0,0 +1,326 @@
 [
  {
    "test": "overlap",
    "code_dim": 2048,
    "k": 20,
    "mean_overlap": 0.010202019593932412,
    "max_overlap": 0.14999999105930328
  },
  {
    "test": "overlap",
    "code_dim": 2048,
    "k": 50,
    "mean_overlap": 0.024711112705520306,
    "max_overlap": 0.12000001221895218
  },
  {
    "test": "overlap",
    "code_dim": 2048,
    "k": 100,
    "mean_overlap": 0.04898586208941509,
    "max_overlap": 0.15000000596046448
  },
  {
    "test": "overlap",
    "code_dim": 4096,
    "k": 20,
    "mean_overlap": 0.005979797623374246,
    "max_overlap": 0.09999999403953552
  },
  {
    "test": "overlap",
    "code_dim": 4096,
    "k": 50,
    "mean_overlap": 0.012800000862717027,
    "max_overlap": 0.12000000476837158
  },
  {
    "test": "overlap",
    "code_dim": 4096,
    "k": 100,
    "mean_overlap": 0.024448486435405835,
    "max_overlap": 0.11000000685453415
  },
  {
    "test": "overlap",
    "code_dim": 8192,
    "k": 20,
    "mean_overlap": 0.0025757574222304604,
    "max_overlap": 0.09999999403953552
  },
  {
    "test": "overlap",
    "code_dim": 8192,
    "k": 50,
    "mean_overlap": 0.006472727722968116,
    "max_overlap": 0.06000000238418579
  },
  {
    "test": "overlap",
    "code_dim": 8192,
    "k": 100,
    "mean_overlap": 0.012430303828659083,
    "max_overlap": 0.06000000610947609
  },
  {
    "test": "overlap",
    "code_dim": 16384,
    "k": 20,
    "mean_overlap": 0.0012222221493721009,
    "max_overlap": 0.09999999403953552
  },
  {
    "test": "overlap",
    "code_dim": 16384,
    "k": 50,
    "mean_overlap": 0.003167676991133979,
    "max_overlap": 0.06000000238418579
  },
  {
    "test": "overlap",
    "code_dim": 16384,
    "k": 100,
    "mean_overlap": 0.006484848919202282,
    "max_overlap": 0.05000000447034836
  },
  {
    "test": "sep_pairs_1",
    "correct": 1.0,
    "wrong": 0,
    "disc": 1.0,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 1,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_5",
    "correct": 1.0000000476837159,
    "wrong": 0.010000000707805157,
    "disc": 0.9900000469759107,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 5,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_10",
    "correct": 1.0000000715255737,
    "wrong": 0.007111111614439222,
    "disc": 0.9928889599111345,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 10,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_20",
    "correct": 1.0000000715255737,
    "wrong": 0.007473684719910747,
    "disc": 0.992526386805663,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 20,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_50",
    "correct": 1.0000000524520873,
    "wrong": 0.006183673899468719,
    "disc": 0.9938163785526186,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 50,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_100",
    "correct": 1.0000000536441802,
    "wrong": 0.005727273127948395,
    "disc": 0.9942727805162318,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_200",
    "correct": 1.0000000607967376,
    "wrong": 0.00616582957512919,
    "disc": 0.9938342312216084,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 200,
    "lr": 1.0
  },
  {
    "test": "sep_pairs_500",
    "correct": 1.0000000553131103,
    "wrong": 0.006348697836501506,
    "disc": 0.9936513574766088,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 500,
    "lr": 1.0
  },
  {
    "test": "sep_codedim_2048",
    "correct": 1.0000000619888305,
    "wrong": 0.0245959611731873,
    "disc": 0.9754041008156432,
    "code_dim": 2048,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_codedim_4096",
    "correct": 1.0000000619888305,
    "wrong": 0.01184848565608263,
    "disc": 0.9881515763327479,
    "code_dim": 4096,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_codedim_8192",
    "correct": 1.0000000548362733,
    "wrong": 0.006383838832867567,
    "disc": 0.9936162160034058,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_codedim_16384",
    "correct": 1.0000000536441802,
    "wrong": 0.003434343673665114,
    "disc": 0.9965657099705151,
    "code_dim": 16384,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_k_10",
    "correct": 1.0,
    "wrong": 0.0013131313326984946,
    "disc": 0.9986868686673015,
    "code_dim": 8192,
    "k_active": 10,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_k_20",
    "correct": 0.9999999302625656,
    "wrong": 0.002348484708504243,
    "disc": 0.9976514455540614,
    "code_dim": 8192,
    "k_active": 20,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_k_50",
    "correct": 1.000000058412552,
    "wrong": 0.0059797983936438655,
    "disc": 0.9940202600189081,
    "code_dim": 8192,
    "k_active": 50,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_k_100",
    "correct": 1.0000000667572022,
    "wrong": 0.012792930120338846,
    "disc": 0.9872071366368633,
    "code_dim": 8192,
    "k_active": 100,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "sep_k_200",
    "correct": 1.000000069141388,
    "wrong": 0.025040405879098206,
    "disc": 0.9749596632622898,
    "code_dim": 8192,
    "k_active": 200,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "cap_10",
    "correct": 0.9999999344348908,
    "wrong": 0.001111111044883728,
    "disc": 0.9988888233900071,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 10,
    "lr": 1.0
  },
  {
    "test": "cap_50",
    "correct": 0.9999999284744263,
    "wrong": 0.001836734584399632,
    "disc": 0.9981631938900267,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 50,
    "lr": 1.0
  },
  {
    "test": "cap_100",
    "correct": 0.9999999344348908,
    "wrong": 0.0014141413298520175,
    "disc": 0.9985857931050387,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 100,
    "lr": 1.0
  },
  {
    "test": "cap_200",
    "correct": 0.9999999329447746,
    "wrong": 0.0011055275722963726,
    "disc": 0.9988944053724782,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 200,
    "lr": 1.0
  },
  {
    "test": "cap_500",
    "correct": 0.9999999303817749,
    "wrong": 0.001167334599760109,
    "disc": 0.9988325957820148,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 500,
    "lr": 1.0
  },
  {
    "test": "cap_1000",
    "correct": 0.9999999321103096,
    "wrong": 0.0012262261531374476,
    "disc": 0.9987737059571721,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 1000,
    "lr": 1.0
  },
  {
    "test": "cap_2000",
    "correct": 0.999999930858612,
    "wrong": 0.0013319158785906119,
    "disc": 0.9986680149800214,
    "code_dim": 16384,
    "k_active": 20,
    "num_pairs": 2000,
    "lr": 1.0
  }
 ]
--- a/doc/exp02d_results.json
+++ b/doc/exp02d_results.json
@@ -0,0 +1,99 @@
 {
  "noise": {
    "0.0": {
      "mean_cos": 0.9999999350309372,
      "exact_rate": 1.0
    },
    "0.1": {
      "mean_cos": 0.16949998944997788,
      "exact_rate": 0.09
    },
    "0.2": {
      "mean_cos": 0.06849999487400055,
      "exact_rate": 0.03
    },
    "0.5": {
      "mean_cos": 0.024999997913837432,
      "exact_rate": 0.0
    },
    "1.0": {
      "mean_cos": 0.011999999135732652,
      "exact_rate": 0.0
    },
    "2.0": {
      "mean_cos": 0.002499999850988388,
      "exact_rate": 0.0
    },
    "5.0": {
      "mean_cos": 0.009499999433755875,
      "exact_rate": 0.0
    }
  },
  "partial": {
    "0.0": {
      "mean_cos": 0.9999999344348908,
      "exact_rate": 1.0
    },
    "0.1": {
      "mean_cos": 0.9999999344348908,
      "exact_rate": 1.0
    },
    "0.2": {
      "mean_cos": 0.9999999344348908,
      "exact_rate": 1.0
    },
    "0.3": {
      "mean_cos": 0.9999999344348908,
      "exact_rate": 1.0
    },
    "0.5": {
      "mean_cos": 0.9069999405741691,
      "exact_rate": 0.86
    },
    "0.7": {
      "mean_cos": 0.5879999609291553,
      "exact_rate": 0.45
    },
    "0.9": {
      "mean_cos": 0.1689999896287918,
      "exact_rate": 0.08
    }
  },
  "capacity": {
    "100": {
      "mean_cos": 0.999999930858612,
      "exact_rate": 1.0,
      "w_abs": 0.00014901161193847656
    },
    "500": {
      "mean_cos": 0.9999999320507049,
      "exact_rate": 1.0,
      "w_abs": 0.0007450580596923828
    },
    "1000": {
      "mean_cos": 0.9999999344348908,
      "exact_rate": 1.0,
      "w_abs": 0.0014901161193847656
    },
    "2000": {
      "mean_cos": 0.9999999338388443,
      "exact_rate": 1.0,
      "w_abs": 0.0029802322387695312
    },
    "5000": {
      "mean_cos": 0.9999999314546585,
      "exact_rate": 1.0,
      "w_abs": 0.007450580596923828
    },
    "10000": {
      "mean_cos": 0.9999999326467514,
      "exact_rate": 1.0,
      "w_abs": 0.014901161193847656
    },
    "20000": {
      "mean_cos": 0.9999999272823333,
      "exact_rate": 1.0,
      "w_abs": 0.029802322387695312
    }
  }
 }
--- a/doc/exp02e_results.json
+++ b/doc/exp02e_results.json
@@ -0,0 +1,114 @@
 {
  "soft_wta_t0.01": {
    "0.0": 0.9924059003591538,
    "0.05": 0.7081658291816711,
    "0.1": 0.3512206456577405,
    "0.2": 0.1427949102059938,
    "0.5": 0.06214611444971524,
    "1.0": 0.03803978644893505
  },
  "soft_wta_t0.05": {
    "0.0": 0.7770068669319152,
    "0.05": 0.7753341776132584,
    "0.1": 0.7744931131601334,
    "0.2": 0.7739920604228974,
    "0.5": 0.7737001150846481,
    "1.0": 0.7735983967781067
  },
  "soft_wta_t0.1": {
    "0.0": 0.9377952325344086,
    "0.05": 0.9377174872159958,
    "0.1": 0.9376753580570221,
    "0.2": 0.9376475828886032,
    "0.5": 0.9376276469230652,
    "1.0": 0.9376224195957183
  },
  "soft_wta_t0.5": {
    "0.0": 0.9974229729175568,
    "0.05": 0.9974228632450104,
    "0.1": 0.9974228018522262,
    "0.2": 0.9974227517843246,
    "0.5": 0.9974227398633957,
    "1.0": 0.9974227231740952
  },
  "multiprobe_4": {
    "0.0": 0.0,
    "0.05": 0.0,
    "0.1": 0.0,
    "0.2": 0.0,
    "0.5": 0.0,
    "1.0": 0.0
  },
  "multiprobe_8": {
    "0.0": 0.0,
    "0.05": 0.0,
    "0.1": 0.0,
    "0.2": 0.0,
    "0.5": 0.0,
    "1.0": 0.0
  },
  "multiprobe_16": {
    "0.0": 0.0,
    "0.05": 0.0,
    "0.1": 0.0,
    "0.2": 0.0,
    "0.5": 0.0,
    "1.0": 0.0
  },
  "multiprobe_32": {
    "0.0": 0.0,
    "0.05": 0.0,
    "0.1": 0.0,
    "0.2": 0.0,
    "0.5": 0.0,
    "1.0": 0.0
  },
  "coarse_to_fine": {
    "0.0": 0.9999999326467514,
    "0.05": 0.9999999326467514,
    "0.1": 0.9999999326467514,
    "0.2": 0.9999999326467514,
    "0.5": 0.24099998503923417,
    "1.0": 0.07149999514222145
  },
  "wider_k_50": {
    "0.0": 1.000000058412552,
    "0.05": 0.96500005453825,
    "0.1": 0.3752000237070024,
    "0.2": 0.10180000556632876,
    "0.5": 0.021200001928955315,
    "1.0": 0.01700000114738941
  },
  "wider_k_100": {
    "0.0": 1.0000000560283662,
    "0.05": 0.9984000563621521,
    "0.1": 0.6423000478558243,
    "0.2": 0.18020001276396214,
    "0.5": 0.050500003919005394,
    "1.0": 0.03480000267736614
  },
  "wider_k_200": {
    "0.0": 1.0000000560283662,
    "0.05": 0.9999500566720962,
    "0.1": 0.6304500451683999,
    "0.2": 0.18000001210719346,
    "0.5": 0.07430000650696457,
    "1.0": 0.06735000459477306
  },
  "wider_k_500": {
    "0.0": 0.9999999970197677,
    "0.05": 0.9025200027227401,
    "0.1": 0.38294000312685966,
    "0.2": 0.17088000044226648,
    "0.5": 0.09710000049322844,
    "1.0": 0.08222000036388635
  },
  "wider_k_1000": {
    "0.0": 0.9985101699829102,
    "0.05": 0.5221900832653046,
    "0.1": 0.27553004458546637,
    "0.2": 0.16993002608418464,
    "0.5": 0.13159002162516117,
    "1.0": 0.11921001873910426
  }
 }
--- a/doc/exp03_consolidation.md
+++ b/doc/exp03_consolidation.md
@@ -0,0 +1,74 @@
 # 实验3：Sleep Consolidation
 ## 实验 3a：标准配置下的 Consolidation（code_dim=16384, k=20）
 **结论：Consolidation 基本没有效果。**
 因为 pattern separation 太强了：
 - 20,000 memories 全部完美召回，consolidation 没有优化空间
 - 10 晚纯 homeostasis（无 replay）后仍然 CosSim=1.0
 - Replay 只是让 W_norm 膨胀（200 → 1644）
 Noisy replay 对噪声容忍的改善极其微小（noise=0.05 时 54%→60%），不值得。
 ## 实验 3b：小网络下的 Consolidation（code_dim=2048, k=50）
 ### 容量边界
 | N | CosSim (无 consol) | CosSim (有 consol) |
 |---|---|---|
 | 500 | 1.0000 | 0.9999 |
 | 1000 | 0.9752 | 0.9754 |
 | 2000 | 0.8019 | 0.8021 |
 **Consolidation 对容量没有帮助。** 干扰来自 pattern 重叠，replay 不能解决这个问题。
 ### 7 天场景（核心发现）⚠️
 每天学 200 条新记忆，每晚 consolidate：
 | 天数 | 总记忆 | Day1 记忆 | 今日记忆 | 全局精度 |
 |------|--------|-----------|----------|----------|
 | Day 1 | 200 | 1.000 | 1.000 | 100% |
 | Night 2 后 | 400 | 0.989 | - | 100% |
 | Night 3 后 | 600 | 0.770 | - | 100% |
 | Night 5 后 | 1000 | 0.252 | - | 71% |
 | Night 7 后 | 1400 | 0.072 | 0.535 | 50% |
 **Consolidation 反而加速了旧记忆的遗忘！** 原因：
 1. Replay 添加新的 outer product → 增加干扰
 2. Selective clear (保留 30%) 意味着旧记忆得不到 replay
 3. W_norm 持续增长（749 → 4000），信噪比恶化
 ### Homeostasis 对稳定系统无影响
 500 pairs + 10 晚 consolidation，无论 hf=0.70 还是 1.0，CosSim 都 ≥ 0.9998。
 WTA 纠错码太强，只要容量够，权重缩放不影响结果。
 ## 关键结论
 ### Consolidation 的真正价值（不是我们预期的）
 1. ❌ **不是防止遗忘**——pattern separation 已经解决了
 2. ❌ **不是提升容量**——容量由 code_dim/k 决定，不由 W 训练策略决定
 3. ✅ **是 W_norm 管理**——防止权重无限增长
 4. ✅ **是选择性遗忘**——当接近容量极限时，主动丢弃不重要的记忆
 ### 正确的 Consolidation 策略
 当前的"replay + homeostasis"策略是错误的。更好的方案：
 1. **W 重建法**：保存所有 (cue_code, target_code) 对，每晚从零重建 W = Σ target ⊗ cue
   - 保证一致性，不累积误差
   - 可以选择性丢弃不重要的 pair（实现遗忘曲线）
   - O(N × code_dim²) 但只需每晚一次
 2. **容量监控 + 动态扩展**：监控召回精度，接近极限时扩大 code_dim
 3. **实际推荐**：直接用大 code_dim（16384+），容量 20K+ 够用几年的对话历史。
   Consolidation 简化为：每晚检查 W_norm，如果过大就重建。
 ### 对整体架构的启示
 生物海马体需要 consolidation 是因为它容量有限（~数天），需要把记忆转移到皮层。
 但在我们的数字系统中，可以直接用更大的 code_dim 来规避容量问题。
 Consolidation 退化为一个简单的**存储管理**问题，不需要复杂的 replay 机制。
--- a/doc/exp04_real_embeddings.md
+++ b/doc/exp04_real_embeddings.md
@@ -0,0 +1,87 @@
 # 实验4：真实语义 Embedding 端到端测试
 ## 模型
 sentence-transformers/all-MiniLM-L6-v2, embedding dim=384
 ## 关键结果
 ### Embedding 空间分析
 - 原始 cue ↔ paraphrase 余弦相似度: mean=0.68, min=0.18, max=0.86
 - 不同 pair 间余弦相似度: mean=0.10
 - Gap = 0.59 — 语义空间有合理分离度
 ### 精确 cue 召回: 100% ✓
 20 对记忆，使用原始 cue 查询，全部正确。
 ### Paraphrase 召回（20 对，无 background）
 | Config | Direct Recall | Coarse-to-Fine |
 |--------|---------------|----------------|
 | code=4096, k=20 | 85% | 90% |
 | code=16384, k=50 | **95%** | 90% |
 | code=16384, k=100 | 90% | 90% |
 **k=50 是最佳 paraphrase 配置**，超过了 coarse-to-fine。
 ### Multi-hop: 完美 ✓✓✓
 修复 unified projection 后，4 条语义链 × 3 跳 = 全部 CosSim=1.0。
 多条链共享同一个 memory 也完美。
 ### Paraphrase at Scale（核心问题）⚠️
 | Background memories | Exact Recall | Paraphrase Recall |
 |---------------------|-------------|-------------------|
 | 0 | 5/5 | 5/5 |
 | 100 | 3-4/5 | 1-2/5 |
 | 500 | 1-3/5 | 0-1/5 |
 | 1000 | 0-3/5 | 0-1/5 |
 **随着存储记忆增加，paraphrase recall 急剧下降。**
 根因：Hebbian 回忆是 W @ sep(query) = Σ target_i · (sep(cue_i) · sep(query))，
 当 memory 数量多时，query code 和多个 cue code 部分重叠，产生噪声混合。
 这不是容量问题（exact recall 2000 条仍然 100%），而是**信噪比问题**。
 ## 架构决策
 ### 最终推荐架构：Hybrid Memory
 ```
 ┌─────────────────────────────────────────────────┐
 │                Query Embedding                    │
 │                      ↓                            │
 │  ┌───────────── Single-Hop ──────────────────┐   │
 │  │  Key-Value Store (explicit cue→target)     │   │
 │  │  NN Lookup: cos_sim(query, stored_cues)    │   │
 │  │  → Top-K nearest cue embeddings            │   │
 │  │  → Return their associated targets         │   │
 │  └────────────────────────────────────────────┘   │
 │                      ↓                            │
 │  ┌───────────── Multi-Hop ───────────────────┐   │
 │  │  Hebbian W matrix (unified projection)     │   │
 │  │  Start from NN-retrieved exact cue         │   │
 │  │  → Chain through W for 2+ hop associations │   │
 │  └────────────────────────────────────────────┘   │
 │                      ↓                            │
 │              Retrieved memories                    │
 └─────────────────────────────────────────────────┘
 ```
 ### 为什么这个架构是对的
 1. **Single-hop 用 NN lookup**：噪声容忍，任意 paraphrase 都能命中
 2. **Multi-hop 用 Hebbian W**：唯一能做 A→B→C 链式联想的方法
 3. **不冲突**：NN lookup 找到精确 cue 后，用精确 cue 查 W 矩阵（不受噪声影响）
 4. **SNN encoder 的位置**：可选，将 embedding 编码为 spike train 作为 W 的输入
   - 当前实验中，WTA 直接在 embedding 空间上做 pattern separation 就够了
   - SNN encoder 的价值在 neuromorphic hardware 部署
 ### 最优参数
 | 参数 | 推荐值 | 理由 |
 |------|--------|------|
 | code_dim | 16384 | 容量 20K+，显存 ~1GB |
 | k (WTA active) | 50 | 平衡 paraphrase 容忍度和容量 |
 | input_dim | 384-768 | 取决于 embedding model |
 | W 精度 | float32 | 1GB for 16384² |
--- a/doc/exp05_benchmark.md
+++ b/doc/exp05_benchmark.md
@@ -0,0 +1,61 @@
 # 实验5：性能 Benchmark
 ## 学习吞吐量
 | code_dim | k | 吞吐量 | 5000条耗时 |
 |----------|---|--------|-----------|
 | 8192 | 50 | **794/s** | 6.3s |
 | 16384 | 50 | 211/s | 23.7s |
 | 32768 | 50 | 54/s | 92.7s |
 瓶颈是 outer-product 更新：O(code_dim²) per memory。
 16384 维的 211/s 意味着一天的对话（假设 1000 条记忆）只需 ~5 秒。
 ## 召回延迟
 | code_dim | k | 延迟 |
 |----------|---|------|
 | 8192 | 50 | **0.35 ms** |
 | 16384 | 50 | 1.26 ms |
 | 32768 | 50 | 4.63 ms |
 **16384 维：1.3ms/query**——对 LLM 对话场景完全够快（LLM 生成一个 token 都要 ~20ms）。
 ## Multi-hop 延迟
 | 跳数 | 延迟 (code=16384) |
 |------|-------------------|
 | 1 | 1.26 ms |
 | 2 | 2.45 ms |
 | 3 | 3.64 ms |
 | 5 | 6.03 ms |
 | 10 | 12.05 ms |
 线性增长：~1.2ms/hop。10 跳 12ms 仍然远快于 LLM inference。
 ## GPU 显存
 | code_dim | W 矩阵 | 总占用 |
 |----------|---------|--------|
 | 4096 | 64 MB | 70 MB |
 | 8192 | 256 MB | 268 MB |
 | **16384** | **1024 MB** | **1048 MB** |
 | 32768 | 4096 MB | 4144 MB |
 推荐 **16384 维 = 1GB 显存**，在 RTX 4090 (24GB) 上轻松和 Gemma 4B 共存。
 ## 端到端 Pipeline（含 embedding 模型）
 | 步骤 | 延迟 |
 |------|------|
 | Embedding (all-MiniLM-L6-v2) | 1.8 ms |
 | Hebbian Recall (1-hop) | 1.3 ms |
 | **Total** | **3.1 ms** |
 Embedding 和 recall 耗时相当。总计 3ms 远低于 LLM 生成延迟。
 ## 结论
 - code_dim=16384 是最佳平衡点：1GB 显存，1.3ms 召回，211/s 学习
 - 性能完全不是瓶颈——LLM inference 才是
 - 32768 维如果需要更大容量也可以（4GB，但 learning 慢 4x）
--- a/doc/exp06_biohash.md
+++ b/doc/exp06_biohash.md
@@ -0,0 +1,61 @@
 # 实验6：BioHash — Learnable Fly Algorithm
 ## 背景
 灵感来自 Dasgupta et al. 2017 (Science)：果蝇嗅觉回路 = random projection + WTA。
 BioHash = 把随机投影换成可学习的，用对比损失训练。
 ## 结果
 ### Code Overlap（邻域保持能力）
 | 方法 | Positive Overlap | Negative Overlap | Gap | SNR |
 |------|-----------------|-----------------|-----|-----|
 | Random | 0.220 | 0.004 | 0.216 | 55x |
 | BioHash (noise=0.2) | **0.572** | 0.060 | **0.512** | 9.5x |
 BioHash 的 positive overlap 涨了 2.6x——确实学到了把相似 embedding 映射到重叠的 code。
 ### Paraphrase Recall（小规模）
 | 方法 | 10 对 Exact | 10 对 Para |
 |------|-----------|-----------|
 | Random | 10/10 | 8/10 |
 | BioHash | 10/10 | **10/10** |
 小规模下 BioHash 完美。
 ### Scale Test（大规模，core problem）
 | bg memories | Random | BioHash |
 |-------------|--------|---------|
 | 0 | 100% | 100% |
 | 100 | 60% | 40% |
 | 500 | 60% | 20% |
 **BioHash 在大规模下反而更差。** 原因：虽然 pos overlap 涨了，neg overlap 也涨了 15x，信噪比从 55x 降到 9.5x。
 ## 核心结论
 ### 瓶颈不是 hash 函数，是 Hebbian W 矩阵
 W @ code = Σ target_i · overlap(cue_i, query)
 这个公式意味着：不管 hash 多好，大量 memory 的加权和必然淹没单条记忆的信号。这是 outer-product associative memory 的固有限制（Hopfield 网络也有同样问题）。
 ### BioHash 的价值
 - ✅ 小规模 paraphrase recall 100%（vs 80%）
 - ✅ 证明了 learned projection 确实保持邻域结构
 - ❌ 不解决 W 矩阵的规模问题
 - **正确用法**: BioHash 用于编码，但检索用 code-based index（而非 W 矩阵加权和）
 ### 修正后的架构建议
 ```
 单跳检索: NN lookup in embedding space（或 code Jaccard index）
 多跳联想: Hebbian W matrix（从 NN 结果出发，精确 cue，无噪声）
 编码层:   BioHash（比 random 更好的 code quality，改善多跳链中的传播）
 ```
 W 矩阵的角色收窄到**只做多跳**，这是它真正不可替代的能力。
--- a/doc/exp07_hopfield.md
+++ b/doc/exp07_hopfield.md
@@ -0,0 +1,88 @@
 # 实验7：Hopfield + 网络结构探索
 ## 背景
 exp02-06 的核心问题：Hebbian W 矩阵做模糊单跳检索在大规模下失败（SNR 不够）。
 Fam 指出这是**网络结构问题**，不是 hash 函数问题。
 ## 7a: 架构对比
 | 架构 | bg=0 | bg=100 | bg=500 | bg=1000 |
 |------|------|--------|--------|---------|
 | Flat Hebbian | 80% | 60% | 30% | 20% |
 | Attractor (auto+hetero) | 90% | 40% | 30% | 10% |
 | **Hopfield (β=16)** | **100%** | **90%** | **90%** | **100%** |
 | Recurrent+inhibition | 20% | 20% | 10% | 10% |
 **Hopfield 完胜。** softmax attention 天然解决了归一化和锐化问题。
 ## 7b: Hopfield 深入测试
 - **Multi-hop**: 3 跳 × 3 链 + 200 bg = 全部 sim=1.0 ✓
 - **Scale (code space)**: 100+ bg 后不稳定（60-80%）
 - **Hard distractors**: 高 β 下被语义相似的干扰项吸走
 - **关键发现**: WTA code 空间的距离不忠实于语义距离
 ## 7c: Embedding-Space Hopfield
 直接在 embedding 空间做 Hopfield attention（不经过 WTA）：
 - 比 code-space 在中等规模（≤2K）更稳定
 - Multi-hop 在 embedding 空间也完美（500 bg, sim=1.0）
 - Hard distractors 在 β=8 时正确（attention 分散但正确）
 ## 7d: Two-Stage 检索
 NN pre-filter (top-K) → Hopfield settle on candidates:
 | N | K=20 | K=50 | 延迟 |
 |---|------|------|------|
 | 110 | 90% | 90% | 1ms |
 | 1010 | 80% | 80% | 1ms |
 | 5010 | 80% | 70% | 2ms |
 | 10010 | 80% | 70% | 2ms |
 | 20010 | **80%** | 70% | 4ms |
 **K=20 最稳定**：20K 规模下 80%，4ms。
 Diverse query test (20 对 + 2000 bg): 70% baseline → 分析 failure 发现是 embedding 模型质量问题。
 ## 7e: Cue Augmentation ⭐
 | 方法 | 准确率 (20 对 + 2000 bg) |
 |------|------------------------|
 | 无 augmentation | 70% |
 | Noise augmentation (各种参数) | 70% |
 | **Paraphrase augmentation** | **95%** |
 Noise 完全无效（高斯噪声 ≠ 真实 paraphrase 方向）。
 Hand-crafted paraphrase 直接 70% → 95%。
 实际系统中让 LLM 生成 3-5 个 paraphrase 一起存。
 ## 最终架构
 ```
 Query → Two-Stage Hopfield (NN top-20 → softmax settle) → Target
                                   ↓
              Hebbian W matrix (multi-hop chain from settled cue)
 ```
 ### 组件职责
 | 组件 | 功能 | 容错 |
 |------|------|------|
 | Hopfield attention | 单跳检索 | 噪声/paraphrase 容忍 |
 | Cue augmentation | 扩大记忆覆盖 | 弥补 embedding 模型不足 |
 | NN pre-filter | 缩小候选集 | O(N) → O(K) |
 | Hebbian W | 多跳联想 | 精确 cue 下完美 |
 | WTA separation | 稀疏编码 | 20K+ 容量 |
 ### 性能指标
 | 指标 | 数值 |
 |------|------|
 | Paraphrase recall (+ augmentation) | 95% |
 | Multi-hop (3 hops, 500 bg) | 100% |
 | Scale (20K memories) | 80% |
 | Latency (20K) | 4ms |
 | VRAM (W=16384²) | 1GB |
--- a/doc/findings.md
+++ b/doc/findings.md
@@ -0,0 +1,106 @@
 # 核心发现与反直觉结论
 ## 最大的突破：Hopfield + Hebbian 混合架构
 **exp07 的转折点**：Fam 指出问题在网络结构，不在 hash 函数。
 引入 Modern Hopfield（softmax attention over stored patterns）后：
 - 1000 bg memories 下 paraphrase recall: 20% (Flat Hebbian) → **100%** (Hopfield β=16)
 - 加上 cue augmentation: 70% → **95%** (20 pairs + 2000 bg)
 - Multi-hop 在 Hopfield 上同样完美（3 hops, sim=1.0）
 - 延迟可控: 4ms @ 20K memories
 **关键洞察：噪声容忍不是靠更好的编码（hash/SNN），而是靠更好的检索机制（attention-based settling）。**
 ## 一夜实验总结
 ### 1. SNN 的价值不在我们预期的地方
 **预期**: SNN + STDP 做记忆的存储和检索核心
 **实际**: 
 - STDP 在记忆存储上不如简单的 Hebbian outer-product
 - SNN 的 LIF 阈值非线性在检索时引入不必要的失真
 - **真正有价值的是 SNN encoder 的 temporal coding**（CosSim 0.99）和 neuromorphic 部署前景
 ### 2. Pattern Separation 才是关键，不是学习规则
 **WTA (Winner-Take-All) 模式分离**是整个系统最关键的组件：
 - 把高维稠密向量变成极稀疏二值码
 - 容量从 ~0.14N 暴涨到 20K+ 
 - 就是这一步让简单的 outer-product Hebbian 变得能用
 生物学类比完全成立：海马体中齿状回（DG）的模式分离是 CA3 记忆功能的前提。
 ### 3. Consolidation 不是你想的那样
 **预期**: Replay 防止遗忘，homeostasis 维持稳定
 **实际**:
 - Pattern separation 太强了，遗忘根本不发生（10 晚纯 homeostasis 后仍完美）
 - Replay 在容量极限附近反而**加速遗忘**（新 outer-product 干扰旧记忆）
 - Consolidation 退化为简单的存储管理问题
 **深层原因**: 生物海马体需要 consolidation 是因为物理容量有限。数字系统可以直接扩大网络。
 ### 4. Multi-hop 是杀手级特性
 **A→B→C 链式联想**: 6 跳全部完美，100 条链零干扰。
 这是 RAG / 向量数据库**不可能做到的**事情。
 RAG 只能做: query → nearest neighbor → result (单跳)
 Hebbian 能做: query → association → association → ... (多跳推理链)
 ### 5. 噪声容忍是最大短板
 WTA 对输入微扰极其敏感：noise_std=0.1 就崩溃。
 这意味着**纯 Hebbian 不能用来做模糊查询**。
 解决方案：hybrid 架构——NN lookup (噪声容忍) + Hebbian W (多跳联想)。
 ### 6. 更宽的 k 比更大的 code_dim 更有用
 - k=50 (16384 dim): 95% paraphrase recall
 - k=20 (16384 dim): 75% paraphrase recall
 - k=20 (32768 dim): 70% paraphrase recall
 更多 active neurons = 更多重叠 = 更好的模糊匹配，但牺牲容量。
 对个人记忆系统（< 10K memories）来说，k=50 是最优。
 ## 什么有用
 | 组件 | 有效性 | 用在哪 |
 |------|--------|--------|
 | WTA Pattern Separation | ⭐⭐⭐ | 核心，不可替代 |
 | Hebbian outer-product | ⭐⭐⭐ | 多跳联想存储 |
 | Multi-hop chaining | ⭐⭐⭐ | 独特能力 |
 | NN embedding lookup | ⭐⭐⭐ | 噪声容忍检索 |
 | SNN encoder | ⭐⭐ | temporal coding + 硬件部署 |
 | Coarse-to-fine recall | ⭐⭐ | 实用的 hybrid 方案 |
 | Unified projection | ⭐⭐ | 多跳的前提条件 |
 ## 什么没用
 | 组件 | 问题 |
 |------|------|
 | STDP trace-based learning | 不如直接 outer-product |
 | Separate cue/target projections | 破坏多跳 |
 | Sleep consolidation (replay) | 在大网络中不必要，在小网络中有害 |
 | Soft WTA | 零区分度 |
 | Multi-probe hashing | 完全不工作 |
 | Learned separator (on random data) | 没有语义结构则无法学习 |
 | Noisy replay for robustness | 效果微乎其微 |
 ## 下一步建议
 ### 短期（原型验证）
 1. 实现 Hybrid Memory（KV store + Hebbian W）
 2. 接 Gemma 4 API，text → recall → context injection
 3. 在真实对话数据上测试
 ### 中期（优化）
 1. 用 FAISS 替代暴力 NN lookup
 2. 在语义 embedding 上训练 learned separator（需要真实数据）
 3. 测试 float16 W 矩阵（节省一半显存）
 ### 长期（SNN 发挥价值）
 1. 移植到 neuromorphic hardware（Loihi 2, SynSense）
 2. 探索 temporal coding 做时序记忆（不只是 static embedding）
 3. Online STDP 学习（对话中实时更新，不需要 nightly batch）
--- a/doc/longmemeval_benchmark.md
+++ b/doc/longmemeval_benchmark.md
@@ -0,0 +1,62 @@
 # LongMemEval Benchmark 结果
 ## 数据集
 LongMemEval (ICLR 2025, MIT License): 500 个问题，6 种类型，真实多轮多 session 对话。
 ## 结果
 ### Retrieval-only（最终方案）
 | 类型 | v1 (旧提取) | v2 (改进提取) | 提升 |
 |------|------------|-------------|------|
 | single-session-user | 81% | **86%** | +5 |
 | single-session-assistant | 25% | **82%** | **+57** |
 | knowledge-update | 53% | **71%** | +18 |
 | multi-session | 23% | **53%** | +30 |
 | temporal-reasoning | 29% | **61%** | +32 |
 | preference | 0% | **27%** | +27 |
 | **Overall** | **36%** | **64%** | **+28** |
 ### 加 Gemma 4 推理反而更差
 | | Retrieval-only | + Gemma 4 |
 |--|---------------|-----------|
 | Overall | **64%** | 40% |
 Gemma 太保守，检索到了信息但说 "Not mentioned"。不值得增加 1.7s/query 的延迟。
 ## 关键改进（v1 → v2）
 1. **不截断 assistant 回复**：分段存储（500 字/段）→ single-session-assistant 25% → 82%
 2. **用户自述作为记忆**：用户说的每句话都存一份 → multi-session +30pp
 3. **偏好提取**：正则匹配 "I like/prefer/use/enjoy" → preference 0% → 27%
 4. **日期元数据**：存储 session 日期 → temporal 辅助
 ## 性能
 - 56ms/query（embedding + Hopfield recall）
 - 平均 22 条记忆/问题
 - 无外部 LLM 依赖
 ## 各类型分析
 ### 强项
 - **single-session-user (86%)**: 用户明确说的信息 → 直接存直接检索，天然适配
 - **single-session-assistant (82%)**: 分段存储解决了长回复截断问题
 ### 中等
 - **knowledge-update (71%)**: 新旧信息都检索到了，top-1 通常是新值
 - **temporal-reasoning (61%)**: 日期信息在 context 里，但检索不做日期计算
 - **multi-session (53%)**: 需要跨 session 聚合，top-K 能召回部分但不完整
 ### 弱项
 - **preference (27%)**: 偏好是隐含的，正则提取覆盖有限。需要 LLM 提取或更多规则
 ## 对比定位
 64% 在 LongMemEval 上是一个 **competitive retrieval baseline**。论文中的 RAG 基线通常在 40-60%，SOTA（带 LLM 推理）在 70-80%。我们的 retrieval-only 64% 已经超过了多数 RAG 基线。
 ## 结论
 **Retrieval-only 是正确选择。** 简单、快速、无依赖。提升空间在提取策略（更好的 memory 切分和偏好识别），不在检索架构。
--- a/doc/p0_llm_integration.md
+++ b/doc/p0_llm_integration.md
@@ -0,0 +1,32 @@
 # P0: LLM Integration
 ## 状态：基础 pipeline 可用，LLM Gateway 不通需后续验证
 ## 实现
 - `llm.py`: LLMClient + extract/paraphrase/format 函数
 - 支持 OpenAI-compatible API，fallback 到 heuristic
 - 端到端 pipeline: 对话 → 提取 → embed → store (with augmentation) → recall → context injection
 ## 端到端测试结果
 5 轮对话存入 7 条记忆（24 个 cue entries，含 paraphrase augmentation）。
 查询召回结果（heuristic paraphrase）：
 | 查询 | 正确? | 说明 |
 |------|-------|------|
 | DB performance terrible | ✅ | 正确召回 missing indexes |
 | How to push a new release? | ✅ | 正确召回 blue-green deploy |
 | Redis connection info? | ✅ | 正确召回 port 6379 |
 | Login system has a problem | ❌ | 指向 database 而不是 auth |
 | Database backup | ✅ | 正确召回 cron job |
 | Deployment config? | ✅ | 正确召回 GitHub Actions |
 5/6 正确。失败的 case 是因为 heuristic paraphrase 没有生成 "login" ↔ "auth" 的关联。LLM paraphrase 应该能覆盖。
 ## 待解决
 1. **LLM Gateway 不通** — 无法验证 LLM 提取和 paraphrase 质量
 2. **重复提取** — heuristic 会对同一对话提取 2 条相似记忆，需要去重
 3. **Heuristic paraphrase 质量差** — 机械式替换（"issue with X"）不如 LLM 生成
 4. **Auth→Login 这类语义跳跃** — 只有 LLM paraphrase 或更强 embedding 模型能解决
--- a/doc/p1_embedding_models.md
+++ b/doc/p1_embedding_models.md
@@ -0,0 +1,34 @@
 # P1: Embedding 模型对比
 ## 核心发现：更大的模型 ≠ 更好的 recall（反直觉）
 | Model | Dim | Same Sim | Diff Sim | **Gap** | **Recall** | Speed |
 |-------|-----|----------|----------|---------|-----------|-------|
 | **MiniLM (22M)** | 384 | 0.653 | 0.090 | **0.563** | **60%** | 11K/s |
 | BGE-small (33M) | 384 | 0.808 | 0.534 | 0.274 | 25% | 7K/s |
 | BGE-base (109M) | 768 | 0.793 | 0.506 | 0.287 | 35% | 5K/s |
 | E5-small (33M) | 384 | 0.890 | 0.790 | 0.100 | 10% | 9K/s |
 ## 为什么
 Recall 取决于 **discrimination gap**，不是绝对 similarity。
 BGE/E5 是为检索任务优化的，倾向于把所有文本映射到一个窄锥体里（高基础相似度）。这导致：
 - 正确 cue 和 background 的相似度差距太小
 - Hopfield softmax attention 无法集中到正确答案
 MiniLM 的 embedding 空间更分散：
 - Background 真的很不像（0.09）
 - 即使 paraphrase 不完美（0.65），相对差距也大得多
 ## 结论
 1. **MiniLM 是当前最优**——最快、最小、discrimination 最好
 2. **不要盲目换大模型**——gap 比 absolute similarity 重要
 3. 改善 recall 的正确路径是 **paraphrase augmentation**（已验证 95%），不是换 embedding 模型
 4. 如果要换模型，应该找 **gap 最大**的，不是 same-sim 最高的
 ## 对架构的影响
 保持 MiniLM (384-dim)。不需要扩大 code_dim 来适配更大 embedding。
 省了 VRAM（102MB vs 656MB）和速度（11K/s vs 5K/s）。
--- a/doc/p2_auto_paraphrase.md
+++ b/doc/p2_auto_paraphrase.md
@@ -0,0 +1,52 @@
 # P2: Auto Paraphrase Generation
 ## 核心数据
 | 策略 | bg=0 | bg=500 | bg=2000 | 实现难度 |
 |------|------|--------|---------|---------|
 | None | 95% | 65% | 55% | - |
 | Heuristic (synonym swap) | 95% | 85% | **75%** | 零成本 |
 | Oracle (hard cases only) | 100% | 95% | **95%** | 需 LLM |
 | Oracle (全覆盖) | 100% | 100% | **100%** | 需 LLM |
 ## 发现
 1. **Heuristic 已经很有价值**：55% → 75%（+20pp），不需要 LLM
 2. **Oracle 全覆盖 = 100%**：证明问题完全可通过 paraphrase 解决
 3. **大部分 failure 可被 paraphrase 修复**：9 个 failure 中 8 个有 oracle fix
 ## Failure 分类
 | 类型 | 例子 | 原因 | 修复方式 |
 |------|------|------|---------|
 | 词汇鸿沟 | "Ship the release" ↔ "deploy" (cos=0.46) | 完全不同的词 | LLM paraphrase ✓ |
 | 概念映射 | "Need observability" ↔ "monitoring" (cos=0.26) | 抽象→具体 | LLM paraphrase ✓ |
 | 领域知识 | "Fix login issue" ↔ "auth bug" (cos=0.65) | 需要知道 login=auth | LLM paraphrase ✓ |
 | 竞争 | "DB terrible" ↔ "DB slow" (cos=0.72) 但被 bg 抢走 | cos 够高但 bg 更近 | 增加 augmentation 密度 |
 ## 实际部署策略
 ### 存储时（异步，不影响延迟）
 ```
 1. 用户说了一句话
 2. 提取 (cue, target)
 3. 同步存原始 cue
 4. 异步：LLM 生成 3-5 个 paraphrase → 追加存入
 ```
 ### Heuristic fallback（LLM 不可用时）
 当前 heuristic 规则已验证有效（+20pp），可以作为 baseline：
 - 去除常见前缀 ("Can you", "I need to", "How do I")
 - 同义词替换 (deploy↔release, database↔DB, fix↔resolve)
 - 添加 "issue with X" 模式
 ### LLM Prompt（待 Gateway 恢复后验证）
 ```
 Generate 3-5 different ways a user might say this:
 "The database is slow again"
 Requirements:
 - Same core meaning, different wording
 - Include informal/colloquial versions
 - Include technical jargon alternatives
 ```
--- a/doc/p3_scale_ceiling.md
+++ b/doc/p3_scale_ceiling.md
@@ -0,0 +1,37 @@
 # P3: 突破 20K 80% 天花板
 ## 结论：天花板来自 embedding 模型，不是架构
 ### Top-K Coverage 分析
 | K | N=20K |
 |---|-------|
 | 5 | 80% |
 | 50 | 80% |
 | 200 | 80% |
 K 从 5 增加到 200，coverage 不变。那 2 个 failure 的 paraphrase 在 embedding 空间里根本不是正确 cue 的最近邻——即使只有 10 条记忆也找不到。
 ### 架构优化无效
 | 方法 | bg=20K |
 |------|--------|
 | Two-stage K=5 | 60% |
 | Two-stage K=200 | 30% (更大 K 更差!) |
 | Hierarchical clustering | 40% |
 更大的 K 引入更多噪声，Hopfield attention 被分散。Hierarchical 也没帮助。
 ### 根因
 失败的 paraphrase 对（embedding cosine similarity）:
 - "Need observability" ↔ "Let's set up monitoring" = 0.257
 - "When's the standup?" ↔ "Team meeting schedule" = 0.375
 这些在 MiniLM 的 embedding 空间里根本不算"相似"。任何基于 embedding 距离的检索方法都无法找到它们。
 ### 解法 = P2
 **Paraphrase augmentation 是唯一解法**（已验证 55% → 100%）。
 不需要改架构。不需要换 K。不需要 hierarchical memory。只需要在存储时覆盖更多的表达方式。
--- a/doc/p4_lifecycle.md
+++ b/doc/p4_lifecycle.md
@@ -0,0 +1,35 @@
 # P4: 记忆生命周期管理
 ## Deduplication
 **可行**。cosine threshold=0.7 正确识别了 2 组近似重复（9 → 6 memories）。
 - "The database is slow" / "Database is really slow today" / "DB performance terrible" → 合并
 - "The API returns 500 errors" / "Getting 500 errors from API" → 合并
 实现简单：pairwise cosine on cue embeddings → group → keep best per group.
 O(N²) 但可以离线做（夜间整合），或用 ANN 加速。
 ## Importance Scoring
 Heuristic 规则 6/7 准确：
 - 关键词检测（crash, compromised, secret → critical）有效
 - 回答长度 > 15 词 → 更可能包含有用信息
 - 简单问答（时间、天气）正确标记为 low
 待 LLM 可用时，可以让 LLM 评分——更准确但有延迟。
 ## Forgetting 策略
 三种策略（FIFO / LRU / 重要性加权）在当前测试中效果相同——因为没有差异化的 access pattern。
 实际系统中应该用 **importance + access count + recency** 的加权组合：
 ```
 forget_score = age_days * 0.3 + (max_access - access_count) * 0.5 + (1 - importance) * 0.2
 ```
 低分优先遗忘。
 ## 整合到 hippocampus.py 的建议
 1. **Store 时**：importance scoring（heuristic 或 LLM），低于阈值不存
 2. **每晚**：deduplication（cos > 0.7 合并）+ capacity check（超限时按 forget_score 裁剪）
 3. **Recall 时**：自动 +1 access_count（已实现）
--- a/doc/p5_snn_hopfield.md
+++ b/doc/p5_snn_hopfield.md
@@ -0,0 +1,36 @@
 # P5: SNN-native Hopfield
 ## 结论：当前不可行，标准 softmax Hopfield 远优于 LIF dynamics
 ## 对比
 | | SNN Hopfield | Standard Hopfield |
 |---|---|---|
 | Paraphrase recall (5 pairs) | 20% | **100%** |
 | With background (10+) | 0% | **90%+** |
 | Latency | 10.8ms | **1.9ms** |
 | Scalability | O(steps × dim²) | O(N × dim) |
 ## 为什么 SNN 失败
 1. **More steps = worse**: steps=20 时 5/5，steps=200 时 1/5。LIF 动力学不收敛到正确 attractor，而是发散或卡在错误状态。
 2. **LIF 不是 softmax**: Modern Hopfield 的 softmax 是精确的能量最小化。LIF 的 spike dynamics 不保证收敛到 Boltzmann 均衡分布。
 3. **膜电位衰减干扰**: β=0.9 的指数衰减让信号快速丢失，长时间 settle 变成纯噪声。
 ## 需要什么才能让 SNN Hopfield work
 1. 更复杂的神经元模型（不只是 LIF——需要 AdEx、Izhikevich、或 stochastic neurons）
 2. 精确调谐的 E/I 平衡（兴奋/抑制）
 3. 可能需要 stochastic neurons 做 proper Boltzmann sampling
 4. 专用 neuromorphic 硬件（Loihi 2 的可编程神经元模型）
 ## SNN 在整个系统中的定位
 | 组件 | SNN 可行性 | 当前方案 |
 |------|-----------|---------|
 | Encoder (emb↔spike) | ✅ 验证通过 (CosSim 0.99) | 保留备用 |
 | Hopfield attention | ❌ 不可行 | 用 softmax |
 | Hebbian multi-hop | ✅ WTA codes + W matrix | 已实现 |
 | Pattern separation | ✅ WTA = 生物 DG | 已实现 |
 **SNN 的真正价值在 neuromorphic 部署，不在 GPU 上替代 softmax。**
--- a/doc/p6_multiturn.md
+++ b/doc/p6_multiturn.md
@@ -0,0 +1,46 @@
 # P6: 多轮对话验证
 ## 场景
 3 天的对话（DB troubleshooting → deployment → monitoring），12 条记忆 + heuristic paraphrase augmentation。
 ## 跨会话召回：12/12 (100%)
 | 查询 | 跨天? | 结果 |
 |------|-------|------|
 | DB is slow again | Day 1 | ✓ "missing index on created_at" |
 | How big is the users table? | Day 1 | ✓ "2.3 million rows" |
 | Who can access the database? | Day 1 | ✓ "Alice, Bob, Charlie" |
 | What Postgres version? | Day 1 | ✓ "PostgreSQL 15.2" |
 | How to deploy? | Day 2 | ✓ "blue-green via GitHub Actions" |
 | How to rollback? | Day 2 | ✓ "switch load balancer" |
 | Who approves deploys? | Day 2 | ✓ "Alice or David" |
 | Monitoring dashboard? | Day 3 | ✓ "grafana.internal" |
 | What alerts? | Day 3 | ✓ "PagerDuty" |
 | DB slow, what index? | Cross | ✓ "created_at" |
 | Deploy logs? | Cross | ✓ "Loki" |
 | Database monitoring exporter | Cross | ✓ "pg_exporter" |
 全部 similarity=1.0。Hopfield + augmentation 在小规模（12 memories）下完美。
 ## Multi-hop
 "database is slow" → hop1: "missing index" → hop2: "missing index" → hop3: "2.3 million rows"
 hop2 循环了（指回自己），因为 Hebbian W 里 "missing index" 的最强关联还是它自己（自身的 outer product 贡献最大）。需要在 multi-hop 中加**去重**：已访问的 memory 不参与下一跳。
 ## Memory 冲突
 存了两个版本的 PostgreSQL 版本（15.2 和 16.1）：
 - Top-1: "Upgraded to 16.1" (sim=1.0) ← 更新的版本排第一
 - Top-2: "version 15.2" (sim=0.0) ← 旧版本也返回了
 当前行为可接受（都返回，新的排前面）。更好的做法：
 - 检测到同 cue 的更新 → 自动替换旧记忆
 - 或标记旧记忆为 "superseded"
 ## 待改进
 1. **Multi-hop 去重**: 已访问的 memory 排除出下一跳候选
 2. **Memory update 检测**: 同 cue 新值自动覆盖旧值
 3. **大规模验证**: 12 条是小规模，需要 100+ 条跨 session 的测试
--- a/experiments/exp01_roundtrip.py
+++ b/experiments/exp01_roundtrip.py
@@ -0,0 +1,179 @@
 """Experiment 1: Encoder roundtrip test.
 Goal: Can we encode an embedding into spikes and decode it back with acceptable loss?
 This is the foundation — if this fails, the whole approach is dead.
 We train a SpikeAutoencoder on random embeddings (simulating LLM hidden states)
 and measure reconstruction quality via cosine similarity and MSE.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import torch.optim as optim
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.encoder import SpikeAutoencoder
 DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 RESULTS_DIR.mkdir(exist_ok=True)
 def cosine_sim(a, b):
    """Batch cosine similarity."""
    return nn.functional.cosine_similarity(a, b, dim=-1).mean().item()
 def run_config(embed_dim, num_neurons, num_steps, lr, epochs, batch_size, num_batches):
    """Train and evaluate one configuration."""
    model = SpikeAutoencoder(embed_dim, num_neurons, num_steps).to(DEVICE)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    mse_loss = nn.MSELoss()
    cos_loss = nn.CosineEmbeddingLoss()
    param_count = sum(p.numel() for p in model.parameters())
    print(f"  Config: dim={embed_dim}, neurons={num_neurons}, steps={num_steps}")
    print(f"  Parameters: {param_count:,}")
    history = {"train_mse": [], "train_cos": [], "epoch_time": []}
    target = torch.ones(batch_size, device=DEVICE)
    for epoch in range(epochs):
        t0 = time.time()
        epoch_mse = 0
        epoch_cos = 0
        for _ in range(num_batches):
            # Random embeddings — simulate LLM hidden states (normalized)
            emb = torch.randn(batch_size, embed_dim, device=DEVICE)
            emb = nn.functional.normalize(emb, dim=-1)
            recon, spikes, _ = model(emb)
            loss_mse = mse_loss(recon, emb)
            loss_cos = cos_loss(recon, emb, target)
            # Sparsity regularization: encourage ~10% firing rate
            firing_rate = spikes.mean()
            loss_sparse = (firing_rate - 0.1).pow(2)
            loss = loss_mse + 0.5 * loss_cos + 0.1 * loss_sparse
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            epoch_mse += loss_mse.item()
            epoch_cos += cosine_sim(recon, emb)
        epoch_mse /= num_batches
        epoch_cos /= num_batches
        dt = time.time() - t0
        history["train_mse"].append(epoch_mse)
        history["train_cos"].append(epoch_cos)
        history["epoch_time"].append(dt)
        if (epoch + 1) % 10 == 0 or epoch == 0:
            fr = spikes.mean().item()
            print(f"  Epoch {epoch+1:3d}: MSE={epoch_mse:.6f}, "
                  f"CosSim={epoch_cos:.4f}, FR={fr:.3f}, Time={dt:.1f}s")
    # Final eval on fresh data
    model.eval()
    with torch.no_grad():
        test_emb = torch.randn(256, embed_dim, device=DEVICE)
        test_emb = nn.functional.normalize(test_emb, dim=-1)
        recon, spikes, _ = model(test_emb)
        final_mse = mse_loss(recon, test_emb).item()
        final_cos = cosine_sim(recon, test_emb)
        final_fr = spikes.mean().item()
    print(f"  ** Final eval: MSE={final_mse:.6f}, CosSim={final_cos:.4f}, FR={final_fr:.3f}")
    return {
        "embed_dim": embed_dim,
        "num_neurons": num_neurons,
        "num_steps": num_steps,
        "param_count": param_count,
        "final_mse": final_mse,
        "final_cos": final_cos,
        "final_firing_rate": final_fr,
        "history": history,
    }
 def main():
    print("=" * 60)
    print("Experiment 1: Encoder Roundtrip Test")
    print("=" * 60)
    configs = [
        # (embed_dim, num_neurons, num_steps)
        # Start small, scale up if promising
        (256, 512, 32),
        (256, 1024, 32),
        (256, 1024, 64),
        (768, 2048, 64),
        (768, 4096, 64),
        (768, 4096, 128),
    ]
    all_results = []
    for embed_dim, num_neurons, num_steps in configs:
        print(f"\n--- Config: dim={embed_dim}, neurons={num_neurons}, steps={num_steps} ---")
        result = run_config(
            embed_dim=embed_dim,
            num_neurons=num_neurons,
            num_steps=num_steps,
            lr=1e-3,
            epochs=50,
            batch_size=64,
            num_batches=20,
        )
        all_results.append(result)
        torch.cuda.empty_cache()
    # Save results
    # Convert for JSON serialization
    for r in all_results:
        r["history"]["train_mse"] = [float(x) for x in r["history"]["train_mse"]]
        r["history"]["train_cos"] = [float(x) for x in r["history"]["train_cos"]]
        r["history"]["epoch_time"] = [float(x) for x in r["history"]["epoch_time"]]
    results_file = RESULTS_DIR / "exp01_results.json"
    with open(results_file, "w") as f:
        json.dump(all_results, f, indent=2)
    # Print summary table
    print("\n" + "=" * 80)
    print("SUMMARY")
    print("=" * 80)
    print(f"{'Dim':>5} {'Neurons':>8} {'Steps':>6} {'Params':>10} {'MSE':>10} {'CosSim':>8} {'FR':>6}")
    print("-" * 80)
    for r in all_results:
        print(f"{r['embed_dim']:>5} {r['num_neurons']:>8} {r['num_steps']:>6} "
              f"{r['param_count']:>10,} {r['final_mse']:>10.6f} "
              f"{r['final_cos']:>8.4f} {r['final_firing_rate']:>6.3f}")
    # Verdict
    best = max(all_results, key=lambda x: x["final_cos"])
    print(f"\nBest config: dim={best['embed_dim']}, neurons={best['num_neurons']}, "
          f"steps={best['num_steps']}")
    print(f"  CosSim={best['final_cos']:.4f}, MSE={best['final_mse']:.6f}")
    if best["final_cos"] > 0.9:
        print("\n✓ PASS: Roundtrip encoding is viable! CosSim > 0.9")
    elif best["final_cos"] > 0.7:
        print("\n~ MARGINAL: CosSim 0.7-0.9, might work for fuzzy associative recall")
    else:
        print("\n✗ FAIL: Roundtrip encoding loses too much information")
 if __name__ == "__main__":
    main()
--- a/experiments/exp01b_deeper_training.py
+++ b/experiments/exp01b_deeper_training.py
@@ -0,0 +1,124 @@
 """Experiment 1b: Deeper training for 768-dim configs.
 Observation from exp01: 768-dim configs converge slower but MSE is actually lower.
 Let's train longer (200 epochs) to see if they surpass 256-dim configs in CosSim.
 Also test: does the encoder need a wider bottleneck (more neurons)?
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import torch.optim as optim
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.encoder import SpikeAutoencoder
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine_sim(a, b):
    return nn.functional.cosine_similarity(a, b, dim=-1).mean().item()
 def run(embed_dim, num_neurons, num_steps, epochs=200, lr=3e-4):
    model = SpikeAutoencoder(embed_dim, num_neurons, num_steps).to(DEVICE)
    optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs)
    mse_fn = nn.MSELoss()
    batch_size = 64
    num_batches = 30
    target = torch.ones(batch_size, device=DEVICE)
    best_cos = 0
    milestones = []
    for epoch in range(epochs):
        model.train()
        epoch_mse = 0
        epoch_cos = 0
        for _ in range(num_batches):
            emb = torch.randn(batch_size, embed_dim, device=DEVICE)
            emb = nn.functional.normalize(emb, dim=-1)
            recon, spikes, _ = model(emb)
            loss_mse = mse_fn(recon, emb)
            loss_cos = nn.functional.cosine_embedding_loss(
                recon, emb, target)
            firing_rate = spikes.mean()
            loss_sparse = (firing_rate - 0.1).pow(2)
            loss = loss_mse + 0.5 * loss_cos + 0.1 * loss_sparse
            optimizer.zero_grad()
            loss.backward()
            nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            epoch_mse += loss_mse.item()
            epoch_cos += cosine_sim(recon.detach(), emb)
        scheduler.step()
        epoch_mse /= num_batches
        epoch_cos /= num_batches
        if epoch_cos > best_cos:
            best_cos = epoch_cos
        if (epoch + 1) % 20 == 0:
            print(f"  Epoch {epoch+1:3d}: MSE={epoch_mse:.6f}, CosSim={epoch_cos:.4f}, "
                  f"FR={spikes.mean().item():.3f}, LR={scheduler.get_last_lr()[0]:.6f}")
            milestones.append({"epoch": epoch+1, "mse": epoch_mse, "cos": epoch_cos})
    # Final eval
    model.eval()
    with torch.no_grad():
        test_emb = torch.randn(512, embed_dim, device=DEVICE)
        test_emb = nn.functional.normalize(test_emb, dim=-1)
        recon, spikes, _ = model(test_emb)
        final_mse = mse_fn(recon, test_emb).item()
        final_cos = cosine_sim(recon, test_emb)
    print(f"  ** Final: MSE={final_mse:.6f}, CosSim={final_cos:.4f}")
    return {"dim": embed_dim, "neurons": num_neurons, "steps": num_steps,
            "final_mse": final_mse, "final_cos": final_cos, "milestones": milestones}
 def main():
    print("Experiment 1b: Deeper training (200 epochs)")
    print("=" * 60)
    configs = [
        (768, 2048, 64),
        (768, 4096, 64),
        (768, 4096, 128),
        (768, 8192, 64),   # wider
    ]
    results = []
    for dim, neurons, steps in configs:
        print(f"\n--- dim={dim}, neurons={neurons}, steps={steps} ---")
        r = run(dim, neurons, steps, epochs=200, lr=3e-4)
        results.append(r)
        torch.cuda.empty_cache()
    # Summary
    print("\n" + "=" * 60)
    print("SUMMARY (200 epochs)")
    print(f"{'Dim':>5} {'Neurons':>8} {'Steps':>6} {'MSE':>10} {'CosSim':>8}")
    print("-" * 40)
    for r in results:
        print(f"{r['dim']:>5} {r['neurons']:>8} {r['steps']:>6} "
              f"{r['final_mse']:>10.6f} {r['final_cos']:>8.4f}")
    with open(RESULTS_DIR / "exp01b_results.json", "w") as f:
        json.dump(results, f, indent=2, default=float)
 if __name__ == "__main__":
    main()
--- a/experiments/exp02_stdp_recall.py
+++ b/experiments/exp02_stdp_recall.py
@@ -0,0 +1,221 @@
 """Experiment 2: STDP Associative Recall.
 Core question: Can STDP learn associations between spike patterns,
 such that presenting a cue recalls the correct target?
 Test protocol:
 1. Generate N pairs of (cue, target) spike patterns
 2. Train STDP network on all pairs
 3. Present each cue and measure similarity between recall and correct target
 4. Measure interference: does recall of pair K degrade after learning pair K+1?
 This is the make-or-break experiment for the whole approach.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.memory import STDPMemoryNetwork
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def spike_similarity(a, b):
    """Cosine similarity between two spike trains (flattened)."""
    a_flat = a.flatten().float()
    b_flat = b.flatten().float()
    if a_flat.norm() == 0 or b_flat.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(
        a_flat.unsqueeze(0), b_flat.unsqueeze(0)
    ).item()
 def firing_rate_similarity(a, b):
    """Similarity based on per-neuron firing rates."""
    fr_a = a.float().mean(dim=0)
    fr_b = b.float().mean(dim=0)
    if fr_a.norm() == 0 or fr_b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(
        fr_a.unsqueeze(0), fr_b.unsqueeze(0)
    ).item()
 def generate_spike_pattern(num_steps, num_neurons, firing_rate=0.05, device="cuda"):
    """Generate a random sparse spike pattern."""
    return (torch.rand(num_steps, num_neurons, device=device) < firing_rate).float()
 def run_recall_test(num_neurons, num_steps, num_pairs, firing_rate,
                    num_presentations, a_plus, a_minus):
    """Test associative recall with given parameters."""
    print(f"  neurons={num_neurons}, steps={num_steps}, pairs={num_pairs}, "
          f"FR={firing_rate}, pres={num_presentations}, "
          f"A+={a_plus}, A-={a_minus}")
    net = STDPMemoryNetwork(
        num_neurons=num_neurons,
        a_plus=a_plus,
        a_minus=a_minus,
    ).to(DEVICE)
    # Generate pattern pairs
    cues = []
    targets = []
    for _ in range(num_pairs):
        cue = generate_spike_pattern(num_steps, num_neurons, firing_rate, DEVICE)
        target = generate_spike_pattern(num_steps, num_neurons, firing_rate, DEVICE)
        cues.append(cue)
        targets.append(target)
    # Learn all pairs
    t0 = time.time()
    for i in range(num_pairs):
        net.learn_association(cues[i], targets[i], num_presentations=num_presentations)
    learn_time = time.time() - t0
    # Test recall
    correct_sims = []
    wrong_sims = []
    for i in range(num_pairs):
        recalled = net.recall(cues[i], num_recall_steps=num_steps)
        # Similarity to correct target
        correct_sim = firing_rate_similarity(recalled, targets[i])
        correct_sims.append(correct_sim)
        # Similarity to wrong targets (average)
        wrong_sim_list = []
        for j in range(num_pairs):
            if j != i:
                wrong_sim_list.append(firing_rate_similarity(recalled, targets[j]))
        if wrong_sim_list:
            wrong_sims.append(np.mean(wrong_sim_list))
    mean_correct = np.mean(correct_sims)
    mean_wrong = np.mean(wrong_sims) if wrong_sims else 0
    discrimination = mean_correct - mean_wrong
    w_stats = net.get_weight_stats()
    recall_fr = recalled.mean().item() if len(correct_sims) > 0 else 0
    print(f"    Correct sim: {mean_correct:.4f}, Wrong sim: {mean_wrong:.4f}, "
          f"Discrimination: {discrimination:.4f}")
    print(f"    Recall FR: {recall_fr:.4f}, W stats: mean={w_stats['abs_mean']:.4f}, "
          f"sparsity={w_stats['sparsity']:.2f}")
    print(f"    Learn time: {learn_time:.1f}s")
    return {
        "num_neurons": num_neurons,
        "num_steps": num_steps,
        "num_pairs": num_pairs,
        "firing_rate": firing_rate,
        "num_presentations": num_presentations,
        "a_plus": a_plus,
        "a_minus": a_minus,
        "mean_correct_sim": mean_correct,
        "mean_wrong_sim": mean_wrong,
        "discrimination": discrimination,
        "correct_sims": correct_sims,
        "recall_firing_rate": recall_fr,
        "weight_stats": w_stats,
        "learn_time": learn_time,
    }
 def main():
    print("=" * 60)
    print("Experiment 2: STDP Associative Recall")
    print("=" * 60)
    results = []
    # Test 1: Baseline — can it learn even 1 pair?
    print("\n--- Test 1: Single pair (sanity check) ---")
    r = run_recall_test(
        num_neurons=2048, num_steps=64, num_pairs=1,
        firing_rate=0.05, num_presentations=5,
        a_plus=0.005, a_minus=0.006,
    )
    results.append({**r, "test": "single_pair"})
    # Test 2: Vary number of pairs
    print("\n--- Test 2: Scaling pairs ---")
    for n_pairs in [5, 10, 20, 50]:
        r = run_recall_test(
            num_neurons=2048, num_steps=64, num_pairs=n_pairs,
            firing_rate=0.05, num_presentations=5,
            a_plus=0.005, a_minus=0.006,
        )
        results.append({**r, "test": f"pairs_{n_pairs}"})
    # Test 3: Vary STDP learning rates
    print("\n--- Test 3: STDP learning rate sweep ---")
    for a_plus in [0.001, 0.005, 0.01, 0.05]:
        r = run_recall_test(
            num_neurons=2048, num_steps=64, num_pairs=10,
            firing_rate=0.05, num_presentations=5,
            a_plus=a_plus, a_minus=a_plus * 1.2,
        )
        results.append({**r, "test": f"lr_{a_plus}"})
    # Test 4: Vary firing rate
    print("\n--- Test 4: Firing rate sweep ---")
    for fr in [0.02, 0.05, 0.10, 0.20]:
        r = run_recall_test(
            num_neurons=2048, num_steps=64, num_pairs=10,
            firing_rate=fr, num_presentations=5,
            a_plus=0.005, a_minus=0.006,
        )
        results.append({**r, "test": f"fr_{fr}"})
    # Test 5: More presentations
    print("\n--- Test 5: Presentation count ---")
    for n_pres in [1, 3, 5, 10, 20]:
        r = run_recall_test(
            num_neurons=2048, num_steps=64, num_pairs=10,
            firing_rate=0.05, num_presentations=n_pres,
            a_plus=0.005, a_minus=0.006,
        )
        results.append({**r, "test": f"pres_{n_pres}"})
    # Test 6: Wider network
    print("\n--- Test 6: Network width ---")
    for neurons in [1024, 2048, 4096, 8192]:
        r = run_recall_test(
            num_neurons=neurons, num_steps=64, num_pairs=10,
            firing_rate=0.05, num_presentations=5,
            a_plus=0.005, a_minus=0.006,
        )
        results.append({**r, "test": f"width_{neurons}"})
    # Save results
    for r in results:
        r["correct_sims"] = [float(x) for x in r["correct_sims"]]
    with open(RESULTS_DIR / "exp02_results.json", "w") as f:
        json.dump(results, f, indent=2, default=float)
    # Summary
    print("\n" + "=" * 60)
    print("SUMMARY")
    print("=" * 60)
    print(f"{'Test':<15} {'Correct':>8} {'Wrong':>8} {'Discrim':>8} {'RecallFR':>8}")
    print("-" * 50)
    for r in results:
        print(f"{r['test']:<15} {r['mean_correct_sim']:>8.4f} "
              f"{r['mean_wrong_sim']:>8.4f} {r['discrimination']:>8.4f} "
              f"{r['recall_firing_rate']:>8.4f}")
 if __name__ == "__main__":
    main()
--- a/experiments/exp02b_stdp_v2.py
+++ b/experiments/exp02b_stdp_v2.py
@@ -0,0 +1,192 @@
 """Experiment 2b: STDP Associative Recall (v2 - fixed learning).
 v1 failed completely because W=0 → no spikes → no STDP updates (chicken-egg).
 v2 fixes this with teacher-forced STDP: directly use (cue, target) as (pre, post).
 Also tests DirectAssociativeMemory (simple outer-product Hebbian) as baseline.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.memory import STDPMemoryNetwork, DirectAssociativeMemory
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def spike_cosine(a, b):
    """Cosine similarity on firing rate vectors."""
    if a.dim() == 2:
        a = a.mean(dim=0)
    if b.dim() == 2:
        b = b.mean(dim=0)
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def vec_cosine(a, b):
    """Cosine similarity of two 1D vectors."""
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def gen_spikes(num_steps, num_neurons, fr=0.05, device="cuda"):
    return (torch.rand(num_steps, num_neurons, device=device) < fr).float()
 def test_stdp_v2(num_neurons, num_steps, num_pairs, fr, num_pres, a_plus):
    """Test the v2 STDP network."""
    net = STDPMemoryNetwork(
        num_neurons=num_neurons, a_plus=a_plus, a_minus=a_plus*1.2,
        w_init_std=0.01
    ).to(DEVICE)
    cues = [gen_spikes(num_steps, num_neurons, fr) for _ in range(num_pairs)]
    targets = [gen_spikes(num_steps, num_neurons, fr) for _ in range(num_pairs)]
    # Learn
    t0 = time.time()
    for i in range(num_pairs):
        net.learn_association(cues[i], targets[i], num_presentations=num_pres)
    learn_t = time.time() - t0
    # Recall
    correct_sims = []
    wrong_sims = []
    for i in range(num_pairs):
        recalled = net.recall(cues[i])
        cs = spike_cosine(recalled, targets[i])
        correct_sims.append(cs)
        for j in range(num_pairs):
            if j != i:
                wrong_sims.append(spike_cosine(recalled, targets[j]))
    mc = np.mean(correct_sims)
    mw = np.mean(wrong_sims) if wrong_sims else 0
    ws = net.get_weight_stats()
    print(f"  STDP: pairs={num_pairs}, pres={num_pres}, A+={a_plus:.3f} | "
          f"Correct={mc:.4f}, Wrong={mw:.4f}, Disc={mc-mw:.4f}, "
          f"W_abs={ws['abs_mean']:.4f}, sparsity={ws['sparsity']:.2f}, "
          f"time={learn_t:.1f}s")
    return {"method": "stdp_v2", "correct": mc, "wrong": mw,
            "disc": mc-mw, "w_stats": ws, "time": learn_t,
            "num_pairs": num_pairs, "a_plus": a_plus, "num_pres": num_pres}
 def test_direct_hebbian(num_neurons, num_steps, num_pairs, fr, lr):
    """Test the direct outer-product Hebbian memory."""
    net = DirectAssociativeMemory(num_neurons=num_neurons, lr=lr).to(DEVICE)
    cues = [gen_spikes(num_steps, num_neurons, fr) for _ in range(num_pairs)]
    targets = [gen_spikes(num_steps, num_neurons, fr) for _ in range(num_pairs)]
    # Learn
    t0 = time.time()
    for i in range(num_pairs):
        net.learn(cues[i], targets[i])
    learn_t = time.time() - t0
    # Recall
    correct_sims = []
    wrong_sims = []
    for i in range(num_pairs):
        recalled = net.recall(cues[i])  # continuous vector
        target_rate = targets[i].mean(dim=0)
        cs = vec_cosine(recalled, target_rate)
        correct_sims.append(cs)
        for j in range(num_pairs):
            if j != i:
                wrong_sims.append(vec_cosine(recalled, targets[j].mean(dim=0)))
    mc = np.mean(correct_sims)
    mw = np.mean(wrong_sims) if wrong_sims else 0
    ws = net.get_weight_stats()
    print(f"  Hebbian: pairs={num_pairs}, lr={lr:.3f} | "
          f"Correct={mc:.4f}, Wrong={mw:.4f}, Disc={mc-mw:.4f}, "
          f"W_abs={ws['abs_mean']:.6f}, sparsity={ws['sparsity']:.2f}, "
          f"time={learn_t:.3f}s")
    return {"method": "direct_hebbian", "correct": mc, "wrong": mw,
            "disc": mc-mw, "w_stats": ws, "time": learn_t,
            "num_pairs": num_pairs, "lr": lr}
 def main():
    print("=" * 60)
    print("Experiment 2b: STDP v2 + Direct Hebbian")
    print("=" * 60)
    results = []
    N = 2048
    S = 64
    FR = 0.05
    # --- Part A: Direct Hebbian (baseline) ---
    print("\n=== Part A: Direct Hebbian Memory ===")
    print("\nA1: Scaling pairs (lr=0.5)")
    for n in [1, 5, 10, 20, 50, 100]:
        r = test_direct_hebbian(N, S, n, FR, lr=0.5)
        results.append({**r, "test": f"hebb_pairs_{n}"})
    print("\nA2: Learning rate sweep (10 pairs)")
    for lr in [0.01, 0.1, 0.5, 1.0, 5.0]:
        r = test_direct_hebbian(N, S, 10, FR, lr=lr)
        results.append({**r, "test": f"hebb_lr_{lr}"})
    # --- Part B: STDP v2 ---
    print("\n=== Part B: STDP v2 (teacher-forced) ===")
    print("\nB1: Sanity check - single pair")
    r = test_stdp_v2(N, S, 1, FR, num_pres=5, a_plus=0.01)
    results.append({**r, "test": "stdp_single"})
    print("\nB2: A+ sweep (10 pairs, 5 presentations)")
    for ap in [0.001, 0.005, 0.01, 0.05, 0.1]:
        r = test_stdp_v2(N, S, 10, FR, num_pres=5, a_plus=ap)
        results.append({**r, "test": f"stdp_ap_{ap}"})
    print("\nB3: Presentation count (10 pairs, A+=0.01)")
    for pres in [1, 3, 5, 10, 20]:
        r = test_stdp_v2(N, S, 10, FR, num_pres=pres, a_plus=0.01)
        results.append({**r, "test": f"stdp_pres_{pres}"})
    print("\nB4: Scaling pairs (A+=0.01, 5 presentations)")
    for n in [1, 5, 10, 20, 50]:
        r = test_stdp_v2(N, S, n, FR, num_pres=5, a_plus=0.01)
        results.append({**r, "test": f"stdp_pairs_{n}"})
    # Save
    with open(RESULTS_DIR / "exp02b_results.json", "w") as f:
        json.dump(results, f, indent=2, default=float)
    # Best from each method
    print("\n" + "=" * 60)
    hebb_best = max([r for r in results if r["method"] == "direct_hebbian"],
                    key=lambda x: x["disc"], default=None)
    stdp_best = max([r for r in results if r["method"] == "stdp_v2"],
                    key=lambda x: x["disc"], default=None)
    if hebb_best:
        print(f"Best Hebbian: {hebb_best['test']} — "
              f"Correct={hebb_best['correct']:.4f}, Disc={hebb_best['disc']:.4f}")
    if stdp_best:
        print(f"Best STDP:    {stdp_best['test']} — "
              f"Correct={stdp_best['correct']:.4f}, Disc={stdp_best['disc']:.4f}")
 if __name__ == "__main__":
    main()
--- a/experiments/exp02c_pattern_separation.py
+++ b/experiments/exp02c_pattern_separation.py
@@ -0,0 +1,209 @@
 """Experiment 2c: Pattern separation + improved associative recall.
 Key insight from 2b: random spike patterns have too much overlap,
 causing catastrophic interference in associative memory.
 Fix: Implement pattern separation (like dentate gyrus in hippocampus):
 1. Winner-take-all: only top-k neurons fire → guaranteed sparse, minimal overlap
 2. Random sparse projection: patterns projected through sparse random matrix
 3. Scale up neurons to improve signal-to-noise ratio (capacity ∝ N/P)
 Also test: direct Hebbian in rate-space (skip spike conversion entirely)
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    """Keep only top-k values, zero out the rest. Differentiable-ish."""
    topk_vals, topk_idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, topk_idx, 1.0)  # Binary: active or not
    return out
 class PatternSeparator(nn.Module):
    """Dentate gyrus analog: transforms input patterns into sparse, orthogonal codes."""
    def __init__(self, input_dim, code_dim, k_active):
        super().__init__()
        self.k_active = k_active
        # Sparse random projection (fixed, not learned)
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
    def forward(self, x):
        """x: [input_dim] → [code_dim] sparse binary"""
        h = x @ self.proj
        return winner_take_all(h, self.k_active)
 class HebbianMemory(nn.Module):
    """Heteroassociative memory with pattern separation."""
    def __init__(self, input_dim, code_dim=8192, k_active=50, lr=1.0):
        super().__init__()
        self.separator = PatternSeparator(input_dim, code_dim, k_active)
        self.code_dim = code_dim
        self.lr = lr
        # Separate separator for targets (different random projection)
        self.target_separator = PatternSeparator(input_dim, code_dim, k_active)
        # Association matrix: separated_cue → separated_target
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def learn(self, cue, target):
        """cue, target: [dim] continuous vectors"""
        cue_code = self.separator(cue)
        target_code = self.target_separator(target)
        # Outer product Hebbian update
        self.W.data += self.lr * torch.outer(target_code, cue_code)
    def recall(self, cue, k_recall=50):
        """Returns separated target code."""
        cue_code = self.separator(cue)
        raw = self.W @ cue_code
        # WTA on output to clean up
        return winner_take_all(raw, k_recall)
    def recall_continuous(self, cue):
        """Returns continuous activation (for cosine sim)."""
        cue_code = self.separator(cue)
        return self.W @ cue_code
 def test_hebbian_with_separation(input_dim, code_dim, k_active, num_pairs, lr):
    """Test associative recall with pattern separation."""
    mem = HebbianMemory(input_dim, code_dim, k_active, lr).to(DEVICE)
    # Generate random normalized vectors as memories
    cues = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    # Learn
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    # Test recall in code space (after separation)
    correct_sims = []
    wrong_sims = []
    for i in range(num_pairs):
        recalled = mem.recall(cues[i], k_recall=k_active)
        target_code = mem.target_separator(targets[i])
        cs = cosine(recalled, target_code)
        correct_sims.append(cs)
        for j in range(min(num_pairs, 20)):  # limit comparisons for speed
            if j != i:
                wrong_code = mem.target_separator(targets[j])
                wrong_sims.append(cosine(recalled, wrong_code))
    mc = np.mean(correct_sims)
    mw = np.mean(wrong_sims) if wrong_sims else 0
    print(f"  code={code_dim}, k={k_active}, pairs={num_pairs}, lr={lr:.2f} | "
          f"Correct={mc:.4f}, Wrong={mw:.4f}, Disc={mc-mw:.4f}")
    return {"correct": mc, "wrong": mw, "disc": mc - mw,
            "code_dim": code_dim, "k_active": k_active,
            "num_pairs": num_pairs, "lr": lr}
 def test_overlap_analysis(code_dim, k_active, num_patterns):
    """Measure how orthogonal the separated patterns actually are."""
    sep = PatternSeparator(768, code_dim, k_active).to(DEVICE)
    patterns = []
    for _ in range(num_patterns):
        x = nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0)
        code = sep(x)
        patterns.append(code)
    # Pairwise cosine similarity
    sims = []
    for i in range(num_patterns):
        for j in range(i+1, num_patterns):
            s = cosine(patterns[i], patterns[j])
            sims.append(s)
    mean_sim = np.mean(sims)
    max_sim = np.max(sims)
    print(f"  code={code_dim}, k={k_active}: mean_overlap={mean_sim:.4f}, max_overlap={max_sim:.4f}")
    return {"mean_overlap": mean_sim, "max_overlap": max_sim}
 def main():
    print("=" * 60)
    print("Experiment 2c: Pattern Separation + Hebbian Memory")
    print("=" * 60)
    results = []
    # Part 1: Overlap analysis — how orthogonal are separated patterns?
    print("\n=== Part 1: Pattern overlap after separation ===")
    for code_dim in [2048, 4096, 8192, 16384]:
        for k in [20, 50, 100]:
            ov = test_overlap_analysis(code_dim, k, 100)
            results.append({"test": "overlap", "code_dim": code_dim, "k": k, **ov})
    # Part 2: Associative recall with separation
    print("\n=== Part 2: Recall with pattern separation ===")
    print("\n-- Scaling pairs --")
    for n in [1, 5, 10, 20, 50, 100, 200, 500]:
        r = test_hebbian_with_separation(768, 8192, 50, n, lr=1.0)
        results.append({"test": f"sep_pairs_{n}", **r})
    print("\n-- Code dimension sweep (100 pairs) --")
    for cd in [2048, 4096, 8192, 16384]:
        r = test_hebbian_with_separation(768, cd, 50, 100, lr=1.0)
        results.append({"test": f"sep_codedim_{cd}", **r})
    print("\n-- Sparsity sweep (100 pairs, code=8192) --")
    for k in [10, 20, 50, 100, 200]:
        r = test_hebbian_with_separation(768, 8192, k, 100, lr=1.0)
        results.append({"test": f"sep_k_{k}", **r})
    print("\n-- Capacity test: find the breaking point (code=16384, k=20) --")
    for n in [10, 50, 100, 200, 500, 1000, 2000]:
        r = test_hebbian_with_separation(768, 16384, 20, n, lr=1.0)
        results.append({"test": f"cap_{n}", **r})
    # Save
    with open(RESULTS_DIR / "exp02c_results.json", "w") as f:
        json.dump(results, f, indent=2, default=float)
    # Find best config
    recall_results = [r for r in results if r.get("disc") is not None and "cap_" in r.get("test", "")]
    if recall_results:
        print("\n=== Capacity curve (code=16384, k=20) ===")
        print(f"{'Pairs':>6} {'Correct':>8} {'Wrong':>8} {'Disc':>8}")
        for r in recall_results:
            print(f"{r['num_pairs']:>6} {r['correct']:>8.4f} {r['wrong']:>8.4f} {r['disc']:>8.4f}")
 if __name__ == "__main__":
    main()
--- a/experiments/exp02d_robustness.py
+++ b/experiments/exp02d_robustness.py
@@ -0,0 +1,218 @@
 """Experiment 2d: Robustness and capacity limits.
 Pattern separation + Hebbian recall is perfect with clean cues.
 Now test:
 1. Noisy cues: add gaussian noise to cue before recall
 2. Partial cues: zero out part of the cue
 3. Capacity stress test: push to 10K+ memories
 4. Full pipeline: encoder → separator → memory → decoder
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    topk_vals, topk_idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, topk_idx, 1.0)
    return out
 class PatternSeparator(nn.Module):
    def __init__(self, input_dim, code_dim, k_active):
        super().__init__()
        self.k_active = k_active
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
    def forward(self, x):
        h = x @ self.proj
        return winner_take_all(h, self.k_active)
 class HebbianMemory(nn.Module):
    def __init__(self, input_dim, code_dim=16384, k_active=20, lr=1.0):
        super().__init__()
        self.separator = PatternSeparator(input_dim, code_dim, k_active)
        self.target_separator = PatternSeparator(input_dim, code_dim, k_active)
        self.code_dim = code_dim
        self.k_active = k_active
        self.lr = lr
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def learn(self, cue, target):
        cue_code = self.separator(cue)
        target_code = self.target_separator(target)
        self.W.data += self.lr * torch.outer(target_code, cue_code)
    def recall_code(self, cue_code):
        raw = self.W @ cue_code
        return winner_take_all(raw, self.k_active)
    def recall(self, cue):
        cue_code = self.separator(cue)
        return self.recall_code(cue_code)
 def run_noise_test(num_pairs, noise_levels, code_dim=16384, k=20, input_dim=768):
    """Test recall under noisy cues."""
    mem = HebbianMemory(input_dim, code_dim, k).to(DEVICE)
    cues = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    # Pre-compute target codes
    target_codes = [mem.target_separator(t) for t in targets]
    results = {}
    for noise_std in noise_levels:
        correct_sims = []
        for i in range(num_pairs):
            # Add noise to cue
            noisy_cue = cues[i] + torch.randn_like(cues[i]) * noise_std
            noisy_cue = nn.functional.normalize(noisy_cue, dim=0)
            recalled = mem.recall(noisy_cue)
            cs = cosine(recalled, target_codes[i])
            correct_sims.append(cs)
        mc = np.mean(correct_sims)
        # Exact match rate (CosSim > 0.99)
        exact_rate = np.mean([s > 0.99 for s in correct_sims])
        results[noise_std] = {"mean_cos": mc, "exact_rate": exact_rate}
        print(f"  noise={noise_std:.2f}: CosSim={mc:.4f}, Exact={exact_rate:.2%}")
    return results
 def run_partial_cue_test(num_pairs, mask_fractions, code_dim=16384, k=20, input_dim=768):
    """Test recall with partial cues (some dimensions zeroed out)."""
    mem = HebbianMemory(input_dim, code_dim, k).to(DEVICE)
    cues = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    target_codes = [mem.target_separator(t) for t in targets]
    results = {}
    for frac in mask_fractions:
        correct_sims = []
        for i in range(num_pairs):
            # Zero out frac% of dimensions
            mask = torch.ones(input_dim, device=DEVICE)
            n_zero = int(input_dim * frac)
            indices = torch.randperm(input_dim)[:n_zero]
            mask[indices] = 0
            partial_cue = cues[i] * mask
            partial_cue = nn.functional.normalize(partial_cue, dim=0)
            recalled = mem.recall(partial_cue)
            cs = cosine(recalled, target_codes[i])
            correct_sims.append(cs)
        mc = np.mean(correct_sims)
        exact_rate = np.mean([s > 0.99 for s in correct_sims])
        results[frac] = {"mean_cos": mc, "exact_rate": exact_rate}
        print(f"  mask={frac:.0%}: CosSim={mc:.4f}, Exact={exact_rate:.2%}")
    return results
 def run_capacity_stress_test(code_dim=16384, k=20, input_dim=768):
    """Push memory count until recall degrades."""
    mem = HebbianMemory(input_dim, code_dim, k).to(DEVICE)
    all_cues = []
    all_targets = []
    all_target_codes = []
    checkpoints = [100, 500, 1000, 2000, 5000, 10000, 20000]
    results = {}
    for n in range(max(checkpoints)):
        cue = nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
        target = nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
        mem.learn(cue, target)
        all_cues.append(cue)
        all_targets.append(target)
        all_target_codes.append(mem.target_separator(target))
        if (n + 1) in checkpoints:
            # Test recall on random sample
            sample_size = min(100, n + 1)
            indices = torch.randperm(n + 1)[:sample_size].tolist()
            correct_sims = []
            for idx in indices:
                recalled = mem.recall(all_cues[idx])
                cs = cosine(recalled, all_target_codes[idx])
                correct_sims.append(cs)
            mc = np.mean(correct_sims)
            exact_rate = np.mean([s > 0.99 for s in correct_sims])
            # W stats
            w_abs = mem.W.data.abs().mean().item()
            print(f"  N={n+1:>5}: CosSim={mc:.4f}, Exact={exact_rate:.2%}, "
                  f"W_abs={w_abs:.4f}")
            results[n+1] = {"mean_cos": mc, "exact_rate": exact_rate, "w_abs": w_abs}
    return results
 def main():
    print("=" * 60)
    print("Experiment 2d: Robustness & Capacity")
    print("=" * 60)
    all_results = {}
    # Test 1: Noise robustness
    print("\n=== Noise Robustness (100 pairs) ===")
    noise_results = run_noise_test(
        100, [0.0, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0])
    all_results["noise"] = {str(k): v for k, v in noise_results.items()}
    # Test 2: Partial cue
    print("\n=== Partial Cue Robustness (100 pairs) ===")
    partial_results = run_partial_cue_test(
        100, [0.0, 0.1, 0.2, 0.3, 0.5, 0.7, 0.9])
    all_results["partial"] = {str(k): v for k, v in partial_results.items()}
    # Test 3: Capacity
    print("\n=== Capacity Stress Test (code=16384, k=20) ===")
    cap_results = run_capacity_stress_test()
    all_results["capacity"] = {str(k): v for k, v in cap_results.items()}
    with open(RESULTS_DIR / "exp02d_results.json", "w") as f:
        json.dump(all_results, f, indent=2, default=float)
 if __name__ == "__main__":
    main()
--- a/experiments/exp02e_noise_tolerance.py
+++ b/experiments/exp02e_noise_tolerance.py
@@ -0,0 +1,368 @@
 """Experiment 2e: Noise-tolerant retrieval.
 Problem: WTA pattern separation is brittle to noise in cue embeddings.
 Real use case requires retrieving from semantically similar (not identical) cues.
 Approaches to test:
 1. Soft-WTA: Use softmax temperature instead of hard top-k
 2. Multi-probe: Multiple noisy retrievals + voting
 3. Coarse-to-fine: Nearest-neighbor in embedding space → exact Hebbian recall
 4. Learned similarity-preserving hash: train the separator to be noise-robust
 5. Wider k: trade capacity for noise robustness
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, topk_idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, topk_idx, 1.0)
    return out
 class SoftWTASeparator(nn.Module):
    """Soft winner-take-all using temperature-scaled softmax.
    Instead of hard binary codes, produces soft sparse codes.
    More robust to noise but reduces discrimination.
    """
    def __init__(self, input_dim, code_dim, temperature=0.1):
        super().__init__()
        self.temperature = temperature
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
    def forward(self, x):
        h = x @ self.proj
        # Soft WTA: high temp → more spread, low temp → more sparse
        return torch.softmax(h / self.temperature, dim=-1)
 class MultiProbeSeparator(nn.Module):
    """Multiple random projections, retrieve from all, majority vote."""
    def __init__(self, input_dim, code_dim, k_active, num_probes=8):
        super().__init__()
        self.k_active = k_active
        self.num_probes = num_probes
        # Multiple random projections
        projs = torch.randn(num_probes, input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('projs', projs)
    def forward(self, x):
        """Returns averaged code across all probes."""
        votes = torch.zeros(self.projs.shape[2], device=x.device)
        for i in range(self.num_probes):
            h = x @ self.projs[i]
            code = winner_take_all(h, self.k_active)
            votes += code
        # Threshold: active if majority of probes agree
        threshold = self.num_probes / 2
        return (votes > threshold).float()
 class CoarseToFineMemory(nn.Module):
    """Coarse: nearest-neighbor in embedding space.
    Fine: exact Hebbian recall from nearest stored cue.
    This is the most practical approach: SNN/Hebbian provides the
    association storage, but retrieval is bootstrapped by embedding similarity.
    """
    def __init__(self, input_dim, code_dim=16384, k_active=20):
        super().__init__()
        self.code_dim = code_dim
        self.k_active = k_active
        proj = torch.randn(input_dim, code_dim, device=DEVICE) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
        target_proj = torch.randn(input_dim, code_dim, device=DEVICE) * (1.0 / input_dim**0.5)
        self.register_buffer('target_proj', target_proj)
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim, device=DEVICE),
                              requires_grad=False)
        # Store raw cue embeddings for nearest-neighbor lookup
        self.cue_store = []
    def separate(self, x, proj):
        h = x @ proj
        return winner_take_all(h, self.k_active)
    def learn(self, cue, target):
        self.cue_store.append(cue.detach().clone())
        cue_code = self.separate(cue, self.proj)
        target_code = self.separate(target, self.target_proj)
        self.W.data += torch.outer(target_code, cue_code)
    def recall(self, query):
        """Coarse: find nearest stored cue. Fine: Hebbian recall."""
        if not self.cue_store:
            return torch.zeros(self.code_dim, device=DEVICE)
        # Nearest neighbor
        cue_matrix = torch.stack(self.cue_store)  # [N, dim]
        sims = nn.functional.cosine_similarity(
            query.unsqueeze(0), cue_matrix, dim=-1)  # [N]
        best_idx = sims.argmax()
        best_cue = self.cue_store[best_idx]
        # Exact Hebbian recall with nearest cue
        cue_code = self.separate(best_cue, self.proj)
        raw = self.W @ cue_code
        return winner_take_all(raw, self.k_active)
 def test_approach(name, mem_class, num_pairs=100, noise_levels=None, **kwargs):
    """Generic test harness."""
    if noise_levels is None:
        noise_levels = [0.0, 0.1, 0.2, 0.5, 1.0, 2.0]
    input_dim = 768
    cues = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(input_dim, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    mem = mem_class(**kwargs).to(DEVICE) if not isinstance(mem_class, nn.Module) else mem_class
    # Learn
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    results = {}
    for noise_std in noise_levels:
        correct_sims = []
        for i in range(num_pairs):
            noisy_cue = cues[i] + torch.randn_like(cues[i]) * noise_std
            noisy_cue = nn.functional.normalize(noisy_cue, dim=0)
            recalled = mem.recall(noisy_cue)
            # Compare to target code
            if hasattr(mem, 'target_separator'):
                target_code = mem.target_separator(targets[i])
            elif hasattr(mem, 'target_proj'):
                target_code = winner_take_all(targets[i] @ mem.target_proj, mem.k_active)
            else:
                target_code = targets[i]
            cs = cosine(recalled, target_code)
            correct_sims.append(cs)
        mc = np.mean(correct_sims)
        exact = np.mean([s > 0.99 for s in correct_sims])
        results[noise_std] = {"mean_cos": mc, "exact_rate": exact}
        print(f"  {name}: noise={noise_std:.2f} → CosSim={mc:.4f}, Exact={exact:.2%}")
    return results
 # --- Approach-specific memory classes ---
 class SoftWTAMemory(nn.Module):
    def __init__(self, input_dim=768, code_dim=16384, temperature=0.1):
        super().__init__()
        self.separator = SoftWTASeparator(input_dim, code_dim, temperature)
        self.target_separator = SoftWTASeparator(input_dim, code_dim, temperature)
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def learn(self, cue, target):
        cc = self.separator(cue)
        tc = self.target_separator(target)
        self.W.data += torch.outer(tc, cc)
    def recall(self, cue):
        cc = self.separator(cue)
        return self.W @ cc
 class MultiProbeMemory(nn.Module):
    def __init__(self, input_dim=768, code_dim=8192, k_active=20, num_probes=16):
        super().__init__()
        self.separator = MultiProbeSeparator(input_dim, code_dim, k_active, num_probes)
        self.target_separator = MultiProbeSeparator(input_dim, code_dim, k_active, num_probes)
        self.k_active = k_active
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def learn(self, cue, target):
        cc = self.separator(cue)
        tc = self.target_separator(target)
        self.W.data += torch.outer(tc, cc)
    def recall(self, cue):
        cc = self.separator(cue)
        raw = self.W @ cc
        return winner_take_all(raw, self.k_active)
 class WiderKMemory(nn.Module):
    """Just use wider k — simple and might work."""
    def __init__(self, input_dim=768, code_dim=16384, k_active=200):
        super().__init__()
        self.k_active = k_active
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
        target_proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('target_proj', target_proj)
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def learn(self, cue, target):
        cc = winner_take_all(cue @ self.proj, self.k_active)
        tc = winner_take_all(target @ self.target_proj, self.k_active)
        self.W.data += torch.outer(tc, cc)
    def recall(self, cue):
        cc = winner_take_all(cue @ self.proj, self.k_active)
        raw = self.W @ cc
        return winner_take_all(raw, self.k_active)
    @property
    def target_separator(self):
        return None  # handled differently
 def main():
    print("=" * 60)
    print("Experiment 2e: Noise-Tolerant Retrieval")
    print("=" * 60)
    noise_levels = [0.0, 0.05, 0.1, 0.2, 0.5, 1.0]
    num_pairs = 100
    all_results = {}
    # 1. Soft WTA
    print("\n=== 1. Soft WTA ===")
    for temp in [0.01, 0.05, 0.1, 0.5]:
        name = f"soft_wta_t{temp}"
        print(f"\n-- temperature={temp} --")
        mem = SoftWTAMemory(temperature=temp).to(DEVICE)
        cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        for i in range(num_pairs):
            mem.learn(cues[i], targets[i])
        results = {}
        for ns in noise_levels:
            sims = []
            for i in range(num_pairs):
                noisy = nn.functional.normalize(cues[i] + torch.randn_like(cues[i]) * ns, dim=0)
                recalled = mem.recall(noisy)
                tc = mem.target_separator(targets[i])
                sims.append(cosine(recalled, tc))
            mc = np.mean(sims)
            print(f"  noise={ns:.2f}: CosSim={mc:.4f}")
            results[ns] = mc
        all_results[name] = results
    # 2. Multi-probe
    print("\n=== 2. Multi-Probe ===")
    for n_probes in [4, 8, 16, 32]:
        name = f"multiprobe_{n_probes}"
        print(f"\n-- probes={n_probes} --")
        mem = MultiProbeMemory(num_probes=n_probes).to(DEVICE)
        cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        for i in range(num_pairs):
            mem.learn(cues[i], targets[i])
        results = {}
        for ns in noise_levels:
            sims = []
            for i in range(num_pairs):
                noisy = nn.functional.normalize(cues[i] + torch.randn_like(cues[i]) * ns, dim=0)
                recalled = mem.recall(noisy)
                tc = mem.target_separator(targets[i])
                sims.append(cosine(recalled, tc))
            mc = np.mean(sims)
            print(f"  noise={ns:.2f}: CosSim={mc:.4f}")
            results[ns] = mc
        all_results[name] = results
    # 3. Coarse-to-fine
    print("\n=== 3. Coarse-to-Fine (NN + Hebbian) ===")
    mem = CoarseToFineMemory(768).to(DEVICE)
    cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    results = {}
    for ns in noise_levels:
        sims = []
        for i in range(num_pairs):
            noisy = nn.functional.normalize(cues[i] + torch.randn_like(cues[i]) * ns, dim=0)
            recalled = mem.recall(noisy)
            tc = winner_take_all(targets[i] @ mem.target_proj, mem.k_active)
            sims.append(cosine(recalled, tc))
        mc = np.mean(sims)
        print(f"  noise={ns:.2f}: CosSim={mc:.4f}")
        results[ns] = mc
    all_results["coarse_to_fine"] = results
    # 4. Wider k
    print("\n=== 4. Wider K ===")
    for k in [50, 100, 200, 500, 1000]:
        name = f"wider_k_{k}"
        print(f"\n-- k={k} --")
        mem = WiderKMemory(k_active=k).to(DEVICE)
        cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0) for _ in range(num_pairs)]
        for i in range(num_pairs):
            mem.learn(cues[i], targets[i])
        results = {}
        for ns in noise_levels:
            sims = []
            for i in range(num_pairs):
                noisy = nn.functional.normalize(cues[i] + torch.randn_like(cues[i]) * ns, dim=0)
                recalled = mem.recall(noisy)
                tc = winner_take_all(targets[i] @ mem.target_proj, k)
                sims.append(cosine(recalled, tc))
            mc = np.mean(sims)
            print(f"  noise={ns:.2f}: CosSim={mc:.4f}")
            results[ns] = mc
        all_results[name] = results
    # Save
    serializable = {}
    for k, v in all_results.items():
        serializable[k] = {str(kk): float(vv) for kk, vv in v.items()}
    with open(RESULTS_DIR / "exp02e_results.json", "w") as f:
        json.dump(serializable, f, indent=2)
    # Summary table
    print("\n" + "=" * 80)
    print("SUMMARY: CosSim at each noise level")
    print(f"{'Method':<25}", end="")
    for ns in noise_levels:
        print(f"  σ={ns:.2f}", end="")
    print()
    print("-" * 80)
    for method, res in all_results.items():
        print(f"{method:<25}", end="")
        for ns in noise_levels:
            v = res.get(ns, 0)
            print(f"  {v:>5.3f}", end="")
        print()
 if __name__ == "__main__":
    main()
--- a/experiments/exp02f_discrimination_check.py
+++ b/experiments/exp02f_discrimination_check.py
@@ -0,0 +1,254 @@
 """Experiment 2f: Check discrimination for soft WTA + test learned separator.
 Soft WTA temp=0.5 showed perfect noise tolerance but might have zero discrimination.
 Need to check: can it tell correct target from wrong targets?
 Then test: learned pattern separator (trained to be noise-robust via contrastive loss).
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import torch.optim as optim
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class SoftWTAMemory(nn.Module):
    def __init__(self, input_dim=768, code_dim=16384, temperature=0.5):
        super().__init__()
        self.temperature = temperature
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
        target_proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('target_proj', target_proj)
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim), requires_grad=False)
    def encode(self, x, proj):
        return torch.softmax((x @ proj) / self.temperature, dim=-1)
    def learn(self, cue, target):
        cc = self.encode(cue, self.proj)
        tc = self.encode(target, self.target_proj)
        self.W.data += torch.outer(tc, cc)
    def recall(self, cue):
        cc = self.encode(cue, self.proj)
        return self.W @ cc
 def check_discrimination(temperature, num_pairs=100):
    """Check correct vs wrong similarity for soft WTA."""
    mem = SoftWTAMemory(temperature=temperature).to(DEVICE)
    cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    for i in range(num_pairs):
        mem.learn(cues[i], targets[i])
    # Test with noise=0.1
    for noise_std in [0.0, 0.1, 0.5]:
        correct_sims = []
        wrong_sims = []
        for i in range(num_pairs):
            noisy = nn.functional.normalize(
                cues[i] + torch.randn_like(cues[i]) * noise_std, dim=0)
            recalled = mem.recall(noisy)
            tc = mem.encode(targets[i], mem.target_proj)
            correct_sims.append(cosine(recalled, tc))
            # Compare to random wrong targets
            for j in range(min(20, num_pairs)):
                if j != i:
                    wc = mem.encode(targets[j], mem.target_proj)
                    wrong_sims.append(cosine(recalled, wc))
        mc = np.mean(correct_sims)
        mw = np.mean(wrong_sims)
        print(f"  temp={temperature}, noise={noise_std:.1f}: "
              f"Correct={mc:.4f}, Wrong={mw:.4f}, Disc={mc-mw:.4f}")
 class LearnedSeparator(nn.Module):
    """Trained pattern separator: maps similar inputs to same code.
    Architecture: MLP → sparse output (WTA)
    Training: contrastive loss on (original, noisy) pairs
    """
    def __init__(self, input_dim=768, code_dim=4096, k_active=50):
        super().__init__()
        self.k_active = k_active
        self.code_dim = code_dim
        self.net = nn.Sequential(
            nn.Linear(input_dim, code_dim),
            nn.ReLU(),
            nn.Linear(code_dim, code_dim),
        )
    def forward(self, x):
        h = self.net(x)
        return winner_take_all(h, self.k_active)
    def forward_soft(self, x, temperature=0.1):
        """Soft version for training (differentiable)."""
        h = self.net(x)
        return torch.softmax(h / temperature, dim=-1)
 def train_learned_separator(input_dim=768, code_dim=4096, k_active=50,
                            epochs=100, batch_size=128, noise_std=0.3):
    """Train separator to produce same codes for original and noisy versions."""
    sep = LearnedSeparator(input_dim, code_dim, k_active).to(DEVICE)
    optimizer = optim.Adam(sep.parameters(), lr=1e-3)
    print(f"\nTraining learned separator (code_dim={code_dim}, k={k_active}, "
          f"noise={noise_std})")
    for epoch in range(epochs):
        # Generate batch of normalized vectors
        x = nn.functional.normalize(torch.randn(batch_size, input_dim, device=DEVICE), dim=1)
        # Noisy version
        x_noisy = nn.functional.normalize(x + torch.randn_like(x) * noise_std, dim=1)
        # Different vector (negative)
        x_neg = nn.functional.normalize(torch.randn(batch_size, input_dim, device=DEVICE), dim=1)
        # Soft codes
        code = sep.forward_soft(x)
        code_noisy = sep.forward_soft(x_noisy)
        code_neg = sep.forward_soft(x_neg)
        # Contrastive loss: same input → same code, diff input → diff code
        pos_sim = nn.functional.cosine_similarity(code, code_noisy, dim=1).mean()
        neg_sim = nn.functional.cosine_similarity(code, code_neg, dim=1).mean()
        loss = -pos_sim + 0.5 * torch.relu(neg_sim - 0.1)
        # Sparsity regularization
        entropy = -(code * (code + 1e-10).log()).sum(dim=1).mean()
        loss += 0.01 * entropy
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (epoch + 1) % 20 == 0:
            with torch.no_grad():
                hard_code = sep(x)
                hard_noisy = sep(x_noisy)
                hard_neg = sep(x_neg)
                # Exact match rate (same WTA pattern)
                match_rate = (hard_code * hard_noisy).sum(dim=1).mean() / k_active
                neg_match = (hard_code * hard_neg).sum(dim=1).mean() / k_active
            print(f"  Epoch {epoch+1}: loss={loss.item():.4f}, "
                  f"pos_match={match_rate:.4f}, neg_match={neg_match:.4f}")
    return sep
 def test_learned_memory(sep, num_pairs=100, noise_levels=None):
    """Test Hebbian memory using learned separator."""
    if noise_levels is None:
        noise_levels = [0.0, 0.1, 0.2, 0.5, 1.0]
    code_dim = sep.code_dim
    k = sep.k_active
    W = torch.zeros(code_dim, code_dim, device=DEVICE)
    cues = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0)
            for _ in range(num_pairs)]
    targets = [nn.functional.normalize(torch.randn(768, device=DEVICE), dim=0)
               for _ in range(num_pairs)]
    # Learn
    with torch.no_grad():
        cue_codes = [sep(c.unsqueeze(0)).squeeze() for c in cues]
        target_codes = [sep(t.unsqueeze(0)).squeeze() for t in targets]
    for i in range(num_pairs):
        W += torch.outer(target_codes[i], cue_codes[i])
    # Test
    for ns in noise_levels:
        correct_sims = []
        wrong_sims = []
        for i in range(num_pairs):
            noisy = nn.functional.normalize(
                cues[i] + torch.randn_like(cues[i]) * ns, dim=0)
            with torch.no_grad():
                nc = sep(noisy.unsqueeze(0)).squeeze()
            recalled_raw = W @ nc
            recalled = winner_take_all(recalled_raw, k)
            cs = cosine(recalled, target_codes[i])
            correct_sims.append(cs)
            for j in range(min(20, num_pairs)):
                if j != i:
                    wrong_sims.append(cosine(recalled, target_codes[j]))
        mc = np.mean(correct_sims)
        mw = np.mean(wrong_sims)
        exact = np.mean([s > 0.99 for s in correct_sims])
        print(f"  noise={ns:.2f}: Correct={mc:.4f}, Wrong={mw:.4f}, "
              f"Disc={mc-mw:.4f}, Exact={exact:.2%}")
 def main():
    print("=" * 60)
    print("Experiment 2f: Discrimination Check + Learned Separator")
    print("=" * 60)
    # Part 1: Check discrimination for soft WTA
    print("\n=== Part 1: Soft WTA Discrimination ===")
    for temp in [0.01, 0.05, 0.1, 0.5, 1.0]:
        check_discrimination(temp)
        print()
    # Part 2: Learned separator
    print("\n=== Part 2: Learned Separator ===")
    # Train with different noise levels
    for train_noise in [0.1, 0.3, 0.5]:
        sep = train_learned_separator(
            code_dim=4096, k_active=50,
            epochs=200, noise_std=train_noise)
        print(f"\n  Testing (trained with noise={train_noise}):")
        test_learned_memory(sep, num_pairs=100)
        print()
    # Part 3: Larger learned separator
    print("\n=== Part 3: Larger Learned Separator (code=8192, k=20) ===")
    sep = train_learned_separator(
        code_dim=8192, k_active=20,
        epochs=300, noise_std=0.3)
    print("\n  Testing:")
    test_learned_memory(sep, num_pairs=200)
 if __name__ == "__main__":
    main()
--- a/experiments/exp02g_multihop.py
+++ b/experiments/exp02g_multihop.py
@@ -0,0 +1,194 @@
 """Experiment 2g: Multi-hop associative recall.
 The unique advantage of Hebbian memory over simple cosine retrieval:
 If A→B and B→C are learned, can we recall C from A by chaining through B?
 This is impossible with standard RAG (which only does single-hop NN lookup).
 If this works, it's the strongest argument for the Hebbian approach.
 """
 import sys
 import torch
 import torch.nn as nn
 import numpy as np
 from pathlib import Path
 DEVICE = "cuda"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class HebbianMemory:
    """Simple Hebbian memory for multi-hop tests."""
    def __init__(self, input_dim=768, code_dim=16384, k=20):
        self.k = k
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue, target):
        cc = self.sep(cue)
        tc = self.sep(target)
        self.W += torch.outer(tc, cc)
    def recall_code(self, code, k=None):
        if k is None:
            k = self.k
        raw = self.W @ code
        return winner_take_all(raw, k)
    def recall(self, cue):
        return self.recall_code(self.sep(cue))
    def multi_hop_recall(self, cue, hops=2):
        """Chain through associations: cue → hop1 → hop2 → ..."""
        code = self.sep(cue)
        for _ in range(hops):
            code = self.recall_code(code)
        return code
 def test_chain(chain_length, num_chains, dim=768, code_dim=16384, k=20):
    """Test multi-hop recall along chains of length L.
    Create chains: A₁→A₂→...→Aₗ
    Learn pairs: (A₁,A₂), (A₂,A₃), ..., (Aₗ₋₁,Aₗ)
    Test: given A₁, can we reach A₂, A₃, ..., Aₗ in 1, 2, ... hops?
    """
    mem = HebbianMemory(dim, code_dim, k)
    chains = []
    for _ in range(num_chains):
        chain = [nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
                 for _ in range(chain_length)]
        chains.append(chain)
        # Learn consecutive pairs
        for i in range(chain_length - 1):
            mem.learn(chain[i], chain[i+1])
    # Test recall at different hop distances
    results = {}
    for hops in range(1, chain_length):
        correct_sims = []
        for chain in chains:
            start = chain[0]
            target = chain[hops]
            target_code = mem.sep(target)
            recalled = mem.multi_hop_recall(start, hops=hops)
            cs = cosine(recalled, target_code)
            correct_sims.append(cs)
        mc = np.mean(correct_sims)
        exact = np.mean([s > 0.5 for s in correct_sims])
        results[hops] = {"mean_cos": mc, "recall_rate": exact}
        print(f"  chain_len={chain_length}, chains={num_chains}, "
              f"hops={hops}: CosSim={mc:.4f}, recall>{0.5:.0%}={exact:.2%}")
    return results
 def test_convergent_chains(dim=768, code_dim=16384, k=20):
    """Test convergent chains: A→C and B→C.
    Can we recall C from both A and B?"""
    mem = HebbianMemory(dim, code_dim, k)
    # Create convergent pattern
    a = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    b = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    c = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    mem.learn(a, c)
    mem.learn(b, c)
    c_code = mem.sep(c)
    # Recall from A
    ra = mem.recall(a)
    sim_a = cosine(ra, c_code)
    # Recall from B
    rb = mem.recall(b)
    sim_b = cosine(rb, c_code)
    print(f"  Convergent: A→C sim={sim_a:.4f}, B→C sim={sim_b:.4f}")
    return {"a_to_c": sim_a, "b_to_c": sim_b}
 def test_divergent_chains(dim=768, code_dim=16384, k=20):
    """Test divergent chains: A→B and A→C.
    Do B and C interfere?"""
    mem = HebbianMemory(dim, code_dim, k)
    a = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    b = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    c = nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
    mem.learn(a, b)
    mem.learn(a, c)
    b_code = mem.sep(b)
    c_code = mem.sep(c)
    recalled = mem.recall(a)
    sim_b = cosine(recalled, b_code)
    sim_c = cosine(recalled, c_code)
    print(f"  Divergent: A→B sim={sim_b:.4f}, A→C sim={sim_c:.4f}")
    return {"a_to_b": sim_b, "a_to_c": sim_c}
 def main():
    print("=" * 60)
    print("Experiment 2g: Multi-hop Associative Recall")
    print("=" * 60)
    # Test 1: Simple chains
    print("\n=== Chain recall (single chain) ===")
    for L in [3, 5, 7]:
        test_chain(L, num_chains=1)
    # Test 2: Multiple chains (interference between chains)
    print("\n=== Chain recall (multiple chains, interference) ===")
    for n_chains in [1, 5, 10, 50, 100]:
        print(f"\n-- {n_chains} chains of length 4 --")
        test_chain(4, num_chains=n_chains)
    # Test 3: Convergent
    print("\n=== Convergent chains (A→C, B→C) ===")
    results = []
    for _ in range(20):
        r = test_convergent_chains()
        results.append(r)
    mean_a = np.mean([r["a_to_c"] for r in results])
    mean_b = np.mean([r["b_to_c"] for r in results])
    print(f"  Average: A→C={mean_a:.4f}, B→C={mean_b:.4f}")
    # Test 4: Divergent
    print("\n=== Divergent chains (A→B, A→C) ===")
    results = []
    for _ in range(20):
        r = test_divergent_chains()
        results.append(r)
    mean_b = np.mean([r["a_to_b"] for r in results])
    mean_c = np.mean([r["a_to_c"] for r in results])
    print(f"  Average: A→B={mean_b:.4f}, A→C={mean_c:.4f}")
 if __name__ == "__main__":
    main()
--- a/experiments/exp03_consolidation.py
+++ b/experiments/exp03_consolidation.py
@@ -0,0 +1,316 @@
 """Experiment 3: Sleep Consolidation Effects.
 Test questions:
 1. Does consolidation (replay + homeostasis) help or hurt recall?
 2. Does replay with noise improve noise tolerance?
 3. How does pruning affect capacity?
 4. Multi-night scenario: learn day 1, consolidate, learn day 2, consolidate.
   Do day 1 memories survive?
 5. Selective consolidation: replay important memories more → priority memory
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.consolidation import MemoryConsolidator, winner_take_all
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class TestableMemory:
    """Memory with consolidation support for testing."""
    def __init__(self, input_dim=768, code_dim=16384, k=20):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.target_proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                            * (1.0 / input_dim**0.5))
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim, device=DEVICE),
                              requires_grad=False)
        self.consolidator = MemoryConsolidator(code_dim, k)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def sep_target(self, x):
        return winner_take_all(x @ self.target_proj, self.k)
    def learn(self, cue, target, record=True):
        cc = self.sep(cue)
        tc = self.sep_target(target)
        self.W.data += torch.outer(tc, cc)
        if record:
            self.consolidator.record(cc, tc)
    def recall(self, cue):
        cc = self.sep(cue)
        raw = self.W @ cc
        return winner_take_all(raw, self.k)
    def test_recall(self, cues, targets, noise_std=0.0):
        """Test recall accuracy."""
        correct = []
        for i in range(len(cues)):
            if noise_std > 0:
                c = nn.functional.normalize(
                    cues[i] + torch.randn_like(cues[i]) * noise_std, dim=0)
            else:
                c = cues[i]
            recalled = self.recall(c)
            tc = self.sep_target(targets[i])
            correct.append(cosine(recalled, tc))
        return np.mean(correct), np.mean([s > 0.5 for s in correct])
    def consolidate(self, **kwargs):
        return self.consolidator.consolidate(
            self.W, self.proj, self.target_proj, **kwargs)
 def gen_memories(n, dim=768):
    cues = [nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
            for _ in range(n)]
    targets = [nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
               for _ in range(n)]
    return cues, targets
 def test_basic_consolidation():
    """Does replay + homeostasis help?"""
    print("=== Test 1: Basic Consolidation Effect ===")
    for n_pairs in [100, 500]:
        mem = TestableMemory()
        cues, targets = gen_memories(n_pairs)
        # Learn
        for i in range(n_pairs):
            mem.learn(cues[i], targets[i])
        # Before consolidation
        cos_before, rate_before = mem.test_recall(cues, targets)
        w_norm_before = mem.W.data.norm().item()
        print(f"\n  {n_pairs} pairs:")
        print(f"    Before: CosSim={cos_before:.4f}, Rate={rate_before:.2%}, "
              f"W_norm={w_norm_before:.2f}")
        # Consolidation with different settings
        for epochs in [1, 3, 5, 10]:
            # Clone memory for each test
            mem_test = TestableMemory()
            mem_test.W.data.copy_(mem.W.data)
            mem_test.proj = mem.proj
            mem_test.target_proj = mem.target_proj
            mem_test.consolidator.replay_buffer = list(mem.consolidator.replay_buffer)
            stats = mem_test.consolidate(
                num_epochs=epochs, homeostasis_factor=0.95, prune_threshold=0.001)
            cos_after, rate_after = mem_test.test_recall(cues, targets)
            print(f"    After (epochs={epochs}): CosSim={cos_after:.4f}, "
                  f"Rate={rate_after:.2%}, "
                  f"W_norm={stats['final_w_norm']:.2f}, "
                  f"Sparsity={stats['final_sparsity']:.2%}")
 def test_noisy_replay():
    """Does replay with noise improve noise tolerance?"""
    print("\n=== Test 2: Noisy Replay for Robustness ===")
    n_pairs = 100
    mem_base = TestableMemory()
    cues, targets = gen_memories(n_pairs)
    for i in range(n_pairs):
        mem_base.learn(cues[i], targets[i])
    # Test at different noise levels
    test_noises = [0.0, 0.05, 0.1, 0.2]
    # No consolidation (baseline)
    print("\n  No consolidation:")
    for ns in test_noises:
        cos, rate = mem_base.test_recall(cues, targets, noise_std=ns)
        print(f"    test_noise={ns:.2f}: CosSim={cos:.4f}, Rate={rate:.2%}")
    # Consolidation with different replay noise
    for replay_noise in [0.0, 0.1, 0.5, 1.0]:
        mem_test = TestableMemory()
        mem_test.W.data.copy_(mem_base.W.data)
        mem_test.proj = mem_base.proj
        mem_test.target_proj = mem_base.target_proj
        mem_test.consolidator.replay_buffer = list(mem_base.consolidator.replay_buffer)
        mem_test.consolidate(num_epochs=5, replay_noise=replay_noise,
                             homeostasis_factor=0.95)
        print(f"\n  Consolidated (replay_noise={replay_noise}):")
        for ns in test_noises:
            cos, rate = mem_test.test_recall(cues, targets, noise_std=ns)
            print(f"    test_noise={ns:.2f}: CosSim={cos:.4f}, Rate={rate:.2%}")
 def test_multi_night():
    """Multi-night scenario: learn, consolidate, learn more.
    Do old memories survive?"""
    print("\n=== Test 3: Multi-Night Memory Survival ===")
    mem = TestableMemory()
    # Day 1: Learn 100 memories
    cues_d1, targets_d1 = gen_memories(100)
    for i in range(100):
        mem.learn(cues_d1[i], targets_d1[i])
    cos_d1, _ = mem.test_recall(cues_d1, targets_d1)
    print(f"  After Day 1 (100 memories): CosSim={cos_d1:.4f}")
    # Night 1: Consolidate
    stats = mem.consolidate(num_epochs=5, homeostasis_factor=0.95)
    cos_d1_after, _ = mem.test_recall(cues_d1, targets_d1)
    print(f"  After Night 1 consolidation: CosSim={cos_d1_after:.4f}, "
          f"W_norm={stats['final_w_norm']:.2f}")
    mem.consolidator.selective_clear(keep_fraction=0.3)
    # Day 2: Learn 100 more memories
    cues_d2, targets_d2 = gen_memories(100)
    for i in range(100):
        mem.learn(cues_d2[i], targets_d2[i])
    cos_d1_mid, _ = mem.test_recall(cues_d1, targets_d1)
    cos_d2_mid, _ = mem.test_recall(cues_d2, targets_d2)
    print(f"  After Day 2 (100 more): Day1={cos_d1_mid:.4f}, Day2={cos_d2_mid:.4f}")
    # Night 2: Consolidate (with day 1 carryover + day 2)
    stats = mem.consolidate(num_epochs=5, homeostasis_factor=0.95)
    cos_d1_final, _ = mem.test_recall(cues_d1, targets_d1)
    cos_d2_final, _ = mem.test_recall(cues_d2, targets_d2)
    print(f"  After Night 2: Day1={cos_d1_final:.4f}, Day2={cos_d2_final:.4f}, "
          f"W_norm={stats['final_w_norm']:.2f}")
    # Continue for 5 more days
    for day in range(3, 8):
        mem.consolidator.selective_clear(keep_fraction=0.3)
        cues_new, targets_new = gen_memories(100)
        for i in range(100):
            mem.learn(cues_new[i], targets_new[i])
        mem.consolidate(num_epochs=5, homeostasis_factor=0.95)
        cos_d1_now, _ = mem.test_recall(cues_d1, targets_d1)
        cos_d2_now, _ = mem.test_recall(cues_d2, targets_d2)
        cos_new, _ = mem.test_recall(cues_new, targets_new)
        w_norm = mem.W.data.norm().item()
        sparsity = (mem.W.data.abs() < 0.001).float().mean().item()
        print(f"  After Day {day}: Day1={cos_d1_now:.4f}, Day2={cos_d2_now:.4f}, "
              f"Latest={cos_new:.4f}, W_norm={w_norm:.1f}, Sparsity={sparsity:.2%}")
 def test_priority_replay():
    """Test selective consolidation: replay important memories more."""
    print("\n=== Test 4: Priority Replay ===")
    mem = TestableMemory()
    # 50 "important" memories (replay 5x)
    cues_imp, targets_imp = gen_memories(50)
    for i in range(50):
        mem.learn(cues_imp[i], targets_imp[i])
        # Record extra copies for priority replay
        cc = mem.sep(cues_imp[i])
        tc = mem.sep_target(targets_imp[i])
        for _ in range(4):  # 4 extra = 5x total
            mem.consolidator.record(cc, tc)
    # 50 "unimportant" memories (replay 1x, normal)
    cues_unimp, targets_unimp = gen_memories(50)
    for i in range(50):
        mem.learn(cues_unimp[i], targets_unimp[i])
    cos_imp_before, _ = mem.test_recall(cues_imp, targets_imp)
    cos_unimp_before, _ = mem.test_recall(cues_unimp, targets_unimp)
    print(f"  Before consolidation: Important={cos_imp_before:.4f}, "
          f"Unimportant={cos_unimp_before:.4f}")
    # Consolidate with strong homeostasis (will decay unimportant more)
    mem.consolidate(num_epochs=10, homeostasis_factor=0.90)
    cos_imp_after, _ = mem.test_recall(cues_imp, targets_imp)
    cos_unimp_after, _ = mem.test_recall(cues_unimp, targets_unimp)
    print(f"  After consolidation: Important={cos_imp_after:.4f}, "
          f"Unimportant={cos_unimp_after:.4f}")
    print(f"  Priority effect: Δimportant={cos_imp_after-cos_imp_before:+.4f}, "
          f"Δunimportant={cos_unimp_after-cos_unimp_before:+.4f}")
 def test_forgetting_curve():
    """Measure memory decay over multiple consolidation cycles without replay."""
    print("\n=== Test 5: Forgetting Curve ===")
    mem = TestableMemory()
    cues, targets = gen_memories(100)
    for i in range(100):
        mem.learn(cues[i], targets[i])
    cos0, _ = mem.test_recall(cues, targets)
    print(f"  Day 0: CosSim={cos0:.4f}")
    # Simulate nights with homeostasis but NO replay
    for night in range(1, 11):
        # Only homeostasis + pruning, no replay
        mem.W.data *= 0.95
        mask = mem.W.data.abs() >= 0.001
        mem.W.data *= mask.float()
        cos, rate = mem.test_recall(cues, targets)
        w_norm = mem.W.data.norm().item()
        print(f"  Night {night:2d} (no replay): CosSim={cos:.4f}, "
              f"Rate={rate:.2%}, W_norm={w_norm:.2f}")
    # Same but WITH replay
    print("\n  --- With replay ---")
    mem2 = TestableMemory()
    mem2.proj = mem.proj
    mem2.target_proj = mem.target_proj
    for i in range(100):
        mem2.learn(cues[i], targets[i])
    for night in range(1, 11):
        mem2.consolidate(num_epochs=1, homeostasis_factor=0.95)
        cos, rate = mem2.test_recall(cues, targets)
        w_norm = mem2.W.data.norm().item()
        print(f"  Night {night:2d} (with replay): CosSim={cos:.4f}, "
              f"Rate={rate:.2%}, W_norm={w_norm:.2f}")
 def main():
    print("=" * 60)
    print("Experiment 3: Sleep Consolidation")
    print("=" * 60)
    test_basic_consolidation()
    test_noisy_replay()
    test_multi_night()
    test_priority_replay()
    test_forgetting_curve()
 if __name__ == "__main__":
    main()
--- a/experiments/exp03b_consolidation_stress.py
+++ b/experiments/exp03b_consolidation_stress.py
@@ -0,0 +1,187 @@
 """Experiment 3b: Consolidation near capacity limits.
 With code_dim=16384 and k=20, capacity is so high that consolidation seems
 unnecessary. Test with smaller code_dim (2048) where capacity limits are lower
 and consolidation effects should be visible.
 Also test: stronger homeostasis to control W_norm growth.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.consolidation import MemoryConsolidator, winner_take_all
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class SmallMemory:
    """Smaller memory for capacity-limited tests."""
    def __init__(self, input_dim=768, code_dim=2048, k=50):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.target_proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                            * (1.0 / input_dim**0.5))
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim, device=DEVICE),
                              requires_grad=False)
        self.consolidator = MemoryConsolidator(code_dim, k)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def sep_target(self, x):
        return winner_take_all(x @ self.target_proj, self.k)
    def learn(self, cue, target, record=True):
        cc = self.sep(cue)
        tc = self.sep_target(target)
        self.W.data += torch.outer(tc, cc)
        if record:
            self.consolidator.record(cc, tc)
    def recall(self, cue):
        cc = self.sep(cue)
        raw = self.W @ cc
        return winner_take_all(raw, self.k)
    def test_recall(self, cues, targets):
        correct = []
        for i in range(len(cues)):
            recalled = self.recall(cues[i])
            tc = self.sep_target(targets[i])
            correct.append(cosine(recalled, tc))
        return np.mean(correct), np.mean([s > 0.5 for s in correct])
    def consolidate(self, **kwargs):
        return self.consolidator.consolidate(
            self.W, self.proj, self.target_proj, **kwargs)
 def gen_memories(n, dim=768):
    cues = [nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
            for _ in range(n)]
    targets = [nn.functional.normalize(torch.randn(dim, device=DEVICE), dim=0)
               for _ in range(n)]
    return cues, targets
 def test_capacity_with_consolidation():
    """Find where small memory breaks and see if consolidation helps."""
    print("=== Capacity with code_dim=2048, k=50 ===")
    for n_pairs in [50, 100, 200, 500, 1000, 2000]:
        mem_no_consol = SmallMemory()
        mem_with_consol = SmallMemory()
        mem_with_consol.proj = mem_no_consol.proj
        mem_with_consol.target_proj = mem_no_consol.target_proj
        cues, targets = gen_memories(n_pairs)
        # Learn in both
        for i in range(n_pairs):
            mem_no_consol.learn(cues[i], targets[i], record=False)
            mem_with_consol.learn(cues[i], targets[i], record=True)
        cos_no, rate_no = mem_no_consol.test_recall(cues, targets)
        # Consolidate with strong homeostasis
        mem_with_consol.consolidate(num_epochs=3, homeostasis_factor=0.80,
                                     prune_threshold=0.01)
        cos_yes, rate_yes = mem_with_consol.test_recall(cues, targets)
        w_no = mem_no_consol.W.data.norm().item()
        w_yes = mem_with_consol.W.data.norm().item()
        print(f"  N={n_pairs:>5}: "
              f"No_consol: CosSim={cos_no:.4f} Rate={rate_no:.0%} W={w_no:.0f} | "
              f"With_consol: CosSim={cos_yes:.4f} Rate={rate_yes:.0%} W={w_yes:.0f}")
 def test_multi_night_at_limit():
    """7-day scenario near capacity limits."""
    print("\n=== 7-Day Scenario (code_dim=2048, k=50, 200/day) ===")
    mem = SmallMemory()
    all_cues = []
    all_targets = []
    for day in range(1, 8):
        cues_today, targets_today = gen_memories(200)
        all_cues.extend(cues_today)
        all_targets.extend(targets_today)
        for i in range(200):
            mem.learn(cues_today[i], targets_today[i])
        # Test on all memories so far
        cos_all, rate_all = mem.test_recall(all_cues, all_targets)
        cos_today, rate_today = mem.test_recall(cues_today, targets_today)
        cos_day1, _ = mem.test_recall(all_cues[:200], all_targets[:200])
        w_norm = mem.W.data.norm().item()
        print(f"  Day {day} (total={len(all_cues)}): "
              f"All={cos_all:.4f}({rate_all:.0%}), "
              f"Today={cos_today:.4f}, Day1={cos_day1:.4f}, "
              f"W={w_norm:.0f}")
        # Night: consolidate
        mem.consolidate(num_epochs=3, homeostasis_factor=0.85,
                        prune_threshold=0.01)
        mem.consolidator.selective_clear(keep_fraction=0.3)
        cos_after, rate_after = mem.test_recall(all_cues, all_targets)
        cos_day1_after, _ = mem.test_recall(all_cues[:200], all_targets[:200])
        w_after = mem.W.data.norm().item()
        print(f"    → Night {day}: "
              f"All={cos_after:.4f}({rate_after:.0%}), Day1={cos_day1_after:.4f}, "
              f"W={w_after:.0f}")
 def test_homeostasis_sweep():
    """Find the right homeostasis factor."""
    print("\n=== Homeostasis Factor Sweep (500 pairs, 10 nights) ===")
    for hf in [1.0, 0.99, 0.95, 0.90, 0.85, 0.80, 0.70]:
        mem = SmallMemory()
        cues, targets = gen_memories(500)
        for i in range(500):
            mem.learn(cues[i], targets[i])
        for night in range(10):
            mem.consolidate(num_epochs=1, homeostasis_factor=hf)
        cos, rate = mem.test_recall(cues, targets)
        w = mem.W.data.norm().item()
        sp = (mem.W.data.abs() < 0.01).float().mean().item()
        print(f"  hf={hf:.2f}: CosSim={cos:.4f}, Rate={rate:.0%}, "
              f"W_norm={w:.1f}, Sparsity={sp:.2%}")
 def main():
    print("=" * 60)
    print("Experiment 3b: Consolidation Under Stress")
    print("=" * 60)
    test_capacity_with_consolidation()
    test_multi_night_at_limit()
    test_homeostasis_sweep()
 if __name__ == "__main__":
    main()
--- a/experiments/exp04_real_embeddings.py
+++ b/experiments/exp04_real_embeddings.py
@@ -0,0 +1,365 @@
 """Experiment 4: End-to-end with real sentence embeddings.
 All previous experiments used random vectors. Now test with actual semantic
 embeddings from a sentence transformer model. Key questions:
 1. Does pattern separation preserve semantic neighborhoods?
   (Similar sentences → similar/related codes?)
 2. Can we retrieve memories using paraphrased/related queries?
 3. Does the multi-hop chaining work with semantic embeddings?
 4. Noise tolerance: does embedding-space noise behave differently?
 5. Does a learned separator trained on real data improve noise tolerance?
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 # --- Test data: conversation-like memory pairs ---
 MEMORY_PAIRS = [
    # (context/cue, memory/fact to recall)
    ("What's the weather like today?", "User prefers to check weather every morning"),
    ("Let's deploy the new version", "The deployment pipeline uses GitHub Actions with k3s"),
    ("The database is slow again", "Last time DB was slow it was because of missing index on users table"),
    ("Can you review my pull request?", "User prefers small PRs with clear commit messages"),
    ("I need to fix the authentication bug", "Auth service uses JWT tokens with 24h expiry stored in Redis"),
    ("Let's set up monitoring", "Prometheus + Grafana stack is already running on the OCI cluster"),
    ("The API is returning 500 errors", "Last 500 error was caused by OOM in the Python worker"),
    ("I want to learn Rust", "User has strong Python and Go background, new to systems programming"),
    ("Schedule a meeting with the team", "Team standup is at 10am London time, Mon-Fri"),
    ("How do I configure nginx?", "The project uses Traefik as reverse proxy, not nginx"),
    ("The tests are failing in CI", "CI runs on Gitea Actions, tests need postgres service container"),
    ("Let's optimize the search function", "Search uses Elasticsearch, recently upgraded to v8"),
    ("I need to backup the database", "Backups run daily at 3am UTC via cron job to S3"),
    ("The memory usage is too high", "Python service has a known memory leak in the websocket handler"),
    ("Can you help with the Docker setup?", "Project uses docker-compose for local dev, k3s for production"),
    ("I want to add caching", "Redis is already available at redis.internal:6379"),
    ("The frontend is loading slowly", "CDN is CloudFlare, assets should be cached with 1h TTL"),
    ("Let's refactor the payment module", "Payment uses Stripe API, webhook handler is in payments/webhook.py"),
    ("I need to set up a new server", "Standard setup: Ubuntu 22.04, Docker, Tailscale, monitoring agent"),
    ("The log files are too large", "Logs rotate daily, kept for 30 days, shipped to Loki"),
 ]
 # Paraphrased queries (semantically similar to cues but different wording)
 PARAPHRASED_QUERIES = [
    "How's the weather outside?",
    "We should push the new release",
    "The DB performance is terrible",
    "Please look at my code changes",
    "There's a login bug I need to fix",
    "We need better observability",
    "Getting internal server errors from the API",
    "I'm interested in learning a new language like Rust",
    "Need to organize a team meeting",
    "How to set up nginx as a web server?",
    "CI tests keep breaking",
    "The search feature needs to be faster",
    "How do I create a database backup?",
    "The service is using too much RAM",
    "Help me with Docker configuration",
    "I want to implement caching for the API",
    "The website is really slow",
    "The payment system needs restructuring",
    "Setting up a fresh Linux server",
    "Logs are eating up disk space",
 ]
 def load_model():
    """Load a small, fast sentence transformer."""
    from sentence_transformers import SentenceTransformer
    print("Loading sentence-transformers model...")
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    print(f"  Model loaded. Embedding dim: {model.get_sentence_embedding_dimension()}")
    return model
 def embed_texts(model, texts):
    """Encode texts to normalized embeddings on GPU."""
    embeddings = model.encode(texts, convert_to_tensor=True,
                               normalize_embeddings=True, device=DEVICE)
    return embeddings
 class HebbianMemory:
    def __init__(self, input_dim, code_dim=16384, k=20):
        self.k = k
        self.code_dim = code_dim
        self.input_dim = input_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.target_proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                            * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
        self.cue_store = []  # For coarse retrieval
        self.target_store = []
        self.metadata = []  # Store original text for debugging
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def sep_target(self, x):
        return winner_take_all(x @ self.target_proj, self.k)
    def learn(self, cue_emb, target_emb, cue_text="", target_text=""):
        cc = self.sep(cue_emb)
        tc = self.sep_target(target_emb)
        self.W += torch.outer(tc, cc)
        self.cue_store.append(cue_emb.detach().clone())
        self.target_store.append(target_emb.detach().clone())
        self.metadata.append({"cue": cue_text, "target": target_text})
    def recall_direct(self, query_emb):
        """Direct WTA recall (no coarse retrieval)."""
        cc = self.sep(query_emb)
        raw = self.W @ cc
        return winner_take_all(raw, self.k)
    def recall_coarse_to_fine(self, query_emb, top_n=3):
        """Coarse: NN in embedding space. Fine: Hebbian recall from best match."""
        if not self.cue_store:
            return torch.zeros(self.code_dim, device=DEVICE)
        cue_matrix = torch.stack(self.cue_store)
        sims = nn.functional.cosine_similarity(
            query_emb.unsqueeze(0), cue_matrix, dim=-1)
        best_idx = sims.argmax()
        best_cue = self.cue_store[best_idx]
        cc = self.sep(best_cue)
        raw = self.W @ cc
        return winner_take_all(raw, self.k), best_idx.item()
    def find_nearest_target(self, recalled_code, top_n=3):
        """Given a recalled code, find which stored targets it matches."""
        target_codes = [self.sep_target(t) for t in self.target_store]
        sims = [cosine(recalled_code, tc) for tc in target_codes]
        sorted_idx = np.argsort(sims)[::-1]
        return [(int(i), sims[i], self.metadata[i]) for i in sorted_idx[:top_n]]
 def test_basic_recall(model, mem):
    """Test: can we recall the correct memory for each cue?"""
    print("\n=== Test 1: Direct Recall (exact cues) ===")
    cue_texts = [p[0] for p in MEMORY_PAIRS]
    target_texts = [p[1] for p in MEMORY_PAIRS]
    correct_count = 0
    for i in range(len(MEMORY_PAIRS)):
        cue_emb = embed_texts(model, [cue_texts[i]])[0]
        recalled = mem.recall_direct(cue_emb)
        matches = mem.find_nearest_target(recalled, top_n=3)
        is_correct = matches[0][0] == i
        correct_count += is_correct
        if not is_correct and i < 5:  # Show first few errors
            print(f"  ✗ Cue: '{cue_texts[i][:40]}...'")
            print(f"    Expected: [{i}] '{target_texts[i][:50]}...'")
            print(f"    Got:      [{matches[0][0]}] '{matches[0][2]['target'][:50]}...' "
                  f"(sim={matches[0][1]:.3f})")
    print(f"  Direct recall: {correct_count}/{len(MEMORY_PAIRS)} "
          f"({correct_count/len(MEMORY_PAIRS):.0%})")
    return correct_count / len(MEMORY_PAIRS)
 def test_paraphrase_recall(model, mem):
    """Test: can we recall memories using paraphrased queries?"""
    print("\n=== Test 2: Paraphrase Recall ===")
    target_texts = [p[1] for p in MEMORY_PAIRS]
    # Direct recall (WTA)
    direct_correct = 0
    coarse_correct = 0
    for i, query in enumerate(PARAPHRASED_QUERIES):
        query_emb = embed_texts(model, [query])[0]
        # Direct
        recalled = mem.recall_direct(query_emb)
        matches = mem.find_nearest_target(recalled, top_n=3)
        is_direct = matches[0][0] == i
        direct_correct += is_direct
        # Coarse-to-fine
        recalled_cf, best_idx = mem.recall_coarse_to_fine(query_emb)
        matches_cf = mem.find_nearest_target(recalled_cf, top_n=3)
        is_coarse = matches_cf[0][0] == i
        coarse_correct += is_coarse
        if i < 5:
            status_d = "✓" if is_direct else "✗"
            status_c = "✓" if is_coarse else "✗"
            print(f"  [{status_d}/{status_c}] Q: '{query[:50]}...'")
            if not is_direct:
                print(f"       Direct got: [{matches[0][0]}] "
                      f"'{matches[0][2]['target'][:50]}...'")
            if is_coarse and not is_direct:
                print(f"       Coarse-fine got it right! (via cue #{best_idx})")
    n = len(PARAPHRASED_QUERIES)
    print(f"\n  Direct recall: {direct_correct}/{n} ({direct_correct/n:.0%})")
    print(f"  Coarse-to-fine: {coarse_correct}/{n} ({coarse_correct/n:.0%})")
    return direct_correct / n, coarse_correct / n
 def test_semantic_neighborhood(model, mem):
    """Test: do semantically related cues retrieve related memories?"""
    print("\n=== Test 3: Semantic Neighborhood ===")
    test_queries = [
        "server is down",          # Should relate to: API 500, deployment, monitoring
        "performance problem",      # Should relate to: DB slow, memory, search
        "security issue",           # Should relate to: auth bug, JWT tokens
        "infrastructure setup",     # Should relate to: server, Docker, k3s
    ]
    for query in test_queries:
        query_emb = embed_texts(model, [query])[0]
        recalled = mem.recall_direct(query_emb)
        matches = mem.find_nearest_target(recalled, top_n=3)
        print(f"\n  Query: '{query}'")
        for rank, (idx, sim, meta) in enumerate(matches):
            print(f"    #{rank+1} (sim={sim:.3f}): {meta['target'][:60]}...")
 def test_multihop_semantic(model, mem):
    """Test: multi-hop with semantic embeddings.
    Learn: "weather" → "morning routine" → "coffee shop"
    Can we go from "weather" to "coffee shop" in 2 hops?
    """
    print("\n=== Test 4: Multi-hop with Semantic Chains ===")
    chains = [
        ["What's the weather?", "I usually check weather before going out",
         "My favorite coffee shop is around the corner", "They have great latte art"],
        ["Let's review the code", "The code review found a memory leak",
         "Memory leaks often cause OOM kills", "We need to add memory limits to k8s pods"],
        ["Deploy to production", "Production uses blue-green deployment",
         "The blue environment is currently active", "Switch DNS to green when ready"],
    ]
    for chain_idx, chain in enumerate(chains):
        print(f"\n  Chain {chain_idx+1}: {' → '.join([c[:20]+'...' for c in chain])}")
        # Create a separate small memory for this chain
        chain_mem = HebbianMemory(384, code_dim=8192, k=20)
        chain_embs = [embed_texts(model, [text])[0] for text in chain]
        # Learn consecutive pairs
        for i in range(len(chain) - 1):
            chain_mem.learn(chain_embs[i], chain_embs[i+1],
                           chain[i], chain[i+1])
        # Test recall at each hop distance
        for hops in range(1, len(chain)):
            start_emb = chain_embs[0]
            target_code = chain_mem.sep_target(chain_embs[hops])
            # Multi-hop
            code = chain_mem.sep(start_emb)
            for _ in range(hops):
                raw = chain_mem.W @ code
                code = winner_take_all(raw, chain_mem.k)
            sim = cosine(code, target_code)
            print(f"    {hops} hop(s): '{chain[0][:25]}...' → "
                  f"'{chain[hops][:25]}...' sim={sim:.4f}")
 def test_embedding_distances(model):
    """Analyze: how far apart are original and paraphrased embeddings?"""
    print("\n=== Test 5: Embedding Distance Analysis ===")
    cue_texts = [p[0] for p in MEMORY_PAIRS]
    cue_embs = embed_texts(model, cue_texts)
    para_embs = embed_texts(model, PARAPHRASED_QUERIES)
    # Same-pair distances
    same_pair_sims = []
    for i in range(len(cue_texts)):
        s = cosine(cue_embs[i], para_embs[i])
        same_pair_sims.append(s)
    # Different-pair distances
    diff_pair_sims = []
    for i in range(len(cue_texts)):
        for j in range(len(cue_texts)):
            if i != j:
                diff_pair_sims.append(cosine(cue_embs[i], para_embs[j]))
    print(f"  Same-pair cosine sim:  mean={np.mean(same_pair_sims):.4f}, "
          f"min={np.min(same_pair_sims):.4f}, max={np.max(same_pair_sims):.4f}")
    print(f"  Diff-pair cosine sim:  mean={np.mean(diff_pair_sims):.4f}, "
          f"min={np.min(diff_pair_sims):.4f}, max={np.max(diff_pair_sims):.4f}")
    print(f"  Gap: {np.mean(same_pair_sims) - np.mean(diff_pair_sims):.4f}")
    # Show some examples
    print("\n  Sample distances:")
    for i in range(5):
        print(f"    '{cue_texts[i][:35]}...' ↔ '{PARAPHRASED_QUERIES[i][:35]}...' "
              f"sim={same_pair_sims[i]:.4f}")
 def main():
    print("=" * 60)
    print("Experiment 4: Real Sentence Embeddings")
    print("=" * 60)
    model = load_model()
    # Analyze embedding space first
    test_embedding_distances(model)
    # Build memory
    print("\n--- Building memory ---")
    embed_dim = model.get_sentence_embedding_dimension()
    mem = HebbianMemory(embed_dim, code_dim=16384, k=20)
    cue_texts = [p[0] for p in MEMORY_PAIRS]
    target_texts = [p[1] for p in MEMORY_PAIRS]
    cue_embs = embed_texts(model, cue_texts)
    target_embs = embed_texts(model, target_texts)
    for i in range(len(MEMORY_PAIRS)):
        mem.learn(cue_embs[i], target_embs[i], cue_texts[i], target_texts[i])
    print(f"  Stored {len(MEMORY_PAIRS)} memory pairs")
    # Run tests
    test_basic_recall(model, mem)
    test_paraphrase_recall(model, mem)
    test_semantic_neighborhood(model, mem)
    test_multihop_semantic(model, mem)
 if __name__ == "__main__":
    main()
--- a/experiments/exp04b_multihop_fix.py
+++ b/experiments/exp04b_multihop_fix.py
@@ -0,0 +1,256 @@
 """Experiment 4b: Fix multi-hop for real embeddings.
 Problem: exp04 used separate projections for cues and targets,
 so target codes lived in a different space from cue codes.
 Multi-hop requires: recalled_target_code CAN be used as next cue_code.
 Fix: Use a SINGLE projection for everything.
 W maps from code_space → code_space.
 W @ sep(A) ≈ sep(B) when we learned (A, B).
 Then W @ sep(B) ≈ sep(C) if we also learned (B, C).
 Also: retest paraphrase recall with single projection and various code_dim/k.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class UnifiedHebbianMemory:
    """Hebbian memory with single unified projection.
    Cues and targets share the same code space → multi-hop works.
    """
    def __init__(self, input_dim, code_dim=16384, k=20):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
        self.cue_store = []
        self.target_store = []
        self.metadata = []
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb, cue_text="", target_text=""):
        cc = self.sep(cue_emb)
        tc = self.sep(target_emb)
        self.W += torch.outer(tc, cc)
        self.cue_store.append(cue_emb.detach().clone())
        self.target_store.append(target_emb.detach().clone())
        self.metadata.append({"cue": cue_text, "target": target_text})
    def recall(self, query_emb, hops=1):
        code = self.sep(query_emb)
        for _ in range(hops):
            raw = self.W @ code
            code = winner_take_all(raw, self.k)
        return code
    def recall_coarse_to_fine(self, query_emb):
        """NN lookup → exact Hebbian recall."""
        cue_matrix = torch.stack(self.cue_store)
        sims = nn.functional.cosine_similarity(
            query_emb.unsqueeze(0), cue_matrix, dim=-1)
        best_idx = sims.argmax()
        code = self.sep(self.cue_store[best_idx])
        raw = self.W @ code
        return winner_take_all(raw, self.k), best_idx.item()
    def find_nearest_target(self, recalled_code, top_n=3):
        target_codes = [self.sep(t) for t in self.target_store]  # Same projection!
        sims = [cosine(recalled_code, tc) for tc in target_codes]
        sorted_idx = np.argsort(sims)[::-1]
        return [(int(i), sims[i], self.metadata[i]) for i in sorted_idx[:top_n]]
 MEMORY_PAIRS = [
    ("What's the weather like today?", "User prefers to check weather every morning"),
    ("Let's deploy the new version", "The deployment pipeline uses GitHub Actions with k3s"),
    ("The database is slow again", "Last time DB was slow it was because of missing index on users table"),
    ("Can you review my pull request?", "User prefers small PRs with clear commit messages"),
    ("I need to fix the authentication bug", "Auth service uses JWT tokens with 24h expiry stored in Redis"),
    ("Let's set up monitoring", "Prometheus + Grafana stack is already running on the OCI cluster"),
    ("The API is returning 500 errors", "Last 500 error was caused by OOM in the Python worker"),
    ("I want to learn Rust", "User has strong Python and Go background, new to systems programming"),
    ("Schedule a meeting with the team", "Team standup is at 10am London time, Mon-Fri"),
    ("How do I configure nginx?", "The project uses Traefik as reverse proxy, not nginx"),
    ("The tests are failing in CI", "CI runs on Gitea Actions, tests need postgres service container"),
    ("Let's optimize the search function", "Search uses Elasticsearch, recently upgraded to v8"),
    ("I need to backup the database", "Backups run daily at 3am UTC via cron job to S3"),
    ("The memory usage is too high", "Python service has a known memory leak in the websocket handler"),
    ("Can you help with the Docker setup?", "Project uses docker-compose for local dev, k3s for production"),
    ("I want to add caching", "Redis is already available at redis.internal:6379"),
    ("The frontend is loading slowly", "CDN is CloudFlare, assets should be cached with 1h TTL"),
    ("Let's refactor the payment module", "Payment uses Stripe API, webhook handler is in payments/webhook.py"),
    ("I need to set up a new server", "Standard setup: Ubuntu 22.04, Docker, Tailscale, monitoring agent"),
    ("The log files are too large", "Logs rotate daily, kept for 30 days, shipped to Loki"),
 ]
 PARAPHRASED_QUERIES = [
    "How's the weather outside?",
    "We should push the new release",
    "The DB performance is terrible",
    "Please look at my code changes",
    "There's a login bug I need to fix",
    "We need better observability",
    "Getting internal server errors from the API",
    "I'm interested in learning a new language like Rust",
    "Need to organize a team meeting",
    "How to set up nginx as a web server?",
    "CI tests keep breaking",
    "The search feature needs to be faster",
    "How do I create a database backup?",
    "The service is using too much RAM",
    "Help me with Docker configuration",
    "I want to implement caching for the API",
    "The website is really slow",
    "The payment system needs restructuring",
    "Setting up a fresh Linux server",
    "Logs are eating up disk space",
 ]
 def load_model():
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    return model
 def embed_texts(model, texts):
    return model.encode(texts, convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)
 def test_multihop(model):
    """Multi-hop with unified projection."""
    print("\n=== Multi-hop (unified projection) ===")
    chains = [
        ["What's the weather?", "I usually check weather before going out",
         "My favorite coffee shop is around the corner", "They have great latte art"],
        ["Let's review the code", "The code review found a memory leak",
         "Memory leaks often cause OOM kills", "We need to add memory limits to k8s pods"],
        ["Deploy to production", "Production uses blue-green deployment",
         "The blue environment is currently active", "Switch DNS to green when ready"],
        ["The server crashed", "Check the error logs first",
         "Logs show out of memory error", "Need to increase pod memory limit"],
    ]
    embed_dim = model.get_sentence_embedding_dimension()
    for chain in chains:
        # Separate memory per chain to avoid cross-chain interference
        mem = UnifiedHebbianMemory(embed_dim, code_dim=8192, k=20)
        chain_embs = [embed_texts(model, [t])[0] for t in chain]
        # Learn consecutive pairs
        for i in range(len(chain) - 1):
            mem.learn(chain_embs[i], chain_embs[i+1], chain[i], chain[i+1])
        print(f"\n  Chain: {' → '.join([c[:20]+'...' for c in chain])}")
        for hops in range(1, len(chain)):
            recalled = mem.recall(chain_embs[0], hops=hops)
            target_code = mem.sep(chain_embs[hops])
            sim = cosine(recalled, target_code)
            status = "✓" if sim > 0.5 else "✗"
            print(f"    {status} {hops} hop(s): → '{chain[hops][:30]}...' sim={sim:.4f}")
    # Test multi-hop with all chains in ONE memory
    print("\n  --- All chains in ONE memory ---")
    mem_all = UnifiedHebbianMemory(embed_dim, code_dim=16384, k=20)
    all_chain_embs = []
    for chain in chains:
        embs = [embed_texts(model, [t])[0] for t in chain]
        all_chain_embs.append(embs)
        for i in range(len(chain) - 1):
            mem_all.learn(embs[i], embs[i+1], chain[i], chain[i+1])
    for ci, chain in enumerate(chains):
        for hops in range(1, len(chain)):
            recalled = mem_all.recall(all_chain_embs[ci][0], hops=hops)
            target_code = mem_all.sep(all_chain_embs[ci][hops])
            sim = cosine(recalled, target_code)
            status = "✓" if sim > 0.5 else "✗"
            print(f"    {status} Chain{ci+1} {hops}hop: → '{chain[hops][:30]}...' sim={sim:.4f}")
 def test_paraphrase_with_configs(model):
    """Test paraphrase recall with different code_dim/k configs."""
    print("\n=== Paraphrase Recall: Config Sweep ===")
    embed_dim = model.get_sentence_embedding_dimension()
    cue_embs = embed_texts(model, [p[0] for p in MEMORY_PAIRS])
    target_embs = embed_texts(model, [p[1] for p in MEMORY_PAIRS])
    para_embs = embed_texts(model, PARAPHRASED_QUERIES)
    configs = [
        (4096, 20), (8192, 20), (16384, 20), (32768, 20),
        (16384, 10), (16384, 50), (16384, 100),
    ]
    for code_dim, k in configs:
        mem = UnifiedHebbianMemory(embed_dim, code_dim, k)
        for i in range(len(MEMORY_PAIRS)):
            mem.learn(cue_embs[i], target_embs[i],
                     MEMORY_PAIRS[i][0], MEMORY_PAIRS[i][1])
        # Direct recall with paraphrased queries
        direct_correct = 0
        coarse_correct = 0
        for i in range(len(PARAPHRASED_QUERIES)):
            # Direct
            recalled = mem.recall(para_embs[i])
            matches = mem.find_nearest_target(recalled, top_n=1)
            if matches[0][0] == i:
                direct_correct += 1
            # Coarse-to-fine
            recalled_cf, _ = mem.recall_coarse_to_fine(para_embs[i])
            matches_cf = mem.find_nearest_target(recalled_cf, top_n=1)
            if matches_cf[0][0] == i:
                coarse_correct += 1
        n = len(PARAPHRASED_QUERIES)
        print(f"  code={code_dim:>5}, k={k:>3}: "
              f"Direct={direct_correct}/{n} ({direct_correct/n:.0%}), "
              f"Coarse={coarse_correct}/{n} ({coarse_correct/n:.0%})")
 def main():
    print("=" * 60)
    print("Experiment 4b: Multi-hop Fix + Config Sweep")
    print("=" * 60)
    model = load_model()
    test_multihop(model)
    test_paraphrase_with_configs(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp04c_optimal_config.py
+++ b/experiments/exp04c_optimal_config.py
@@ -0,0 +1,228 @@
 """Experiment 4c: Find optimal config for real-world use.
 From exp04b: k=50 gives 95% paraphrase recall (best).
 Need to verify capacity is still sufficient at k=50.
 Also: test with more realistic memory counts (100-1000).
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class UnifiedHebbianMemory:
    def __init__(self, input_dim, code_dim, k):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb):
        self.W += torch.outer(self.sep(target_emb), self.sep(cue_emb))
    def recall(self, query_emb):
        code = self.sep(query_emb)
        raw = self.W @ code
        return winner_take_all(raw, self.k)
 def test_capacity_with_real_embeddings(model, code_dim, k, max_memories=2000):
    """Generate lots of diverse sentence pairs and test recall."""
    from sentence_transformers import SentenceTransformer
    # Generate diverse sentences programmatically
    topics = [
        "deploy", "database", "API", "testing", "monitoring", "security",
        "frontend", "backend", "caching", "logging", "backup", "server",
        "CI/CD", "Docker", "Kubernetes", "microservice", "authentication",
        "performance", "debugging", "refactoring"
    ]
    actions = [
        "is broken", "needs updating", "has a bug", "was configured wrong",
        "needs optimization", "requires migration", "should be refactored",
        "has a memory leak", "is timing out", "needs documentation"
    ]
    facts = [
        "was fixed last week by adding an index",
        "uses the new v3 API endpoint",
        "is scheduled for maintenance on Friday",
        "requires admin access to modify",
        "has a known issue with large payloads",
        "was migrated from AWS to GCP",
        "needs Python 3.12 or higher",
        "uses Redis for session storage",
        "has rate limiting at 1000 req/min",
        "is monitored by PagerDuty"
    ]
    cue_sentences = []
    target_sentences = []
    for i in range(max_memories):
        topic = topics[i % len(topics)]
        action = actions[i % len(actions)]
        fact = facts[i % len(facts)]
        idx = i // (len(topics) * len(actions))
        cue_sentences.append(f"The {topic} system {action} (issue #{i})")
        target_sentences.append(f"{topic} {fact}, ticket #{i}, priority {idx}")
    embed_dim = model.get_sentence_embedding_dimension()
    mem = UnifiedHebbianMemory(embed_dim, code_dim, k)
    # Encode in batches
    batch_size = 256
    checkpoints = [50, 100, 200, 500, 1000, 2000]
    all_cue_embs = []
    all_target_embs = []
    print(f"  Config: code_dim={code_dim}, k={k}")
    for start in range(0, max_memories, batch_size):
        end = min(start + batch_size, max_memories)
        cue_embs = model.encode(cue_sentences[start:end],
                                convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
        target_embs = model.encode(target_sentences[start:end],
                                   convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)
        for i in range(cue_embs.shape[0]):
            mem.learn(cue_embs[i], target_embs[i])
            all_cue_embs.append(cue_embs[i])
            all_target_embs.append(target_embs[i])
        total = len(all_cue_embs)
        if total in checkpoints:
            # Test on random sample
            sample_n = min(100, total)
            indices = torch.randperm(total)[:sample_n].tolist()
            correct = 0
            for idx in indices:
                recalled = mem.recall(all_cue_embs[idx])
                target_code = mem.sep(all_target_embs[idx])
                if cosine(recalled, target_code) > 0.5:
                    correct += 1
            w_norm = mem.W.norm().item()
            print(f"    N={total:>5}: Recall={correct}/{sample_n} "
                  f"({correct/sample_n:.0%}), W_norm={w_norm:.0f}")
 def test_paraphrase_at_scale(model, code_dim, k, n_memories):
    """Add many memories, then test paraphrase recall on a subset."""
    embed_dim = model.get_sentence_embedding_dimension()
    mem = UnifiedHebbianMemory(embed_dim, code_dim, k)
    # Add background memories (noise)
    bg_cues = [f"Background task number {i} about topic {i%20}" for i in range(n_memories)]
    bg_targets = [f"Background fact {i} with detail {i%10}" for i in range(n_memories)]
    bg_cue_embs = model.encode(bg_cues, convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE,
                                batch_size=256)
    bg_target_embs = model.encode(bg_targets, convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE,
                                   batch_size=256)
    for i in range(n_memories):
        mem.learn(bg_cue_embs[i], bg_target_embs[i])
    # Now add our specific test memories
    test_pairs = [
        ("What's the weather like today?", "User prefers to check weather every morning"),
        ("Let's deploy the new version", "The deployment pipeline uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table caused slowdown last time"),
        ("I need to fix the auth bug", "Auth service uses JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "Last 500 was caused by OOM in the Python worker"),
    ]
    paraphrases = [
        "How's the weather outside?",
        "We should push the new release",
        "DB performance is terrible",
        "There's a login bug to fix",
        "Getting internal server errors",
    ]
    test_cue_embs = model.encode([p[0] for p in test_pairs],
                                  convert_to_tensor=True,
                                  normalize_embeddings=True, device=DEVICE)
    test_target_embs = model.encode([p[1] for p in test_pairs],
                                     convert_to_tensor=True,
                                     normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    for i in range(len(test_pairs)):
        mem.learn(test_cue_embs[i], test_target_embs[i])
    # Test exact recall
    exact_correct = 0
    for i in range(len(test_pairs)):
        recalled = mem.recall(test_cue_embs[i])
        tc = mem.sep(test_target_embs[i])
        if cosine(recalled, tc) > 0.5:
            exact_correct += 1
    # Test paraphrase recall
    para_correct = 0
    for i in range(len(paraphrases)):
        recalled = mem.recall(para_embs[i])
        tc = mem.sep(test_target_embs[i])
        if cosine(recalled, tc) > 0.5:
            para_correct += 1
    n = len(test_pairs)
    print(f"  bg={n_memories}, code={code_dim}, k={k}: "
          f"Exact={exact_correct}/{n}, Para={para_correct}/{n}")
    return exact_correct / n, para_correct / n
 def main():
    print("=" * 60)
    print("Experiment 4c: Optimal Config + Scale Testing")
    print("=" * 60)
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    # Test 1: Capacity with real embeddings
    print("\n=== Capacity Test ===")
    for code_dim, k in [(8192, 50), (16384, 50), (16384, 20), (32768, 50)]:
        test_capacity_with_real_embeddings(model, code_dim, k, max_memories=2000)
        print()
    # Test 2: Paraphrase at scale
    print("\n=== Paraphrase Recall at Scale ===")
    for n_bg in [0, 100, 500, 1000]:
        for code_dim, k in [(8192, 50), (16384, 50)]:
            test_paraphrase_at_scale(model, code_dim, k, n_bg)
 if __name__ == "__main__":
    main()
--- a/experiments/exp05_benchmark.py
+++ b/experiments/exp05_benchmark.py
@@ -0,0 +1,211 @@
 """Experiment 5: Performance benchmarks.
 Measure:
 1. Learning throughput (memories/second)
 2. Recall latency (ms per query)
 3. GPU memory usage at different scales
 4. Multi-hop latency vs hops
 5. End-to-end: embed + separate + recall pipeline
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class BenchMemory:
    def __init__(self, input_dim, code_dim, k):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue, target):
        self.W += torch.outer(self.sep(target), self.sep(cue))
    def recall(self, query, hops=1):
        code = self.sep(query)
        for _ in range(hops):
            code = winner_take_all(self.W @ code, self.k)
        return code
 def benchmark_learn(input_dim, code_dim, k, n_memories):
    """Measure learning throughput."""
    mem = BenchMemory(input_dim, code_dim, k)
    cues = torch.randn(n_memories, input_dim, device=DEVICE)
    targets = torch.randn(n_memories, input_dim, device=DEVICE)
    torch.cuda.synchronize()
    t0 = time.time()
    for i in range(n_memories):
        mem.learn(cues[i], targets[i])
    torch.cuda.synchronize()
    dt = time.time() - t0
    return n_memories / dt, dt
 def benchmark_recall(input_dim, code_dim, k, n_memories, n_queries=1000, hops=1):
    """Measure recall latency."""
    mem = BenchMemory(input_dim, code_dim, k)
    # Pre-fill
    for _ in range(n_memories):
        c = torch.randn(input_dim, device=DEVICE)
        t = torch.randn(input_dim, device=DEVICE)
        mem.learn(c, t)
    queries = torch.randn(n_queries, input_dim, device=DEVICE)
    torch.cuda.synchronize()
    t0 = time.time()
    for i in range(n_queries):
        mem.recall(queries[i], hops=hops)
    torch.cuda.synchronize()
    dt = time.time() - t0
    return dt / n_queries * 1000  # ms per query
 def benchmark_memory_usage(input_dim, code_dims):
    """Measure GPU memory at different code_dim."""
    results = {}
    for cd in code_dims:
        torch.cuda.empty_cache()
        torch.cuda.reset_peak_memory_stats()
        before = torch.cuda.memory_allocated()
        mem = BenchMemory(input_dim, cd, k=50)
        # Learn 1000 memories
        for _ in range(1000):
            c = torch.randn(input_dim, device=DEVICE)
            t = torch.randn(input_dim, device=DEVICE)
            mem.learn(c, t)
        after = torch.cuda.memory_allocated()
        peak = torch.cuda.max_memory_allocated()
        w_size = cd * cd * 4 / 1024**2  # MB
        proj_size = input_dim * cd * 4 / 1024**2  # MB
        total_allocated = (after - before) / 1024**2
        results[cd] = {
            "W_size_MB": w_size,
            "proj_size_MB": proj_size,
            "total_allocated_MB": total_allocated,
            "peak_MB": peak / 1024**2,
        }
        print(f"  code_dim={cd:>6}: W={w_size:.0f}MB, proj={proj_size:.0f}MB, "
              f"total={total_allocated:.0f}MB")
        del mem
    return results
 def main():
    print("=" * 60)
    print("Experiment 5: Performance Benchmarks")
    print("=" * 60)
    input_dim = 384  # MiniLM dimension
    # Test 1: Learning throughput
    print("\n=== Learning Throughput ===")
    for code_dim, k in [(8192, 50), (16384, 50), (32768, 50)]:
        for n in [1000, 5000, 10000]:
            rate, dt = benchmark_learn(input_dim, code_dim, k, n)
            print(f"  code={code_dim}, k={k}, N={n:>5}: "
                  f"{rate:>8.0f} memories/s ({dt:.2f}s)")
        torch.cuda.empty_cache()
    # Test 2: Recall latency
    print("\n=== Recall Latency ===")
    for code_dim, k in [(8192, 50), (16384, 50), (32768, 50)]:
        for n_mem in [100, 1000, 10000]:
            ms = benchmark_recall(input_dim, code_dim, k, n_mem, n_queries=1000)
            print(f"  code={code_dim}, k={k}, N={n_mem:>5}: {ms:.3f} ms/query")
        torch.cuda.empty_cache()
    # Test 3: Multi-hop latency
    print("\n=== Multi-hop Latency ===")
    for hops in [1, 2, 3, 5, 10]:
        ms = benchmark_recall(input_dim, 16384, 50, 1000, n_queries=1000, hops=hops)
        print(f"  hops={hops:>2}: {ms:.3f} ms/query")
    # Test 4: GPU Memory
    print("\n=== GPU Memory Usage ===")
    benchmark_memory_usage(input_dim, [4096, 8192, 16384, 32768, 65536])
    # Test 5: End-to-end with sentence-transformers
    print("\n=== End-to-End Pipeline Latency ===")
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    mem = BenchMemory(384, 16384, 50)
    # Pre-fill 1000 memories
    sentences = [f"This is test sentence number {i}" for i in range(1000)]
    embs = model.encode(sentences, convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)
    for i in range(1000):
        mem.learn(embs[i], embs[min(i+1, 999)])
    # Benchmark single query pipeline
    query = "What is the test sentence?"
    n_runs = 100
    torch.cuda.synchronize()
    t0 = time.time()
    for _ in range(n_runs):
        q_emb = model.encode([query], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
        recalled = mem.recall(q_emb, hops=1)
    torch.cuda.synchronize()
    dt = (time.time() - t0) / n_runs * 1000
    # Breakdown
    t_embed = 0
    t_recall = 0
    for _ in range(n_runs):
        torch.cuda.synchronize()
        t1 = time.time()
        q_emb = model.encode([query], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
        torch.cuda.synchronize()
        t2 = time.time()
        recalled = mem.recall(q_emb, hops=1)
        torch.cuda.synchronize()
        t3 = time.time()
        t_embed += t2 - t1
        t_recall += t3 - t2
    t_embed = t_embed / n_runs * 1000
    t_recall = t_recall / n_runs * 1000
    print(f"  Total: {dt:.1f} ms/query")
    print(f"    Embedding: {t_embed:.1f} ms")
    print(f"    Recall: {t_recall:.3f} ms")
    print(f"    Ratio: embedding is {t_embed/t_recall:.0f}x slower than recall")
 if __name__ == "__main__":
    main()
--- a/experiments/exp05b_benchmark_lite.py
+++ b/experiments/exp05b_benchmark_lite.py
@@ -0,0 +1,158 @@
 """Experiment 5b: Lightweight performance benchmarks.
 Skip the 65536 config that OOMs.
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class BenchMemory:
    def __init__(self, input_dim, code_dim, k):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue, target):
        self.W += torch.outer(self.sep(target), self.sep(cue))
    def recall(self, query, hops=1):
        code = self.sep(query)
        for _ in range(hops):
            code = winner_take_all(self.W @ code, self.k)
        return code
 def main():
    input_dim = 384
    # Learning throughput
    print("=== Learning Throughput ===")
    for code_dim, k in [(8192, 50), (16384, 50), (32768, 50)]:
        mem = BenchMemory(input_dim, code_dim, k)
        n = 5000
        cues = torch.randn(n, input_dim, device=DEVICE)
        targets = torch.randn(n, input_dim, device=DEVICE)
        torch.cuda.synchronize()
        t0 = time.time()
        for i in range(n):
            mem.learn(cues[i], targets[i])
        torch.cuda.synchronize()
        dt = time.time() - t0
        print(f"  code={code_dim}, k={k}: {n/dt:.0f} memories/s ({dt:.2f}s for {n})")
        del mem
        torch.cuda.empty_cache()
    # Recall latency
    print("\n=== Recall Latency ===")
    for code_dim, k in [(8192, 50), (16384, 50), (32768, 50)]:
        mem = BenchMemory(input_dim, code_dim, k)
        for _ in range(1000):
            mem.learn(torch.randn(input_dim, device=DEVICE),
                     torch.randn(input_dim, device=DEVICE))
        queries = torch.randn(1000, input_dim, device=DEVICE)
        torch.cuda.synchronize()
        t0 = time.time()
        for i in range(1000):
            mem.recall(queries[i])
        torch.cuda.synchronize()
        ms = (time.time() - t0) / 1000 * 1000
        print(f"  code={code_dim}, k={k}: {ms:.3f} ms/query")
        del mem
        torch.cuda.empty_cache()
    # Multi-hop latency
    print("\n=== Multi-hop Latency (code=16384, k=50) ===")
    mem = BenchMemory(input_dim, 16384, 50)
    for _ in range(1000):
        mem.learn(torch.randn(input_dim, device=DEVICE),
                 torch.randn(input_dim, device=DEVICE))
    queries = torch.randn(500, input_dim, device=DEVICE)
    for hops in [1, 2, 3, 5, 10]:
        torch.cuda.synchronize()
        t0 = time.time()
        for i in range(500):
            mem.recall(queries[i], hops=hops)
        torch.cuda.synchronize()
        ms = (time.time() - t0) / 500 * 1000
        print(f"  hops={hops:>2}: {ms:.3f} ms/query")
    del mem
    torch.cuda.empty_cache()
    # Memory usage
    print("\n=== GPU Memory Usage ===")
    for cd in [4096, 8192, 16384, 32768]:
        torch.cuda.empty_cache()
        before = torch.cuda.memory_allocated()
        mem = BenchMemory(input_dim, cd, 50)
        for _ in range(1000):
            mem.learn(torch.randn(input_dim, device=DEVICE),
                     torch.randn(input_dim, device=DEVICE))
        after = torch.cuda.memory_allocated()
        mb = (after - before) / 1024**2
        w_mb = cd * cd * 4 / 1024**2
        print(f"  code_dim={cd:>5}: total={mb:.0f} MB (W matrix={w_mb:.0f} MB)")
        del mem
        torch.cuda.empty_cache()
    # E2E with sentence-transformers
    print("\n=== End-to-End Pipeline ===")
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    mem = BenchMemory(384, 16384, 50)
    embs = model.encode([f"Sentence {i}" for i in range(1000)],
                        convert_to_tensor=True, normalize_embeddings=True,
                        device=DEVICE)
    for i in range(999):
        mem.learn(embs[i], embs[i+1])
    query = "What is the test?"
    n_runs = 50
    # Embedding time
    torch.cuda.synchronize()
    t0 = time.time()
    for _ in range(n_runs):
        q = model.encode([query], convert_to_tensor=True,
                         normalize_embeddings=True, device=DEVICE)[0]
    torch.cuda.synchronize()
    embed_ms = (time.time() - t0) / n_runs * 1000
    # Recall time
    torch.cuda.synchronize()
    t0 = time.time()
    for _ in range(n_runs):
        mem.recall(q)
    torch.cuda.synchronize()
    recall_ms = (time.time() - t0) / n_runs * 1000
    print(f"  Embedding: {embed_ms:.1f} ms")
    print(f"  Recall:    {recall_ms:.3f} ms")
    print(f"  Total:     {embed_ms + recall_ms:.1f} ms")
    print(f"  Bottleneck: embedding is {embed_ms/recall_ms:.0f}x slower than recall")
 if __name__ == "__main__":
    main()
--- a/experiments/exp06_biohash.py
+++ b/experiments/exp06_biohash.py
@@ -0,0 +1,381 @@
 """Experiment 6: BioHash — Learnable Fly Algorithm.
 Replace random projection with learned projection trained via contrastive loss
 on real sentence embeddings. The key insight from Dasgupta 2017 (Science):
 random projection + WTA already preserves neighborhoods. Learning the projection
 should make it even better.
 Training objective:
 - Positive pairs (similar sentences): maximize Jaccard overlap of sparse codes
 - Negative pairs (different sentences): minimize overlap
 Since WTA is not differentiable, we use a soft relaxation during training
 (Gumbel-softmax or straight-through estimator) and hard WTA at test time.
 """
 import sys
 import time
 import json
 from pathlib import Path
 import torch
 import torch.nn as nn
 import torch.optim as optim
 import numpy as np
 DEVICE = "cuda"
 RESULTS_DIR = Path(__file__).parent.parent / "doc"
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 def jaccard(a, b):
    """Jaccard similarity of two binary vectors."""
    intersection = (a * b).sum(dim=-1)
    union = ((a + b) > 0).float().sum(dim=-1)
    return (intersection / union.clamp(min=1)).mean().item()
 def soft_topk(x, k, temperature=1.0):
    """Differentiable approximation of WTA using softmax."""
    # Straight-through estimator: hard WTA forward, soft backward
    hard = winner_take_all(x, k)
    soft = torch.softmax(x / temperature, dim=-1) * k  # scaled softmax
    return hard + (soft - soft.detach())  # STE trick
 class BioHash(nn.Module):
    """Learnable Fly Hash with WTA sparsification.
    Architecture mirrors fruit fly olfactory circuit:
    - Projection neurons (PN): input → high-dim (learned, replaces random)
    - Kenyon cells (KC): WTA top-k → sparse binary code
    """
    def __init__(self, input_dim=384, code_dim=16384, k=50):
        super().__init__()
        self.k = k
        self.code_dim = code_dim
        # Learnable projection (replaces random matrix)
        self.proj = nn.Linear(input_dim, code_dim, bias=False)
        # Initialize like random fly projection
        nn.init.normal_(self.proj.weight, std=1.0 / input_dim**0.5)
    def forward(self, x, soft=False, temperature=1.0):
        """
        x: [batch, input_dim] normalized embeddings
        Returns: [batch, code_dim] sparse binary codes
        """
        h = self.proj(x)  # [batch, code_dim]
        if soft:
            return soft_topk(h, self.k, temperature)
        return winner_take_all(h, self.k)
    def encode_hard(self, x):
        """Hard WTA encoding (for inference)."""
        with torch.no_grad():
            return winner_take_all(self.proj(x), self.k)
 class RandomFlyHash(nn.Module):
    """Baseline: original random Fly algorithm (not learned)."""
    def __init__(self, input_dim=384, code_dim=16384, k=50):
        super().__init__()
        self.k = k
        proj = torch.randn(input_dim, code_dim) * (1.0 / input_dim**0.5)
        self.register_buffer('proj', proj)
    def encode_hard(self, x):
        with torch.no_grad():
            return winner_take_all(x @ self.proj, self.k)
 def generate_training_data(model, n_pairs=5000, noise_std=0.3):
    """Generate contrastive pairs from sentence embeddings.
    Positive pairs: same sentence with noise (simulating paraphrase)
    Negative pairs: different sentences
    """
    # Diverse training sentences
    templates = [
        "The {} is having {} issues",
        "We need to {} the {} system",
        "The {} team is working on {}",
        "There's a bug in the {} {}",
        "Let's deploy {} to {}",
        "The {} performance is {}",
        "How do I configure {}?",
        "The {} logs show {}",
        "We should monitor the {} {}",
        "The {} needs {} upgrade",
    ]
    subjects = ["database", "API", "server", "frontend", "backend",
                "auth", "cache", "queue", "storage", "network",
                "deployment", "monitoring", "logging", "testing", "CI/CD"]
    modifiers = ["critical", "minor", "performance", "security", "timeout",
                 "memory", "disk", "CPU", "latency", "throughput"]
    sentences = []
    for t in templates:
        for s in subjects:
            for m in modifiers:
                sentences.append(t.format(s, m))
    np.random.shuffle(sentences)
    sentences = sentences[:n_pairs * 2]  # enough for pairs
    # Encode
    embs = model.encode(sentences, convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE,
                        batch_size=256)
    return embs
 def train_biohash(model, code_dim=16384, k=50, epochs=100, batch_size=256,
                  lr=1e-3, noise_std=0.3, margin=0.2):
    """Train BioHash with contrastive loss on sentence embeddings."""
    embed_dim = model.get_sentence_embedding_dimension()
    hasher = BioHash(embed_dim, code_dim, k).to(DEVICE)
    optimizer = optim.Adam(hasher.parameters(), lr=lr)
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs)
    print(f"Training BioHash: code={code_dim}, k={k}, noise={noise_std}")
    # Generate training embeddings
    embs = generate_training_data(model, n_pairs=5000)
    for epoch in range(epochs):
        # Sample batch
        idx = torch.randperm(embs.shape[0])[:batch_size]
        anchor = embs[idx]
        # Positive: add noise (simulate paraphrase)
        pos = nn.functional.normalize(
            anchor + torch.randn_like(anchor) * noise_std, dim=-1)
        # Negative: random different embeddings
        neg_idx = torch.randperm(embs.shape[0])[:batch_size]
        neg = embs[neg_idx]
        # Forward with STE
        code_anchor = hasher(anchor, soft=True, temperature=0.5)
        code_pos = hasher(pos, soft=True, temperature=0.5)
        code_neg = hasher(neg, soft=True, temperature=0.5)
        # Jaccard-like loss (differentiable via STE)
        # Positive overlap: maximize
        pos_overlap = (code_anchor * code_pos).sum(dim=-1) / k
        # Negative overlap: minimize (with margin)
        neg_overlap = (code_anchor * code_neg).sum(dim=-1) / k
        loss = -pos_overlap.mean() + torch.relu(neg_overlap - margin).mean()
        optimizer.zero_grad()
        loss.backward()
        nn.utils.clip_grad_norm_(hasher.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        if (epoch + 1) % 20 == 0:
            # Eval with hard WTA
            with torch.no_grad():
                h_anchor = hasher.encode_hard(anchor)
                h_pos = hasher.encode_hard(pos)
                h_neg = hasher.encode_hard(neg)
                j_pos = jaccard(h_anchor, h_pos)
                j_neg = jaccard(h_anchor, h_neg)
            print(f"  Epoch {epoch+1}: loss={loss.item():.4f}, "
                  f"Jaccard_pos={j_pos:.4f}, Jaccard_neg={j_neg:.4f}, "
                  f"gap={j_pos-j_neg:.4f}")
    return hasher
 def evaluate_recall(hasher, model, label=""):
    """Test associative recall with this hasher."""
    # Memory pairs
    pairs = [
        ("What's the weather like today?", "User prefers to check weather every morning"),
        ("Let's deploy the new version", "The deployment pipeline uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table caused slowdown"),
        ("I need to fix the auth bug", "Auth uses JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "Last 500 was OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI cluster"),
        ("The tests are failing", "CI needs postgres service container"),
        ("Memory usage is too high", "Known leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files are too large", "Logs rotate daily, 30 days retention, shipped to Loki"),
    ]
    paraphrases = [
        "How's the weather outside?",
        "We should push the new release",
        "DB performance is terrible",
        "There's a login bug to fix",
        "Getting internal server errors",
        "We need better observability",
        "CI tests keep breaking",
        "The service is using too much RAM",
        "Help me with Docker configuration",
        "Logs are eating up disk space",
    ]
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    # Build Hebbian memory
    code_dim = hasher.encode_hard(cue_embs[:1]).shape[-1]
    k = int(hasher.encode_hard(cue_embs[:1]).sum().item())
    W = torch.zeros(code_dim, code_dim, device=DEVICE)
    cue_codes = hasher.encode_hard(cue_embs)
    target_codes = hasher.encode_hard(target_embs)
    for i in range(len(pairs)):
        W += torch.outer(target_codes[i], cue_codes[i])
    # Test exact recall
    exact_correct = 0
    for i in range(len(pairs)):
        recalled = winner_take_all(W @ cue_codes[i], k)
        sims = nn.functional.cosine_similarity(
            recalled.unsqueeze(0), target_codes, dim=-1)
        if sims.argmax().item() == i:
            exact_correct += 1
    # Test paraphrase recall
    para_correct = 0
    para_codes = hasher.encode_hard(para_embs)
    for i in range(len(paraphrases)):
        recalled = winner_take_all(W @ para_codes[i], k)
        sims = nn.functional.cosine_similarity(
            recalled.unsqueeze(0), target_codes, dim=-1)
        if sims.argmax().item() == i:
            para_correct += 1
    # Code overlap analysis
    pos_overlaps = []
    neg_overlaps = []
    for i in range(len(pairs)):
        # Positive: cue vs paraphrase
        overlap = (cue_codes[i] * para_codes[i]).sum().item() / k
        pos_overlaps.append(overlap)
        # Negative: cue vs random other paraphrase
        j = (i + 1) % len(pairs)
        overlap_neg = (cue_codes[i] * para_codes[j]).sum().item() / k
        neg_overlaps.append(overlap_neg)
    n = len(pairs)
    print(f"  {label}: Exact={exact_correct}/{n}, Para={para_correct}/{n}, "
          f"CodeOverlap: pos={np.mean(pos_overlaps):.3f}, "
          f"neg={np.mean(neg_overlaps):.3f}, "
          f"gap={np.mean(pos_overlaps)-np.mean(neg_overlaps):.3f}")
    return exact_correct / n, para_correct / n, np.mean(pos_overlaps)
 def evaluate_at_scale(hasher, model, n_background, label=""):
    """Test with background memories (the real challenge)."""
    pairs = [
        ("The database is slow", "Check missing indexes on users table"),
        ("Deploy to production", "Use blue-green via GitHub Actions"),
        ("Server crashed", "Check logs, likely OOM in Python worker"),
        ("Fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("API returns 500", "OOM in Python worker process"),
    ]
    paraphrases = [
        "DB performance terrible",
        "Push the new release",
        "Server is down",
        "Login bug needs fixing",
        "Getting 500 errors from API",
    ]
    # Background noise
    bg_sentences = [f"Background task {i} about topic {i%20}" for i in range(n_background)]
    bg_targets = [f"Background detail {i} with info {i%10}" for i in range(n_background)]
    all_cues = [p[0] for p in pairs] + bg_sentences
    all_targets = [p[1] for p in pairs] + bg_targets
    cue_embs = model.encode(all_cues, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE, batch_size=256)
    target_embs = model.encode(all_targets, convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE, batch_size=256)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    # Build memory
    cue_codes = hasher.encode_hard(cue_embs)
    target_codes = hasher.encode_hard(target_embs)
    code_dim = cue_codes.shape[-1]
    k = int(cue_codes[0].sum().item())
    W = torch.zeros(code_dim, code_dim, device=DEVICE)
    for i in range(len(all_cues)):
        W += torch.outer(target_codes[i], cue_codes[i])
    # Test paraphrase recall
    para_codes = hasher.encode_hard(para_embs)
    correct = 0
    for i in range(len(paraphrases)):
        recalled = winner_take_all(W @ para_codes[i], k)
        sims = nn.functional.cosine_similarity(
            recalled.unsqueeze(0), target_codes[:len(pairs)], dim=-1)
        if sims.argmax().item() == i:
            correct += 1
    n = len(paraphrases)
    print(f"  {label} (bg={n_background}): Para={correct}/{n} ({correct/n:.0%})")
    return correct / n
 def main():
    print("=" * 60)
    print("Experiment 6: BioHash — Learnable Fly Algorithm")
    print("=" * 60)
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    # Baseline: random projection (current approach)
    print("\n=== Baseline: Random Fly Hash ===")
    random_hasher = RandomFlyHash(384, 16384, 50).to(DEVICE)
    evaluate_recall(random_hasher, model, "Random")
    for n_bg in [0, 100, 500]:
        evaluate_at_scale(random_hasher, model, n_bg, "Random")
    # Train BioHash with different configs
    print("\n=== Training BioHash ===")
    for noise_std in [0.2, 0.5]:
        print(f"\n--- noise_std={noise_std} ---")
        hasher = train_biohash(model, code_dim=16384, k=50,
                               epochs=200, noise_std=noise_std, lr=1e-3)
        evaluate_recall(hasher, model, f"BioHash(noise={noise_std})")
        for n_bg in [0, 100, 500]:
            evaluate_at_scale(hasher, model, n_bg, f"BioHash(noise={noise_std})")
    # Try different k values with BioHash
    print("\n=== BioHash: k sweep ===")
    for k in [20, 50, 100, 200]:
        hasher = train_biohash(model, code_dim=16384, k=k,
                               epochs=200, noise_std=0.3, lr=1e-3)
        evaluate_recall(hasher, model, f"BioHash(k={k})")
        evaluate_at_scale(hasher, model, 500, f"BioHash(k={k})")
 if __name__ == "__main__":
    main()
--- a/experiments/exp07_attractor.py
+++ b/experiments/exp07_attractor.py
@@ -0,0 +1,335 @@
 """Experiment 7: Attractor dynamics for noise-tolerant recall.
 Current architecture: heteroassociative, one-shot (W @ cue → target)
 Problem: noisy cue → noisy recall, no error correction
 Fix: Use attractor dynamics (like real CA3 recurrent network).
 Approach 1: Autoassociative + heteroassociative
  - Store patterns as attractors: W_auto += outer(pattern, pattern)
  - Noisy cue → iterate W_auto until convergence → clean cue
  - Then: W_hetero @ clean_cue → target
 Approach 2: Recurrent settling with inhibition
  - W stores associations
  - Recall: iterate (W @ code → WTA → W @ code → ...) with lateral inhibition
  - Network settles into clean attractor state
 Approach 3: Modern Hopfield (softmax energy)
  - Replace linear W @ x with softmax-based attention over stored patterns
  - Exponential storage capacity, natural noise tolerance
 Approach 4: Hebbian + recurrent cleanup with learned inhibition
  - W for associations + lateral inhibition matrix for competition
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 # ===== Approach 1: Autoassociative cleanup + heteroassociative recall =====
 class AttractorMemory:
    """Two-stage recall: first clean the cue, then associate.
    W_auto: autoassociative (cue → cue), stores cue patterns as attractors
    W_hetero: heteroassociative (cue <20><><EFBFBD> target), stores associations
    Recall: noisy_cue → settle in W_auto → clean_cue → W_hetero → target
    """
    def __init__(self, input_dim, code_dim=16384, k=50):
        self.k = k
        self.code_dim = code_dim
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        # Autoassociative: cue cleanup network
        self.W_auto = torch.zeros(code_dim, code_dim, device=DEVICE)
        # Heteroassociative: cue → target
        self.W_hetero = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb):
        cc = self.sep(cue_emb)
        tc = self.sep(target_emb)
        # Auto: store cue as attractor
        self.W_auto += torch.outer(cc, cc)
        # Hetero: cue → target
        self.W_hetero += torch.outer(tc, cc)
    def settle(self, code, W, steps=10):
        """Iterate until convergence (attractor dynamics)."""
        for _ in range(steps):
            raw = W @ code
            new_code = winner_take_all(raw, self.k)
            if (new_code == code).all():
                break  # Converged
            code = new_code
        return code
    def recall(self, query_emb, settle_steps=10):
        """Noisy query → auto-settle → hetero-associate."""
        # Encode
        code = self.sep(query_emb)
        # Phase 1: Settle in autoassociative network (cleanup)
        clean_code = self.settle(code, self.W_auto, steps=settle_steps)
        # Phase 2: Associate
        raw = self.W_hetero @ clean_code
        return winner_take_all(raw, self.k)
    def recall_no_settle(self, query_emb):
        """Direct recall without settling (baseline)."""
        code = self.sep(query_emb)
        raw = self.W_hetero @ code
        return winner_take_all(raw, self.k)
 # ===== Approach 2: Modern Hopfield-inspired attention =====
 class HopfieldMemory:
    """Modern Hopfield network: attention over stored patterns.
    Instead of W @ query (linear), use:
    softmax(beta * query @ stored_patterns^T) @ stored_targets
    This gives exponential capacity and natural noise tolerance.
    Still uses WTA codes for compatibility with Hebbian multi-hop.
    """
    def __init__(self, input_dim, code_dim=16384, k=50, beta=8.0):
        self.k = k
        self.code_dim = code_dim
        self.beta = beta
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.stored_cue_codes = []
        self.stored_target_codes = []
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb):
        self.stored_cue_codes.append(self.sep(cue_emb))
        self.stored_target_codes.append(self.sep(target_emb))
    def recall(self, query_emb, steps=3):
        """Hopfield retrieval: iterative attention over stored patterns."""
        if not self.stored_cue_codes:
            return torch.zeros(self.code_dim, device=DEVICE)
        cue_matrix = torch.stack(self.stored_cue_codes)  # [N, code_dim]
        target_matrix = torch.stack(self.stored_target_codes)
        xi = self.sep(query_emb)  # [code_dim]
        for _ in range(steps):
            # Attention weights
            scores = self.beta * (xi @ cue_matrix.T)  # [N]
            attn = torch.softmax(scores, dim=0)  # [N]
            # Weighted sum of stored cue patterns (settle to nearest)
            xi = attn @ cue_matrix  # [code_dim]
            xi = winner_take_all(xi, self.k)
        # Final: associate to target
        scores = self.beta * (xi @ cue_matrix.T)
        attn = torch.softmax(scores, dim=0)
        recalled = attn @ target_matrix
        return winner_take_all(recalled, self.k)
 # ===== Approach 3: Recurrent Hebbian with lateral inhibition =====
 class RecurrentHebbianMemory:
    """Hebbian W + lateral inhibition for competitive recall.
    During settling, neurons compete: strongly activated patterns
    suppress weakly activated ones via inhibition.
    """
    def __init__(self, input_dim, code_dim=16384, k=50, inhibition=0.1):
        self.k = k
        self.code_dim = code_dim
        self.inhibition = inhibition
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb):
        cc = self.sep(cue_emb)
        tc = self.sep(target_emb)
        self.W += torch.outer(tc, cc)
        # Also store cue as auto-attractor (for settling)
        self.W += torch.outer(cc, cc) * 0.5
    def recall(self, query_emb, steps=5):
        code = self.sep(query_emb)
        for _ in range(steps):
            # Excitation from W
            excitation = self.W @ code
            # Global inhibition: subtract mean activity
            inhibition = excitation.mean() * self.inhibition
            activation = excitation - inhibition
            # WTA: winner suppresses losers
            code = winner_take_all(activation, self.k)
        return code
 # ===== Test harness =====
 def build_and_test(MemClass, model, n_test_pairs=10, n_background=0,
                   label="", **kwargs):
    """Unified test for all memory architectures."""
    from sentence_transformers import SentenceTransformer
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI cluster"),
        ("Tests are failing in CI", "CI needs postgres service container"),
        ("Memory usage is too high", "Leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files are too large", "Logs rotate daily, shipped to Loki"),
    ][:n_test_pairs]
    paraphrases = [
        "How's the weather outside?",
        "We should push the new release",
        "DB performance is terrible",
        "There's a login bug to fix",
        "Getting internal server errors",
        "We need better observability",
        "CI tests keep breaking",
        "Service using too much RAM",
        "Docker configuration help",
        "Logs eating up disk space",
    ][:n_test_pairs]
    embed_dim = model.get_sentence_embedding_dimension()
    mem = MemClass(embed_dim, **kwargs)
    # Store test memories
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    for i in range(len(pairs)):
        mem.learn(cue_embs[i], target_embs[i])
    # Store background noise
    if n_background > 0:
        bg_cues = [f"Background task {i} about topic {i%20}" for i in range(n_background)]
        bg_targets = [f"Background fact {i} detail {i%10}" for i in range(n_background)]
        bg_cue_embs = model.encode(bg_cues, convert_to_tensor=True,
                                    normalize_embeddings=True, device=DEVICE, batch_size=256)
        bg_target_embs = model.encode(bg_targets, convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE, batch_size=256)
        for i in range(n_background):
            mem.learn(bg_cue_embs[i], bg_target_embs[i])
    # Test
    target_codes = torch.stack([mem.sep(t) for t in target_embs])
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    exact_correct = 0
    para_correct = 0
    for i in range(len(pairs)):
        # Exact
        recalled = mem.recall(cue_embs[i])
        sims = nn.functional.cosine_similarity(recalled.unsqueeze(0), target_codes, dim=-1)
        if sims.argmax().item() == i:
            exact_correct += 1
        # Paraphrase
        recalled_p = mem.recall(para_embs[i])
        sims_p = nn.functional.cosine_similarity(recalled_p.unsqueeze(0), target_codes, dim=-1)
        if sims_p.argmax().item() == i:
            para_correct += 1
    n = len(pairs)
    print(f"  {label} (bg={n_background}): "
          f"Exact={exact_correct}/{n} ({exact_correct/n:.0%}), "
          f"Para={para_correct}/{n} ({para_correct/n:.0%})")
    return exact_correct / n, para_correct / n
 def main():
    print("=" * 60)
    print("Experiment 7: Attractor Dynamics")
    print("=" * 60)
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    configs = [
        ("Flat Hebbian (baseline)", dict(code_dim=16384, k=50)),
    ]
    # Test each architecture at different scales
    for bg in [0, 100, 500, 1000]:
        print(f"\n=== Background memories: {bg} ===")
        # Baseline: flat Hebbian (no settling)
        class FlatHebbian:
            def __init__(self, input_dim, code_dim=16384, k=50):
                self.k = k
                self.code_dim = code_dim
                self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                             * (1.0 / input_dim**0.5))
                self.W = torch.zeros(code_dim, code_dim, device=DEVICE)
            def sep(self, x):
                return winner_take_all(x @ self.proj, self.k)
            def learn(self, c, t):
                self.W += torch.outer(self.sep(t), self.sep(c))
            def recall(self, q):
                code = self.sep(q)
                return winner_take_all(self.W @ code, self.k)
        build_and_test(FlatHebbian, model, n_background=bg,
                       label="Flat Hebbian", code_dim=16384, k=50)
        # Approach 1: Autoassociative cleanup
        build_and_test(AttractorMemory, model, n_background=bg,
                       label="Attractor (auto+hetero)", code_dim=16384, k=50)
        # Approach 2: Modern Hopfield
        for beta in [4.0, 8.0, 16.0]:
            build_and_test(HopfieldMemory, model, n_background=bg,
                           label=f"Hopfield (β={beta})", code_dim=16384, k=50,
                           beta=beta)
        # Approach 3: Recurrent with inhibition
        for inhib in [0.1, 0.5, 1.0]:
            build_and_test(RecurrentHebbianMemory, model, n_background=bg,
                           label=f"Recurrent (inhib={inhib})", code_dim=16384, k=50,
                           inhibition=inhib)
 if __name__ == "__main__":
    main()
--- a/experiments/exp07b_hopfield_deep.py
+++ b/experiments/exp07b_hopfield_deep.py
@@ -0,0 +1,375 @@
 """Experiment 7b: Deep dive into Hopfield memory.
 Hopfield crushed it at 1000 bg (100% para recall). Now stress test:
 1. Scale to 5K, 10K, 20K memories — does softmax attention hold up?
 2. Multi-hop: can we chain through Hopfield? (A→B→C)
 3. Latency: O(N) attention — how slow at 20K?
 4. β optimization: find sweet spot
 5. Memory: storing all patterns explicitly — how much VRAM?
 6. Mixed difficulty: semantically similar distractors (not just random bg)
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 def cosine(a, b):
    if a.norm() == 0 or b.norm() == 0:
        return 0.0
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class HopfieldMemory:
    def __init__(self, input_dim, code_dim=16384, k=50, beta=16.0):
        self.k = k
        self.code_dim = code_dim
        self.beta = beta
        self.proj = (torch.randn(input_dim, code_dim, device=DEVICE)
                     * (1.0 / input_dim**0.5))
        self.cue_codes = []
        self.target_codes = []
        self.cue_embs = []
        self.target_embs = []
    def sep(self, x):
        return winner_take_all(x @ self.proj, self.k)
    def learn(self, cue_emb, target_emb):
        self.cue_codes.append(self.sep(cue_emb))
        self.target_codes.append(self.sep(target_emb))
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
    def _get_matrices(self):
        return torch.stack(self.cue_codes), torch.stack(self.target_codes)
    def recall(self, query_emb, steps=3):
        cue_mat, target_mat = self._get_matrices()
        xi = self.sep(query_emb)
        for _ in range(steps):
            scores = self.beta * (xi @ cue_mat.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cue_mat
            xi = winner_take_all(xi, self.k)
        # Final association
        scores = self.beta * (xi @ cue_mat.T)
        attn = torch.softmax(scores, dim=0)
        recalled = attn @ target_mat
        return winner_take_all(recalled, self.k)
    def recall_multihop(self, query_emb, hops=2, steps_per_hop=3):
        """Multi-hop: settle to cue → get target → use target as next cue."""
        cue_mat, target_mat = self._get_matrices()
        xi = self.sep(query_emb)
        results = []
        for hop in range(hops):
            # Settle to nearest cue attractor
            for _ in range(steps_per_hop):
                scores = self.beta * (xi @ cue_mat.T)
                attn = torch.softmax(scores, dim=0)
                xi = attn @ cue_mat
                xi = winner_take_all(xi, self.k)
            # Associate: cue → target
            scores = self.beta * (xi @ cue_mat.T)
            attn = torch.softmax(scores, dim=0)
            target = attn @ target_mat
            target = winner_take_all(target, self.k)
            results.append(target)
            # Next hop: use target as new query
            xi = target
        return results
    def recall_embedding_space(self, query_emb, steps=3):
        """Hopfield attention in raw embedding space (no WTA codes).
        Might be better for noise tolerance since embeddings are continuous.
        """
        if not self.cue_embs:
            return None
        cue_mat = torch.stack(self.cue_embs)
        target_mat = torch.stack(self.target_embs)
        xi = query_emb
        for _ in range(steps):
            scores = self.beta * (xi @ cue_mat.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cue_mat
        # Final: get target
        scores = self.beta * (xi @ cue_mat.T)
        attn = torch.softmax(scores, dim=0)
        return attn @ target_mat
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def test_scale(model, n_background_list, beta=16.0):
    """Test Hopfield at different scales."""
    print(f"\n=== Scale Test (β={beta}) ===")
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
    ]
    paraphrases = [
        "How's the weather outside?",
        "We should push the new release",
        "DB performance is terrible",
        "There's a login bug to fix",
        "Getting internal server errors",
    ]
    embed_dim = model.get_sentence_embedding_dimension()
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    for n_bg in n_background_list:
        mem = HopfieldMemory(embed_dim, code_dim=8192, k=50, beta=beta)
        # Store test pairs
        for i in range(len(pairs)):
            mem.learn(cue_embs[i], target_embs[i])
        # Store background
        if n_bg > 0:
            # More diverse background sentences
            bg_cues = []
            bg_targets = []
            topics = ["server", "database", "API", "frontend", "backend",
                      "cache", "queue", "network", "storage", "auth"]
            for i in range(n_bg):
                t = topics[i % len(topics)]
                bg_cues.append(f"The {t} system has issue number {i}")
                bg_targets.append(f"Issue {i} for {t} requires attention from team {i%5}")
            for start in range(0, n_bg, 256):
                end = min(start + 256, n_bg)
                bc = model.encode(bg_cues[start:end], convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)
                bt = model.encode(bg_targets[start:end], convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)
                for j in range(bc.shape[0]):
                    mem.learn(bc[j], bt[j])
        # Test
        target_codes = torch.stack([mem.sep(t) for t in target_embs])
        # Paraphrase recall
        t0 = time.time()
        para_correct = 0
        for i in range(len(paraphrases)):
            recalled = mem.recall(para_embs[i])
            sims = nn.functional.cosine_similarity(recalled.unsqueeze(0), target_codes, dim=-1)
            if sims.argmax().item() == i:
                para_correct += 1
        recall_time = (time.time() - t0) / len(paraphrases) * 1000
        # Also test in embedding space
        para_correct_emb = 0
        for i in range(len(paraphrases)):
            recalled_emb = mem.recall_embedding_space(para_embs[i])
            sims = nn.functional.cosine_similarity(recalled_emb.unsqueeze(0), target_embs, dim=-1)
            if sims.argmax().item() == i:
                para_correct_emb += 1
        n = len(paraphrases)
        total_mem = len(mem.cue_codes)
        vram = total_mem * 8192 * 4 * 2 / 1024**2  # codes + embs approx
        print(f"  N={total_mem:>6}: Code={para_correct}/{n} ({para_correct/n:.0%}), "
              f"Emb={para_correct_emb}/{n} ({para_correct_emb/n:.0%}), "
              f"time={recall_time:.1f}ms, ~VRAM={vram:.0f}MB")
        del mem
        torch.cuda.empty_cache()
 def test_multihop(model):
    """Multi-hop through Hopfield memory."""
    print("\n=== Multi-hop Test ===")
    chains = [
        ["What's the weather?", "I check weather before going out",
         "My coffee shop is around the corner", "They have great latte art"],
        ["Let's review the code", "Code review found a memory leak",
         "Memory leaks cause OOM kills", "Need memory limits in k8s"],
        ["Deploy to production", "Production uses blue-green deploy",
         "Blue environment is active", "Switch DNS to green when ready"],
    ]
    embed_dim = model.get_sentence_embedding_dimension()
    for chain in chains:
        mem = HopfieldMemory(embed_dim, code_dim=8192, k=50, beta=16.0)
        chain_embs = [model.encode([t], convert_to_tensor=True,
                                    normalize_embeddings=True, device=DEVICE)[0]
                      for t in chain]
        # Learn consecutive pairs
        for i in range(len(chain) - 1):
            mem.learn(chain_embs[i], chain_embs[i+1])
        # Multi-hop recall
        target_codes = [mem.sep(e) for e in chain_embs]
        results = mem.recall_multihop(chain_embs[0], hops=len(chain)-1)
        print(f"\n  Chain: {' → '.join([c[:20]+'...' for c in chain])}")
        for hop_idx, recalled in enumerate(results):
            target = target_codes[hop_idx + 1]
            sim = cosine(recalled, target)
            status = "✓" if sim > 0.5 else "✗"
            print(f"    {status} hop {hop_idx+1}: → '{chain[hop_idx+1][:30]}...' sim={sim:.3f}")
    # Multi-hop with background noise
    print("\n  --- Multi-hop with 200 background memories ---")
    mem = HopfieldMemory(embed_dim, code_dim=8192, k=50, beta=16.0)
    # Store all chains
    all_chain_embs = []
    for chain in chains:
        embs = [model.encode([t], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
                for t in chain]
        all_chain_embs.append(embs)
        for i in range(len(chain) - 1):
            mem.learn(embs[i], embs[i+1])
    # Add background
    bg = [f"Background sentence number {i}" for i in range(200)]
    bg_embs = model.encode(bg, convert_to_tensor=True,
                            normalize_embeddings=True, device=DEVICE)
    for i in range(199):
        mem.learn(bg_embs[i], bg_embs[i+1])
    for ci, chain in enumerate(chains):
        target_codes = [mem.sep(e) for e in all_chain_embs[ci]]
        results = mem.recall_multihop(all_chain_embs[ci][0], hops=len(chain)-1)
        for hop_idx, recalled in enumerate(results):
            target = target_codes[hop_idx + 1]
            sim = cosine(recalled, target)
            status = "✓" if sim > 0.5 else "✗"
            print(f"    {status} Chain{ci+1} hop{hop_idx+1}: sim={sim:.3f}")
 def test_hard_distractors(model):
    """Test with semantically similar distractors (harder than random bg)."""
    print("\n=== Hard Distractors (semantically similar) ===")
    # Target pair
    pairs = [
        ("The database is slow", "Missing index on users table"),
    ]
    # Distractors: similar to cue but different meaning
    distractors_cue = [
        "The database is fast",
        "The database crashed",
        "The database needs backup",
        "The datastore is slow",
        "The DB latency is high",
        "Database performance degraded",
        "SQL queries are slow",
        "The cache is slow",
        "The search index is slow",
        "MongoDB is slow",
    ]
    distractors_target = [
        f"Distractor target {i}" for i in range(len(distractors_cue))
    ]
    query = "DB performance is terrible"
    embed_dim = model.get_sentence_embedding_dimension()
    for beta in [8.0, 16.0, 32.0, 64.0]:
        mem = HopfieldMemory(embed_dim, code_dim=8192, k=50, beta=beta)
        # Store target
        cue_emb = model.encode([pairs[0][0]], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)[0]
        target_emb = model.encode([pairs[0][1]], convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)[0]
        mem.learn(cue_emb, target_emb)
        # Store distractors
        dist_cue_embs = model.encode(distractors_cue, convert_to_tensor=True,
                                      normalize_embeddings=True, device=DEVICE)
        dist_target_embs = model.encode(distractors_target, convert_to_tensor=True,
                                         normalize_embeddings=True, device=DEVICE)
        for i in range(len(distractors_cue)):
            mem.learn(dist_cue_embs[i], dist_target_embs[i])
        # Query
        q_emb = model.encode([query], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
        recalled = mem.recall(q_emb)
        target_code = mem.sep(target_emb)
        sim = cosine(recalled, target_code)
        # Also check which cue got highest attention
        cue_mat = torch.stack(mem.cue_codes)
        q_code = mem.sep(q_emb)
        scores = beta * (q_code @ cue_mat.T)
        attn = torch.softmax(scores, dim=0)
        top_idx = attn.argmax().item()
        top_attn = attn[top_idx].item()
        all_cues = [pairs[0][0]] + distractors_cue
        print(f"  β={beta:>4}: sim_to_target={sim:.3f}, "
              f"top_attn={top_attn:.3f} → '{all_cues[top_idx][:30]}...'")
 def main():
    print("=" * 60)
    print("Experiment 7b: Hopfield Deep Dive")
    print("=" * 60)
    model = load_model()
    # Scale test
    test_scale(model, [0, 100, 500, 1000, 2000, 5000, 10000], beta=16.0)
    # β sweep at large scale
    print("\n=== β Sweep at N=5000 ===")
    for beta in [4, 8, 16, 32, 64]:
        test_scale(model, [5000], beta=beta)
    # Multi-hop
    test_multihop(model)
    # Hard distractors
    test_hard_distractors(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp07c_hopfield_embedding.py
+++ b/experiments/exp07c_hopfield_embedding.py
@@ -0,0 +1,317 @@
 """Experiment 7c: Hopfield in embedding space (no WTA codes for retrieval).
 Key insight: WTA codes distort semantic distance. Hopfield attention works
 better directly on continuous embeddings where cosine similarity is meaningful.
 WTA codes are only needed for Hebbian multi-hop (W matrix).
 For single-hop retrieval, embedding-space Hopfield is strictly better.
 Test:
 1. Embedding-space Hopfield at scale (1K-10K)
 2. Hard semantic distractors
 3. Embedding-space multi-hop (no WTA needed?)
 4. Compare code-space vs embedding-space
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 class EmbeddingHopfield:
    """Modern Hopfield network operating directly on embeddings.
    No WTA codes, no pattern separation — pure softmax attention
    over stored embedding patterns. This is essentially transformer
    cross-attention with stored memories as K/V.
    """
    def __init__(self, beta=16.0):
        self.beta = beta
        self.cue_embs = []    # Keys
        self.target_embs = [] # Values
        self.metadata = []
    def learn(self, cue_emb, target_emb, meta=None):
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
        self.metadata.append(meta or {})
    def recall(self, query_emb, steps=3):
        """Iterative Hopfield retrieval in embedding space.
        Step 1: query settles to nearest cue attractor via softmax attention
        Step 2: settled query → associated target via softmax attention
        """
        cue_mat = torch.stack(self.cue_embs)      # [N, dim]
        target_mat = torch.stack(self.target_embs)  # [N, dim]
        xi = query_emb  # [dim]
        # Settle to nearest cue (iterative attention)
        for _ in range(steps):
            scores = self.beta * (xi @ cue_mat.T)  # [N]
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cue_mat  # [dim] — weighted average of cues
            xi = nn.functional.normalize(xi, dim=0)
        # Associate: settled cue → target
        scores = self.beta * (xi @ cue_mat.T)
        attn = torch.softmax(scores, dim=0)
        target = attn @ target_mat
        return nn.functional.normalize(target, dim=0), attn
    def recall_multihop(self, query_emb, hops=2, steps_per_hop=3):
        """Multi-hop in embedding space.
        Settle to cue → get target → use target as next query.
        """
        cue_mat = torch.stack(self.cue_embs)
        target_mat = torch.stack(self.target_embs)
        xi = query_emb
        results = []
        for hop in range(hops):
            # Settle
            for _ in range(steps_per_hop):
                scores = self.beta * (xi @ cue_mat.T)
                attn = torch.softmax(scores, dim=0)
                xi = attn @ cue_mat
                xi = nn.functional.normalize(xi, dim=0)
            # Associate
            scores = self.beta * (xi @ cue_mat.T)
            attn = torch.softmax(scores, dim=0)
            target = attn @ target_mat
            target = nn.functional.normalize(target, dim=0)
            results.append((target, attn))
            # Next hop
            xi = target
        return results
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def test_scale(model):
    """Scale test with embedding-space Hopfield."""
    print("\n=== Scale Test: Embedding-Space Hopfield ===")
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
        ("Tests failing in CI", "CI needs postgres service container"),
        ("Memory usage too high", "Leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files too large", "Logs rotate daily, shipped to Loki"),
    ]
    paraphrases = [
        "How's the weather outside?",
        "Push the new release",
        "DB performance terrible",
        "Login bug needs fixing",
        "Getting 500 errors",
        "Need better observability",
        "CI tests breaking",
        "Service using too much RAM",
        "Docker config help",
        "Logs eating disk space",
    ]
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    for n_bg in [0, 100, 500, 1000, 2000, 5000, 10000]:
        for beta in [16, 32, 64]:
            mem = EmbeddingHopfield(beta=beta)
            for i in range(len(pairs)):
                mem.learn(cue_embs[i], target_embs[i])
            if n_bg > 0:
                topics = ["server", "database", "API", "frontend", "backend",
                          "cache", "queue", "network", "storage", "auth",
                          "docker", "kubernetes", "redis", "nginx", "postgres"]
                bg_cues = [f"The {topics[i%len(topics)]} system has issue {i}" for i in range(n_bg)]
                bg_targets = [f"Fix {topics[i%len(topics)]} issue {i} urgently" for i in range(n_bg)]
                for start in range(0, n_bg, 256):
                    end = min(start + 256, n_bg)
                    bc = model.encode(bg_cues[start:end], convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE)
                    bt = model.encode(bg_targets[start:end], convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE)
                    for j in range(bc.shape[0]):
                        mem.learn(bc[j], bt[j])
            # Test paraphrase recall
            t0 = time.time()
            correct = 0
            for i in range(len(paraphrases)):
                with torch.no_grad():
                    recalled, attn = mem.recall(para_embs[i])
                sim = cosine(recalled, target_embs[i])
                # Check if recalled is closest to correct target
                all_sims = [cosine(recalled, target_embs[j]) for j in range(len(pairs))]
                if np.argmax(all_sims) == i:
                    correct += 1
            dt = (time.time() - t0) / len(paraphrases) * 1000
            n = len(paraphrases)
            if beta == 32 or n_bg == 0:  # Only print all β for bg=0
                print(f"  N={n_bg+len(pairs):>6}, β={beta:>2}: "
                      f"Para={correct}/{n} ({correct/n:.0%}), "
                      f"time={dt:.1f}ms")
            del mem
        if n_bg == 0:
            print()  # separator after β sweep
 def test_hard_distractors(model):
    """Semantic distractors in embedding space."""
    print("\n=== Hard Semantic Distractors (Embedding Hopfield) ===")
    target_pair = ("The database is slow", "Missing index on users table")
    distractors = [
        ("The database crashed completely", "Run database recovery procedure"),
        ("Database needs backup now", "Use pg_dump for PostgreSQL backup"),
        ("The datastore is slow", "Check Redis connection pool settings"),
        ("DB latency is high", "Review query execution plans"),
        ("Database performance degraded", "Check for lock contention"),
        ("SQL queries are slow", "Add composite index on frequently joined columns"),
        ("The cache is slow", "Increase Redis maxmemory setting"),
        ("MongoDB is slow", "Check for collection scans without index"),
        ("The search index is slow", "Rebuild Elasticsearch index"),
        ("Database connection timeout", "Increase pool size in connection config"),
    ]
    query = "DB performance is terrible"
    cue_emb = model.encode([target_pair[0]], convert_to_tensor=True,
                            normalize_embeddings=True, device=DEVICE)[0]
    target_emb = model.encode([target_pair[1]], convert_to_tensor=True,
                               normalize_embeddings=True, device=DEVICE)[0]
    q_emb = model.encode([query], convert_to_tensor=True,
                          normalize_embeddings=True, device=DEVICE)[0]
    # Show embedding distances
    print(f"\n  Query: '{query}'")
    print(f"  Target cue: '{target_pair[0]}' (cos={cosine(q_emb, cue_emb):.3f})")
    for dc, dt in distractors[:5]:
        dc_emb = model.encode([dc], convert_to_tensor=True,
                               normalize_embeddings=True, device=DEVICE)[0]
        print(f"  Distractor: '{dc[:40]}...' (cos={cosine(q_emb, dc_emb):.3f})")
    for beta in [8, 16, 32, 64, 128]:
        mem = EmbeddingHopfield(beta=beta)
        mem.learn(cue_emb, target_emb, {"text": target_pair[1]})
        for dc, dt in distractors:
            dc_emb = model.encode([dc], convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)[0]
            dt_emb = model.encode([dt], convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)[0]
            mem.learn(dc_emb, dt_emb, {"text": dt})
        recalled, attn = mem.recall(q_emb)
        sim_to_target = cosine(recalled, target_emb)
        top_idx = attn.argmax().item()
        top_attn = attn[top_idx].item()
        all_texts = [target_pair[1]] + [d[1] for d in distractors]
        print(f"  β={beta:>3}: sim={sim_to_target:.3f}, "
              f"top_attn={top_attn:.3f} → '{all_texts[top_idx][:40]}...'")
 def test_multihop_embedding(model):
    """Multi-hop in pure embedding space."""
    print("\n=== Multi-hop (Embedding Space) ===")
    chains = [
        ["What's the weather?", "Check weather before going out",
         "My coffee shop is around the corner", "Great latte art there"],
        ["Review the code", "Found a memory leak in review",
         "Memory leaks cause OOM", "Add memory limits to k8s pods"],
    ]
    for chain in chains:
        mem = EmbeddingHopfield(beta=32)
        chain_embs = [model.encode([t], convert_to_tensor=True,
                                    normalize_embeddings=True, device=DEVICE)[0]
                      for t in chain]
        for i in range(len(chain) - 1):
            mem.learn(chain_embs[i], chain_embs[i+1])
        results = mem.recall_multihop(chain_embs[0], hops=len(chain)-1)
        print(f"\n  Chain: {' → '.join([c[:20]+'...' for c in chain])}")
        for hop_idx, (recalled, attn) in enumerate(results):
            target = chain_embs[hop_idx + 1]
            sim = cosine(recalled, target)
            status = "✓" if sim > 0.7 else "✗"
            print(f"    {status} hop {hop_idx+1}: sim={sim:.3f}")
    # With background
    print("\n  --- With 500 background ---")
    mem = EmbeddingHopfield(beta=32)
    all_embs = []
    for chain in chains:
        embs = [model.encode([t], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
                for t in chain]
        all_embs.append(embs)
        for i in range(len(chain) - 1):
            mem.learn(embs[i], embs[i+1])
    bg = [f"Background about {['coding','devops','ml','infra','data'][i%5]} topic {i}" for i in range(500)]
    bg_embs = model.encode(bg, convert_to_tensor=True,
                            normalize_embeddings=True, device=DEVICE, batch_size=256)
    for i in range(499):
        mem.learn(bg_embs[i], bg_embs[i+1])
    for ci, chain in enumerate(chains):
        results = mem.recall_multihop(all_embs[ci][0], hops=len(chain)-1)
        for hop_idx, (recalled, _) in enumerate(results):
            target = all_embs[ci][hop_idx + 1]
            sim = cosine(recalled, target)
            status = "✓" if sim > 0.7 else "✗"
            print(f"    {status} Chain{ci+1} hop{hop_idx+1}: sim={sim:.3f}")
 def main():
    print("=" * 60)
    print("Experiment 7c: Embedding-Space Hopfield")
    print("=" * 60)
    model = load_model()
    test_scale(model)
    test_hard_distractors(model)
    test_multihop_embedding(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp07d_twostage.py
+++ b/experiments/exp07d_twostage.py
@@ -0,0 +1,339 @@
 """Experiment 7d: Two-stage retrieval for scale.
 Problem: Embedding Hopfield degrades at 10K+ (80%).
 Fix: Pre-filter with approximate NN (top-K), then Hopfield settle on candidates.
 This is O(N) for pre-filter (can be O(log N) with FAISS) + O(K) for Hopfield.
 Also: test adaptive β based on attention entropy (low entropy = confident).
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class TwoStageHopfield:
    """Pre-filter + Hopfield settle.
    Stage 1: cosine NN → top-K candidates (fast, O(N) or O(log N) with index)
    Stage 2: Hopfield attention over K candidates (precise, O(K))
    """
    def __init__(self, beta=16.0, top_k=50):
        self.beta = beta
        self.top_k = top_k
        self.cue_embs = []
        self.target_embs = []
        self._cue_matrix = None  # Cached for batch NN
    def learn(self, cue_emb, target_emb):
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
        self._cue_matrix = None  # Invalidate cache
    def _get_cue_matrix(self):
        if self._cue_matrix is None:
            self._cue_matrix = torch.stack(self.cue_embs)
        return self._cue_matrix
    def recall(self, query_emb, steps=3):
        cue_mat = self._get_cue_matrix()
        target_mat = torch.stack(self.target_embs)
        N = cue_mat.shape[0]
        # Stage 1: Fast NN pre-filter
        k = min(self.top_k, N)
        sims = query_emb @ cue_mat.T  # [N]
        top_sims, top_indices = sims.topk(k)
        # Stage 2: Hopfield settle on candidates only
        cand_cues = cue_mat[top_indices]     # [K, dim]
        cand_targets = target_mat[top_indices] # [K, dim]
        xi = query_emb
        for _ in range(steps):
            scores = self.beta * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cand_cues
            xi = nn.functional.normalize(xi, dim=0)
        # Final association
        scores = self.beta * (xi @ cand_cues.T)
        attn = torch.softmax(scores, dim=0)
        target = attn @ cand_targets
        # Map back to global index
        best_local = attn.argmax().item()
        best_global = top_indices[best_local].item()
        return nn.functional.normalize(target, dim=0), best_global, attn
    def recall_multihop(self, query_emb, hops=2, steps=3):
        """Multi-hop: each hop does two-stage retrieval."""
        xi = query_emb
        results = []
        for _ in range(hops):
            target, idx, attn = self.recall(xi, steps=steps)
            results.append((target, idx))
            xi = target  # Use target as next query
        return results
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def test_scale(model):
    """Scale test comparing pure Hopfield vs two-stage."""
    print("\n=== Scale Comparison ===")
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
        ("Tests failing in CI", "CI needs postgres service container"),
        ("Memory usage too high", "Leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files too large", "Logs rotate daily, shipped to Loki"),
    ]
    paraphrases = [
        "How's the weather outside?",
        "Push the new release",
        "DB performance terrible",
        "Login bug needs fixing",
        "Getting 500 errors",
        "Need better observability",
        "CI tests breaking",
        "Service using too much RAM",
        "Docker config help",
        "Logs eating disk space",
    ]
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    for n_bg in [0, 100, 500, 1000, 5000, 10000, 20000]:
        # Two-stage with different K
        for top_k in [20, 50, 100]:
            if n_bg < top_k and n_bg > 0:
                continue
            mem = TwoStageHopfield(beta=16.0, top_k=top_k)
            for i in range(len(pairs)):
                mem.learn(cue_embs[i], target_embs[i])
            if n_bg > 0:
                topics = ["server", "database", "API", "frontend", "backend",
                          "cache", "queue", "network", "storage", "auth",
                          "docker", "kubernetes", "redis", "nginx", "postgres"]
                bg_cues = [f"The {topics[i%len(topics)]} system has issue {i}"
                           for i in range(n_bg)]
                bg_targets = [f"Fix {topics[i%len(topics)]} issue {i} urgently"
                              for i in range(n_bg)]
                for start in range(0, n_bg, 256):
                    end = min(start + 256, n_bg)
                    bc = model.encode(bg_cues[start:end], convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE)
                    bt = model.encode(bg_targets[start:end], convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE)
                    for j in range(bc.shape[0]):
                        mem.learn(bc[j], bt[j])
            # Test
            t0 = time.time()
            correct = 0
            for i in range(len(paraphrases)):
                with torch.no_grad():
                    recalled, idx, attn = mem.recall(para_embs[i])
                all_sims = [cosine(recalled, target_embs[j]) for j in range(len(pairs))]
                if np.argmax(all_sims) == i:
                    correct += 1
            dt = (time.time() - t0) / len(paraphrases) * 1000
            n = len(paraphrases)
            total = len(mem.cue_embs)
            print(f"  N={total:>6}, K={top_k:>3}: "
                  f"Para={correct}/{n} ({correct/n:>3.0%}), "
                  f"time={dt:.1f}ms")
            del mem
            torch.cuda.empty_cache()
        if n_bg > 0:
            print()
 def test_multihop_at_scale(model):
    """Multi-hop with two-stage at scale."""
    print("\n=== Multi-hop Two-Stage (500 bg) ===")
    chains = [
        ["What's the weather?", "Check weather before going out",
         "My coffee shop nearby", "Great latte art"],
        ["Review the code", "Found memory leak", "Leaks cause OOM", "Add k8s limits"],
        ["Deploy to prod", "Blue-green deployment", "Blue is active", "Switch to green"],
    ]
    mem = TwoStageHopfield(beta=16.0, top_k=50)
    all_embs = []
    for chain in chains:
        embs = [model.encode([t], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
                for t in chain]
        all_embs.append(embs)
        for i in range(len(chain) - 1):
            mem.learn(embs[i], embs[i+1])
    # Background
    bg = [f"Background about {['code','ops','ml','data','infra'][i%5]} number {i}"
          for i in range(500)]
    bg_embs = model.encode(bg, convert_to_tensor=True,
                            normalize_embeddings=True, device=DEVICE, batch_size=256)
    for i in range(499):
        mem.learn(bg_embs[i], bg_embs[i+1])
    for ci, chain in enumerate(chains):
        results = mem.recall_multihop(all_embs[ci][0], hops=len(chain)-1)
        for hop_idx, (recalled, idx) in enumerate(results):
            target = all_embs[ci][hop_idx + 1]
            sim = cosine(recalled, target)
            status = "✓" if sim > 0.7 else "✗"
            print(f"  {status} Chain{ci+1} hop{hop_idx+1}: sim={sim:.3f}")
 def test_diverse_queries(model):
    """Larger test set with more diverse queries."""
    print("\n=== Diverse Query Test (20 pairs, 2000 bg) ===")
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the auth bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
        ("Tests failing in CI", "CI needs postgres service container"),
        ("Memory usage too high", "Leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files too large", "Logs rotate daily, shipped to Loki"),
        ("How to add caching?", "Redis available at redis.internal:6379"),
        ("Frontend loads slowly", "CDN CloudFlare, 1h TTL for assets"),
        ("Refactor payment module", "Stripe API, webhook in payments/webhook.py"),
        ("Set up new server", "Ubuntu 22.04, Docker, Tailscale, monitoring"),
        ("Optimize search", "Elasticsearch v8, recently upgraded"),
        ("Backup the database", "Daily 3am UTC cron to S3"),
        ("Configure reverse proxy", "Traefik, not nginx"),
        ("Team meeting schedule", "Standup 10am London, Mon-Fri"),
        ("Learn a new language", "User has Python+Go, new to systems programming"),
        ("Review my PR", "User prefers small PRs with clear commits"),
    ]
    paraphrases = [
        "How's the weather?",
        "Ship the release",
        "DB is crawling",
        "Fix the login issue",
        "Server errors everywhere",
        "Need observability",
        "CI is broken",
        "Too much RAM usage",
        "Docker help please",
        "Disk full from logs",
        "Want to add a cache layer",
        "Website too slow",
        "Payment code needs rework",
        "Provision a new machine",
        "Search is slow",
        "Need a DB backup",
        "Proxy configuration",
        "When's the standup?",
        "Want to learn Rust",
        "Check my pull request",
    ]
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    mem = TwoStageHopfield(beta=16.0, top_k=50)
    for i in range(len(pairs)):
        mem.learn(cue_embs[i], target_embs[i])
    # 2000 diverse background
    topics = ["server", "database", "API", "frontend", "backend", "cache",
              "queue", "network", "storage", "auth", "docker", "kubernetes",
              "redis", "nginx", "postgres", "python", "golang", "react",
              "terraform", "ansible"]
    actions = ["crashed", "is slow", "needs update", "has bug", "timed out",
               "needs migration", "needs backup", "has leak", "is down", "needs config"]
    bg_cues = [f"The {topics[i%len(topics)]} {actions[i%len(actions)]} (ticket {i})"
               for i in range(2000)]
    bg_targets = [f"Fix {topics[i%len(topics)]} {actions[i%len(actions)]}: see wiki page {i}"
                  for i in range(2000)]
    for start in range(0, 2000, 256):
        end = min(start + 256, 2000)
        bc = model.encode(bg_cues[start:end], convert_to_tensor=True,
                           normalize_embeddings=True, device=DEVICE)
        bt = model.encode(bg_targets[start:end], convert_to_tensor=True,
                           normalize_embeddings=True, device=DEVICE)
        for j in range(bc.shape[0]):
            mem.learn(bc[j], bt[j])
    # Test
    correct = 0
    failures = []
    for i in range(len(paraphrases)):
        with torch.no_grad():
            recalled, idx, attn = mem.recall(para_embs[i])
        all_sims = [cosine(recalled, target_embs[j]) for j in range(len(pairs))]
        best = np.argmax(all_sims)
        if best == i:
            correct += 1
        else:
            failures.append((i, best, all_sims[i], all_sims[best]))
    n = len(paraphrases)
    print(f"  Result: {correct}/{n} ({correct/n:.0%})")
    if failures:
        print(f"  Failures:")
        for qi, gi, sim_correct, sim_got in failures:
            print(f"    Q: '{paraphrases[qi][:30]}...' → got [{gi}] "
                  f"(sim_correct={sim_correct:.3f}, sim_got={sim_got:.3f})")
 def main():
    print("=" * 60)
    print("Experiment 7d: Two-Stage Hopfield")
    print("=" * 60)
    model = load_model()
    test_scale(model)
    test_multihop_at_scale(model)
    test_diverse_queries(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp07e_cue_augmentation.py
+++ b/experiments/exp07e_cue_augmentation.py
@@ -0,0 +1,260 @@
 """Experiment 7e: Cue augmentation to overcome embedding model limitations.
 Idea: When storing a memory, also store augmented versions of the cue.
 If the user says "The database is slow", also store:
 - The embedding with added noise (gaussian augmentation)
 - A shifted version toward common paraphrase patterns
 This increases the "catchment basin" of each memory without changing the model.
 Also test: using the LLM itself to generate paraphrases (simulated here).
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class AugmentedHopfield:
    """Hopfield with cue augmentation.
    Each memory stores N augmented cue embeddings, all pointing to the same target.
    During recall, any of the augmented cues can match.
    """
    def __init__(self, beta=16.0, top_k=20, n_augments=5, noise_std=0.15):
        self.beta = beta
        self.top_k = top_k
        self.n_augments = n_augments
        self.noise_std = noise_std
        self.cue_embs = []
        self.target_embs = []
        self.memory_ids = []  # Which original memory each entry belongs to
    def learn(self, cue_emb, target_emb, memory_id=None):
        """Store with augmented cues."""
        mid = memory_id if memory_id is not None else len(set(self.memory_ids))
        # Original
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
        self.memory_ids.append(mid)
        # Augmented: add noise and renormalize
        for _ in range(self.n_augments):
            noisy = cue_emb + torch.randn_like(cue_emb) * self.noise_std
            noisy = nn.functional.normalize(noisy, dim=0)
            self.cue_embs.append(noisy)
            self.target_embs.append(target_emb.detach())
            self.memory_ids.append(mid)
    def learn_with_paraphrases(self, cue_embs_list, target_emb, memory_id=None):
        """Store multiple cue embeddings for the same target.
        cue_embs_list: list of embeddings (original + paraphrases)
        """
        mid = memory_id if memory_id is not None else len(set(self.memory_ids))
        for ce in cue_embs_list:
            self.cue_embs.append(ce.detach())
            self.target_embs.append(target_emb.detach())
            self.memory_ids.append(mid)
    def recall(self, query_emb, steps=3):
        cue_mat = torch.stack(self.cue_embs)
        target_mat = torch.stack(self.target_embs)
        N = cue_mat.shape[0]
        # Stage 1: top-K
        k = min(self.top_k, N)
        sims = query_emb @ cue_mat.T
        _, top_idx = sims.topk(k)
        cand_cues = cue_mat[top_idx]
        cand_targets = target_mat[top_idx]
        # Stage 2: Hopfield settle
        xi = query_emb
        for _ in range(steps):
            scores = self.beta * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cand_cues
            xi = nn.functional.normalize(xi, dim=0)
        scores = self.beta * (xi @ cand_cues.T)
        attn = torch.softmax(scores, dim=0)
        target = attn @ cand_targets
        return nn.functional.normalize(target, dim=0)
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def test_augmentation(model):
    """Compare: no augmentation vs noise augmentation vs paraphrase augmentation."""
    print("\n=== Augmentation Comparison (20 pairs, 2000 bg) ===")
    pairs = [
        ("What's the weather like today?", "User checks weather every morning"),
        ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
        ("The database is slow again", "Missing index on users table"),
        ("I need to fix the authentication bug", "JWT tokens with 24h expiry in Redis"),
        ("The API returns 500 errors", "OOM in the Python worker"),
        ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
        ("Tests failing in CI", "CI needs postgres service container"),
        ("Memory usage too high", "Leak in websocket handler"),
        ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
        ("Log files too large", "Logs rotate daily, shipped to Loki"),
        ("How to add caching?", "Redis available at redis.internal:6379"),
        ("Frontend loads slowly", "CDN CloudFlare, 1h TTL for assets"),
        ("Refactor payment module", "Stripe API, webhook in payments/webhook.py"),
        ("Set up new server", "Ubuntu 22.04, Docker, Tailscale, monitoring"),
        ("Optimize search", "Elasticsearch v8, recently upgraded"),
        ("Backup the database", "Daily 3am UTC cron to S3"),
        ("Configure reverse proxy", "Traefik, not nginx"),
        ("Team meeting schedule", "Standup 10am London, Mon-Fri"),
        ("Learn a new programming language", "User has Python+Go, new to systems"),
        ("Review my pull request", "User prefers small PRs with clear commits"),
    ]
    paraphrases = [
        "How's the weather?", "Ship the release", "DB performance terrible",
        "Fix the login issue", "Server errors everywhere", "Need observability",
        "CI tests breaking", "Service using too much RAM", "Docker config help",
        "Logs eating disk space", "Want to add a cache layer", "Website too slow",
        "Payment code needs rework", "Provision a new machine", "Search is slow",
        "Need a DB backup", "Proxy configuration", "When's the standup?",
        "Want to learn Rust", "Check my pull request",
    ]
    # Hand-crafted additional paraphrases for hard cases
    extra_paraphrases = {
        1: ["Ship the release", "Push to production", "Release the new build"],
        3: ["Fix the login issue", "Authentication is broken", "Login doesn't work"],
        4: ["Server errors everywhere", "Getting 500s", "Internal server error"],
        5: ["Need observability", "Set up alerts", "Monitor the services"],
        10: ["Add a cache layer", "Implement caching", "Cache the responses"],
        11: ["Website too slow", "Page load time is bad", "Frontend performance"],
        13: ["Provision a new machine", "Need a new server", "Set up a new box"],
        17: ["When's the standup?", "What time is the meeting?", "Daily sync time?"],
        18: ["Want to learn Rust", "Getting into Rust", "Start learning Rust"],
        19: ["Check my pull request", "Look at my code changes", "PR review please"],
    }
    cue_embs = model.encode([p[0] for p in pairs], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in pairs], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(paraphrases, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    # Encode extra paraphrases
    extra_embs = {}
    for idx, texts in extra_paraphrases.items():
        extra_embs[idx] = model.encode(texts, convert_to_tensor=True,
                                        normalize_embeddings=True, device=DEVICE)
    # Background
    topics = ["server", "database", "API", "frontend", "backend", "cache",
              "queue", "network", "storage", "auth", "docker", "kubernetes",
              "redis", "nginx", "postgres", "python", "golang", "react",
              "terraform", "ansible"]
    actions = ["crashed", "is slow", "needs update", "has bug", "timed out",
               "needs migration", "needs backup", "has leak", "is down", "needs config"]
    bg_cues = [f"The {topics[i%len(topics)]} {actions[i%len(actions)]} (ticket {i})"
               for i in range(2000)]
    bg_targets = [f"Fix {topics[i%len(topics)]} {actions[i%len(actions)]}: wiki {i}"
                  for i in range(2000)]
    bg_cue_embs = model.encode(bg_cues, convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE, batch_size=256)
    bg_target_embs = model.encode(bg_targets, convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE, batch_size=256)
    def evaluate(mem, label):
        correct = 0
        for i in range(len(paraphrases)):
            with torch.no_grad():
                recalled = mem.recall(para_embs[i])
            all_sims = [cosine(recalled, target_embs[j]) for j in range(len(pairs))]
            if np.argmax(all_sims) == i:
                correct += 1
        n = len(paraphrases)
        print(f"  {label}: {correct}/{n} ({correct/n:.0%})")
        return correct / n
    # Method 1: No augmentation (baseline)
    mem1 = AugmentedHopfield(n_augments=0)
    for i in range(len(pairs)):
        mem1.learn(cue_embs[i], target_embs[i], memory_id=i)
    for i in range(2000):
        mem1.learn(bg_cue_embs[i], bg_target_embs[i], memory_id=100+i)
    evaluate(mem1, "No augmentation")
    # Method 2: Noise augmentation (5 copies)
    for noise in [0.1, 0.15, 0.2, 0.3]:
        mem2 = AugmentedHopfield(n_augments=5, noise_std=noise)
        for i in range(len(pairs)):
            mem2.learn(cue_embs[i], target_embs[i], memory_id=i)
        for i in range(2000):
            # Don't augment background
            mem2.cue_embs.append(bg_cue_embs[i])
            mem2.target_embs.append(bg_target_embs[i])
            mem2.memory_ids.append(100+i)
        evaluate(mem2, f"Noise aug (σ={noise}, n=5)")
    # Method 3: Noise augmentation (20 copies)
    mem3 = AugmentedHopfield(n_augments=20, noise_std=0.15)
    for i in range(len(pairs)):
        mem3.learn(cue_embs[i], target_embs[i], memory_id=i)
    for i in range(2000):
        mem3.cue_embs.append(bg_cue_embs[i])
        mem3.target_embs.append(bg_target_embs[i])
        mem3.memory_ids.append(100+i)
    evaluate(mem3, "Noise aug (σ=0.15, n=20)")
    # Method 4: Paraphrase augmentation (hand-crafted extras)
    mem4 = AugmentedHopfield(n_augments=0)
    for i in range(len(pairs)):
        cue_list = [cue_embs[i]]
        if i in extra_embs:
            cue_list.extend([e for e in extra_embs[i]])
        mem4.learn_with_paraphrases(cue_list, target_embs[i], memory_id=i)
    for i in range(2000):
        mem4.cue_embs.append(bg_cue_embs[i])
        mem4.target_embs.append(bg_target_embs[i])
        mem4.memory_ids.append(100+i)
    evaluate(mem4, "Paraphrase aug (hand-crafted)")
    # Method 5: Noise + Paraphrase combined
    mem5 = AugmentedHopfield(n_augments=5, noise_std=0.15)
    for i in range(len(pairs)):
        cue_list = [cue_embs[i]]
        if i in extra_embs:
            cue_list.extend([e for e in extra_embs[i]])
        mem5.learn_with_paraphrases(cue_list, target_embs[i], memory_id=i)
    for i in range(2000):
        mem5.cue_embs.append(bg_cue_embs[i])
        mem5.target_embs.append(bg_target_embs[i])
        mem5.memory_ids.append(100+i)
    evaluate(mem5, "Noise + Paraphrase combined")
 def main():
    print("=" * 60)
    print("Experiment 7e: Cue Augmentation")
    print("=" * 60)
    model = load_model()
    test_augmentation(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp08_llm_integration.py
+++ b/experiments/exp08_llm_integration.py
@@ -0,0 +1,213 @@
 """Experiment P0: LLM Integration — end-to-end memory-augmented conversation.
 Tests:
 1. Memory extraction (heuristic fallback since LLM gateway is down)
 2. Paraphrase generation (heuristic fallback)
 3. End-to-end: conversation → extract → store → recall → inject
 4. Multi-turn conversation simulation
 """
 import sys
 import time
 from pathlib import Path
 import torch
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from nuonuo.hippocampus import HippocampalMemory
 from llm import (LLMClient, extract_memories_heuristic, extract_memories_llm,
                 generate_paraphrases_heuristic, generate_paraphrases_llm,
                 format_recalled_memories)
 DEVICE = "cuda"
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb(model, text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)[0]
 def test_heuristic_extraction():
    """Test memory extraction without LLM."""
    print("=== Test 1: Heuristic Memory Extraction ===\n")
    conversations = [
        ("How do I deploy to production?",
         "Use the blue-green deployment pipeline via GitHub Actions. The config is in .github/workflows/deploy.yml"),
        ("The database is really slow today",
         "Check for missing indexes on the users table. Last time this happened it was the created_at column."),
        ("Hi, how are you?",
         "I'm doing well, thanks!"),
        ("What port does Redis run on?",
         "Redis is on port 6379 at redis.internal"),
        ("Fix the auth bug please",
         "The auth service uses JWT tokens with 24h expiry stored in Redis. The bug was in token refresh logic."),
    ]
    for user_msg, assistant_msg in conversations:
        memories = extract_memories_heuristic(user_msg, assistant_msg)
        print(f"  User: {user_msg[:50]}...")
        if memories:
            for m in memories:
                print(f"    → CUE: {m.cue[:40]}... | TARGET: {m.target[:50]}... | IMP: {m.importance}")
        else:
            print(f"    → (nothing extracted)")
        print()
 def test_heuristic_paraphrases():
    """Test paraphrase generation without LLM."""
    print("=== Test 2: Heuristic Paraphrase Generation ===\n")
    texts = [
        "How do I deploy to production?",
        "The database is slow",
        "Can you fix the authentication bug?",
        "I need to configure nginx",
        "Let's set up monitoring for the server",
    ]
    for text in texts:
        paras = generate_paraphrases_heuristic(text, n=3)
        print(f"  Original: {text}")
        for p in paras:
            print(f"    → {p}")
        print()
 def test_end_to_end(model):
    """Full pipeline: conversation → extract → store → recall → inject."""
    print("=== Test 3: End-to-End Pipeline ===\n")
    memory = HippocampalMemory(embed_dim=384)
    llm = LLMClient()  # Will fail gracefully if gateway down
    # Simulate a few conversation turns
    turns = [
        ("How do I deploy to production?",
         "Use blue-green deployment via GitHub Actions. Config in .github/workflows/deploy.yml"),
        ("The database is really slow",
         "Check for missing indexes on users table, especially created_at column"),
        ("What port does Redis run on?",
         "Redis is on port 6379 at redis.internal"),
        ("Fix the auth bug",
         "Auth uses JWT tokens with 24h expiry in Redis. Bug was in token refresh."),
        ("How do I backup the database?",
         "Backups run daily at 3am UTC via cron job to S3. Config in /etc/cron.d/db-backup"),
    ]
    # Phase 1: Learn from conversations
    print("--- Phase 1: Learning from conversations ---")
    for user_msg, assistant_msg in turns:
        # Extract memories
        if llm.available:
            memories = extract_memories_llm(llm, user_msg, assistant_msg)
        else:
            memories = extract_memories_heuristic(user_msg, assistant_msg)
        for mem_item in memories:
            # Generate paraphrases
            if llm.available:
                paras = generate_paraphrases_llm(llm, mem_item.cue, n=3)
            else:
                paras = generate_paraphrases_heuristic(mem_item.cue, n=3)
            # Embed and store
            cue_emb = emb(model, mem_item.cue)
            target_emb = emb(model, mem_item.target)
            para_embs = [emb(model, p) for p in paras] if paras else None
            mid = memory.store(
                cue_emb, target_emb,
                cue_variants=para_embs,
                metadata={"cue": mem_item.cue, "target": mem_item.target,
                          "importance": mem_item.importance},
            )
            print(f"  Stored [{mid}]: {mem_item.cue[:40]}... → {mem_item.target[:40]}...")
            if paras:
                print(f"    + {len(paras)} paraphrases: {[p[:30] for p in paras]}")
    print(f"\n  Total: {memory.stats()}")
    # Phase 2: Recall
    print("\n--- Phase 2: Recall from new queries ---")
    queries = [
        "DB performance is terrible",
        "How to push a new release?",
        "What's the Redis connection info?",
        "The login system has a problem",
        "Need to create a database backup",
        "Where's the deployment config?",
    ]
    for query in queries:
        query_emb = emb(model, query)
        # Single-hop recall
        results = memory.recall(query_emb, top_k=2)
        # Multi-hop
        chain = memory.recall_chain(query_emb, hops=2)
        # Format for context injection
        all_results = results + [r for r in chain if r.memory_id not in {r2.memory_id for r2 in results}]
        context = format_recalled_memories(all_results)
        print(f"\n  Query: \"{query}\"")
        if results:
            print(f"    Top result: {results[0].metadata.get('target', '?')[:60]}...")
            print(f"    Similarity: {results[0].similarity:.3f}")
        if chain and len(chain) > 1:
            print(f"    Chain hop 2: {chain[1].metadata.get('target', '?')[:60]}...")
        if context:
            print(f"    Context injection:\n      {context.replace(chr(10), chr(10) + '      ')}")
 def test_llm_live(model):
    """Test with live LLM if available."""
    print("\n=== Test 4: Live LLM Integration ===\n")
    llm = LLMClient()
    if not llm.available:
        print("  LLM Gateway not available. Skipping live test.")
        print("  To test: ensure https://ste-jarvis.tiktok-row.net/llm/v1 is reachable")
        return
    # Test extraction
    user_msg = "The payment webhook keeps failing with a 502 error"
    assistant_msg = "The webhook endpoint at /api/payments/webhook is behind nginx. Check if the upstream timeout is too short — payment processing can take up to 30 seconds."
    memories = extract_memories_llm(llm, user_msg, assistant_msg)
    print(f"  Extracted {len(memories)} memories from live LLM:")
    for m in memories:
        print(f"    CUE: {m.cue} | TARGET: {m.target[:60]}... | IMP: {m.importance}")
    # Test paraphrase
    if memories:
        paras = generate_paraphrases_llm(llm, memories[0].cue, n=3)
        print(f"\n  Paraphrases for '{memories[0].cue}':")
        for p in paras:
            print(f"    → {p}")
 def main():
    print("=" * 60)
    print("Experiment P0: LLM Integration")
    print("=" * 60)
    model = load_model()
    test_heuristic_extraction()
    test_heuristic_paraphrases()
    test_end_to_end(model)
    test_llm_live(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp09_embedding_models.py
+++ b/experiments/exp09_embedding_models.py
@@ -0,0 +1,222 @@
 """Experiment P1: Better embedding models.
 MiniLM (22M) has weak paraphrase similarity for many pairs.
 Test: BGE-small (33M), BGE-base (109M), and E5-small (33M).
 Skip large models (330M+) due to VRAM budget with Hebbian W.
 Measure:
 1. Paraphrase pair cosine similarity (gap between same/diff pairs)
 2. Recall accuracy with Hopfield at 2K background
 3. Encoding speed
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 # Test pairs (same as exp07e)
 PAIRS = [
    ("What's the weather like today?", "User checks weather every morning"),
    ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
    ("The database is slow again", "Missing index on users table"),
    ("I need to fix the authentication bug", "JWT tokens with 24h expiry in Redis"),
    ("The API returns 500 errors", "OOM in the Python worker"),
    ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
    ("Tests failing in CI", "CI needs postgres service container"),
    ("Memory usage too high", "Leak in websocket handler"),
    ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
    ("Log files too large", "Logs rotate daily, shipped to Loki"),
    ("How to add caching?", "Redis available at redis.internal:6379"),
    ("Frontend loads slowly", "CDN CloudFlare, 1h TTL for assets"),
    ("Refactor payment module", "Stripe API, webhook in payments/webhook.py"),
    ("Set up new server", "Ubuntu 22.04, Docker, Tailscale, monitoring"),
    ("Optimize search", "Elasticsearch v8, recently upgraded"),
    ("Backup the database", "Daily 3am UTC cron to S3"),
    ("Configure reverse proxy", "Traefik, not nginx"),
    ("Team meeting schedule", "Standup 10am London, Mon-Fri"),
    ("Learn a new programming language", "User has Python+Go, new to systems"),
    ("Review my pull request", "User prefers small PRs with clear commits"),
 ]
 PARAPHRASES = [
    "How's the weather?", "Ship the release", "DB performance terrible",
    "Fix the login issue", "Server errors everywhere", "Need observability",
    "CI tests breaking", "Service using too much RAM", "Docker config help",
    "Logs eating disk space", "Want to add a cache layer", "Website too slow",
    "Payment code needs rework", "Provision a new machine", "Search is slow",
    "Need a DB backup", "Proxy configuration", "When's the standup?",
    "Want to learn Rust", "Check my pull request",
 ]
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class TwoStageHopfield:
    def __init__(self, embed_dim, beta=16.0, top_k=20):
        self.beta = beta
        self.top_k = top_k
        self.cue_embs = []
        self.target_embs = []
    def learn(self, cue_emb, target_emb):
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
    def recall(self, query_emb, steps=3):
        cue_mat = torch.stack(self.cue_embs)
        target_mat = torch.stack(self.target_embs)
        K = min(self.top_k, len(self.cue_embs))
        sims = query_emb @ cue_mat.T
        _, top_idx = sims.topk(K)
        cand_cues = cue_mat[top_idx]
        cand_targets = target_mat[top_idx]
        xi = query_emb
        for _ in range(steps):
            scores = self.beta * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cand_cues
            xi = nn.functional.normalize(xi, dim=0)
        scores = self.beta * (xi @ cand_cues.T)
        attn = torch.softmax(scores, dim=0)
        return nn.functional.normalize(attn @ cand_targets, dim=0)
 def evaluate_model(model_name):
    """Full evaluation of one embedding model."""
    from sentence_transformers import SentenceTransformer
    print(f"\n--- {model_name} ---")
    t0 = time.time()
    model = SentenceTransformer(model_name, device=DEVICE)
    load_time = time.time() - t0
    embed_dim = model.get_sentence_embedding_dimension()
    print(f"  Dim: {embed_dim}, Load: {load_time:.1f}s")
    # 1. Paraphrase similarity gap
    cue_texts = [p[0] for p in PAIRS]
    cue_embs = model.encode(cue_texts, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(PARAPHRASES, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in PAIRS], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    same_sims = [cosine(cue_embs[i], para_embs[i]) for i in range(len(PAIRS))]
    diff_sims = []
    for i in range(len(PAIRS)):
        for j in range(len(PAIRS)):
            if i != j:
                diff_sims.append(cosine(cue_embs[i], para_embs[j]))
    mean_same = np.mean(same_sims)
    mean_diff = np.mean(diff_sims)
    min_same = np.min(same_sims)
    gap = mean_same - mean_diff
    print(f"  Similarity: same={mean_same:.3f} (min={min_same:.3f}), "
          f"diff={mean_diff:.3f}, gap={gap:.3f}")
    # Show worst pairs
    worst_idx = np.argsort(same_sims)[:3]
    for idx in worst_idx:
        print(f"    Worst: {same_sims[idx]:.3f} '{cue_texts[idx][:30]}...' ↔ '{PARAPHRASES[idx][:30]}...'")
    # 2. Encoding speed
    texts_100 = [f"Test sentence number {i} about various topics" for i in range(100)]
    t0 = time.time()
    model.encode(texts_100, convert_to_tensor=True, device=DEVICE)
    speed = 100 / (time.time() - t0)
    print(f"  Speed: {speed:.0f} sentences/s")
    # 3. Recall with 2K background
    mem = TwoStageHopfield(embed_dim, beta=16.0, top_k=20)
    for i in range(len(PAIRS)):
        mem.learn(cue_embs[i], target_embs[i])
    # Background
    bg_cues = [f"The {['server','db','api','fe','be','cache'][i%6]} has issue {i}"
               for i in range(2000)]
    bg_targets = [f"Fix issue {i}" for i in range(2000)]
    bg_cue_embs = model.encode(bg_cues, convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE, batch_size=256)
    bg_target_embs = model.encode(bg_targets, convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE, batch_size=256)
    for i in range(2000):
        mem.learn(bg_cue_embs[i], bg_target_embs[i])
    correct = 0
    for i in range(len(PARAPHRASES)):
        recalled = mem.recall(para_embs[i])
        all_sims = [cosine(recalled, target_embs[j]) for j in range(len(PAIRS))]
        if np.argmax(all_sims) == i:
            correct += 1
    n = len(PARAPHRASES)
    print(f"  Recall (20 pairs + 2K bg): {correct}/{n} ({correct/n:.0%})")
    # VRAM
    vram = torch.cuda.memory_allocated() / 1024**2
    print(f"  VRAM: {vram:.0f} MB")
    del model, mem
    torch.cuda.empty_cache()
    return {
        "model": model_name, "dim": embed_dim,
        "same_sim": mean_same, "diff_sim": mean_diff, "gap": gap,
        "min_same": min_same, "speed": speed,
        "recall": correct / n, "vram_mb": vram,
    }
 def main():
    print("=" * 60)
    print("Experiment P1: Embedding Model Comparison")
    print("=" * 60)
    models = [
        "all-MiniLM-L6-v2",           # Baseline, 22M, dim=384
        "BAAI/bge-small-en-v1.5",     # 33M, dim=384
        "BAAI/bge-base-en-v1.5",      # 109M, dim=768
        "intfloat/e5-small-v2",       # 33M, dim=384
    ]
    results = []
    for model_name in models:
        try:
            r = evaluate_model(model_name)
            results.append(r)
        except Exception as e:
            print(f"  ERROR: {e}")
    # Summary table
    print("\n" + "=" * 80)
    print("SUMMARY")
    print(f"{'Model':<30} {'Dim':>4} {'SameSim':>8} {'Gap':>6} "
          f"{'MinSim':>7} {'Recall':>7} {'Speed':>6} {'VRAM':>6}")
    print("-" * 80)
    for r in results:
        print(f"{r['model']:<30} {r['dim']:>4} {r['same_sim']:>8.3f} "
              f"{r['gap']:>6.3f} {r['min_same']:>7.3f} "
              f"{r['recall']:>6.0%} {r['speed']:>5.0f}/s {r['vram_mb']:>5.0f}MB")
 if __name__ == "__main__":
    main()
--- a/experiments/exp10_auto_paraphrase.py
+++ b/experiments/exp10_auto_paraphrase.py
@@ -0,0 +1,220 @@
 """Experiment P2: Auto Paraphrase Generation.
 LLM gateway down, so test:
 1. Heuristic paraphrase effect on recall (how much does crappy augmentation help?)
 2. "Oracle" paraphrase (hand-crafted) vs heuristic vs none
 3. Design: what makes a good paraphrase for memory augmentation?
 4. Analysis: which failures are fixable by paraphrase vs need better embeddings?
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from llm import generate_paraphrases_heuristic
 DEVICE = "cuda"
 PAIRS = [
    ("What's the weather like today?", "User checks weather every morning"),
    ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
    ("The database is slow again", "Missing index on users table"),
    ("I need to fix the authentication bug", "JWT tokens with 24h expiry in Redis"),
    ("The API returns 500 errors", "OOM in the Python worker"),
    ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
    ("Tests failing in CI", "CI needs postgres service container"),
    ("Memory usage too high", "Leak in websocket handler"),
    ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
    ("Log files too large", "Logs rotate daily, shipped to Loki"),
    ("How to add caching?", "Redis available at redis.internal:6379"),
    ("Frontend loads slowly", "CDN CloudFlare, 1h TTL for assets"),
    ("Refactor payment module", "Stripe API, webhook in payments/webhook.py"),
    ("Set up new server", "Ubuntu 22.04, Docker, Tailscale, monitoring"),
    ("Optimize search", "Elasticsearch v8, recently upgraded"),
    ("Backup the database", "Daily 3am UTC cron to S3"),
    ("Configure reverse proxy", "Traefik, not nginx"),
    ("Team meeting schedule", "Standup 10am London, Mon-Fri"),
    ("Learn a new programming language", "User has Python+Go, new to systems"),
    ("Review my pull request", "User prefers small PRs with clear commits"),
 ]
 PARAPHRASES = [
    "How's the weather?", "Ship the release", "DB performance terrible",
    "Fix the login issue", "Server errors everywhere", "Need observability",
    "CI tests breaking", "Service using too much RAM", "Docker config help",
    "Logs eating disk space", "Want to add a cache layer", "Website too slow",
    "Payment code needs rework", "Provision a new machine", "Search is slow",
    "Need a DB backup", "Proxy configuration", "When's the standup?",
    "Want to learn Rust", "Check my pull request",
 ]
 # Oracle paraphrases: hand-crafted to cover the semantic gaps
 ORACLE_PARAPHRASES = {
    1: ["Ship the release", "Push to production", "Release the new build", "Deploy new code"],
    3: ["Fix the login issue", "Authentication broken", "Login doesn't work", "Auth bug"],
    4: ["Server errors everywhere", "Getting 500s", "Internal server error", "API is down"],
    5: ["Need observability", "Set up alerts", "Monitor services", "Add monitoring"],
    10: ["Add a cache layer", "Implement caching", "Cache responses"],
    11: ["Website too slow", "Page loads slowly", "Frontend performance bad"],
    12: ["Payment code needs rework", "Refactor payments", "Payment system restructure"],
    13: ["Provision a new machine", "Need a new server", "Set up new box", "New machine setup"],
    14: ["Search is slow", "Search performance", "Optimize search queries"],
    17: ["When's the standup?", "Meeting time?", "Daily sync schedule", "What time is standup?"],
    18: ["Want to learn Rust", "Learning Rust", "Getting into Rust", "Start with Rust"],
    19: ["Check my pull request", "Look at my code", "PR review please", "Review my code changes"],
 }
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class TwoStageHopfield:
    def __init__(self, beta=16.0, top_k=20):
        self.beta = beta
        self.top_k = top_k
        self.cue_embs = []
        self.target_embs = []
        self.memory_ids = []
    def learn(self, cue_emb, target_emb, mid):
        self.cue_embs.append(cue_emb.detach())
        self.target_embs.append(target_emb.detach())
        self.memory_ids.append(mid)
    def recall(self, query_emb, steps=3):
        cue_mat = torch.stack(self.cue_embs)
        target_mat = torch.stack(self.target_embs)
        K = min(self.top_k, len(self.cue_embs))
        sims = query_emb @ cue_mat.T
        _, top_idx = sims.topk(K)
        cand_cues = cue_mat[top_idx]
        cand_targets = target_mat[top_idx]
        cand_mids = [self.memory_ids[i] for i in top_idx.tolist()]
        xi = query_emb
        for _ in range(steps):
            scores = self.beta * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cand_cues
            xi = nn.functional.normalize(xi, dim=0)
        scores = self.beta * (xi @ cand_cues.T)
        attn = torch.softmax(scores, dim=0)
        # Aggregate by memory_id
        mid_scores = {}
        for i, mid in enumerate(cand_mids):
            mid_scores[mid] = mid_scores.get(mid, 0) + attn[i].item()
        best_mid = max(mid_scores, key=mid_scores.get)
        target = nn.functional.normalize(attn @ cand_targets, dim=0)
        return target, best_mid
 def evaluate(model, augmentation_mode, n_background=2000):
    """Test recall with different augmentation strategies."""
    from sentence_transformers import SentenceTransformer
    cue_embs = model.encode([p[0] for p in PAIRS], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in PAIRS], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(PARAPHRASES, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    mem = TwoStageHopfield(beta=16.0, top_k=20)
    for i in range(len(PAIRS)):
        mem.learn(cue_embs[i], target_embs[i], mid=i)
        if augmentation_mode == "heuristic":
            paras = generate_paraphrases_heuristic(PAIRS[i][0], n=3)
            para_e = model.encode(paras, convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)
            for j in range(len(paras)):
                mem.learn(para_e[j], target_embs[i], mid=i)
        elif augmentation_mode == "oracle":
            if i in ORACLE_PARAPHRASES:
                paras = ORACLE_PARAPHRASES[i]
                para_e = model.encode(paras, convert_to_tensor=True,
                                       normalize_embeddings=True, device=DEVICE)
                for j in range(len(paras)):
                    mem.learn(para_e[j], target_embs[i], mid=i)
        elif augmentation_mode == "oracle_all":
            # Oracle for all pairs (3 generic paraphrases each)
            if i in ORACLE_PARAPHRASES:
                paras = ORACLE_PARAPHRASES[i]
            else:
                paras = generate_paraphrases_heuristic(PAIRS[i][0], n=3)
            para_e = model.encode(paras, convert_to_tensor=True,
                                   normalize_embeddings=True, device=DEVICE)
            for j in range(len(paras)):
                mem.learn(para_e[j], target_embs[i], mid=i)
    # Background
    if n_background > 0:
        topics = ["server", "db", "api", "fe", "be", "cache"]
        bg_cues = [f"The {topics[i%6]} has issue {i}" for i in range(n_background)]
        bg_targets = [f"Fix issue {i}" for i in range(n_background)]
        bg_c = model.encode(bg_cues, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE, batch_size=256)
        bg_t = model.encode(bg_targets, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE, batch_size=256)
        for i in range(n_background):
            mem.learn(bg_c[i], bg_t[i], mid=100+i)
    correct = 0
    failures = []
    for i in range(len(PARAPHRASES)):
        _, best_mid = mem.recall(para_embs[i])
        if best_mid == i:
            correct += 1
        else:
            failures.append((i, best_mid))
    n = len(PARAPHRASES)
    return correct, n, failures
 def main():
    print("=" * 60)
    print("Experiment P2: Auto Paraphrase Analysis")
    print("=" * 60)
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
    for bg in [0, 500, 2000]:
        print(f"\n=== Background: {bg} ===")
        for mode in ["none", "heuristic", "oracle", "oracle_all"]:
            correct, n, failures = evaluate(model, mode, n_background=bg)
            fail_ids = [f[0] for f in failures]
            print(f"  {mode:<15}: {correct}/{n} ({correct/n:.0%})"
                  + (f" | Failures: {fail_ids}" if failures else ""))
    # Analyze: which failures are fixable?
    print("\n=== Failure Analysis (2K bg, no augmentation) ===")
    correct, n, failures = evaluate(model, "none", 2000)
    cue_texts = [p[0] for p in PAIRS]
    for qi, gi in failures:
        cue_emb = model.encode([cue_texts[qi]], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)[0]
        para_emb = model.encode([PARAPHRASES[qi]], convert_to_tensor=True,
                                 normalize_embeddings=True, device=DEVICE)[0]
        sim = cosine(cue_emb, para_emb)
        fixable = qi in ORACLE_PARAPHRASES
        print(f"  [{qi}] '{PARAPHRASES[qi][:25]}...' → got [{gi}], "
              f"cue_sim={sim:.3f}, oracle_fix={'✓' if fixable else '✗'}")
 if __name__ == "__main__":
    main()
--- a/experiments/exp11_scale_ceiling.py
+++ b/experiments/exp11_scale_ceiling.py
@@ -0,0 +1,237 @@
 """Experiment P3: Breaking the 20K 80% ceiling.
 Hypothesis: NN pre-filter (top-20) misses the correct cue at large scale.
 Tests:
 1. Oracle analysis: is the correct cue in top-K? What K is needed?
 2. Hierarchical memory: cluster memories, route query to relevant cluster
 3. Re-ranking: top-K NN → cross-similarity re-rank → Hopfield on re-ranked
 4. Multiple projections: ensemble of NN lookups with different random projections
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 DEVICE = "cuda"
 PAIRS = [
    ("What's the weather like today?", "User checks weather every morning"),
    ("Let's deploy the new version", "Deployment uses GitHub Actions with k3s"),
    ("The database is slow again", "Missing index on users table"),
    ("I need to fix the authentication bug", "JWT tokens with 24h expiry in Redis"),
    ("The API returns 500 errors", "OOM in the Python worker"),
    ("Let's set up monitoring", "Prometheus + Grafana on OCI"),
    ("Tests failing in CI", "CI needs postgres service container"),
    ("Memory usage too high", "Leak in websocket handler"),
    ("Help with Docker setup", "docker-compose for dev, k3s for prod"),
    ("Log files too large", "Logs rotate daily, shipped to Loki"),
 ]
 PARAPHRASES = [
    "How's the weather?", "Ship the release", "DB performance terrible",
    "Fix the login issue", "Server errors everywhere", "Need observability",
    "CI tests breaking", "Service using too much RAM", "Docker config help",
    "Logs eating disk space",
 ]
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def build_memory(model, n_bg):
    """Build memory with test pairs + background."""
    cue_embs = model.encode([p[0] for p in PAIRS], convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE)
    target_embs = model.encode([p[1] for p in PAIRS], convert_to_tensor=True,
                                normalize_embeddings=True, device=DEVICE)
    para_embs = model.encode(PARAPHRASES, convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)
    all_cues = list(cue_embs)
    all_targets = list(target_embs)
    all_mids = list(range(len(PAIRS)))
    if n_bg > 0:
        topics = ["server", "db", "api", "fe", "be", "cache",
                  "queue", "net", "store", "auth", "docker", "k8s"]
        bg_cues = [f"The {topics[i%len(topics)]} has issue {i}" for i in range(n_bg)]
        bg_targets = [f"Fix {topics[i%len(topics)]} issue {i}" for i in range(n_bg)]
        bg_c = model.encode(bg_cues, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE, batch_size=256)
        bg_t = model.encode(bg_targets, convert_to_tensor=True,
                             normalize_embeddings=True, device=DEVICE, batch_size=256)
        for i in range(n_bg):
            all_cues.append(bg_c[i])
            all_targets.append(bg_t[i])
            all_mids.append(100 + i)
    cue_mat = torch.stack(all_cues)
    target_mat = torch.stack(all_targets)
    return cue_mat, target_mat, all_mids, cue_embs, target_embs, para_embs
 def test_topk_coverage(model, n_bg_list):
    """Is the correct cue in top-K? What K do we need?"""
    print("=== Test 1: Top-K Coverage Analysis ===\n")
    for n_bg in n_bg_list:
        cue_mat, target_mat, mids, cue_embs, target_embs, para_embs = build_memory(model, n_bg)
        for K in [5, 10, 20, 50, 100, 200]:
            in_topk = 0
            for i in range(len(PARAPHRASES)):
                sims = para_embs[i] @ cue_mat.T
                _, top_idx = sims.topk(min(K, len(mids)))
                top_mids = [mids[j] for j in top_idx.tolist()]
                if i in top_mids:
                    in_topk += 1
            n = len(PARAPHRASES)
            print(f"  N={n_bg+len(PAIRS):>6}, K={K:>3}: "
                  f"{in_topk}/{n} ({in_topk/n:.0%}) correct cue in top-K")
        print()
 def test_two_stage_topk(model, n_bg):
    """Vary K in two-stage Hopfield to find optimal."""
    print(f"\n=== Test 2: Two-Stage K Optimization (bg={n_bg}) ===\n")
    cue_mat, target_mat, mids, cue_embs, target_embs, para_embs = build_memory(model, n_bg)
    for K in [5, 10, 20, 50, 100, 200]:
        correct = 0
        for i in range(len(PARAPHRASES)):
            sims = para_embs[i] @ cue_mat.T
            k = min(K, len(mids))
            _, top_idx = sims.topk(k)
            cand_cues = cue_mat[top_idx]
            cand_targets = target_mat[top_idx]
            cand_mids = [mids[j] for j in top_idx.tolist()]
            # Hopfield settle
            xi = para_embs[i]
            for _ in range(3):
                scores = 16.0 * (xi @ cand_cues.T)
                attn = torch.softmax(scores, dim=0)
                xi = attn @ cand_cues
                xi = nn.functional.normalize(xi, dim=0)
            scores = 16.0 * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            mid_scores = {}
            for j, mid in enumerate(cand_mids):
                mid_scores[mid] = mid_scores.get(mid, 0) + attn[j].item()
            best_mid = max(mid_scores, key=mid_scores.get)
            if best_mid == i:
                correct += 1
        n = len(PARAPHRASES)
        print(f"  K={K:>3}: {correct}/{n} ({correct/n:.0%})")
 def test_hierarchical(model, n_bg):
    """Cluster memories by topic, route query to relevant cluster."""
    print(f"\n=== Test 3: Hierarchical Memory (bg={n_bg}) ===\n")
    cue_mat, target_mat, mids, cue_embs, target_embs, para_embs = build_memory(model, n_bg)
    # Simple clustering: k-means on cue embeddings
    from torch import cdist
    n_clusters = max(10, (n_bg + len(PAIRS)) // 100)
    # K-means (simple implementation)
    N = cue_mat.shape[0]
    centroids = cue_mat[torch.randperm(N)[:n_clusters]].clone()
    for _ in range(20):
        dists = 1 - cue_mat @ centroids.T  # cosine distance
        assignments = dists.argmin(dim=1)
        for c in range(n_clusters):
            mask = assignments == c
            if mask.sum() > 0:
                centroids[c] = nn.functional.normalize(cue_mat[mask].mean(dim=0), dim=0)
    # Route query to top-3 clusters, then Hopfield within
    correct = 0
    for i in range(len(PARAPHRASES)):
        # Find relevant clusters
        cluster_sims = para_embs[i] @ centroids.T
        top_clusters = cluster_sims.topk(3).indices
        # Gather candidates from top clusters
        cand_idx = []
        for c in top_clusters:
            cluster_members = (assignments == c).nonzero().squeeze(-1).tolist()
            cand_idx.extend(cluster_members)
        cand_idx = list(set(cand_idx))
        if not cand_idx:
            continue
        # Hopfield on candidates
        cand_cues = cue_mat[cand_idx]
        cand_targets = target_mat[cand_idx]
        cand_mids = [mids[j] for j in cand_idx]
        K = min(20, len(cand_idx))
        sims = para_embs[i] @ cand_cues.T
        _, top_local = sims.topk(K)
        local_cues = cand_cues[top_local]
        local_mids = [cand_mids[j] for j in top_local.tolist()]
        xi = para_embs[i]
        for _ in range(3):
            scores = 16.0 * (xi @ local_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ local_cues
            xi = nn.functional.normalize(xi, dim=0)
        scores = 16.0 * (xi @ local_cues.T)
        attn = torch.softmax(scores, dim=0)
        mid_scores = {}
        for j, mid in enumerate(local_mids):
            mid_scores[mid] = mid_scores.get(mid, 0) + attn[j].item()
        best_mid = max(mid_scores, key=mid_scores.get)
        if best_mid == i:
            correct += 1
    n = len(PARAPHRASES)
    print(f"  Hierarchical (clusters={n_clusters}): {correct}/{n} ({correct/n:.0%})")
 def main():
    print("=" * 60)
    print("Experiment P3: Breaking the 20K Ceiling")
    print("=" * 60)
    model = load_model()
    # Test 1: Top-K coverage
    test_topk_coverage(model, [0, 500, 2000, 5000, 10000, 20000])
    # Test 2: K optimization
    for bg in [2000, 10000, 20000]:
        test_two_stage_topk(model, bg)
    # Test 3: Hierarchical
    for bg in [2000, 10000, 20000]:
        test_hierarchical(model, bg)
 if __name__ == "__main__":
    main()
--- a/experiments/exp12_lifecycle.py
+++ b/experiments/exp12_lifecycle.py
@@ -0,0 +1,258 @@
 """Experiment P4: Memory Lifecycle Management.
 Questions:
 1. What's worth storing? (not everything in a conversation is a "memory")
 2. When to forget? (access-based decay, age-based decay, capacity pressure)
 3. Can we merge similar memories? (deduplification / compression)
 4. Importance scoring: how to prioritize during recall and forgetting?
 Strategy: implement and test each mechanism, measure impact on recall quality.
 """
 import sys
 import time
 from pathlib import Path
 from collections import Counter
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.hippocampus import HippocampalMemory
 DEVICE = "cuda"
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb(model, text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)[0]
 def test_deduplication(model):
    """Test: can we detect and merge duplicate/near-duplicate memories?"""
    print("=== Test 1: Deduplication ===\n")
    mem = HippocampalMemory(embed_dim=384)
    # Store some memories with near-duplicates
    memories = [
        ("The database is slow", "Check missing indexes"),
        ("Database is really slow today", "Check missing indexes on users table"),  # near-dup
        ("DB performance is terrible", "Look at index usage"),  # near-dup
        ("Deploy to production", "Use blue-green deployment"),
        ("Push to prod", "Blue-green deployment via GitHub Actions"),  # near-dup
        ("The API returns 500 errors", "Check for OOM in Python worker"),
        ("Getting 500 errors from API", "Python worker might be OOM"),  # near-dup
        ("Set up monitoring", "Prometheus + Grafana"),
        ("We need better observability", "Set up Prometheus and Grafana"),  # near-dup
    ]
    for cue, target in memories:
        mem.store(emb(model, cue), emb(model, target),
                  metadata={"cue": cue, "target": target})
    print(f"  Before dedup: {len(mem.memories)} memories")
    # Detect near-duplicates by cue similarity
    entries = list(mem.memories.values())
    groups = []
    used = set()
    for i, e1 in enumerate(entries):
        if i in used:
            continue
        group = [i]
        for j, e2 in enumerate(entries):
            if j <= i or j in used:
                continue
            sim = cosine(e1.cue_embedding, e2.cue_embedding)
            if sim > 0.7:  # threshold for "near-duplicate"
                group.append(j)
                used.add(j)
        groups.append(group)
        used.add(i)
    print(f"  Found {len(groups)} groups (from {len(entries)} memories):")
    for group in groups:
        if len(group) > 1:
            cues = [entries[i].metadata.get("cue", "?") for i in group]
            print(f"    Group ({len(group)}): {[c[:30] for c in cues]}")
    # Merge: keep the one with longest target (most info)
    to_remove = []
    for group in groups:
        if len(group) > 1:
            # Keep the one with longest target text
            best = max(group, key=lambda i: len(entries[i].metadata.get("target", "")))
            for i in group:
                if i != best:
                    to_remove.append(entries[i].memory_id)
    for mid in to_remove:
        mem.forget(mid)
    print(f"  After dedup: {len(mem.memories)} memories")
    print(f"  Removed {len(to_remove)} duplicates")
 def test_importance_scoring(model):
    """Test: importance-based memory management."""
    print("\n=== Test 2: Importance Scoring ===\n")
    # Simulate conversation with varying importance
    conversations = [
        # (user, assistant, expected_importance)
        ("Hi there!", "Hello! How can I help?", "low"),
        ("What's the weather?", "It's sunny today.", "low"),
        ("The production database crashed at 3am", "Emergency: restore from latest backup at s3://backups/db-latest.sql", "high"),
        ("What time is it?", "It's 3:45 PM.", "low"),
        ("The auth service JWT secret was compromised", "Rotate secret immediately: kubectl set env deployment/auth JWT_SECRET=new_value", "critical"),
        ("Deploy the hotfix", "Deployed via GitHub Actions, monitor Grafana for 30 min", "high"),
        ("Thanks for your help", "You're welcome!", "low"),
    ]
    def score_importance(user_msg, assistant_msg):
        """Simple heuristic importance scoring."""
        score = 0.3  # base
        # Length suggests complexity
        if len(assistant_msg.split()) > 15:
            score += 0.2
        # Technical keywords
        critical_words = ["crash", "emergency", "compromised", "secret", "password",
                          "production", "outage", "down", "data loss"]
        high_words = ["deploy", "config", "fix", "bug", "error", "migrate",
                      "backup", "restore", "rollback"]
        for w in critical_words:
            if w in (user_msg + assistant_msg).lower():
                score += 0.3
        for w in high_words:
            if w in (user_msg + assistant_msg).lower():
                score += 0.1
        # Questions suggest retrievable info
        if "?" in user_msg:
            score += 0.1
        return min(score, 1.0)
    for user, assistant, expected in conversations:
        score = score_importance(user, assistant)
        status = "✓" if (expected == "low" and score < 0.5) or \
                        (expected == "high" and 0.5 <= score < 0.8) or \
                        (expected == "critical" and score >= 0.8) else "✗"
        should_store = score >= 0.4
        print(f"  {status} [{score:.2f}] {'STORE' if should_store else 'SKIP ':>5} "
              f"({expected:>8}) '{user[:40]}...'")
 def test_forgetting_strategies(model):
    """Test: different forgetting strategies under memory pressure."""
    print("\n=== Test 3: Forgetting Strategies ===\n")
    # Simulate 7 days of memories, each day 10 memories
    days = 7
    per_day = 10
    max_capacity = 30  # Force forgetting after 30 memories
    cue_template = "Day {day} task {i}: {topic}"
    target_template = "Solution for day {day} task {i}"
    topics = ["database", "deploy", "monitoring", "auth", "API",
              "caching", "logging", "testing", "docker", "CI/CD"]
    def run_strategy(strategy_name, forget_fn):
        mem = HippocampalMemory(embed_dim=384)
        day_memories = {}  # day → list of memory_ids
        for day in range(1, days + 1):
            day_memories[day] = []
            for i in range(per_day):
                cue = cue_template.format(day=day, i=i, topic=topics[i])
                target = target_template.format(day=day, i=i)
                mid = mem.store(emb(model, cue), emb(model, target),
                                metadata={"day": day, "task": i},
                                timestamp=float(day))
                day_memories[day].append(mid)
            # Check capacity
            if len(mem.memories) > max_capacity:
                forget_fn(mem, max_capacity)
        # Test recall for each day's memories
        day_recall = {}
        for day in range(1, days + 1):
            correct = 0
            total = 0
            for i in range(per_day):
                mid = day_memories[day][i] if i < len(day_memories[day]) else None
                if mid is None or mid not in mem.memories:
                    continue
                cue = cue_template.format(day=day, i=i, topic=topics[i])
                results = mem.recall(emb(model, cue), top_k=1)
                if results and results[0].memory_id == mid:
                    correct += 1
                total += 1
            day_recall[day] = (correct, total)
        # Print results
        surviving = len(mem.memories)
        print(f"  {strategy_name}: {surviving} memories surviving")
        for day in range(1, days + 1):
            c, t = day_recall[day]
            pct = f"{c}/{t}" if t > 0 else "0/0"
            print(f"    Day {day}: {pct}")
    # Strategy 1: FIFO (oldest first)
    def forget_fifo(mem, cap):
        entries = sorted(mem.memories.values(), key=lambda e: e.timestamp)
        to_remove = len(mem.memories) - cap
        for e in entries[:to_remove]:
            mem.forget(e.memory_id)
    # Strategy 2: LRU (least recently accessed)
    def forget_lru(mem, cap):
        entries = sorted(mem.memories.values(), key=lambda e: e.access_count)
        to_remove = len(mem.memories) - cap
        for e in entries[:to_remove]:
            mem.forget(e.memory_id)
    # Strategy 3: Low importance first (by timestamp recency as proxy)
    def forget_low_importance(mem, cap):
        entries = sorted(mem.memories.values(),
                        key=lambda e: e.timestamp + e.access_count * 0.5)
        to_remove = len(mem.memories) - cap
        for e in entries[:to_remove]:
            mem.forget(e.memory_id)
    print("(max_capacity=30, 7 days × 10 memories = 70 total)")
    run_strategy("FIFO (oldest first)", forget_fifo)
    print()
    run_strategy("LRU (least accessed)", forget_lru)
    print()
    run_strategy("Importance (recency+access)", forget_low_importance)
 def main():
    print("=" * 60)
    print("Experiment P4: Memory Lifecycle")
    print("=" * 60)
    model = load_model()
    test_deduplication(model)
    test_importance_scoring(model)
    test_forgetting_strategies(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp13_snn_hopfield.py
+++ b/experiments/exp13_snn_hopfield.py
@@ -0,0 +1,306 @@
 """Experiment P5: SNN-native Hopfield (spike-based attention).
 Goal: Implement Hopfield-like attractor dynamics using LIF neurons.
 The connection: Hopfield softmax attention with inverse temperature β
 is equivalent to a Boltzmann distribution at temperature 1/β.
 In SNN terms: β maps to membrane time constant / threshold ratio.
 Approach: Replace softmax(β * q @ K^T) @ V with:
 1. Encode query as spike train
 2. Feed through recurrent LIF network with stored patterns as synaptic weights
 3. Network settles to attractor (nearest stored pattern)
 4. Read out associated target
 This is closer to biological CA3 recurrent dynamics.
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import snntorch as snn
 import numpy as np
 DEVICE = "cuda"
 def cosine(a, b):
    return nn.functional.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()
 class SNNHopfield(nn.Module):
    """Spike-based Hopfield network.
    Architecture:
    - Input layer: converts query embedding to current injection
    - Recurrent layer: LIF neurons with Hopfield-like connection weights
    - Readout: spike rates → attention weights → target embedding
    The recurrent weights are set (not trained) based on stored patterns,
    making this a "configured" SNN, not a "trained" one.
    """
    def __init__(self, dim, beta=0.9, threshold=1.0, num_steps=50):
        super().__init__()
        self.dim = dim
        self.num_steps = num_steps
        self.beta_lif = beta  # LIF membrane decay
        self.threshold = threshold
        self.lif = snn.Leaky(beta=beta, threshold=threshold)
        # Stored patterns
        self.cue_patterns = []
        self.target_patterns = []
    def store(self, cue_emb, target_emb):
        self.cue_patterns.append(cue_emb.detach())
        self.target_patterns.append(target_emb.detach())
    def _build_weights(self):
        """Build Hopfield-like recurrent weights from stored patterns.
        W_ij = Σ_μ (pattern_μ_i * pattern_μ_j) / N
        This creates attractor states at each stored pattern.
        """
        if not self.cue_patterns:
            return torch.zeros(self.dim, self.dim, device=DEVICE)
        patterns = torch.stack(self.cue_patterns)  # [N_patterns, dim]
        W = patterns.T @ patterns / len(self.cue_patterns)  # [dim, dim]
        # Remove diagonal (no self-connections, like biological networks)
        W.fill_diagonal_(0)
        return W
    def recall(self, query_emb):
        """Spike-based attractor dynamics.
        1. Inject query as constant current
        2. Let network settle via recurrent dynamics
        3. Read spike rates → find nearest stored pattern → get target
        """
        W = self._build_weights()
        # LIF dynamics
        mem = torch.zeros(self.dim, device=DEVICE)
        spike_counts = torch.zeros(self.dim, device=DEVICE)
        # Constant input current from query (scaled)
        input_current = query_emb * 2.0  # Scale to help reach threshold
        for step in range(self.num_steps):
            # Total current: external input + recurrent
            if step < self.num_steps // 2:
                # First half: external input drives the network
                total_current = input_current + W @ (mem / self.threshold)
            else:
                # Second half: only recurrent (free running, settle to attractor)
                total_current = W @ (mem / self.threshold)
            spk, mem = self.lif(total_current, mem)
            spike_counts += spk
        # Spike rates as representation
        spike_rates = spike_counts / self.num_steps  # [dim]
        # Find nearest stored pattern by spike rate similarity
        if not self.cue_patterns:
            return None, None
        cue_mat = torch.stack(self.cue_patterns)
        sims = nn.functional.cosine_similarity(
            spike_rates.unsqueeze(0), cue_mat, dim=-1)
        # Softmax attention based on similarity (hybrid: spike settle + soft readout)
        attn = torch.softmax(sims * 16.0, dim=0)
        target_mat = torch.stack(self.target_patterns)
        recalled = attn @ target_mat
        recalled = nn.functional.normalize(recalled, dim=0)
        best_idx = sims.argmax().item()
        return recalled, best_idx
    def recall_pure_spike(self, query_emb):
        """Fully spike-based recall (no softmax at readout)."""
        W = self._build_weights()
        mem = torch.zeros(self.dim, device=DEVICE)
        spike_counts = torch.zeros(self.dim, device=DEVICE)
        input_current = query_emb * 2.0
        for step in range(self.num_steps):
            if step < self.num_steps // 2:
                total_current = input_current + W @ (mem / self.threshold)
            else:
                total_current = W @ (mem / self.threshold)
            spk, mem = self.lif(total_current, mem)
            spike_counts += spk
        spike_rates = spike_counts / self.num_steps
        # Pure spike readout: direct cosine similarity (no softmax)
        cue_mat = torch.stack(self.cue_patterns)
        sims = nn.functional.cosine_similarity(
            spike_rates.unsqueeze(0), cue_mat, dim=-1)
        best_idx = sims.argmax().item()
        return self.target_patterns[best_idx], best_idx
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb(model, text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)[0]
 def test_basic(model):
    """Basic SNN Hopfield recall."""
    print("=== Test 1: Basic SNN Hopfield ===\n")
    pairs = [
        ("The database is slow", "Check missing indexes"),
        ("Deploy to production", "Use blue-green deployment"),
        ("The API returns 500", "Check for OOM in worker"),
        ("Set up monitoring", "Prometheus and Grafana"),
        ("Tests failing in CI", "Need postgres container"),
    ]
    for num_steps in [20, 50, 100, 200]:
        for beta in [0.8, 0.9, 0.95]:
            net = SNNHopfield(384, beta=beta, num_steps=num_steps).to(DEVICE)
            for cue, target in pairs:
                net.store(emb(model, cue), emb(model, target))
            # Test exact recall
            correct = 0
            for i, (cue, target) in enumerate(pairs):
                recalled, idx = net.recall(emb(model, cue))
                if idx == i:
                    correct += 1
            # Test paraphrase
            paraphrases = ["DB is crawling", "Ship the release",
                           "Getting 500 errors", "Need observability", "CI broken"]
            para_correct = 0
            for i, para in enumerate(paraphrases):
                recalled, idx = net.recall(emb(model, para))
                if idx == i:
                    para_correct += 1
            n = len(pairs)
            print(f"  steps={num_steps:>3}, β={beta}: "
                  f"Exact={correct}/{n}, Para={para_correct}/{n}")
 def test_comparison(model):
    """Compare SNN Hopfield vs standard Hopfield."""
    print("\n=== Test 2: SNN vs Standard Hopfield ===\n")
    pairs = [
        ("The database is slow", "Check missing indexes"),
        ("Deploy to production", "Use blue-green deployment"),
        ("The API returns 500", "Check for OOM in worker"),
        ("Set up monitoring", "Prometheus and Grafana"),
        ("Tests failing in CI", "Need postgres container"),
    ]
    paraphrases = ["DB is crawling", "Ship the release",
                   "Getting 500 errors", "Need observability", "CI broken"]
    # SNN Hopfield
    snn_net = SNNHopfield(384, beta=0.9, num_steps=100).to(DEVICE)
    for cue, target in pairs:
        snn_net.store(emb(model, cue), emb(model, target))
    snn_correct = 0
    t0 = time.time()
    for i, para in enumerate(paraphrases):
        _, idx = snn_net.recall(emb(model, para))
        if idx == i:
            snn_correct += 1
    snn_time = (time.time() - t0) / len(paraphrases) * 1000
    # Standard Hopfield (softmax attention)
    cue_embs = [emb(model, p[0]) for p in pairs]
    target_embs = [emb(model, p[1]) for p in pairs]
    cue_mat = torch.stack(cue_embs)
    target_mat = torch.stack(target_embs)
    std_correct = 0
    t0 = time.time()
    for i, para in enumerate(paraphrases):
        q = emb(model, para)
        xi = q
        for _ in range(3):
            scores = 16.0 * (xi @ cue_mat.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cue_mat
            xi = nn.functional.normalize(xi, dim=0)
        scores = 16.0 * (xi @ cue_mat.T)
        attn = torch.softmax(scores, dim=0)
        best = attn.argmax().item()
        if best == i:
            std_correct += 1
    std_time = (time.time() - t0) / len(paraphrases) * 1000
    n = len(paraphrases)
    print(f"  SNN Hopfield:      {snn_correct}/{n} ({snn_correct/n:.0%}), {snn_time:.1f}ms/query")
    print(f"  Standard Hopfield: {std_correct}/{n} ({std_correct/n:.0%}), {std_time:.1f}ms/query")
 def test_with_background(model):
    """SNN Hopfield with background noise."""
    print("\n=== Test 3: SNN Hopfield with Background ===\n")
    pairs = [
        ("The database is slow", "Check missing indexes"),
        ("Deploy to production", "Use blue-green deployment"),
        ("The API returns 500", "Check for OOM in worker"),
    ]
    paraphrases = ["DB is crawling", "Ship the release", "Getting 500 errors"]
    for n_bg in [0, 10, 50]:
        net = SNNHopfield(384, beta=0.9, num_steps=100).to(DEVICE)
        for cue, target in pairs:
            net.store(emb(model, cue), emb(model, target))
        for i in range(n_bg):
            net.store(
                emb(model, f"Background task {i} about topic {i%5}"),
                emb(model, f"Background detail {i}"),
            )
        correct = 0
        for i, para in enumerate(paraphrases):
            _, idx = net.recall(emb(model, para))
            if idx == i:
                correct += 1
        n = len(paraphrases)
        t0 = time.time()
        net.recall(emb(model, paraphrases[0]))
        dt = (time.time() - t0) * 1000
        print(f"  bg={n_bg:>3}: Para={correct}/{n} ({correct/n:.0%}), "
              f"latency={dt:.1f}ms, "
              f"W_size={net.dim**2*4/1024/1024:.0f}MB")
 def main():
    print("=" * 60)
    print("Experiment P5: SNN-native Hopfield")
    print("=" * 60)
    model = load_model()
    test_basic(model)
    test_comparison(model)
    test_with_background(model)
 if __name__ == "__main__":
    main()
--- a/experiments/exp14_multiturn.py
+++ b/experiments/exp14_multiturn.py
@@ -0,0 +1,176 @@
 """Experiment P6: Multi-turn conversation simulation.
 Simulate a realistic multi-day conversation scenario:
 - Day 1: User discusses database issues
 - Day 2: User works on deployment
 - Day 3: User comes back with a related question → should recall Day 1 context
 - Day 4: User asks about something mentioned in passing on Day 1
 Test: cross-session recall, context accumulation, multi-hop across days.
 """
 import sys
 import time
 from pathlib import Path
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from nuonuo.hippocampus import HippocampalMemory
 from llm import generate_paraphrases_heuristic
 DEVICE = "cuda"
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb(model, text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)[0]
 def store_with_augmentation(mem, model, cue, target, timestamp=0.0):
    """Store a memory with heuristic paraphrases."""
    cue_emb = emb(model, cue)
    target_emb = emb(model, target)
    paras = generate_paraphrases_heuristic(cue, n=3)
    para_embs = [emb(model, p) for p in paras] if paras else None
    return mem.store(cue_emb, target_emb, cue_variants=para_embs,
                     metadata={"cue": cue, "target": target},
                     timestamp=timestamp)
 def test_recall(mem, model, query, expected_target_substr):
    """Test if recall contains expected substring."""
    results = mem.recall(emb(model, query), top_k=3)
    for r in results:
        if expected_target_substr.lower() in r.metadata.get("target", "").lower():
            return True, r.similarity, r.metadata["target"]
    return False, 0.0, results[0].metadata.get("target", "???") if results else "no results"
 def main():
    print("=" * 60)
    print("Experiment P6: Multi-turn Conversation")
    print("=" * 60)
    model = load_model()
    mem = HippocampalMemory(embed_dim=384)
    # ===== Day 1: Database troubleshooting session =====
    print("\n--- Day 1: Database Troubleshooting ---")
    day1_memories = [
        ("The database is really slow", "The users table is missing an index on created_at"),
        ("What's the query that's slow?", "SELECT * FROM users WHERE created_at > ? ORDER BY created_at"),
        ("How many rows in the users table?", "About 2.3 million rows, growing 10K per day"),
        ("Who has access to the database?", "Only the backend team: Alice, Bob, and Charlie"),
        ("What's the database host?", "PostgreSQL on db.internal:5432, running version 15.2"),
    ]
    for cue, target in day1_memories:
        store_with_augmentation(mem, model, cue, target, timestamp=1.0)
    # ===== Day 2: Deployment work =====
    print("--- Day 2: Deployment ---")
    day2_memories = [
        ("How do we deploy?", "Blue-green deployment via GitHub Actions, config in .github/workflows/deploy.yml"),
        ("What's the rollback procedure?", "Switch the load balancer back to the previous blue/green slot"),
        ("Where are the deployment logs?", "GitHub Actions logs, also mirrored to Loki at loki.internal:3100"),
        ("Who approves production deploys?", "Requires approval from Alice or David in the #deploys channel"),
    ]
    for cue, target in day2_memories:
        store_with_augmentation(mem, model, cue, target, timestamp=2.0)
    # ===== Day 3: Monitoring setup =====
    print("--- Day 3: Monitoring ---")
    day3_memories = [
        ("Set up monitoring for the database", "Prometheus scrapes pg_exporter on db.internal:9187, dashboard in Grafana"),
        ("What alerts do we have?", "PagerDuty alerts for: CPU>80%, disk>90%, replication lag>30s"),
        ("Where's the Grafana dashboard?", "grafana.internal/d/postgres-overview, login with SSO"),
    ]
    for cue, target in day3_memories:
        store_with_augmentation(mem, model, cue, target, timestamp=3.0)
    print(f"\nTotal memories: {mem.stats()}")
    # ===== Test: Cross-session recall =====
    print("\n=== Cross-session Recall Tests ===\n")
    tests = [
        # (query, expected_substring, description)
        # Day 1 recall
        ("DB is slow again", "index", "Day 1: DB slow → index"),
        ("How big is the users table?", "million", "Day 1: table size"),
        ("Who can access the database?", "Alice", "Day 1: DB access"),
        ("What Postgres version?", "15.2", "Day 1: PG version"),
        # Day 2 recall
        ("How to deploy the new version?", "blue-green", "Day 2: deploy method"),
        ("How to rollback?", "load balancer", "Day 2: rollback"),
        ("Who approves deploys?", "Alice", "Day 2: deploy approval"),
        # Day 3 recall
        ("Where's the monitoring dashboard?", "grafana", "Day 3: Grafana URL"),
        ("What alerts are configured?", "PagerDuty", "Day 3: alerts"),
        # Cross-day inference
        ("The database is slow, what index is missing?", "created_at", "Cross: DB slow → specific index"),
        ("I need to check deploy logs", "Loki", "Cross: deploy logs → Loki"),
        ("Database monitoring exporter", "pg_exporter", "Cross: DB + monitoring"),
    ]
    correct = 0
    for query, expected, desc in tests:
        found, sim, got = test_recall(mem, model, query, expected)
        status = "✓" if found else "✗"
        if found:
            correct += 1
        print(f"  {status} [{sim:.2f}] {desc}")
        if not found:
            print(f"       Expected '{expected}', got: '{got[:50]}...'")
    n = len(tests)
    print(f"\n  Total: {correct}/{n} ({correct/n:.0%})")
    # ===== Test: Multi-hop across days =====
    print("\n=== Multi-hop Across Days ===\n")
    # Store explicit chains across days
    # Day 1: "DB slow" → "missing index"
    # Day 3: "monitoring DB" → "pg_exporter"
    # Chain: "DB slow" → (hop1) "missing index" → ... can we reach monitoring?
    # Actually, multi-hop needs explicit chain links. Let's store some:
    store_with_augmentation(mem, model,
        "The missing index caused the slow query",
        "Added index and set up monitoring to prevent recurrence",
        timestamp=3.5)
    chain = mem.recall_chain(emb(model, "database is slow"), hops=3)
    print("  Chain from 'database is slow':")
    for r in chain:
        print(f"    hop {r.hop_distance}: {r.metadata.get('target', '?')[:60]}...")
    # ===== Test: Memory conflicts =====
    print("\n=== Memory Update / Conflict ===\n")
    # Store contradicting info
    store_with_augmentation(mem, model,
        "What Postgres version?", "Upgraded to PostgreSQL 16.1 last night",
        timestamp=4.0)
    # Which version does it recall?
    results = mem.recall(emb(model, "What Postgres version are we running?"), top_k=2)
    print("  Query: 'What Postgres version?'")
    for r in results:
        print(f"    [{r.similarity:.2f}] {r.metadata.get('target', '?')}")
    print("  Note: Both old (15.2) and new (16.1) returned — recency sorting needed")
 if __name__ == "__main__":
    main()
--- a/experiments/exp15_longmemeval.py
+++ b/experiments/exp15_longmemeval.py
@@ -0,0 +1,255 @@
 """Experiment: LongMemEval benchmark on HippocampalMemory.
 Protocol:
 1. For each question, load all haystack sessions as conversation history
 2. Extract memories from each session turn (user says X, assistant says Y)
 3. Store in HippocampalMemory with paraphrase augmentation
 4. Query with the question
 5. Check if the recalled memories contain the answer
 This tests our system against a real, published benchmark.
 """
 import sys
 import json
 import time
 from pathlib import Path
 from collections import Counter
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from nuonuo.hippocampus import HippocampalMemory
 from llm import generate_paraphrases_heuristic
 DEVICE = "cuda"
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb(model, text):
    return model.encode([text], convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE)[0]
 def emb_batch(model, texts):
    if not texts:
        return []
    embs = model.encode(texts, convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE,
                        batch_size=64)
    return [embs[i] for i in range(embs.shape[0])]
 def extract_memories_from_session(session):
    """Extract (cue, target) pairs from a conversation session.
    Strategy: pair consecutive user/assistant turns.
    User message = cue, assistant response = target (truncated to key info).
    """
    memories = []
    for i, turn in enumerate(session):
        if turn["role"] == "user":
            user_text = turn["content"].strip()
            # Find next assistant response
            for j in range(i + 1, len(session)):
                if session[j]["role"] == "assistant":
                    assistant_text = session[j]["content"].strip()
                    # Truncate long responses to first 200 chars
                    if len(assistant_text) > 200:
                        # Try to cut at sentence boundary
                        cut = assistant_text[:200].rfind(". ")
                        if cut > 50:
                            assistant_text = assistant_text[:cut + 1]
                        else:
                            assistant_text = assistant_text[:200]
                    if len(user_text) > 10 and len(assistant_text) > 10:
                        memories.append((user_text, assistant_text))
                    break
        # Also store user's own statements as memories
        # (user reveals personal info that's worth remembering)
        if turn["role"] == "user" and len(turn["content"]) > 20:
            text = turn["content"].strip()
            # First sentence often contains the key info
            first_sent = text.split(". ")[0] if ". " in text else text[:150]
            if len(first_sent) > 20:
                memories.append((first_sent, text[:200]))
    return memories
 def check_answer(recalled_texts, answer, question_type):
    """Check if answer is found in recalled texts.
    For string answers: check substring match (case-insensitive).
    For 'unanswerable' type: check if system correctly returns nothing relevant.
    """
    answer_str = str(answer).lower().strip()
    # Handle unanswerable questions
    if "did not mention" in answer_str or "not mention" in answer_str:
        # System should NOT find a confident match
        return True  # We'll handle this separately
    # Check if answer appears in any recalled text
    for text in recalled_texts:
        text_lower = text.lower()
        if answer_str in text_lower:
            return True
        # Also check key parts of the answer
        answer_words = [w for w in answer_str.split() if len(w) > 3]
        if answer_words:
            matches = sum(1 for w in answer_words if w in text_lower)
            if matches >= len(answer_words) * 0.6:
                return True
    return False
 def run_benchmark(model, oracle, max_questions=None, use_augmentation=True):
    """Run the full benchmark."""
    if max_questions:
        oracle = oracle[:max_questions]
    results_by_type = Counter()
    total_by_type = Counter()
    total_memories = []
    total_time = 0
    for qi, entry in enumerate(oracle):
        qtype = entry["question_type"]
        question = entry["question"]
        answer = entry["answer"]
        sessions = entry["haystack_sessions"]
        total_by_type[qtype] += 1
        # Build memory from sessions
        mem = HippocampalMemory(embed_dim=384)
        all_cue_texts = []
        all_target_texts = []
        for session in sessions:
            pairs = extract_memories_from_session(session)
            for cue, target in pairs:
                all_cue_texts.append(cue)
                all_target_texts.append(target)
        if not all_cue_texts:
            continue
        # Batch embed
        cue_embs = emb_batch(model, all_cue_texts)
        target_embs = emb_batch(model, all_target_texts)
        for i in range(len(all_cue_texts)):
            if use_augmentation:
                paras = generate_paraphrases_heuristic(all_cue_texts[i][:100], n=2)
                para_embs = emb_batch(model, paras) if paras else None
            else:
                para_embs = None
            mem.store(cue_embs[i], target_embs[i],
                      cue_variants=para_embs,
                      metadata={"cue": all_cue_texts[i], "target": all_target_texts[i]})
        total_memories.append(len(mem.memories))
        # Query
        t0 = time.time()
        q_emb = emb(model, question)
        results = mem.recall(q_emb, top_k=5)
        chain = mem.recall_chain(q_emb, hops=2)
        total_time += time.time() - t0
        # Collect recalled texts
        recalled_texts = []
        for r in results:
            recalled_texts.append(r.metadata.get("target", ""))
            recalled_texts.append(r.metadata.get("cue", ""))
        for r in chain:
            recalled_texts.append(r.metadata.get("target", ""))
        # Check
        hit = check_answer(recalled_texts, answer, qtype)
        if hit:
            results_by_type[qtype] += 1
        if qi < 5 or (not hit and qi < 50):
            status = "✓" if hit else "✗"
            print(f"  {status} [{qtype[:12]:>12}] Q: {question[:60]}...")
            print(f"    A: {str(answer)[:60]}...")
            if results:
                print(f"    Got: {results[0].metadata.get('target', '?')[:60]}...")
            if not hit and qi < 50:
                print(f"    (MISS)")
        del mem
        if (qi + 1) % 50 == 0:
            elapsed = total_time
            print(f"  ... {qi+1}/{len(oracle)} done ({elapsed:.1f}s)")
    return results_by_type, total_by_type, total_memories, total_time
 def main():
    print("=" * 60)
    print("LongMemEval Benchmark")
    print("=" * 60)
    model = load_model()
    with open("data/longmemeval_oracle.json") as f:
        oracle = json.load(f)
    print(f"Dataset: {len(oracle)} questions")
    # Quick test on first 50
    print("\n=== Quick Test (first 50 questions) ===\n")
    results, totals, mems, dt = run_benchmark(model, oracle, max_questions=50,
                                               use_augmentation=True)
    print(f"\n--- Results (50 questions) ---")
    overall_correct = sum(results.values())
    overall_total = sum(totals.values())
    print(f"Overall: {overall_correct}/{overall_total} ({overall_correct/overall_total:.0%})")
    for qtype in sorted(totals.keys()):
        c = results.get(qtype, 0)
        t = totals[qtype]
        print(f"  {qtype:<25}: {c}/{t} ({c/t:.0%})")
    print(f"Avg memories per question: {np.mean(mems):.1f}")
    print(f"Total time: {dt:.1f}s ({dt/50*1000:.0f}ms/question)")
    # Full benchmark
    print("\n=== Full Benchmark (500 questions) ===\n")
    results, totals, mems, dt = run_benchmark(model, oracle, use_augmentation=True)
    print(f"\n{'='*60}")
    print("FINAL RESULTS")
    print(f"{'='*60}")
    overall_correct = sum(results.values())
    overall_total = sum(totals.values())
    print(f"Overall: {overall_correct}/{overall_total} ({overall_correct/overall_total:.0%})")
    print()
    for qtype in sorted(totals.keys()):
        c = results.get(qtype, 0)
        t = totals[qtype]
        bar = "█" * int(c/t * 20) + "░" * (20 - int(c/t * 20))
        print(f"  {qtype:<25}: {c:>3}/{t:<3} ({c/t:>5.1%}) {bar}")
    print()
    print(f"Avg memories per question: {np.mean(mems):.1f}")
    print(f"Total time: {dt:.1f}s ({dt/len(oracle)*1000:.0f}ms/question)")
 if __name__ == "__main__":
    main()
--- a/experiments/exp16_longmemeval_gemma.py
+++ b/experiments/exp16_longmemeval_gemma.py
@@ -0,0 +1,321 @@
 """LongMemEval benchmark with Gemma 4 post-retrieval reasoning.
 Previous result: 36% with retrieval-only.
 This version: retrieve top-K memories → Gemma 4 reads them and answers the question.
 Improvements:
 1. No truncation of assistant responses (store full text, chunked)
 2. Store user statements as memories (preference extraction)
 3. Include timestamps in stored memories
 4. Post-retrieval: Gemma 4 synthesizes answer from recalled memories
 """
 import sys
 import json
 import time
 import re
 import requests
 from pathlib import Path
 from collections import Counter
 import torch
 import torch.nn as nn
 import numpy as np
 sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 from nuonuo.hippocampus import HippocampalMemory
 DEVICE = "cuda"
 OLLAMA_URL = "http://localhost:11434/api/chat"
 def gemma_chat(messages, max_tokens=256, temperature=0):
    """Call Gemma 4 via Ollama."""
    try:
        resp = requests.post(OLLAMA_URL, json={
            "model": "gemma4:31b",
            "messages": messages,
            "stream": False,
            "think": False,
            "options": {"num_predict": max_tokens, "temperature": temperature},
        }, timeout=60)
        return resp.json()["message"]["content"]
    except Exception as e:
        return None
 def load_model():
    from sentence_transformers import SentenceTransformer
    return SentenceTransformer("all-MiniLM-L6-v2", device=DEVICE)
 def emb_batch(model, texts):
    if not texts:
        return []
    embs = model.encode(texts, convert_to_tensor=True,
                        normalize_embeddings=True, device=DEVICE, batch_size=64)
    return [embs[i] for i in range(embs.shape[0])]
 def extract_memories_v2(session, session_date=""):
    """Improved memory extraction.
    Changes from v1:
    - Don't truncate assistant responses; chunk them instead
    - Extract user preferences ("I like/prefer/use/enjoy X")
    - Store timestamps
    """
    memories = []  # list of (cue_text, target_text, extra_metadata)
    for i, turn in enumerate(session):
        content = turn["content"].strip()
        role = turn["role"]
        if role == "user":
            # Find next assistant response
            assistant_text = ""
            for j in range(i + 1, len(session)):
                if session[j]["role"] == "assistant":
                    assistant_text = session[j]["content"].strip()
                    break
            if len(content) > 10 and len(assistant_text) > 10:
                # Chunk long assistant responses (500 char chunks)
                if len(assistant_text) > 500:
                    chunks = []
                    for start in range(0, len(assistant_text), 400):
                        chunk = assistant_text[start:start + 500]
                        if len(chunk) > 50:
                            chunks.append(chunk)
                    for ci, chunk in enumerate(chunks[:5]):  # max 5 chunks
                        memories.append((
                            content,
                            chunk,
                            {"date": session_date, "chunk": ci}
                        ))
                else:
                    memories.append((content, assistant_text, {"date": session_date}))
            # User's own statements as memories
            if len(content) > 20:
                memories.append((content, content, {"date": session_date, "type": "user_statement"}))
            # Preference extraction
            pref_patterns = [
                r"I (?:like|love|prefer|enjoy|use|usually|always|often)\s+(.{5,80})",
                r"my (?:favorite|preferred)\s+(.{5,80})",
                r"I'm (?:a fan of|into|interested in)\s+(.{5,80})",
            ]
            for pat in pref_patterns:
                match = re.search(pat, content, re.IGNORECASE)
                if match:
                    pref_text = match.group(0)
                    memories.append((
                        f"user preference: {pref_text}",
                        content,
                        {"date": session_date, "type": "preference"}
                    ))
    return memories
 def check_answer(answer_text, expected_answer):
    """Check if the generated answer contains the expected answer."""
    if not answer_text:
        return False
    expected = str(expected_answer).lower().strip()
    generated = answer_text.lower().strip()
    # Handle unanswerable
    if "did not mention" in expected or "not mention" in expected:
        neg_phrases = ["don't have", "no information", "not mentioned",
                       "cannot find", "don't recall", "no record", "not available"]
        return any(p in generated for p in neg_phrases)
    # Direct substring match
    if expected in generated:
        return True
    # Key words match (60% of answer words present)
    answer_words = [w for w in expected.split() if len(w) > 3]
    if answer_words:
        matches = sum(1 for w in answer_words if w in generated)
        if matches >= max(1, len(answer_words) * 0.5):
            return True
    # Number matching (for temporal questions)
    expected_nums = re.findall(r'\d+', expected)
    generated_nums = re.findall(r'\d+', generated)
    if expected_nums and any(n in generated_nums for n in expected_nums):
        return True
    return False
 def run_benchmark(model, oracle, max_questions=None):
    """Full benchmark with Gemma 4 reasoning."""
    if max_questions:
        oracle = oracle[:max_questions]
    results_by_type = Counter()
    total_by_type = Counter()
    retrieval_hits = Counter()
    gemma_calls = 0
    gemma_errors = 0
    total_time = 0
    for qi, entry in enumerate(oracle):
        qtype = entry["question_type"]
        question = entry["question"]
        answer = entry["answer"]
        sessions = entry["haystack_sessions"]
        dates = entry.get("haystack_dates", [""] * len(sessions))
        total_by_type[qtype] += 1
        # Build memory
        mem = HippocampalMemory(embed_dim=384)
        all_texts = []  # (cue, target, meta)
        for si, session in enumerate(sessions):
            date = dates[si] if si < len(dates) else ""
            pairs = extract_memories_v2(session, session_date=date)
            all_texts.extend(pairs)
        if not all_texts:
            continue
        cue_texts = [t[0] for t in all_texts]
        target_texts = [t[1] for t in all_texts]
        metas = [t[2] for t in all_texts]
        cue_embs = emb_batch(model, cue_texts)
        target_embs = emb_batch(model, target_texts)
        for i in range(len(cue_texts)):
            mem.store(cue_embs[i], target_embs[i],
                      metadata={"cue": cue_texts[i][:200],
                                "target": target_texts[i][:500],
                                **metas[i]})
        # Recall
        t0 = time.time()
        q_emb = model.encode([question], convert_to_tensor=True,
                              normalize_embeddings=True, device=DEVICE)[0]
        results = mem.recall(q_emb, top_k=10)
        # Check if retrieval alone has the answer
        retrieved_texts = []
        for r in results:
            retrieved_texts.append(r.metadata.get("target", ""))
            retrieved_texts.append(r.metadata.get("cue", ""))
        retrieval_hit = check_answer("\n".join(retrieved_texts), answer)
        if retrieval_hit:
            retrieval_hits[qtype] += 1
        # Build context for Gemma
        context_parts = []
        for r in results[:7]:  # top 7 memories
            date = r.metadata.get("date", "")
            target = r.metadata.get("target", "")
            if date:
                context_parts.append(f"[{date}] {target}")
            else:
                context_parts.append(target)
        context = "\n".join(context_parts)
        # Ask Gemma to answer based on recalled memories
        prompt = f"""You are answering a question based on recalled memories from past conversations with the user. The memories may contain dates and timestamps — use them for temporal reasoning.
 Memories:
 {context}
 Question: {question}
 Instructions:
 - Answer based ONLY on the memories above
 - For "which came first" questions, compare the dates in the memories
 - For "how many days" questions, calculate from the dates mentioned
 - For counting questions, count all relevant items across memories
 - Extract specific facts (names, numbers, places) directly from the memories
 - Be concise (1-2 sentences)
 - If genuinely no relevant information exists, say "Not mentioned in our conversations"
 Answer:"""
        gemma_answer = gemma_chat([{"role": "user", "content": prompt}],
                                   max_tokens=100)
        gemma_calls += 1
        if gemma_answer is None:
            gemma_errors += 1
            gemma_answer = "\n".join(retrieved_texts[:3])  # fallback
        total_time += time.time() - t0
        hit = check_answer(gemma_answer, answer)
        if hit:
            results_by_type[qtype] += 1
        if qi < 3 or (not hit and qi < 20):
            status = "✓" if hit else "✗"
            ret_status = "R✓" if retrieval_hit else "R✗"
            print(f"  {status} {ret_status} [{qtype[:12]:>12}] Q: {question[:55]}...")
            print(f"    Expected: {str(answer)[:60]}...")
            if gemma_answer:
                print(f"    Gemma: {gemma_answer[:80]}...")
        del mem
        if (qi + 1) % 25 == 0:
            print(f"  ... {qi+1}/{len(oracle)} ({total_time:.0f}s, "
                  f"{gemma_errors} Gemma errors)")
    return results_by_type, total_by_type, retrieval_hits, total_time
 def main():
    print("=" * 60)
    print("LongMemEval + Gemma 4 Post-Retrieval Reasoning")
    print("=" * 60)
    # Verify Gemma
    test = gemma_chat([{"role": "user", "content": "Say OK"}], max_tokens=10)
    if test:
        print(f"Gemma 4 connected: '{test.strip()}'")
    else:
        print("ERROR: Gemma 4 not available!")
        return
    model = load_model()
    with open("data/longmemeval_oracle.json") as f:
        oracle = json.load(f)
    # Full benchmark directly
    if True:
        print(f"\n=== Full Benchmark (500 questions) ===\n")
        results, totals, ret_hits, dt = run_benchmark(model, oracle)
        print(f"\n{'='*60}")
        print("FINAL RESULTS (Retrieval + Gemma 4 Reasoning)")
        print(f"{'='*60}")
        overall = sum(results.values())
        total = sum(totals.values())
        ret_overall = sum(ret_hits.values())
        print(f"Overall: {overall}/{total} ({overall/total:.0%}) "
              f"[retrieval-only baseline: {ret_overall}/{total} ({ret_overall/total:.0%})]")
        print()
        for qtype in sorted(totals.keys()):
            c = results.get(qtype, 0)
            rc = ret_hits.get(qtype, 0)
            t = totals[qtype]
            bar = "█" * int(c/t * 20) + "░" * (20 - int(c/t * 20))
            print(f"  {qtype:<25}: {c:>3}/{t:<3} ({c/t:>5.1%}) {bar} [ret: {rc/t:.0%}]")
        print(f"\nTime: {dt:.0f}s ({dt/total:.1f}s/question)")
 if __name__ == "__main__":
    main()
--- a/llm.py
+++ b/llm.py
@@ -0,0 +1,203 @@
 """LLM integration for hippocampal memory.
 Functions:
 1. extract_memories: Extract (cue, target) pairs from conversation turns
 2. generate_paraphrases: Generate cue variants for augmentation
 3. recall_and_inject: Recall memories and format for context injection
 4. format_recalled_memories: Format RecallResults into prompt text
 Supports any OpenAI-compatible API. Falls back to simple heuristics when LLM unavailable.
 """
 import re
 from typing import Optional
 from dataclasses import dataclass
 from openai import OpenAI
@dataclass
 class ExtractedMemory:
    cue: str
    target: str
    importance: float = 0.5  # 0-1, higher = more worth storing
 class LLMClient:
    """Wrapper around OpenAI-compatible API with fallback."""
    def __init__(self, base_url: str = "https://ste-jarvis.tiktok-row.net/llm/v1",
                 api_key: str = "unused",
                 model: str = "gemma4:12b",
                 timeout: float = 5.0):
        self.model = model
        self.available = False
        try:
            self.client = OpenAI(base_url=base_url, api_key=api_key, timeout=timeout)
            # Quick check
            self.client.models.list()
            self.available = True
        except Exception:
            self.client = None
    def chat(self, messages: list[dict], temperature: float = 0.7,
             max_tokens: int = 512) -> Optional[str]:
        if not self.available:
            return None
        try:
            resp = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens,
            )
            return resp.choices[0].message.content
        except Exception:
            return None
 def extract_memories_llm(client: LLMClient, user_msg: str,
                         assistant_msg: str) -> list[ExtractedMemory]:
    """Use LLM to extract memorable facts from a conversation turn."""
    prompt = f"""From this conversation turn, extract key facts worth remembering for future conversations.
 For each fact, provide a "cue" (what would trigger recalling this) and a "target" (the fact itself).
 Rate importance 0-1 (1 = critical fact, 0 = trivial).
 User: {user_msg}
 Assistant: {assistant_msg}
 Output format (one per line):
 CUE: <trigger phrase> | TARGET: <fact> | IMPORTANCE: <0-1>
 Only extract genuinely useful facts. If nothing worth remembering, output NONE."""
    result = client.chat([{"role": "user", "content": prompt}], temperature=0.3)
    if not result:
        return extract_memories_heuristic(user_msg, assistant_msg)
    memories = []
    for line in result.strip().split("\n"):
        if line.strip() == "NONE":
            break
        match = re.match(r"CUE:\s*(.+?)\s*\|\s*TARGET:\s*(.+?)\s*\|\s*IMPORTANCE:\s*([\d.]+)", line)
        if match:
            memories.append(ExtractedMemory(
                cue=match.group(1).strip(),
                target=match.group(2).strip(),
                importance=float(match.group(3)),
            ))
    return memories
 def extract_memories_heuristic(user_msg: str, assistant_msg: str) -> list[ExtractedMemory]:
    """Fallback: simple heuristic extraction when LLM unavailable.
    Rules:
    - User questions → store the answer
    - Technical statements → store as-is
    - Short messages (< 10 words) → skip
    """
    memories = []
    # User asked a question, assistant answered
    if "?" in user_msg and len(assistant_msg.split()) > 5:
        memories.append(ExtractedMemory(
            cue=user_msg.rstrip("?").strip(),
            target=assistant_msg[:200],
            importance=0.6,
        ))
    # Technical keywords suggest something worth remembering
    tech_keywords = ["deploy", "config", "bug", "fix", "error", "database",
                     "server", "API", "port", "token", "password", "version",
                     "install", "upgrade", "migrate", "backup"]
    combined = (user_msg + " " + assistant_msg).lower()
    if any(kw in combined for kw in tech_keywords):
        if len(user_msg.split()) >= 5:
            memories.append(ExtractedMemory(
                cue=user_msg[:100],
                target=assistant_msg[:200],
                importance=0.5,
            ))
    return memories
 def generate_paraphrases_llm(client: LLMClient, text: str,
                              n: int = 3) -> list[str]:
    """Use LLM to generate paraphrases of a cue text."""
    prompt = f"""Generate {n} different paraphrases of this text. Each should convey the same meaning but use different words/phrasing. One per line, no numbering.
 Text: {text}"""
    result = client.chat([{"role": "user", "content": prompt}],
                         temperature=0.8, max_tokens=256)
    if not result:
        return generate_paraphrases_heuristic(text, n)
    paraphrases = [line.strip() for line in result.strip().split("\n")
                   if line.strip() and len(line.strip()) > 3]
    return paraphrases[:n]
 def generate_paraphrases_heuristic(text: str, n: int = 3) -> list[str]:
    """Fallback: simple text augmentation when LLM unavailable.
    Strategies:
    - Remove/add common prefixes
    - Swap known synonyms
    - Truncate to key phrases
    """
    variants = []
    text_lower = text.lower().strip()
    # Remove common prefixes
    prefixes = ["can you ", "please ", "i need to ", "let's ", "we should ",
                "how do i ", "how to ", "i want to ", "help me "]
    for pfx in prefixes:
        if text_lower.startswith(pfx):
            stripped = text[len(pfx):].strip()
            if stripped and stripped not in variants:
                variants.append(stripped)
    # Simple synonym swaps
    swaps = {
        "slow": "performance issues", "fast": "quick", "fix": "resolve",
        "deploy": "release", "error": "issue", "bug": "problem",
        "database": "DB", "server": "machine", "configure": "set up",
    }
    for old, new in swaps.items():
        if old in text_lower:
            variant = text.replace(old, new).replace(old.capitalize(), new.capitalize())
            if variant != text and variant not in variants:
                variants.append(variant)
    # Add "the X is Y" pattern
    if len(text.split()) <= 8:
        variants.append(f"issue with {text_lower}")
    return variants[:n]
 def format_recalled_memories(results: list, max_memories: int = 5) -> str:
    """Format RecallResults into a prompt-ready string."""
    if not results:
        return ""
    lines = []
    for i, r in enumerate(results[:max_memories]):
        meta = r.metadata
        if "target" in meta:
            text = meta["target"]
        elif "text" in meta:
            text = meta["text"]
        else:
            continue
        hop_info = f" (via {r.hop_distance}-hop association)" if r.hop_distance > 1 else ""
        lines.append(f"- {text}{hop_info}")
    if not lines:
        return ""
    return "Recalled from memory:\n" + "\n".join(lines)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,25 @@
 [project]
 name = "nuonuo"
 version = "0.1.0"
 description = "SNN-based hippocampal memory module for LLMs"
 requires-python = ">=3.12"
 dependencies = [
    "torch>=2.10,<2.11",
    "snntorch>=0.9",
    "numpy",
    "matplotlib",
    "sentence-transformers>=3.0",
    "openai>=1.0",
    "requests>=2.33.1",
 ]
 [tool.uv]
 index-url = "https://pypi.org/simple"
 [[tool.uv.index]]
 name = "pytorch-cu128"
 url = "https://download.pytorch.org/whl/cu128"
 explicit = true
 [tool.uv.sources]
 torch = { index = "pytorch-cu128" }
--- a/src/nuonuo/init.py
+++ b/src/nuonuo/init.py
--- a/src/nuonuo/consolidation.py
+++ b/src/nuonuo/consolidation.py
@@ -0,0 +1,125 @@
 """Sleep consolidation module.
 Simulates hippocampal memory consolidation during sleep:
 1. Memory replay: re-present stored memories to refresh associations
 2. Synaptic homeostasis: global weight scaling to prevent saturation
 3. Pruning: remove weak connections
 4. Interference reduction: replay with noise for generalization
 Based on:
 - Synaptic Homeostasis Hypothesis (Tononi & Cirelli)
 - Two-stage memory consolidation (hippocampus → neocortex)
 - Sharp-wave ripple replay during NREM sleep
 """
 import torch
 import torch.nn as nn
 import numpy as np
 from typing import Optional
 def winner_take_all(x, k):
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
 class MemoryConsolidator:
    """Performs nightly consolidation on a Hebbian memory network."""
    def __init__(self, code_dim=16384, k_active=20):
        self.code_dim = code_dim
        self.k_active = k_active
        self.replay_buffer = []  # List of (cue_code, target_code) pairs
    def record(self, cue_code, target_code):
        """Record a memory interaction for tonight's consolidation."""
        self.replay_buffer.append((
            cue_code.detach().cpu().clone(),
            target_code.detach().cpu().clone()
        ))
    def consolidate(self, W, proj, target_proj,
                    num_epochs=5,
                    replay_noise=0.0,
                    homeostasis_factor=0.95,
                    prune_threshold=0.001,
                    replay_fraction=1.0,
                    interleave_old=True):
        """Run one night's consolidation.
        Args:
            W: weight matrix [code_dim, code_dim] (modified in-place)
            proj, target_proj: separation projections (for verification only)
            num_epochs: number of replay passes
            replay_noise: noise added during replay (for generalization)
            homeostasis_factor: global weight scaling per epoch (< 1 for decay)
            prune_threshold: remove weights below this absolute value
            replay_fraction: fraction of buffer to replay (for testing partial replay)
            interleave_old: if True, replay old+new interleaved (vs new-only)
        Returns:
            stats dict with consolidation metrics
        """
        device = W.device
        initial_w_norm = W.norm().item()
        initial_sparsity = (W.abs() < prune_threshold).float().mean().item()
        n_memories = len(self.replay_buffer)
        if n_memories == 0:
            return {"status": "empty_buffer"}
        # Select memories to replay
        n_replay = max(1, int(n_memories * replay_fraction))
        replay_indices = torch.randperm(n_memories)[:n_replay].tolist()
        for epoch in range(num_epochs):
            # Shuffle replay order each epoch
            np.random.shuffle(replay_indices)
            for idx in replay_indices:
                cue_code, target_code = self.replay_buffer[idx]
                cue_code = cue_code.to(device)
                target_code = target_code.to(device)
                if replay_noise > 0:
                    # Add noise and re-apply WTA for robustness
                    cue_code = cue_code + torch.randn_like(cue_code) * replay_noise
                    cue_code = winner_take_all(cue_code, self.k_active)
                # Hebbian refresh
                W.data += torch.outer(target_code, cue_code)
            # Synaptic homeostasis: scale down all weights
            W.data *= homeostasis_factor
        # Pruning
        mask = W.data.abs() >= prune_threshold
        W.data *= mask.float()
        final_w_norm = W.norm().item()
        final_sparsity = (W.abs() < prune_threshold).float().mean().item()
        stats = {
            "n_memories_replayed": n_replay,
            "num_epochs": num_epochs,
            "initial_w_norm": initial_w_norm,
            "final_w_norm": final_w_norm,
            "initial_sparsity": initial_sparsity,
            "final_sparsity": final_sparsity,
            "replay_noise": replay_noise,
            "homeostasis_factor": homeostasis_factor,
        }
        return stats
    def clear_buffer(self):
        """Clear replay buffer after consolidation."""
        self.replay_buffer.clear()
    def selective_clear(self, keep_fraction=0.3):
        """Keep some memories for next consolidation (simulates multi-night replay)."""
        n_keep = max(1, int(len(self.replay_buffer) * keep_fraction))
        # Keep the most recent ones
        self.replay_buffer = self.replay_buffer[-n_keep:]
--- a/src/nuonuo/encoder.py
+++ b/src/nuonuo/encoder.py
@@ -0,0 +1,117 @@
 """Encoder/Decoder bridge: continuous embedding <-> spike train.
 This is the foundation of the whole approach. If embedding -> spike -> embedding
 roundtrip loses too much information, nothing else matters.
 """
 import torch
 import torch.nn as nn
 import snntorch as snn
 class EmbeddingToSpike(nn.Module):
    """Convert continuous embeddings to spike trains using learned temporal coding.
    Instead of naive rate coding, we project the embedding into initial membrane
    potentials and let LIF dynamics produce a spike train. The temporal pattern
    of spikes encodes the semantic information.
    """
    def __init__(self, embed_dim=768, num_neurons=2048, num_steps=64,
                 beta=0.85, threshold=1.0):
        super().__init__()
        self.num_steps = num_steps
        self.num_neurons = num_neurons
        self.embed_dim = embed_dim
        # Project embedding to initial membrane potential
        self.proj = nn.Linear(embed_dim, num_neurons)
        # LIF neuron layer with learnable decay
        self.beta = beta
        self.threshold = threshold
        self.lif = snn.Leaky(
            beta=beta,
            threshold=threshold,
            learn_beta=True,
            learn_threshold=True,
        )
    def forward(self, embedding):
        """
        Args:
            embedding: [batch, embed_dim]
        Returns:
            spike_train: [batch, num_steps, num_neurons]
            mem_trace: [batch, num_steps, num_neurons] (for analysis)
        """
        init_potential = self.proj(embedding)  # [batch, num_neurons]
        spikes = []
        mems = []
        mem = init_potential  # Initialize membrane with projected embedding
        for _ in range(self.num_steps):
            spk, mem = self.lif(torch.zeros_like(init_potential), mem)
            spikes.append(spk)
            mems.append(mem)
        spike_train = torch.stack(spikes, dim=1)  # [batch, steps, neurons]
        mem_trace = torch.stack(mems, dim=1)
        return spike_train, mem_trace
 class SpikeToEmbedding(nn.Module):
    """Decode spike trains back to continuous embeddings.
    Uses multi-scale temporal features (firing rates at different time windows)
    plus a learned readout network.
    """
    def __init__(self, num_neurons=2048, embed_dim=768,
                 time_windows=(8, 16, 32, 64)):
        super().__init__()
        self.time_windows = time_windows
        feat_dim = num_neurons * len(time_windows)
        self.readout = nn.Sequential(
            nn.Linear(feat_dim, embed_dim * 2),
            nn.GELU(),
            nn.LayerNorm(embed_dim * 2),
            nn.Linear(embed_dim * 2, embed_dim),
        )
    def forward(self, spike_train):
        """
        Args:
            spike_train: [batch, num_steps, num_neurons]
        Returns:
            embedding: [batch, embed_dim]
        """
        features = []
        for w in self.time_windows:
            if spike_train.shape[1] >= w:
                windowed = spike_train[:, -w:, :].mean(dim=1)
            else:
                windowed = spike_train.mean(dim=1)
            features.append(windowed)
        feat = torch.cat(features, dim=-1)  # [batch, num_neurons * num_windows]
        return self.readout(feat)
 class SpikeAutoencoder(nn.Module):
    """End-to-end encoder-decoder for roundtrip testing."""
    def __init__(self, embed_dim=768, num_neurons=2048, num_steps=64):
        super().__init__()
        self.encoder = EmbeddingToSpike(embed_dim, num_neurons, num_steps)
        self.decoder = SpikeToEmbedding(
            num_neurons, embed_dim,
            time_windows=tuple(2**i for i in range(3, int(num_steps).bit_length()))
        )
    def forward(self, embedding):
        spike_train, mem_trace = self.encoder(embedding)
        reconstructed = self.decoder(spike_train)
        return reconstructed, spike_train, mem_trace
--- a/src/nuonuo/hippocampus.py
+++ b/src/nuonuo/hippocampus.py
@@ -0,0 +1,344 @@
 """Hippocampal Memory Module v2 — Hopfield + Hebbian Hybrid.
 Architecture based on overnight experiments (2026-04-06):
 1. **Hopfield Layer** (single-hop, noise-tolerant):
   - Stores (cue_embedding, target_embedding) pairs explicitly
   - Retrieval: two-stage (NN pre-filter → Hopfield softmax attention settle)
   - Supports cue augmentation (paraphrases) for better coverage
   - Proven: 95% paraphrase recall with augmentation, 80% at 20K scale
 2. **Hebbian Layer** (multi-hop, associative chains):
   - WTA pattern separation + outer-product weight matrix
   - Retrieval: starts from Hopfield result, chains through W
   - Proven: 6-hop perfect recall, even with 500 bg memories
 Usage:
    memory = HippocampalMemory(embed_dim=384)
    # Store with paraphrase augmentation
    memory.store(cue_emb, target_emb, cue_variants=[para1_emb, para2_emb],
                 metadata={"text": "..."})
    # Single-hop recall (Hopfield, noise-tolerant)
    results = memory.recall(query_emb, top_k=3)
    # Multi-hop recall (Hopfield → Hebbian chain)
    chain = memory.recall_chain(query_emb, hops=3)
 """
 import torch
 import torch.nn as nn
 from dataclasses import dataclass, field
 from typing import Optional
 def winner_take_all(x: torch.Tensor, k: int) -> torch.Tensor:
    _, idx = x.topk(k, dim=-1)
    out = torch.zeros_like(x)
    out.scatter_(-1, idx, 1.0)
    return out
@dataclass
 class MemoryEntry:
    memory_id: int
    cue_embedding: torch.Tensor
    target_embedding: torch.Tensor
    metadata: dict = field(default_factory=dict)
    timestamp: float = 0.0
    access_count: int = 0
@dataclass
 class RecallResult:
    target_embedding: torch.Tensor
    similarity: float
    metadata: dict
    memory_id: int
    hop_distance: int = 1
 class HippocampalMemory(nn.Module):
    def __init__(self, embed_dim: int = 384, code_dim: int = 16384,
                 k: int = 50, beta: float = 16.0, hopfield_top_k: int = 20,
                 hopfield_steps: int = 3, device: str = "cuda"):
        super().__init__()
        self.embed_dim = embed_dim
        self.code_dim = code_dim
        self.k = k
        self.beta = beta
        self.hopfield_top_k = hopfield_top_k
        self.hopfield_steps = hopfield_steps
        self.device = device
        # Hebbian: WTA projection + association matrix
        proj = torch.randn(embed_dim, code_dim, device=device) * (1.0 / embed_dim**0.5)
        self.register_buffer('proj', proj)
        self.W = nn.Parameter(torch.zeros(code_dim, code_dim, device=device),
                              requires_grad=False)
        # Hopfield: explicit pattern store
        # Multiple cue entries can map to the same memory (augmentation)
        self._cue_embs: list[torch.Tensor] = []
        self._target_embs: list[torch.Tensor] = []
        self._memory_ids: list[int] = []
        # Canonical memories (one per memory_id)
        self.memories: dict[int, MemoryEntry] = {}
        self._next_id = 0
        # Cache
        self._cue_matrix: Optional[torch.Tensor] = None
    def _invalidate_cache(self):
        self._cue_matrix = None
    def _get_cue_matrix(self):
        if self._cue_matrix is None and self._cue_embs:
            self._cue_matrix = torch.stack(self._cue_embs)
        return self._cue_matrix
    def separate(self, embedding: torch.Tensor) -> torch.Tensor:
        return winner_take_all(embedding @ self.proj, self.k)
    def store(self, cue_embedding: torch.Tensor, target_embedding: torch.Tensor,
              cue_variants: Optional[list[torch.Tensor]] = None,
              metadata: Optional[dict] = None, timestamp: float = 0.0) -> int:
        """Store a memory with optional cue variants (paraphrases).
        Returns: memory_id
        """
        mid = self._next_id
        self._next_id += 1
        if metadata is None:
            metadata = {}
        # Canonical entry
        self.memories[mid] = MemoryEntry(
            memory_id=mid,
            cue_embedding=cue_embedding.detach().clone(),
            target_embedding=target_embedding.detach().clone(),
            metadata=metadata,
            timestamp=timestamp,
        )
        # Hopfield store: primary cue + variants
        all_cues = [cue_embedding]
        if cue_variants:
            all_cues.extend(cue_variants)
        for ce in all_cues:
            self._cue_embs.append(ce.detach().clone())
            self._target_embs.append(target_embedding.detach().clone())
            self._memory_ids.append(mid)
        # Hebbian: update W with primary cue only
        cue_code = self.separate(cue_embedding)
        target_code = self.separate(target_embedding)
        self.W.data += torch.outer(target_code, cue_code)
        self._invalidate_cache()
        return mid
    def recall(self, query_embedding: torch.Tensor,
               top_k: int = 3) -> list[RecallResult]:
        """Single-hop recall via two-stage Hopfield.
        Stage 1: NN pre-filter to top-K candidates
        Stage 2: Hopfield softmax attention settle
        """
        cue_mat = self._get_cue_matrix()
        if cue_mat is None:
            return []
        target_mat = torch.stack(self._target_embs)
        N = cue_mat.shape[0]
        K = min(self.hopfield_top_k, N)
        # Stage 1: NN pre-filter
        sims = query_embedding @ cue_mat.T
        top_sims, top_indices = sims.topk(K)
        cand_cues = cue_mat[top_indices]
        cand_targets = target_mat[top_indices]
        cand_mids = [self._memory_ids[i] for i in top_indices.tolist()]
        # Stage 2: Hopfield settle
        xi = query_embedding
        for _ in range(self.hopfield_steps):
            scores = self.beta * (xi @ cand_cues.T)
            attn = torch.softmax(scores, dim=0)
            xi = attn @ cand_cues
            xi = nn.functional.normalize(xi, dim=0)
        # Final: get target via attention
        scores = self.beta * (xi @ cand_cues.T)
        attn = torch.softmax(scores, dim=0)
        # Aggregate attention by memory_id (multiple cue variants → same memory)
        mid_scores: dict[int, float] = {}
        for i, mid in enumerate(cand_mids):
            mid_scores[mid] = mid_scores.get(mid, 0) + attn[i].item()
        # Sort by aggregated attention, return top_k
        sorted_mids = sorted(mid_scores, key=mid_scores.get, reverse=True)[:top_k]
        results = []
        for mid in sorted_mids:
            entry = self.memories[mid]
            entry.access_count += 1
            results.append(RecallResult(
                target_embedding=entry.target_embedding,
                similarity=mid_scores[mid],
                metadata=entry.metadata,
                memory_id=mid,
                hop_distance=1,
            ))
        return results
    def recall_chain(self, query_embedding: torch.Tensor,
                     hops: int = 2) -> list[RecallResult]:
        """Multi-hop: Hopfield single-hop → Hebbian chain.
        Hop 0: Hopfield recall (noise-tolerant start)
        Hop 1+: Hebbian W matrix chaining (exact, multi-hop)
        """
        # Hop 0: Hopfield to get clean starting point
        hop0 = self.recall(query_embedding, top_k=1)
        if not hop0:
            return []
        results = list(hop0)
        # Get the cue code for the matched memory
        start_entry = self.memories[hop0[0].memory_id]
        code = self.separate(start_entry.cue_embedding)
        # Hop 1+: Hebbian chaining
        for hop in range(1, hops):
            raw = self.W @ code
            code = winner_take_all(raw, self.k)
            # Find best matching memory
            match = self._match_code(code)
            if match:
                match.hop_distance = hop + 1
                results.append(match)
            else:
                break
        return results
    def _match_code(self, code: torch.Tensor) -> Optional[RecallResult]:
        """Find the memory whose target code best matches."""
        best_sim = -1
        best_mid = None
        for mid, entry in self.memories.items():
            tc = self.separate(entry.target_embedding)
            sim = nn.functional.cosine_similarity(
                code.unsqueeze(0), tc.unsqueeze(0)).item()
            if sim > best_sim:
                best_sim = sim
                best_mid = mid
        if best_mid is None:
            return None
        entry = self.memories[best_mid]
        return RecallResult(
            target_embedding=entry.target_embedding,
            similarity=best_sim,
            metadata=entry.metadata,
            memory_id=best_mid,
        )
    def rebuild_weights(self):
        """Rebuild Hebbian W from canonical memories."""
        self.W.data.zero_()
        for entry in self.memories.values():
            cue_code = self.separate(entry.cue_embedding)
            target_code = self.separate(entry.target_embedding)
            self.W.data += torch.outer(target_code, cue_code)
    def forget(self, memory_id: int):
        """Remove a memory and all its cue variants."""
        if memory_id not in self.memories:
            return
        del self.memories[memory_id]
        # Remove from Hopfield store
        indices_to_remove = [i for i, mid in enumerate(self._memory_ids) if mid == memory_id]
        for i in sorted(indices_to_remove, reverse=True):
            self._cue_embs.pop(i)
            self._target_embs.pop(i)
            self._memory_ids.pop(i)
        self._invalidate_cache()
        self.rebuild_weights()
    def save(self, path: str):
        state = {
            'proj': self.proj,
            'W': self.W.data,
            'config': {
                'embed_dim': self.embed_dim,
                'code_dim': self.code_dim,
                'k': self.k,
                'beta': self.beta,
                'hopfield_top_k': self.hopfield_top_k,
                'hopfield_steps': self.hopfield_steps,
            },
            'hopfield': {
                'cue_embs': self._cue_embs,
                'target_embs': self._target_embs,
                'memory_ids': self._memory_ids,
            },
            'memories': {
                mid: (e.cue_embedding, e.target_embedding, e.metadata, e.timestamp)
                for mid, e in self.memories.items()
            },
            'next_id': self._next_id,
        }
        torch.save(state, path)
    @classmethod
    def load(cls, path: str, device: str = "cuda") -> 'HippocampalMemory':
        state = torch.load(path, map_location=device, weights_only=False)
        cfg = state['config']
        mem = cls(cfg['embed_dim'], cfg['code_dim'], cfg['k'],
                  cfg.get('beta', 16.0), cfg.get('hopfield_top_k', 20),
                  cfg.get('hopfield_steps', 3), device)
        mem.proj.copy_(state['proj'])
        mem.W.data.copy_(state['W'])
        h = state['hopfield']
        mem._cue_embs = [e.to(device) for e in h['cue_embs']]
        mem._target_embs = [e.to(device) for e in h['target_embs']]
        mem._memory_ids = h['memory_ids']
        for mid, (cue, target, metadata, ts) in state['memories'].items():
            mem.memories[mid] = MemoryEntry(
                memory_id=mid,
                cue_embedding=cue.to(device),
                target_embedding=target.to(device),
                metadata=metadata,
                timestamp=ts,
            )
        mem._next_id = state['next_id']
        return mem
    def stats(self) -> dict:
        n_cue_entries = len(self._cue_embs)
        n_memories = len(self.memories)
        return {
            'num_memories': n_memories,
            'num_cue_entries': n_cue_entries,
            'augmentation_ratio': n_cue_entries / max(n_memories, 1),
            'w_norm': self.W.data.norm().item(),
            'embedding_store_mb': n_cue_entries * self.embed_dim * 4 * 2 / 1024**2,
            'w_size_mb': self.code_dim ** 2 * 4 / 1024**2,
        }
--- a/src/nuonuo/memory.py
+++ b/src/nuonuo/memory.py
@@ -0,0 +1,182 @@
 """STDP-based associative memory network, v2.
 Key fix from v1: During learning, directly use cue→target spike pairs for STDP,
 not relying on recurrent dynamics to generate post-spikes (which can't work
 when weights are initially zero — chicken-and-egg problem).
 Architecture:
 - Heteroassociative: cue pattern → recall target pattern
 - STDP directly on (cue[t], target[t]) pairs during learning
 - During recall: cue drives the network, recurrent dynamics + learned weights
  produce output that should resemble the target
 """
 import torch
 import torch.nn as nn
 import snntorch as snn
 class STDPMemoryNetwork(nn.Module):
    """Associative memory using direct STDP on spike pattern pairs.
    v2 changes:
    - Learning uses teacher forcing: cue=pre, target=post, direct STDP
    - W initialized with small random values for recall bootstrapping
    - Added recurrent connections (W_rec) for pattern completion during recall
    - Separate forward weights (cue→memory) and recurrent weights
    """
    def __init__(self, num_neurons=2048, beta=0.85,
                 tau_plus=20.0, tau_minus=20.0,
                 a_plus=0.01, a_minus=0.012,
                 w_max=1.0, w_init_std=0.01):
        super().__init__()
        self.num_neurons = num_neurons
        self.tau_plus = tau_plus
        self.tau_minus = tau_minus
        self.a_plus = a_plus
        self.a_minus = a_minus
        self.w_max = w_max
        # Forward association weights (cue → target)
        self.W = nn.Parameter(
            torch.randn(num_neurons, num_neurons) * w_init_std,
            requires_grad=False
        )
        # LIF neuron for recall dynamics
        self.lif = snn.Leaky(beta=beta, threshold=1.0)
        # STDP traces
        self.register_buffer('pre_trace', torch.zeros(num_neurons))
        self.register_buffer('post_trace', torch.zeros(num_neurons))
    def reset_traces(self):
        self.pre_trace.zero_()
        self.post_trace.zero_()
    def stdp_update(self, pre_spikes, post_spikes):
        """Single-step STDP update using trace-based rule."""
        self.pre_trace = self.pre_trace * (1 - 1/self.tau_plus) + pre_spikes
        self.post_trace = self.post_trace * (1 - 1/self.tau_minus) + post_spikes
        # LTP: post fires → strengthen from recent pre
        dW = self.a_plus * torch.outer(post_spikes, self.pre_trace)
        # LTD: pre fires → weaken to recent post
        dW -= self.a_minus * torch.outer(self.post_trace, pre_spikes)
        self.W.data += dW
        self.W.data.clamp_(-self.w_max, self.w_max)
    def learn_association(self, cue_spikes, target_spikes, num_presentations=1):
        """Teacher-forced STDP learning.
        Directly pair cue (pre) and target (post) spikes for STDP.
        No need for the network to generate its own spikes during learning.
        cue_spikes: [num_steps, num_neurons]
        target_spikes: [num_steps, num_neurons]
        """
        num_steps = cue_spikes.shape[0]
        for _ in range(num_presentations):
            self.reset_traces()
            for t in range(num_steps):
                self.stdp_update(cue_spikes[t], target_spikes[t])
    def recall(self, cue_spikes, num_recall_steps=None):
        """Recall associated pattern given a cue.
        Phase 1: Drive network with cue through learned weights
        Phase 2: Collect output spike pattern
        cue_spikes: [num_steps, num_neurons]
        Returns: [num_steps, num_neurons] recalled spike pattern
        """
        num_steps = cue_spikes.shape[0]
        if num_recall_steps is None:
            num_recall_steps = num_steps
        recalled = []
        mem = torch.zeros(self.num_neurons, device=cue_spikes.device)
        for t in range(min(num_steps, num_recall_steps)):
            # Drive through association weights
            input_current = cue_spikes[t] @ self.W.T
            spk, mem = self.lif(input_current, mem)
            recalled.append(spk)
        # If more steps needed, free-run
        for t in range(max(0, num_recall_steps - num_steps)):
            input_current = spk @ self.W.T
            spk, mem = self.lif(input_current, mem)
            recalled.append(spk)
        return torch.stack(recalled, dim=0)
    def get_weight_stats(self):
        W = self.W.data
        return {
            "mean": W.mean().item(),
            "std": W.std().item(),
            "abs_mean": W.abs().mean().item(),
            "sparsity": (W.abs() < 0.001).float().mean().item(),
            "max": W.max().item(),
            "min": W.min().item(),
        }
 class DirectAssociativeMemory(nn.Module):
    """Simplified approach: direct Hebbian outer-product learning.
    Instead of trace-based STDP, use a simpler Hebbian rule:
    W += lr * (target^T @ cue) — basic heteroassociative memory.
    This is equivalent to a single-layer linear associator but using
    spike patterns. It's the simplest possible test of whether spike-based
    associative memory can work at all.
    """
    def __init__(self, num_neurons=2048, lr=0.01, w_max=1.0):
        super().__init__()
        self.num_neurons = num_neurons
        self.lr = lr
        self.w_max = w_max
        self.W = nn.Parameter(
            torch.zeros(num_neurons, num_neurons),
            requires_grad=False
        )
    def learn(self, cue_spikes, target_spikes):
        """Simple outer-product Hebbian learning.
        cue_spikes: [num_steps, num_neurons]
        target_spikes: [num_steps, num_neurons]
        """
        # Average firing rate patterns
        cue_rate = cue_spikes.mean(dim=0)    # [num_neurons]
        target_rate = target_spikes.mean(dim=0)  # [num_neurons]
        # Outer product update
        self.W.data += self.lr * torch.outer(target_rate, cue_rate)
        self.W.data.clamp_(-self.w_max, self.w_max)
    def recall(self, cue_spikes):
        """Recall by matrix-vector product.
        Returns continuous activation (not spikes) for easier evaluation.
        """
        cue_rate = cue_spikes.mean(dim=0)
        return self.W @ cue_rate  # [num_neurons]
    def get_weight_stats(self):
        W = self.W.data
        return {
            "mean": W.mean().item(),
            "std": W.std().item(),
            "abs_mean": W.abs().mean().item(),
            "sparsity": (W.abs() < 0.001).float().mean().item(),
            "max": W.max().item(),
            "min": W.min().item(),
        }