RL Eviction + Operational Intelligence
Problem
Build caches differ from CDN/web caches:
- Objects form a DAG. Evicting a shared intermediate artifact cascades rebuilds of all downstream targets.
- Miss cost is non-uniform. A header rebuild is cheap. A 45-minute compilation output is expensive.
- Access patterns are structured. Release branches, sprint cadences, CI schedules create semi-predictable patterns a learned policy can exploit.
- S3 has no native LRU. Eviction must be explicitly implemented as deletes.
Standard eviction policies (LRU, LFU, GDSF) ignore the DAG structure.
Architecture
┌─────────────────────────────────┐
│ Slow Brain (LLM) │
│ Async diagnostics, policy │
│ tuning, anomaly narration │
│ Timescale: minutes/hours │
└──────────┬──────────────────────┘
│ adjusts reward weights
┌──────────▼──────────────────────┐
│ Fast Brain (RL) │
│ Per-object eviction decisions │
│ DQN model, runs inline │
│ Timescale: microseconds │
└──────────┬──────────────────────┘
│ eviction decisions
┌──────────▼──────────────────────┐
│ S3 Eviction Worker (Go) │
│ Rate-limited DeleteObjects │
└─────────────────────────────────┘
Phases
6a: Access Logger
Go middleware. Structured JSONL per cache operation — timestamp, operation, store type, key, size, hit/miss, latency, action hash. The action hash enables AC→CAS graph reconstruction from logs alone.
This is the Go/Python boundary. Go writes logs, Python reads them.
6b: Cache Trace Simulator
Python tool that replays access logs through configurable cache models (LRU, LFU, GDSF, RL). Outputs per-policy hit rate, rebuild cost, eviction count, cascade depth.
Also serves as the RL training environment.
6c: RL Agent
Dueling DQN trained offline on access log replay.
State features per cache entry:
- Access features — recency, frequency, inter-arrival time
- Object features — size, estimated rebuild cost
- Graph features — downstream dependents, DAG depth, shared ratio
- Lifecycle features — branch type, age, CI vs developer
The graph features are the novel part — no prior RL eviction work uses build graph structure.
Action: Binary keep/evict per candidate under cache pressure.
Reward: Weighted combination of rebuild cost, cascade rebuilds, S3 API cost, and space freed.
Deployment: Start with offline batch (periodic eviction plans). Graduate to ONNX-in-Go for inline decisions.
6d: Graph Features
Three sources of build graph topology:
- AC→CAS references from access logs (no external tooling needed)
- BUILD file parser (Python)
bazel query(most accurate, used for validation)
7a: MCP Servers
Expose S3 stats, Valkey stats, Prometheus metrics, and eviction history to LLM agents via Model Context Protocol.
7b: LLM Agents
- Diagnostic — correlates metrics across tiers to explain hit rate changes
- Policy advisor — compares RL against baselines, suggests reward tuning
- Anomaly narrator — detects metric anomalies, produces summaries
7c: Fine-Tuning
Accumulate diagnostic tuples. Fine-tune a small model for routine cases. Full model handles novel situations.
Boundaries
| Concern | Language |
|---|---|
| Cache serving, access logging, eviction execution | Go |
| RL training, LLM agents, graph extraction | Python |
Go/Python boundary: the access log file.