RAMJET¶
Distributed peer cache for ML training data. Eliminate redundant S3 pulls when your team shares datasets across runs.
Why RAMJET?¶
| Problem | Solution |
|---|---|
| Repeated S3 pulls of the same dataset | First run warms the peer cache; subsequent runs serve from local SSD or peer disks |
| Network bottleneck from shared object storage | Consistent-hashed peer cache spreads bytes across nodes |
| No visibility into where bytes come from | Real-time dashboard with per-source byte breakdown (local / peer / origin) |
| DDP rank coordination headaches | Works out of the box with torchrun, DeepSpeed, Accelerate, or custom launchers |
Measured impact¶
Benchmark on 2× A5000 (odin) with a 5315-sample YOLO dataset, 5 epochs:
| Run | Dataset export | E2E | S3 calls |
|---|---|---|---|
| Cold (empty cache) | 161.2 s | 419.6 s | 5315 |
| Cache-warm (data wiped, cache kept) | 24.6 s | 274.7 s | 0 |
That's a 6.5× faster export and zero S3 requests on the second run.
RAMJET accelerates the dataset export stage (S3 → local SSD). Once data is on the local SSD, the training loop reads it directly. Real ROI scales with how often a dataset is reused across runs, not with epoch count inside one run.
Get started¶
- Quickstart —
pip install ramjetioto first cache hit in 5 minutes. - Integration guide — embedding RAMJET into an existing PyTorch pipeline.
- API reference — every public function and dataclass.
- Examples — runnable scripts for
torchrun, DeepSpeed, and Accelerate. - Troubleshooting — known issues and fixes.
License¶
PolyForm Noncommercial 1.0.0. Commercial licensing available — see ramjet.io.