RAMJET¶

Distributed peer cache for ML training data. Eliminate redundant S3 pulls when your team shares datasets across runs.

Why RAMJET?¶

Problem	Solution
Repeated S3 pulls of the same dataset	First run warms the peer cache; subsequent runs serve from local SSD or peer disks
Network bottleneck from shared object storage	Consistent-hashed peer cache spreads bytes across nodes
No visibility into where bytes come from	Real-time dashboard with per-source byte breakdown (local / peer / origin)
DDP rank coordination headaches	Works out of the box with `torchrun`, DeepSpeed, Accelerate, or custom launchers

Measured impact¶

Benchmark on 2× A5000 (odin) with a 5315-sample YOLO dataset, 5 epochs:

Run	Dataset export	E2E	S3 calls
Cold (empty cache)	161.2 s	419.6 s	5315
Cache-warm (data wiped, cache kept)	24.6 s	274.7 s	0

That's a 6.5× faster export and zero S3 requests on the second run.

RAMJET accelerates the dataset export stage (S3 → local SSD). Once data is on the local SSD, the training loop reads it directly. Real ROI scales with how often a dataset is reused across runs, not with epoch count inside one run.

Get started¶

Quickstart — pip install ramjetio to first cache hit in 5 minutes.
Integration guide — embedding RAMJET into an existing PyTorch pipeline.
API reference — every public function and dataclass.
Examples — runnable scripts for torchrun, DeepSpeed, and Accelerate.
Troubleshooting — known issues and fixes.

License¶

PolyForm Noncommercial 1.0.0. Commercial licensing available — see ramjet.io.