Skip to content

RAMJET

Distributed peer cache for ML training data. Eliminate redundant S3 pulls when your team shares datasets across runs.

PyPI GitHub


Why RAMJET?

Problem Solution
Repeated S3 pulls of the same dataset First run warms the peer cache; subsequent runs serve from local SSD or peer disks
Network bottleneck from shared object storage Consistent-hashed peer cache spreads bytes across nodes
No visibility into where bytes come from Real-time dashboard with per-source byte breakdown (local / peer / origin)
DDP rank coordination headaches Works out of the box with torchrun, DeepSpeed, Accelerate, or custom launchers

Measured impact

Benchmark on 2× A5000 (odin) with a 5315-sample YOLO dataset, 5 epochs:

Run Dataset export E2E S3 calls
Cold (empty cache) 161.2 s 419.6 s 5315
Cache-warm (data wiped, cache kept) 24.6 s 274.7 s 0

That's a 6.5× faster export and zero S3 requests on the second run.

RAMJET accelerates the dataset export stage (S3 → local SSD). Once data is on the local SSD, the training loop reads it directly. Real ROI scales with how often a dataset is reused across runs, not with epoch count inside one run.

Get started

License

PolyForm Noncommercial 1.0.0. Commercial licensing available — see ramjet.io.