Quickstart¶
Get from pip install to first cache hit in five minutes.
1. Install¶
Requires Python 3.8+ and PyTorch 1.9+.
2. Get an API key¶
Sign up at app.ramjet.io, create a cluster, copy the API key.
3. Set environment variables¶
export RAMJET_API_KEY=<your-key>
export RAMJET_BACKEND_URL=https://api.ramjet.io # default
# S3-compatible storage (works with AWS S3, MinIO, R2, ...)
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export S3_ENDPOINT_URL=https://s3.amazonaws.com # or your MinIO host
4. Initialize in your training script¶
That's it — every read from bucket/train/* now hits the distributed cache first.
5. Run with torchrun¶
RAMJET auto-detects RANK / LOCAL_RANK / WORLD_SIZE. No code changes needed for DDP.
What to expect¶
- First run: dataset export pulls from S3, populates the peer cache, normal speed.
- Subsequent runs: export is 6× faster, zero S3 calls (assuming cache survived).
- Dashboard: open the cluster page in app.ramjet.io — the Data Pipeline Rate panel shows where each MB came from (local / peer / origin).
Next¶
- Integration guide — wiring RAMJET into existing dataloaders, Ultralytics, HuggingFace
datasets. - API reference — full function signatures.
- Troubleshooting — common gotchas (cache eviction, stale heartbeats, drain semantics).