Quickstart¶

Get from pip install to first cache hit in five minutes.

1. Install¶

pip install ramjetio

Requires Python 3.8+ and PyTorch 1.9+.

2. Get an API key¶

Sign up at app.ramjet.io, create a cluster, copy the API key.

3. Set environment variables¶

export RAMJET_API_KEY=<your-key>
export RAMJET_BACKEND_URL=https://api.ramjet.io  # default

# S3-compatible storage (works with AWS S3, MinIO, R2, ...)
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export S3_ENDPOINT_URL=https://s3.amazonaws.com  # or your MinIO host

4. Initialize in your training script¶

import ramjetio

ramjetio.init(
    bucket="my-dataset-bucket",
    prefix="train/",
)

That's it — every read from bucket/train/* now hits the distributed cache first.

5. Run with `torchrun`¶

torchrun --nnodes=2 --nproc_per_node=4 train.py

RAMJET auto-detects RANK / LOCAL_RANK / WORLD_SIZE. No code changes needed for DDP.

What to expect¶

First run: dataset export pulls from S3, populates the peer cache, normal speed.
Subsequent runs: export is 6× faster, zero S3 calls (assuming cache survived).
Dashboard: open the cluster page in app.ramjet.io — the Data Pipeline Rate panel shows where each MB came from (local / peer / origin).

Next¶

Integration guide — wiring RAMJET into existing dataloaders, Ultralytics, HuggingFace datasets.
API reference — full function signatures.
Troubleshooting — common gotchas (cache eviction, stale heartbeats, drain semantics).