Skip to content

Quickstart

Get from pip install to first cache hit in five minutes.

1. Install

pip install ramjetio

Requires Python 3.8+ and PyTorch 1.9+.

2. Get an API key

Sign up at app.ramjet.io, create a cluster, copy the API key.

3. Set environment variables

export RAMJET_API_KEY=<your-key>
export RAMJET_BACKEND_URL=https://api.ramjet.io  # default

# S3-compatible storage (works with AWS S3, MinIO, R2, ...)
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export S3_ENDPOINT_URL=https://s3.amazonaws.com  # or your MinIO host

4. Initialize in your training script

import ramjetio

ramjetio.init(
    bucket="my-dataset-bucket",
    prefix="train/",
)

That's it — every read from bucket/train/* now hits the distributed cache first.

5. Run with torchrun

torchrun --nnodes=2 --nproc_per_node=4 train.py

RAMJET auto-detects RANK / LOCAL_RANK / WORLD_SIZE. No code changes needed for DDP.

What to expect

  • First run: dataset export pulls from S3, populates the peer cache, normal speed.
  • Subsequent runs: export is 6× faster, zero S3 calls (assuming cache survived).
  • Dashboard: open the cluster page in app.ramjet.io — the Data Pipeline Rate panel shows where each MB came from (local / peer / origin).

Next

  • Integration guide — wiring RAMJET into existing dataloaders, Ultralytics, HuggingFace datasets.
  • API reference — full function signatures.
  • Troubleshooting — common gotchas (cache eviction, stale heartbeats, drain semantics).