Skip to content

Examples

Runnable scripts in ramjetio/examples/ on GitHub.

simple.py

Minimal end-to-end example: ramjetio.init() + a tiny training loop. Best starting point.

View source

torchrun_ddp.py

Multi-node distributed training with torchrun. RAMJET auto-detects ranks; no extra wiring needed.

torchrun --nnodes=2 --nproc_per_node=4 examples/torchrun_ddp.py

View source

accelerate_example.py

HuggingFace Accelerate integration. Drop-in compatible.

accelerate launch examples/accelerate_example.py

View source

deepspeed_example.py

DeepSpeed launcher with RAMJET. Useful for large-model training where memory is the constraint.

deepspeed examples/deepspeed_example.py

View source

pytorch_imagenet.py

ImageFolder + ResNet18 with CachedDataset wrapping. DDP-aware (single-process if RANK is unset).

python examples/pytorch_imagenet.py --data-dir /mnt/imagenet --epochs 3
# or multi-GPU:
./examples/ddp_torchrun.sh examples/pytorch_imagenet.py --data-dir /mnt/imagenet

View source

huggingface_datasets.py

datasets.load_dataset(...) wrapped in CachedDataset so subsequent epochs / nodes hit the peer ring instead of the HF Hub.

python examples/huggingface_datasets.py --dataset imdb --split train --epochs 2

View source

yolov8.py

Ultralytics YOLOv8 training fed by UniversalDataset(format='yolo'). This is the workload from our benchmark — 2× A5000, 5,315 samples × 5 epochs, 6.5× faster end-to-end.

python examples/yolov8.py --model yolov8n.pt --epochs 5
# multi-GPU:
./examples/ddp_torchrun.sh examples/yolov8.py --model yolov8n.pt --epochs 5

View source

ddp_torchrun.sh

One-liner torchrun wrapper that auto-detects per-node GPU count and accepts NNODES / NODE_RANK / MASTER_ADDR overrides for multi-node runs.

View source

Coming in v0.10

  • video_mp4_streaming.py — sharded video streaming with WAN-tier peering.