API Reference¶
Module: ramjetio¶
ramjetio.init()¶
Initialize RAMJET. This is the recommended way to start using RAMJET.
ramjetio.init(
api_key: str = None, # API key (default: from RAMJET_API_KEY env var)
cache_path: str = None, # Cache directory (default: /tmp/ramjet_cache)
cache_size: str = None, # Max cache size (default: 100GB)
port: int = None, # Cache server port (default: 9000)
auto_start_server: bool = True, # Start cache server automatically
)
What it does:
1. Reads configuration from environment/arguments
2. Connects to RAMJET dashboard
3. Registers this node
4. Starts local cache server (if auto_start_server=True)
5. Starts background heartbeat/metrics thread
Example:
import ramjetio
# Simple (uses env vars)
ramjetio.init()
# With explicit config
ramjetio.init(
api_key="ramjet_abc123",
cache_size="500GB",
port=9001,
)
ramjetio.CachedDataset¶
PyTorch Dataset wrapper with distributed caching.
ramjetio.CachedDataset(
dataset: Dataset, # Original PyTorch dataset
cache: DistributedCache = None, # Cache instance (default: global cache)
key_fn: Callable[[int], str] = None, # Key generation function
cache_on_miss: bool = True, # Cache items on miss
ttl: int = None, # Time-to-live in seconds
transform_before_cache: Callable = None, # Transform to apply before caching
)
Example:
from torch.utils.data import DataLoader
import ramjetio
ramjetio.init()
# Basic usage
dataset = ramjetio.CachedDataset(YourDataset())
# With custom key function
dataset = ramjetio.CachedDataset(
YourDataset(),
key_fn=lambda idx: f"sample_{idx}_v2",
ttl=86400, # 24 hours
)
loader = DataLoader(dataset, batch_size=32)
Methods:
- get_cache_stats() -> dict — Get hit/miss statistics
- reset_stats() — Reset statistics counters
ramjetio.DistributedCache¶
Low-level cache client.
ramjetio.DistributedCache(
nodes: List[str] = None, # Node addresses (default: from dashboard)
timeout: float = 30.0, # Request timeout
retry_attempts: int = 3, # Retry failed requests
)
Methods:
cache = ramjetio.DistributedCache()
# Store data
cache.set(key: str, value: Any, ttl: int = None) -> bool
# Retrieve data
cache.get(key: str) -> Optional[Any]
# Delete data
cache.delete(key: str) -> bool
# Check existence
cache.exists(key: str) -> bool
# Get statistics
cache.stats() -> Dict[str, Dict]
# Clear all data
cache.clear() -> bool
Example:
import torch
from ramjetio import DistributedCache
cache = DistributedCache()
# Store tensor
tensor = torch.randn(100, 100)
cache.set("my_tensor", tensor)
# Retrieve
loaded = cache.get("my_tensor")
assert torch.allclose(tensor, loaded)
# With TTL (expires in 1 hour)
cache.set("temp_data", data, ttl=3600)
ramjetio.get_data_source()¶
Get data source configuration from dashboard.
Returns DataSource with:
ds.type # "s3", "http", "local"
ds.endpoint # S3 endpoint URL
ds.bucket # S3 bucket name
ds.access_key # S3 access key
ds.secret_key # S3 secret key
ds.region # S3 region (optional)
ds.use_ssl # Use SSL for S3
ds.path_style # Use path-style URLs (for MinIO)
ds.url # HTTP URL (if type="http")
ds.auth_header # HTTP auth header
ds.prefix # Path prefix
Example:
import ramjetio
import s3fs
ramjetio.init()
ds = ramjetio.get_data_source()
if ds.type == "s3":
fs = s3fs.S3FileSystem(
endpoint_url=ds.endpoint,
key=ds.access_key,
secret=ds.secret_key,
)
files = fs.ls(f"{ds.bucket}/{ds.prefix}")
ramjet.get_cluster_nodes()¶
Get list of all nodes in the cluster.
Returns:
[
{"hostname": "node0", "port": 9000, "status": "online"},
{"hostname": "node1", "port": 9000, "status": "online"},
]
ramjet.log_metrics()¶
Log training metrics to dashboard.
ramjet.log_metrics(
loss: float = None,
epoch: int = None,
step: int = None,
learning_rate: float = None,
throughput: float = None, # samples/sec
**kwargs, # Any additional metrics
)
Example:
for epoch in range(100):
for step, batch in enumerate(loader):
loss = train_step(batch)
ramjet.log_metrics(
loss=loss.item(),
epoch=epoch,
step=step,
learning_rate=optimizer.param_groups[0]['lr'],
)
Environment Variables¶
| Variable | Description | Default |
|---|---|---|
RAMJET_API_KEY |
API key for authentication | Required |
RAMJET_CACHE_PATH |
Local cache directory | /tmp/ramjet_cache |
RAMJET_CACHE_SIZE |
Maximum cache size | 100GB |
RAMJET_PORT |
Cache server port | 9000 |
RAMJET_NODE_NAME |
Node identifier | hostname |
RAMJET_BACKEND_HOST |
Backend host | api.ramjet.io |
RAMJET_BACKEND_PORT |
Backend port | 443 |
RAMJET_LOG_LEVEL |
Logging level | INFO |
CLI Commands¶
ramjet-server¶
Start cache server daemon.
ramjet-server [OPTIONS]
Options:
--host TEXT Host to bind to [default: 0.0.0.0]
--port INTEGER Port to bind to [default: 9000]
--storage-path Path for cache storage [default: /tmp/ramjet_cache]
--capacity TEXT Maximum cache size [default: 100GB]
--log-level TEXT Logging level [default: INFO]
ramjet-client¶
CLI client for cache operations.