[INDEX] // ALL_DOCS ›
[TOC] // ON_THIS_PAGE ›
Practical ways to lower your machine.dev bill without slowing down. Every example below uses real per-minute rates.
How machine.dev bills you
- Per-minute, in US dollars. No credits.
- You only pay for runtime. Provisioning and teardown are free.
- Spot rates are 70-90% cheaper than on-demand.
- Storage (EBS) is billed separately, prorated to runtime. Defaults are minimal (~$0.006 for a 30-min job).
- The dashboard shows dollar spend by default (app.machine.dev). Toggle to credit-style view in Settings if you prefer.
1. Use spot when you can
Spot instances offer the biggest single saving. Real numbers from current pricing:
| Runner | Spot $/min | On-demand $/min | Savings |
|---|---|---|---|
| T4G GPU (4 vCPU) | $0.00351 | $0.01400 | 75% |
| T4 GPU (4 vCPU) | $0.00449 | $0.01753 | 74% |
| L4 GPU (4 vCPU) | $0.00575 | $0.02683 | 79% |
| A10G GPU (4 vCPU) | $0.01526 | $0.03353 | 55% |
| L40S GPU (4 vCPU) | $0.01572 | $0.06203 | 75% |
| CPU 16 vCPU X64 | $0.00255 | $0.02380 | 89% |
| CPU 16 vCPU ARM64 | $0.00207 | $0.01927 | 89% |
A 2-hour L40S fine-tune drops from $7.44 on-demand to $1.89 on spot. That’s $5.55 saved per run.
Add tenancy=spot to your runs-on::
runs-on: machine/gpu=l4/tenancy=spot
Spot interruption rates per runner type are visible on the pricing page.
2. Make spot interruptions safe with checkpointing
Spot instances can be reclaimed by AWS at any time. Checkpoint your work to a durable store so you can resume from where you left off on a fresh instance.
The cleanest pattern uses the Hugging Face Hub as the checkpoint store:
import torch
from huggingface_hub import HfApi, hf_hub_download
def save_checkpoint(model, optimizer, epoch, step, repo_id):
checkpoint = {
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'epoch': epoch,
'step': step,
}
torch.save(checkpoint, 'checkpoint.pt')
HfApi().upload_file(
path_or_fileobj='checkpoint.pt',
path_in_repo='checkpoint.pt',
repo_id=repo_id,
repo_type='model',
)
Then on job start, try to resume:
def load_checkpoint(model, optimizer, repo_id):
try:
hf_hub_download(repo_id=repo_id, filename='checkpoint.pt', local_dir='.')
ckpt = torch.load('checkpoint.pt')
model.load_state_dict(ckpt['model_state_dict'])
optimizer.load_state_dict(ckpt['optimizer_state_dict'])
return ckpt['epoch'], ckpt['step']
except Exception:
return 0, 0
Combine this with a workflow that retries itself on spot interruption. See LLM Supervised Fine-Tuning and GRPO Fine-Tuning for full working examples.
3. Right-size the GPU
Don’t pay for L40S when L4 will do the job. Quick mapping:
| Workload | Recommended GPU | $/min (spot) |
|---|---|---|
| Inference, ≤7B model | T4G | $0.004 |
| Inference, 7-13B | L4 | $0.006 |
| Inference, 30B+ | L40S | $0.016 |
| QLoRA fine-tune ≤13B | T4G | $0.004 |
| QLoRA fine-tune 30B | L4 | $0.006 |
| QLoRA fine-tune 70B | L40S | $0.016 |
| LoRA fine-tune 7B | L4 or A10G | $0.006-$0.011 |
| LoRA fine-tune 13B | L40S | $0.016 |
| Real-time CV | A10G | $0.011-$0.019 |
| Inference at scale | Inferentia2 | $0.003 |
See CPU vs GPU for the full decision tree.
Use metrics to right-size
After a job completes, check the GPU utilization metrics on the machine.dev dashboard. If GPU utilization is consistently under 50%, you’re paying for more GPU than you need. Drop to a cheaper tier. For example, an L4 job that only hits 30% GPU utilization will likely be fine on a T4 at $0.004/min.
4. Right-size CPU and RAM
Each GPU is offered in 3 vCPU/RAM configurations. The 4-vCPU config is cheapest. Only step up if your data preprocessing is the bottleneck.
runs-on: machine/gpu=l4 # 4 vCPU, 16 GB RAM (default, cheapest)
runs-on: machine/gpu=l4/cpu=16 # 16 vCPU, 64 GB RAM (more $)
The same applies to CPU and RAM. Check the metrics after a run. If CPU utilization is low, drop to the smaller vCPU config for that GPU.
5. Right-size storage
Default storage (100 GB / 6,000 IOPS / 250 MB/s) is included at no extra charge. Don’t request more unless you need it. IOPS above 6,000 and throughput above 250 MB/s incur prorated EBS charges.
runs-on: machine/gpu=l4 # Default: 100GB, 6000 IOPS, 250 MB/s — sufficient for most workloads
For data-heavy jobs that genuinely need more:
runs-on: machine/gpu=l4/disk_size=500/disk_iops=10000/disk_throughput=750
# disk_size=500 → only increase if you need the space
# disk_iops=10000 → only increase for I/O-bound workloads
# disk_throughput=750 → only increase for sequential read/write bound jobs
A 60-minute job at 500 GB / 10,000 IOPS / 750 MB/s costs about $0.11 in storage on top of compute. See Pricing for the full breakdown.
6. Open up regions
Spot prices vary by region. Specifying multiple regions lets machine.dev pick the cheapest available. No need to lock in:
runs-on: machine/gpu=l4/tenancy=spot/regions=us-east-1,us-east-2,eu-south-2
For most CPU and L4 workloads, spot prices in eu-south-2 are lower than US regions. For on-demand, us-east-1 is usually cheapest.
7. Cache aggressively
Cache pip, npm, Hugging Face downloads, and Docker layers to skip re-downloading on every run:
- uses: actions/cache@v4
with:
path: |
~/.cache/pip
~/.cache/huggingface
key: deps-${{ hashFiles('requirements.txt') }}
8. Use workflow filters
Don’t trigger expensive GPU jobs on every commit. Filter by file path:
on:
push:
paths:
- 'model/**'
- 'data/**'
- 'requirements.txt'
Or gate behind a CPU-only check job:
jobs:
changed:
runs-on: ubuntu-latest
outputs:
train: ${{ steps.check.outputs.train }}
steps:
- id: check
run: echo "train=true" >> $GITHUB_OUTPUT
train:
needs: changed
if: ${{ needs.changed.outputs.train == 'true' }}
runs-on: machine/gpu=l4/tenancy=spot
9. Set timeouts
Fail-safe against runaway costs:
jobs:
train:
runs-on: machine/gpu=l40s/tenancy=spot
timeout-minutes: 120
Monitoring spend
The machine.dev dashboard shows per-job dollar cost, plus daily and monthly aggregates. Built-in metrics (CPU, memory, disk, network, GPU utilization) appear as sparkline charts on every job page so you can see if you’re under-utilizing the runner you’re paying for.
Use the dashboard to spot which workflows or which repos are eating your budget.
Next steps
- CPU vs GPU: pick the right runner type
- Pricing: full per-minute rates and EBS pricing
- LLM Supervised Fine-Tuning: checkpointing pattern in action
- GRPO Fine-Tuning: spot-resilient training