[INDEX] // ALL_DOCS ›
[TOC] // ON_THIS_PAGE ›
Practical strategies to lower your machine.dev bill without sacrificing speed. Every example below uses real per-minute rates.
How machine.dev bills you
- Per-minute, in US dollars. No credits.
- You only pay for runtime. Provisioning and teardown are free.
- Spot rates are 70–90% cheaper than on-demand.
- Storage (EBS) is billed separately, prorated to runtime. Defaults are minimal (~$0.006 for a 30-min job).
- The dashboard shows dollar spend by default (app.machine.dev). Toggle to credit-style view in Settings if you prefer.
1. Use spot when you can
Spot instances offer the biggest single saving. Real numbers from current pricing:
| Runner | Spot $/min | On-demand $/min | Savings |
|---|---|---|---|
| T4G GPU (4 vCPU) | $0.00351 | $0.01400 | 75% |
| T4 GPU (4 vCPU) | $0.00449 | $0.01753 | 74% |
| L4 GPU (4 vCPU) | $0.00575 | $0.02683 | 79% |
| A10G GPU (4 vCPU) | $0.01526 | $0.03353 | 55% |
| L40S GPU (4 vCPU) | $0.01572 | $0.06203 | 75% |
| CPU 16 vCPU X64 | $0.00255 | $0.02380 | 89% |
| CPU 16 vCPU ARM64 | $0.00207 | $0.01927 | 89% |
A 2-hour L40S fine-tune drops from $7.44 on-demand to $1.89 on spot — a $5.55 savings per run.
Add tenancy=spot to your runs-on::
runs-on: [machine, gpu=l4, tenancy=spot]
Spot interruption rates per runner type are visible on machine.dev/runners.
2. Make spot interruptions safe with checkpointing
Spot instances can be reclaimed by AWS at any time. Checkpoint your work to a durable store so you can resume from where you left off on a fresh instance.
The cleanest pattern uses the Hugging Face Hub as the checkpoint store:
import torch
from huggingface_hub import HfApi, hf_hub_download
def save_checkpoint(model, optimizer, epoch, step, repo_id):
checkpoint = {
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'epoch': epoch,
'step': step,
}
torch.save(checkpoint, 'checkpoint.pt')
HfApi().upload_file(
path_or_fileobj='checkpoint.pt',
path_in_repo='checkpoint.pt',
repo_id=repo_id,
repo_type='model',
)
Then on job start, try to resume:
def load_checkpoint(model, optimizer, repo_id):
try:
hf_hub_download(repo_id=repo_id, filename='checkpoint.pt', local_dir='.')
ckpt = torch.load('checkpoint.pt')
model.load_state_dict(ckpt['model_state_dict'])
optimizer.load_state_dict(ckpt['optimizer_state_dict'])
return ckpt['epoch'], ckpt['step']
except Exception:
return 0, 0
Combine this with a workflow that retries itself on spot interruption — see LLM Supervised Fine-Tuning and GRPO Fine-Tuning for full working examples.
3. Right-size the GPU
Don’t pay for L40S when L4 will do the job. Quick mapping:
| Workload | Recommended GPU | $/min (spot) |
|---|---|---|
| Inference, ≤7B model | T4G | $0.004 |
| Inference, 7–13B | L4 | $0.006 |
| Inference, 30B+ | L40S | $0.016 |
| QLoRA fine-tune ≤13B | T4G | $0.004 |
| QLoRA fine-tune 30B | L4 | $0.006 |
| QLoRA fine-tune 70B | L40S | $0.016 |
| LoRA fine-tune 7B | L4 or A10G | $0.006–$0.011 |
| LoRA fine-tune 13B | L40S | $0.016 |
| Real-time CV | A10G | $0.011–$0.019 |
| Inference at scale | Inferentia2 | $0.003 |
See CPU vs GPU for the full decision tree.
Use metrics to right-size
After a job completes, check the GPU utilization metrics on the machine.dev dashboard. If your GPU utilization is consistently low (e.g., under 50%), you’re likely paying for more GPU than you need — drop to a cheaper tier. For example, if your L4 job only hits 30% GPU utilization, a T4 at $0.004/min may handle the same workload.
4. Right-size CPU and RAM
Each GPU is offered in 3 vCPU/RAM configurations. The 4-vCPU config is cheapest. Only step up if your data preprocessing is the bottleneck.
runs-on: [machine, gpu=l4] # 4 vCPU, 16 GB RAM (default, cheapest)
runs-on: [machine, gpu=l4, cpu=16] # 16 vCPU, 64 GB RAM (more $)
The same applies to CPU and RAM — check the metrics after a run. If CPU utilization is low, you may be able to use the smaller vCPU config for that GPU and save money.
5. Right-size storage
Default storage (100 GB / 6,000 IOPS / 250 MB/s) is included at no extra charge. Don’t request more unless you need it — IOPS above 6,000 and throughput above 250 MB/s incur prorated EBS charges.
runs-on:
- machine
- gpu=l4
# Default: 100GB, 6000 IOPS, 250 MB/s — sufficient for most workloads
For data-heavy jobs that genuinely need more:
runs-on:
- machine
- gpu=l4
- disk_size=500 # Only increase if you need the space
- disk_iops=10000 # Only increase for I/O-bound workloads
- disk_throughput=750 # Only increase for sequential read/write bound jobs
A 60-minute job at 500 GB / 10,000 IOPS / 750 MB/s costs about $0.11 in storage on top of compute. See Pricing for the full breakdown.
6. Open up regions
Spot prices vary by region. Specifying multiple regions lets machine.dev pick the cheapest available — no need to lock in:
runs-on:
- machine
- gpu=l4
- tenancy=spot
- regions=us-east-1,us-east-2,eu-south-2
For most CPU and L4 workloads, spot prices in eu-south-2 are lower than US regions. For on-demand, us-east-1 is usually cheapest.
7. Cache aggressively
Cache pip, npm, Hugging Face downloads, and Docker layers to skip re-downloading on every run:
- uses: actions/cache@v4
with:
path: |
~/.cache/pip
~/.cache/huggingface
key: deps-${{ hashFiles('requirements.txt') }}
8. Use workflow filters
Don’t trigger expensive GPU jobs on every commit. Filter by file path:
on:
push:
paths:
- 'model/**'
- 'data/**'
- 'requirements.txt'
Or gate behind a CPU-only check job:
jobs:
changed:
runs-on: ubuntu-latest
outputs:
train: ${{ steps.check.outputs.train }}
steps:
- id: check
run: echo "train=true" >> $GITHUB_OUTPUT
train:
needs: changed
if: ${{ needs.changed.outputs.train == 'true' }}
runs-on: [machine, gpu=l4, tenancy=spot]
9. Set timeouts
Fail-safe against runaway costs:
jobs:
train:
runs-on: [machine, gpu=l40s, tenancy=spot]
timeout-minutes: 120
Monitoring spend
The machine.dev dashboard shows per-job dollar cost, plus daily and monthly aggregates. Built-in metrics (CPU, memory, disk, network, GPU utilization) appear as sparkline charts on every job page so you can see if you’re under-utilizing the runner you’re paying for.
Use the dashboard to spot which workflows or which repos are eating your budget.
Next steps
- CPU vs GPU — pick the right runner type
- Pricing — full per-minute rates and EBS pricing
- LLM Supervised Fine-Tuning — checkpointing pattern in action
- GRPO Fine-Tuning — spot-resilient training