SKIP_TO_MAIN_CONTENT
machine.dev
SIGN_UP
04.0 // Documentation v1.8.4 Last updated 2026-06-05

Cost Optimization

Practical strategies to lower your machine.dev bill. Spot instances, checkpointing, right-sizing, region selection, storage tuning. Real dollar savings examples.

Practical ways to lower your machine.dev bill without slowing down. Every example below uses real per-minute rates.

How machine.dev bills you

  • Per-minute, in US dollars. No credits.
  • You only pay for runtime. Provisioning and teardown are free.
  • Spot rates are 70-90% cheaper than on-demand.
  • Storage (EBS) is billed separately, prorated to runtime. Defaults are minimal (~$0.006 for a 30-min job).
  • The dashboard shows dollar spend by default (app.machine.dev). Toggle to credit-style view in Settings if you prefer.

1. Use spot when you can

Spot instances offer the biggest single saving. Real numbers from current pricing:

RunnerSpot $/minOn-demand $/minSavings
T4G GPU (4 vCPU)$0.00351$0.0140075%
T4 GPU (4 vCPU)$0.00449$0.0175374%
L4 GPU (4 vCPU)$0.00575$0.0268379%
A10G GPU (4 vCPU)$0.01526$0.0335355%
L40S GPU (4 vCPU)$0.01572$0.0620375%
CPU 16 vCPU X64$0.00255$0.0238089%
CPU 16 vCPU ARM64$0.00207$0.0192789%

A 2-hour L40S fine-tune drops from $7.44 on-demand to $1.89 on spot. That’s $5.55 saved per run.

Add tenancy=spot to your runs-on::

runs-on: machine/gpu=l4/tenancy=spot

Spot interruption rates per runner type are visible on the pricing page.

2. Make spot interruptions safe with checkpointing

Spot instances can be reclaimed by AWS at any time. Checkpoint your work to a durable store so you can resume from where you left off on a fresh instance.

The cleanest pattern uses the Hugging Face Hub as the checkpoint store:

import torch
from huggingface_hub import HfApi, hf_hub_download

def save_checkpoint(model, optimizer, epoch, step, repo_id):
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'step': step,
    }
    torch.save(checkpoint, 'checkpoint.pt')

    HfApi().upload_file(
        path_or_fileobj='checkpoint.pt',
        path_in_repo='checkpoint.pt',
        repo_id=repo_id,
        repo_type='model',
    )

Then on job start, try to resume:

def load_checkpoint(model, optimizer, repo_id):
    try:
        hf_hub_download(repo_id=repo_id, filename='checkpoint.pt', local_dir='.')
        ckpt = torch.load('checkpoint.pt')
        model.load_state_dict(ckpt['model_state_dict'])
        optimizer.load_state_dict(ckpt['optimizer_state_dict'])
        return ckpt['epoch'], ckpt['step']
    except Exception:
        return 0, 0

Combine this with a workflow that retries itself on spot interruption. See LLM Supervised Fine-Tuning and GRPO Fine-Tuning for full working examples.

3. Right-size the GPU

Don’t pay for L40S when L4 will do the job. Quick mapping:

WorkloadRecommended GPU$/min (spot)
Inference, ≤7B modelT4G$0.004
Inference, 7-13BL4$0.006
Inference, 30B+L40S$0.016
QLoRA fine-tune ≤13BT4G$0.004
QLoRA fine-tune 30BL4$0.006
QLoRA fine-tune 70BL40S$0.016
LoRA fine-tune 7BL4 or A10G$0.006-$0.011
LoRA fine-tune 13BL40S$0.016
Real-time CVA10G$0.011-$0.019
Inference at scaleInferentia2$0.003

See CPU vs GPU for the full decision tree.

Use metrics to right-size

After a job completes, check the GPU utilization metrics on the machine.dev dashboard. If GPU utilization is consistently under 50%, you’re paying for more GPU than you need. Drop to a cheaper tier. For example, an L4 job that only hits 30% GPU utilization will likely be fine on a T4 at $0.004/min.

4. Right-size CPU and RAM

Each GPU is offered in 3 vCPU/RAM configurations. The 4-vCPU config is cheapest. Only step up if your data preprocessing is the bottleneck.

runs-on: machine/gpu=l4              # 4 vCPU, 16 GB RAM (default, cheapest)
runs-on: machine/gpu=l4/cpu=16       # 16 vCPU, 64 GB RAM (more $)

The same applies to CPU and RAM. Check the metrics after a run. If CPU utilization is low, drop to the smaller vCPU config for that GPU.

5. Right-size storage

Default storage (100 GB / 6,000 IOPS / 250 MB/s) is included at no extra charge. Don’t request more unless you need it. IOPS above 6,000 and throughput above 250 MB/s incur prorated EBS charges.

runs-on: machine/gpu=l4   # Default: 100GB, 6000 IOPS, 250 MB/s — sufficient for most workloads

For data-heavy jobs that genuinely need more:

runs-on: machine/gpu=l4/disk_size=500/disk_iops=10000/disk_throughput=750
#  disk_size=500       → only increase if you need the space
#  disk_iops=10000     → only increase for I/O-bound workloads
#  disk_throughput=750 → only increase for sequential read/write bound jobs

A 60-minute job at 500 GB / 10,000 IOPS / 750 MB/s costs about $0.11 in storage on top of compute. See Pricing for the full breakdown.

6. Open up regions

Spot prices vary by region. Specifying multiple regions lets machine.dev pick the cheapest available. No need to lock in:

runs-on: machine/gpu=l4/tenancy=spot/regions=us-east-1,us-east-2,eu-south-2

For most CPU and L4 workloads, spot prices in eu-south-2 are lower than US regions. For on-demand, us-east-1 is usually cheapest.

7. Cache aggressively

Cache pip, npm, Hugging Face downloads, and Docker layers to skip re-downloading on every run:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      ~/.cache/huggingface
    key: deps-${{ hashFiles('requirements.txt') }}

8. Use workflow filters

Don’t trigger expensive GPU jobs on every commit. Filter by file path:

on:
  push:
    paths:
      - 'model/**'
      - 'data/**'
      - 'requirements.txt'

Or gate behind a CPU-only check job:

jobs:
  changed:
    runs-on: ubuntu-latest
    outputs:
      train: ${{ steps.check.outputs.train }}
    steps:
      - id: check
        run: echo "train=true" >> $GITHUB_OUTPUT

  train:
    needs: changed
    if: ${{ needs.changed.outputs.train == 'true' }}
    runs-on: machine/gpu=l4/tenancy=spot

9. Set timeouts

Fail-safe against runaway costs:

jobs:
  train:
    runs-on: machine/gpu=l40s/tenancy=spot
    timeout-minutes: 120

Monitoring spend

The machine.dev dashboard shows per-job dollar cost, plus daily and monthly aggregates. Built-in metrics (CPU, memory, disk, network, GPU utilization) appear as sparkline charts on every job page so you can see if you’re under-utilizing the runner you’re paying for.

Use the dashboard to spot which workflows or which repos are eating your budget.

Next steps