Cost Optimization — machine.dev docs

[INDEX] // ALL_DOCS ›

[TOC] // ON_THIS_PAGE ›

Practical strategies to lower your machine.dev bill without sacrificing speed. Every example below uses real per-minute rates.

How machine.dev bills you

Per-minute, in US dollars. No credits.
You only pay for runtime. Provisioning and teardown are free.
Spot rates are 70–90% cheaper than on-demand.
Storage (EBS) is billed separately, prorated to runtime. Defaults are minimal (~$0.006 for a 30-min job).
The dashboard shows dollar spend by default (app.machine.dev). Toggle to credit-style view in Settings if you prefer.

1. Use spot when you can

Spot instances offer the biggest single saving. Real numbers from current pricing:

Runner	Spot $/min	On-demand $/min	Savings
T4G GPU (4 vCPU)	$0.00351	$0.01400	75%
T4 GPU (4 vCPU)	$0.00449	$0.01753	74%
L4 GPU (4 vCPU)	$0.00575	$0.02683	79%
A10G GPU (4 vCPU)	$0.01526	$0.03353	55%
L40S GPU (4 vCPU)	$0.01572	$0.06203	75%
CPU 16 vCPU X64	$0.00255	$0.02380	89%
CPU 16 vCPU ARM64	$0.00207	$0.01927	89%

A 2-hour L40S fine-tune drops from $7.44 on-demand to $1.89 on spot — a $5.55 savings per run.

Add tenancy=spot to your runs-on::

runs-on: [machine, gpu=l4, tenancy=spot]

Spot interruption rates per runner type are visible on machine.dev/runners.

2. Make spot interruptions safe with checkpointing

Spot instances can be reclaimed by AWS at any time. Checkpoint your work to a durable store so you can resume from where you left off on a fresh instance.

The cleanest pattern uses the Hugging Face Hub as the checkpoint store:

import torch
from huggingface_hub import HfApi, hf_hub_download

def save_checkpoint(model, optimizer, epoch, step, repo_id):
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'step': step,
    }
    torch.save(checkpoint, 'checkpoint.pt')

    HfApi().upload_file(
        path_or_fileobj='checkpoint.pt',
        path_in_repo='checkpoint.pt',
        repo_id=repo_id,
        repo_type='model',
    )

Then on job start, try to resume:

def load_checkpoint(model, optimizer, repo_id):
    try:
        hf_hub_download(repo_id=repo_id, filename='checkpoint.pt', local_dir='.')
        ckpt = torch.load('checkpoint.pt')
        model.load_state_dict(ckpt['model_state_dict'])
        optimizer.load_state_dict(ckpt['optimizer_state_dict'])
        return ckpt['epoch'], ckpt['step']
    except Exception:
        return 0, 0

Combine this with a workflow that retries itself on spot interruption — see LLM Supervised Fine-Tuning and GRPO Fine-Tuning for full working examples.

3. Right-size the GPU

Don’t pay for L40S when L4 will do the job. Quick mapping:

Workload	Recommended GPU	$/min (spot)
Inference, ≤7B model	T4G	$0.004
Inference, 7–13B	L4	$0.006
Inference, 30B+	L40S	$0.016
QLoRA fine-tune ≤13B	T4G	$0.004
QLoRA fine-tune 30B	L4	$0.006
QLoRA fine-tune 70B	L40S	$0.016
LoRA fine-tune 7B	L4 or A10G	$0.006–$0.011
LoRA fine-tune 13B	L40S	$0.016
Real-time CV	A10G	$0.011–$0.019
Inference at scale	Inferentia2	$0.003

See CPU vs GPU for the full decision tree.

Use metrics to right-size

After a job completes, check the GPU utilization metrics on the machine.dev dashboard. If your GPU utilization is consistently low (e.g., under 50%), you’re likely paying for more GPU than you need — drop to a cheaper tier. For example, if your L4 job only hits 30% GPU utilization, a T4 at $0.004/min may handle the same workload.

4. Right-size CPU and RAM

Each GPU is offered in 3 vCPU/RAM configurations. The 4-vCPU config is cheapest. Only step up if your data preprocessing is the bottleneck.

runs-on: [machine, gpu=l4]              # 4 vCPU, 16 GB RAM (default, cheapest)
runs-on: [machine, gpu=l4, cpu=16]      # 16 vCPU, 64 GB RAM (more $)

The same applies to CPU and RAM — check the metrics after a run. If CPU utilization is low, you may be able to use the smaller vCPU config for that GPU and save money.

5. Right-size storage

Default storage (100 GB / 6,000 IOPS / 250 MB/s) is included at no extra charge. Don’t request more unless you need it — IOPS above 6,000 and throughput above 250 MB/s incur prorated EBS charges.

runs-on:
  - machine
  - gpu=l4
  # Default: 100GB, 6000 IOPS, 250 MB/s — sufficient for most workloads

For data-heavy jobs that genuinely need more:

runs-on:
  - machine
  - gpu=l4
  - disk_size=500        # Only increase if you need the space
  - disk_iops=10000      # Only increase for I/O-bound workloads
  - disk_throughput=750  # Only increase for sequential read/write bound jobs

A 60-minute job at 500 GB / 10,000 IOPS / 750 MB/s costs about $0.11 in storage on top of compute. See Pricing for the full breakdown.

6. Open up regions

Spot prices vary by region. Specifying multiple regions lets machine.dev pick the cheapest available — no need to lock in:

runs-on:
  - machine
  - gpu=l4
  - tenancy=spot
  - regions=us-east-1,us-east-2,eu-south-2

For most CPU and L4 workloads, spot prices in eu-south-2 are lower than US regions. For on-demand, us-east-1 is usually cheapest.

7. Cache aggressively

Cache pip, npm, Hugging Face downloads, and Docker layers to skip re-downloading on every run:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      ~/.cache/huggingface
    key: deps-${{ hashFiles('requirements.txt') }}

8. Use workflow filters

Don’t trigger expensive GPU jobs on every commit. Filter by file path:

on:
  push:
    paths:
      - 'model/**'
      - 'data/**'
      - 'requirements.txt'

Or gate behind a CPU-only check job:

jobs:
  changed:
    runs-on: ubuntu-latest
    outputs:
      train: ${{ steps.check.outputs.train }}
    steps:
      - id: check
        run: echo "train=true" >> $GITHUB_OUTPUT

  train:
    needs: changed
    if: ${{ needs.changed.outputs.train == 'true' }}
    runs-on: [machine, gpu=l4, tenancy=spot]

9. Set timeouts

Fail-safe against runaway costs:

jobs:
  train:
    runs-on: [machine, gpu=l40s, tenancy=spot]
    timeout-minutes: 120

Monitoring spend

The machine.dev dashboard shows per-job dollar cost, plus daily and monthly aggregates. Built-in metrics (CPU, memory, disk, network, GPU utilization) appear as sparkline charts on every job page so you can see if you’re under-utilizing the runner you’re paying for.

Use the dashboard to spot which workflows or which repos are eating your budget.

Next steps

CPU vs GPU — pick the right runner type
Pricing — full per-minute rates and EBS pricing
LLM Supervised Fine-Tuning — checkpointing pattern in action
GRPO Fine-Tuning — spot-resilient training