SKIP_TO_MAIN_CONTENT
M machine.dev
SIGN_UP
04.0 // Documentation v1.3.1 Last updated 2026-04-26

Cost Optimization

Practical strategies to lower your machine.dev bill. Spot instances, checkpointing, right-sizing, region selection, storage tuning. Real dollar savings examples.

Practical strategies to lower your machine.dev bill without sacrificing speed. Every example below uses real per-minute rates.

How machine.dev bills you

  • Per-minute, in US dollars. No credits.
  • You only pay for runtime. Provisioning and teardown are free.
  • Spot rates are 70–90% cheaper than on-demand.
  • Storage (EBS) is billed separately, prorated to runtime. Defaults are minimal (~$0.006 for a 30-min job).
  • The dashboard shows dollar spend by default (app.machine.dev). Toggle to credit-style view in Settings if you prefer.

1. Use spot when you can

Spot instances offer the biggest single saving. Real numbers from current pricing:

RunnerSpot $/minOn-demand $/minSavings
T4G GPU (4 vCPU)$0.00351$0.0140075%
T4 GPU (4 vCPU)$0.00449$0.0175374%
L4 GPU (4 vCPU)$0.00575$0.0268379%
A10G GPU (4 vCPU)$0.01526$0.0335355%
L40S GPU (4 vCPU)$0.01572$0.0620375%
CPU 16 vCPU X64$0.00255$0.0238089%
CPU 16 vCPU ARM64$0.00207$0.0192789%

A 2-hour L40S fine-tune drops from $7.44 on-demand to $1.89 on spot — a $5.55 savings per run.

Add tenancy=spot to your runs-on::

runs-on: [machine, gpu=l4, tenancy=spot]

Spot interruption rates per runner type are visible on machine.dev/runners.

2. Make spot interruptions safe with checkpointing

Spot instances can be reclaimed by AWS at any time. Checkpoint your work to a durable store so you can resume from where you left off on a fresh instance.

The cleanest pattern uses the Hugging Face Hub as the checkpoint store:

import torch
from huggingface_hub import HfApi, hf_hub_download

def save_checkpoint(model, optimizer, epoch, step, repo_id):
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'step': step,
    }
    torch.save(checkpoint, 'checkpoint.pt')

    HfApi().upload_file(
        path_or_fileobj='checkpoint.pt',
        path_in_repo='checkpoint.pt',
        repo_id=repo_id,
        repo_type='model',
    )

Then on job start, try to resume:

def load_checkpoint(model, optimizer, repo_id):
    try:
        hf_hub_download(repo_id=repo_id, filename='checkpoint.pt', local_dir='.')
        ckpt = torch.load('checkpoint.pt')
        model.load_state_dict(ckpt['model_state_dict'])
        optimizer.load_state_dict(ckpt['optimizer_state_dict'])
        return ckpt['epoch'], ckpt['step']
    except Exception:
        return 0, 0

Combine this with a workflow that retries itself on spot interruption — see LLM Supervised Fine-Tuning and GRPO Fine-Tuning for full working examples.

3. Right-size the GPU

Don’t pay for L40S when L4 will do the job. Quick mapping:

WorkloadRecommended GPU$/min (spot)
Inference, ≤7B modelT4G$0.004
Inference, 7–13BL4$0.006
Inference, 30B+L40S$0.016
QLoRA fine-tune ≤13BT4G$0.004
QLoRA fine-tune 30BL4$0.006
QLoRA fine-tune 70BL40S$0.016
LoRA fine-tune 7BL4 or A10G$0.006–$0.011
LoRA fine-tune 13BL40S$0.016
Real-time CVA10G$0.011–$0.019
Inference at scaleInferentia2$0.003

See CPU vs GPU for the full decision tree.

Use metrics to right-size

After a job completes, check the GPU utilization metrics on the machine.dev dashboard. If your GPU utilization is consistently low (e.g., under 50%), you’re likely paying for more GPU than you need — drop to a cheaper tier. For example, if your L4 job only hits 30% GPU utilization, a T4 at $0.004/min may handle the same workload.

4. Right-size CPU and RAM

Each GPU is offered in 3 vCPU/RAM configurations. The 4-vCPU config is cheapest. Only step up if your data preprocessing is the bottleneck.

runs-on: [machine, gpu=l4]              # 4 vCPU, 16 GB RAM (default, cheapest)
runs-on: [machine, gpu=l4, cpu=16]      # 16 vCPU, 64 GB RAM (more $)

The same applies to CPU and RAM — check the metrics after a run. If CPU utilization is low, you may be able to use the smaller vCPU config for that GPU and save money.

5. Right-size storage

Default storage (100 GB / 6,000 IOPS / 250 MB/s) is included at no extra charge. Don’t request more unless you need it — IOPS above 6,000 and throughput above 250 MB/s incur prorated EBS charges.

runs-on:
  - machine
  - gpu=l4
  # Default: 100GB, 6000 IOPS, 250 MB/s — sufficient for most workloads

For data-heavy jobs that genuinely need more:

runs-on:
  - machine
  - gpu=l4
  - disk_size=500        # Only increase if you need the space
  - disk_iops=10000      # Only increase for I/O-bound workloads
  - disk_throughput=750  # Only increase for sequential read/write bound jobs

A 60-minute job at 500 GB / 10,000 IOPS / 750 MB/s costs about $0.11 in storage on top of compute. See Pricing for the full breakdown.

6. Open up regions

Spot prices vary by region. Specifying multiple regions lets machine.dev pick the cheapest available — no need to lock in:

runs-on:
  - machine
  - gpu=l4
  - tenancy=spot
  - regions=us-east-1,us-east-2,eu-south-2

For most CPU and L4 workloads, spot prices in eu-south-2 are lower than US regions. For on-demand, us-east-1 is usually cheapest.

7. Cache aggressively

Cache pip, npm, Hugging Face downloads, and Docker layers to skip re-downloading on every run:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      ~/.cache/huggingface
    key: deps-${{ hashFiles('requirements.txt') }}

8. Use workflow filters

Don’t trigger expensive GPU jobs on every commit. Filter by file path:

on:
  push:
    paths:
      - 'model/**'
      - 'data/**'
      - 'requirements.txt'

Or gate behind a CPU-only check job:

jobs:
  changed:
    runs-on: ubuntu-latest
    outputs:
      train: ${{ steps.check.outputs.train }}
    steps:
      - id: check
        run: echo "train=true" >> $GITHUB_OUTPUT

  train:
    needs: changed
    if: ${{ needs.changed.outputs.train == 'true' }}
    runs-on: [machine, gpu=l4, tenancy=spot]

9. Set timeouts

Fail-safe against runaway costs:

jobs:
  train:
    runs-on: [machine, gpu=l40s, tenancy=spot]
    timeout-minutes: 120

Monitoring spend

The machine.dev dashboard shows per-job dollar cost, plus daily and monthly aggregates. Built-in metrics (CPU, memory, disk, network, GPU utilization) appear as sparkline charts on every job page so you can see if you’re under-utilizing the runner you’re paying for.

Use the dashboard to spot which workflows or which repos are eating your budget.

Next steps