Workflow Setup — machine.dev docs

[INDEX] // ALL_DOCS ›

[TOC] // ON_THIS_PAGE ›

This guide walks through setting up a machine.dev runner in your GitHub Actions workflow. For the canonical label reference see Configuration options.

Prerequisites

A machine.dev account
A GitHub repository with Actions enabled
Machine Provisioner installed on your account or organization
Self-hosted runners enabled for your org

Step 1: Pick a runner

You need…	Use	Spot $/min
A GPU (CUDA, torch, nvidia-smi)	`gpu=t4g` (cheapest) or `gpu=l4` (more VRAM)	$0.004 / $0.006
Lots of CPU cores	`cpu=16` (X64) or `cpu=16, architecture=arm64` (cheaper)	$0.003 / $0.002
Maximum VRAM	`gpu=l40s` (48 GB)	$0.016
AWS Neuron workloads	`gpu=trainium`	$0.006

See CPU vs GPU for a fuller decision matrix.

Step 2: Create a workflow file

Create .github/workflows/build.yml (or any other name):

name: Build
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: [machine, cpu=16, tenancy=spot]
    steps:
      - uses: actions/checkout@v4

      - name: Build
        run: make -j$(nproc)

      - name: Test
        run: ./test.sh

Step 3: Push and watch it run

Push the file. The job appears in the Actions tab and machine.dev provisions a runner within ~1 minute.

Common patterns

A GPU training job

jobs:
  train:
    runs-on: [machine, gpu=l4, tenancy=spot]
    timeout-minutes: 120
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Verify GPU
        run: nvidia-smi

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Train
        run: python train.py

Pin a region for data residency

runs-on: [machine, gpu=l4, regions=eu-south-2]

Combine a CPU build with a GPU train

jobs:
  build:
    runs-on: [machine, cpu=8, tenancy=spot]
    steps:
      - uses: actions/checkout@v4
      - run: make
      - uses: actions/upload-artifact@v4
        with:
          name: build
          path: dist/

  train:
    needs: build
    runs-on: [machine, gpu=a10g, tenancy=spot]
    steps:
      - uses: actions/download-artifact@v4
        with:
          name: build
      - run: python train.py

Matrix across GPU types

jobs:
  bench:
    strategy:
      fail-fast: false
      matrix:
        gpu: [t4g, t4, l4, a10g, l40s]
    runs-on:
      - machine
      - "gpu=${{ matrix.gpu }}"
      - tenancy=spot
    steps:
      - uses: actions/checkout@v4
      - run: ./bench.sh ${{ matrix.gpu }}

Custom storage for data-heavy jobs

jobs:
  big-train:
    runs-on:
      - machine
      - gpu=l40s
      - tenancy=spot
      - disk_size=500          # 500 GB root volume
      - disk_iops=10000        # Faster checkpoint writes
      - disk_throughput=750    # Sequential read throughput
    timeout-minutes: 240
    steps:
      - uses: actions/checkout@v4
      - run: python train.py

Cache dependencies

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      ~/.cache/huggingface
    key: deps-${{ hashFiles('requirements.txt') }}

Mix GitHub-hosted and machine.dev runners

Run cheap setup on GitHub-hosted, then switch to machine.dev for the heavy work:

jobs:
  lint:
    runs-on: ubuntu-latest          # GitHub-hosted, free for public repos
    steps:
      - uses: actions/checkout@v4
      - run: ./lint.sh

  train:
    needs: lint
    runs-on: [machine, gpu=a10g, tenancy=spot]

Troubleshooting

Symptom	Cause	Fix
Job stays queued	Self-hosted runners not enabled for the repo	Enable self-hosted runners
”No runner matching the specified labels”	Typo in `gpu=` value or unsupported region	Check Configuration and Regions
Spot interruption mid-job	Normal — spot can be reclaimed	Use `tenancy=on_demand` or add checkpointing
`nvidia-smi: command not found`	Used `cpu=` instead of `gpu=`	Switch to `gpu=t4g` or another GPU type
Out-of-memory errors	Batch size too large for VRAM	Reduce batch size, enable gradient checkpointing, or use a larger GPU
CUDA version mismatch	Library expects a different CUDA version	Install your needed version via pip

Next steps

Cost Optimization — strategies to lower your bill
CPU vs GPU — decision guide
Use Cases — real workflows you can fork
Configuration options — every label