GPU Runners — machine.dev docs

[INDEX] // ALL_DOCS ›

[TOC] // ON_THIS_PAGE ›

machine.dev offers 6 NVIDIA GPU types and 2 AWS AI accelerators for GitHub Actions. Each NVIDIA GPU is offered in multiple vCPU/RAM configurations so you can pair the right amount of CPU and memory with the GPU. All runners run Ubuntu 22.04 LTS with a configurable gp3 EBS root volume.

NVIDIA GPUs

GPU	VRAM	Architecture	Best for
T4G	16 GB	ARM64 (Graviton)	Cheapest GPU. Inference, small fine-tunes, computer vision.
T4	16 GB	X64	General-purpose ML, computer vision, X64-only workloads.
L4	24 GB	X64	Mid-range training and inference. Best $/VRAM ratio.
A10G	24 GB	X64	Larger model training, real-time inference, rendering.
L40S	48 GB	X64	Large-model fine-tunes. Fits 70B QLoRA single-GPU.
RTX 6000	96 GB	X64	Largest VRAM. Multi-GPU configurations up to 8× RTX 6000.

Live prices and current spot interruption rates are at machine.dev/runners. The tables below show the best rates across all regions.

Configurations and pricing

T4G / T4 / L4 / A10G / L40S are available in three vCPU/RAM tiers each. RTX 6000 is available in six sizes including multi-GPU configurations. Pick a higher-vCPU configuration if your workflow does heavy CPU-side preprocessing before handing off to the GPU.

T4G — 16 GB VRAM, ARM64

vCPU	RAM	VRAM	Spot $/min	OD $/min
4	8 GB	16 GB	$0.00351	$0.01400
8	16 GB	16 GB	$0.00283	$0.01853
16	32 GB	16 GB	$0.00315	$0.02760

T4 — 16 GB VRAM, X64

vCPU	RAM	VRAM	Spot $/min	OD $/min
4	16 GB	16 GB	$0.00449	$0.01753
8	32 GB	16 GB	$0.00660	$0.02507
16	64 GB	16 GB	$0.01039	$0.04013

L4 — 24 GB VRAM, X64

vCPU	RAM	VRAM	Spot $/min	OD $/min
4	16 GB	24 GB	$0.00575	$0.02683
8	32 GB	24 GB	$0.00417	$0.03259
16	64 GB	24 GB	$0.00465	$0.04411

A10G — 24 GB VRAM, X64

vCPU	RAM	VRAM	Spot $/min	OD $/min
4	16 GB	24 GB	$0.01526	$0.03353
8	32 GB	24 GB	$0.01083	$0.04040
16	64 GB	24 GB	$0.01922	$0.05413

L40S — 48 GB VRAM, X64

vCPU	RAM	VRAM	Spot $/min	OD $/min
4	32 GB	48 GB	$0.01572	$0.06203
8	64 GB	48 GB	$0.01718	$0.07474
16	128 GB	48 GB	$0.01610	$0.10014

The L40S has been used to fine-tune 70B-parameter models with QLoRA on a single GPU. See Crusoe’s writeup for a real-world example.

RTX 6000 — 96 GB VRAM, X64

vCPU	RAM	GPUs	VRAM	Spot $/min	OD $/min
8	64 GB	1	96 GB	$0.02467	$0.11210
16	128 GB	1	96 GB	$0.01965	$0.13327
32	256 GB	1	96 GB	$0.02099	$0.17561
48	512 GB	2	96 GB	$0.03370	$0.27620
96	1,024 GB	4	96 GB	$0.08154	$0.55241
192	2,048 GB	8	96 GB	$0.17911	$1.10481

RTX 6000 (NVIDIA RTX PRO 6000) gives you 96 GB of VRAM per GPU and scales up to 8 GPUs in a single runner — the largest VRAM tier in the public catalog. Well-suited for multi-GPU fine-tuning and large-context inference.

AWS AI accelerators

Accelerator	Memory	Architecture	Best for
Trainium	32 GB	X64	Distributed training with the AWS Neuron SDK.
Inferentia2	32 GB	X64	High-throughput inference. Cheapest GPU-class option.

Type	vCPU	RAM	Spot $/min	OD $/min
Inferentia2	4	16 GB	$0.00253	$0.02527
Inferentia2	32	128 GB	$0.00911	$0.06560
Trainium	8	32 GB	$0.00573	$0.04479

AI accelerator runners include the AWS Neuron SDK pre-installed.

AI accelerators are currently available in us-east-1, us-east-2, and us-west-2 only. See Regions for the GPU-by-region matrix.

Storage

Every runner gets a 100 GB gp3 EBS root volume by default with 6,000 IOPS and 250 MB/s throughput. You can scale up to 16 TB with custom IOPS and throughput using the disk_size, disk_iops, and disk_throughput labels:

runs-on:
  - machine
  - gpu=l4
  - disk_size=500          # 500 GB root volume
  - disk_iops=10000        # 10,000 IOPS
  - disk_throughput=750    # 750 MB/s throughput

Label	Default	Range
`disk_size=<GB>`	100	1 – 16,384
`disk_iops=<IOPS>`	6,000	6,000 – 16,000
`disk_throughput=<MB/s>`	250	250 – 1,000

Defaults are included at no additional charge. Increasing IOPS above 6,000 or throughput above 250 MB/s incurs prorated EBS charges. See Pricing for the EBS rate breakdown.

Instance metrics

machine.dev collects metrics by default for every job and renders them as sparkline charts on the dashboard. Collected metrics:

GPU: utilization, memory usage, temperature, power draw
System: CPU utilization, memory usage, disk I/O, network bytes in/out

Control collection per job with the metrics and metrics_interval labels:

runs-on:
  - machine
  - gpu=a10g
  - metrics=true           # Enable (default)
  - metrics_interval=10    # Collect every 10 seconds (default: 60)

To disable metrics for a job, set metrics=false.

Pre-installed software

GPU runners come with the NVIDIA driver, CUDA, cuDNN, the NVIDIA Container Toolkit, Docker, and Python pre-installed. AI accelerator runners also include the AWS Neuron SDK.

Component	Version
NVIDIA Driver	580.126.20 (data_center)
CUDA Toolkit	13.0.0
cuDNN	9.20.0.48

Versions are updated with each runner image build.

You can install any other version, build CUDA from source, or use your own Docker image — runners give you full root access.

Use it in a workflow

Minimal:

jobs:
  train:
    runs-on: [machine, gpu=l4]   # 4 vCPU, 16 GB RAM, 24 GB VRAM by default

Scale up the CPU/RAM:

runs-on: [machine, gpu=l4, cpu=16, ram=64]

Use spot pricing for cost:

runs-on: [machine, gpu=l4, tenancy=spot]

Pin a region:

runs-on: [machine, gpu=l4, regions=eu-south-2]

See Configuration options for the complete label reference.

Next steps

CPU vs GPU decision guide — pick the right runner for your workload
Cost Optimization — spot, checkpointing, right-sizing
Pricing — full per-minute rates for every runner
Workflow Setup — patterns for real workflows