SKIP_TO_MAIN_CONTENT
M machine.dev
SIGN_UP
04.0 // Documentation v1.3.1 Last updated 2026-04-26

Frequently Asked Questions

Common questions about machine.dev GPU and CPU runners for GitHub Actions — billing, configuration, technical limits, and troubleshooting.

[INDEX] // ALL_DOCS
[TOC] // ON_THIS_PAGE

This page covers the most common questions about machine.dev. If you don’t find your answer, email support@machine.dev.

General

What is machine.dev?

machine.dev provides on-demand GPU and CPU runners for GitHub Actions. Add runs-on: [machine, gpu=l4] (or cpu=16) to a workflow and your job runs on a dedicated AWS instance billed by the minute in US dollars. No infrastructure to manage, no separate accounts.

How is machine.dev different from other GPU cloud providers?

Most GPU clouds make you set up infrastructure, manage billing in their portal, and SSH into instances. machine.dev is just a label you put in runs-on: — your existing GitHub Actions workflows pick up machine.dev runners automatically. There’s no separate dashboard you have to operate, no Kubernetes, no Terraform.

What workloads is machine.dev good for?

  • ML model training and fine-tuning
  • Large-scale inference and batch processing
  • CPU- and GPU-intensive builds and tests
  • Computer vision pipelines
  • Hyperparameter sweeps via matrix strategy
  • Parallel test execution
  • Data processing and ETL

Can I use machine.dev with private repositories?

Yes, both public and private. See Enable self-hosted runners for the org configuration steps.

Getting started

How do I sign up?

  1. Visit app.machine.dev/signup and sign in with GitHub
  2. Install the Machine Provisioner GitHub App on your account or organization
  3. Pick the repos you want machine.dev to run jobs for
  4. Add runs-on: [machine, gpu=t4g] to a workflow and push

New accounts receive $10 of free compute (~58 hours of T4G GPU time at spot rates).

Do I need to install anything?

No. machine.dev is a cloud service. The only thing you add to your repo is a workflow file with runs-on: [machine, ...].

How quickly are GPU runners provisioned?

Most runners are ready within about 1 minute. Exact time depends on current AWS capacity and the specific GPU type.

Technical

What runners does machine.dev offer?

CPU runners: 2, 4, 8, 16, 32, 48, or 64 vCPUs. RAM scales with vCPU count from 4 GB to 128 GB. Both X64 (Intel/AMD) and ARM64 (Graviton) architectures.

GPU runners (NVIDIA):

  • T4G (16 GB VRAM, ARM64) — entry-level, cheapest GPU
  • T4 (16 GB VRAM, X64) — general-purpose ML
  • L4 (24 GB VRAM, X64) — best $/VRAM ratio
  • A10G (24 GB VRAM, X64) — larger model training, real-time inference
  • L40S (48 GB VRAM, X64) — large-model fine-tunes, fits 70B QLoRA
  • RTX 6000 (96 GB VRAM, X64) — biggest VRAM, multi-GPU configurations up to 8× RTX 6000

AWS AI accelerators:

  • Trainium (32 GB) — distributed training with the AWS Neuron SDK
  • Inferentia2 (32 GB) — high-throughput inference

See GPU Runners and CPU Runners for full configuration matrices.

Can I use a specific CUDA or driver version?

Yes. Runners give you full root access — install any CUDA or driver version, build from source, or use your own Docker image.

Are there environment variables I can read?

machine.dev doesn’t inject environment variables. Your job’s environment is whatever the runner image provides. The labels in runs-on: are how machine.dev knows what to launch — they’re your source of truth. Inside the job, run nvidia-smi, nproc, uname -m etc. to inspect what you got.

Are there limitations on network access?

machine.dev runners have full outbound internet access for pulling datasets, models, and dependencies. Common outbound ports (80, 443, etc.) are open.

Can I install custom software on the runners?

Yes — full root access, install anything.

How long can my jobs run?

GitHub limits self-hosted runner jobs to 5 days of runtime, and workflow runs (including queue time) to 35 days. Set a tighter timeout-minutes in your job to avoid runaway costs.

For long-running training, implement checkpointing and resume logic — see Cost Optimization.

Can I run multiple GPUs in a single job?

Single GPU per runner today. For multi-GPU training, contact hello@machine.dev to discuss.

What operating system do runners use?

All runners are Ubuntu 22.04 LTS with a 100 GB gp3 EBS root volume by default (configurable up to 16 TB). GPU runners include the NVIDIA driver, CUDA, cuDNN, NVIDIA Container Toolkit, Docker, and Python pre-installed. AI accelerator runners include the AWS Neuron SDK.

Can I configure storage for my runners?

Yes. By default, every runner gets a 100 GB gp3 EBS volume with 6,000 IOPS and 250 MB/s throughput. Customize using runner labels:

runs-on:
  - machine
  - gpu=a10g
  - disk_size=500        # 500 GB volume
  - disk_iops=10000      # 10,000 IOPS
  - disk_throughput=750  # 750 MB/s throughput

Storage is billed separately from compute, prorated to the job’s runtime. The defaults are included at no extra charge — you only pay for IOPS above 6,000 and throughput above 250 MB/s. See Pricing for the EBS rate breakdown.

What metrics are collected for my jobs?

machine.dev collects metrics by default for every job: CPU utilization, memory usage, disk I/O, network bytes in/out, and (on GPU runners) GPU utilization, memory, temperature, and power draw. After a job completes, these appear as sparkline charts on the machine.dev dashboard.

You can control metrics collection per job:

runs-on:
  - machine
  - gpu=l4
  - metrics=true          # Enable (default) or disable with false
  - metrics_interval=10   # Collection interval in seconds (default: 60)

Pricing and billing

How is machine.dev billed?

Per minute, in US dollars, for compute. Storage (EBS) is billed separately, prorated to runtime. You only pay for the time your job is actually running — provisioning and teardown are free. Spot rates are 70–90% cheaper than on-demand. See Pricing.

What happened to credits?

machine.dev now bills in US dollars instead of credits. If you had a credit balance before the cutover, it was automatically converted to a dollar balance at the same rate ($0.005 per credit). Your existing balance, monthly subscription, and billing all carry over — nothing for you to do.

What plans are available?

  • Pay-as-you-go ($0/mo) — no commitment
  • Developer ($50/mo, $55 of compute) — 10% bonus
  • Growth ($85/mo, $100 of compute) — ~18% bonus
  • Pro ($160/mo, $200 of compute) — 25% bonus
  • Enterprise — volume discounts, contact hello@machine.dev

All compute is billed per-minute regardless of plan. Plans give you a bonus dollar balance to draw down.

Is there a free tier?

Yes — $10 of free compute on signup (~58 hours of T4G GPU at spot rates).

Are there volume discounts?

Yes — contact hello@machine.dev for enterprise pricing.

Can I see my spend in the dashboard?

Yes. The machine.dev dashboard shows per-job cost in dollars by default, plus daily and monthly totals. There’s a toggle in Settings if you prefer the credit-style view.

Do I pay for provisioning time?

No. Only runtime. Provisioning and teardown are free.

What happens if a spot instance is interrupted?

You’re billed up to the moment of interruption, then billing stops. Implement checkpointing to handle interruptions gracefully — see Cost Optimization.

Are storage charges separate from compute?

Yes. Compute is billed per minute based on the runner type. EBS storage is billed prorated based on volume size, IOPS, and throughput. Defaults (100 GB / 6,000 IOPS / 250 MB/s) cost about $0.006 for a 30-minute job. See Pricing for the EBS rate breakdown.

Workflow configuration

How do I specify a GPU type?

jobs:
  train:
    runs-on: [machine, gpu=l4]
    steps:
      - uses: actions/checkout@v4
      - run: python train.py

GPU labels: t4g, t4, l4, a10g, l40s, trainium, inferentia2.

Can I mix GitHub-hosted and machine.dev runners?

Yes, in the same workflow:

jobs:
  lint:
    runs-on: ubuntu-latest      # GitHub-hosted, free for public repos
  train:
    needs: lint
    runs-on: [machine, gpu=a10g]

How do I check if my job is using the GPU efficiently?

Run nvidia-smi in your workflow:

steps:
  - name: GPU info
    run: nvidia-smi -l 5 -i 0

Or rely on the built-in metrics — sparkline charts appear on the dashboard after every job.

Can I specify regions?

Yes:

runs-on:
  - machine
  - gpu=l4
  - regions=us-east-1,us-east-2

See Regions for the available list.

Troubleshooting

My job is queued and never picks up

Self-hosted runners are blocked by default at the org level. See Enable self-hosted runners for the one-time configuration.

”No runner matching the specified labels”

  • Typo in gpu= value (it’s lowercase: gpu=l4 not gpu=L4)
  • The GPU type isn’t available in your specified region — see Regions
  • Self-hosted runners aren’t enabled for the repo

Out-of-memory errors

Options:

  1. Reduce batch size
  2. Enable gradient checkpointing
  3. Use mixed precision (FP16/BF16)
  4. Move to a GPU with more VRAM (e.g., L4 → L40S)

Spot instance terminated mid-job

Normal — spot instances can be preempted. Implement checkpointing in your code so you can resume from the last checkpoint on a fresh instance. See Cost Optimization for the pattern.

Slow data transfer

  • Use Hugging Face Hub for models, checkpoints, and datasets
  • Cache frequently-used artifacts with actions/cache@v4
  • Use efficient formats (parquet, not CSV)
  • Compress large uploads
  • Increase disk_throughput for I/O-heavy workloads

Where to learn more