Parallel Hyperparameter Tuning

[INDEX] // ALL_DOCS ›

[TOC] // ON_THIS_PAGE ›

At a glance

Workload: Train ResNet on CIFAR-10 across a 2×2 hyperparameter grid (learning rates × batch sizes) using GitHub Actions matrix strategy. Runner: gpu=t4, cpu=4, ram=16 × 4 parallel jobs ($0.004/min each). Estimated cost: ~$0.55 per full sweep (~30 min wall-clock thanks to parallelism).

This page shows how to use a GitHub Actions matrix to fan out hyperparameter combinations across parallel machine.dev GPU runners, then aggregate the results in a single comparison job.

When to fan out

Reasons you might run a parallel sweep:

Find the right model configuration faster by testing combinations side by side
Cut total wall-clock time on a hyperparameter search
Compare model performance across configurations in one report
Pick the best-performing run automatically rather than by eye

How it works

The workflow uses GitHub Actions’ matrix strategy to run training jobs concurrently. Each job trains a ResNet model on CIFAR-10 with a different combination of hyperparameters. You trigger it on demand.

The pipeline:

Defines a matrix of hyperparameter combinations
Launches a GPU job per combination, all running concurrently
Saves per-run metrics as artifacts
Aggregates results in a final comparison job
Outputs a comparison CSV

Workflow

name: ResNet Hyperparameter Tuning

on:
  workflow_dispatch:

jobs:
  hyperparameter_tuning:
    name: Hyperparameter Tuning
    # id makes each matrix leg pin its own runner — no job-stealing between sweeps
    runs-on: machine/id=${{ github.run_id }}-lr${{ matrix.learning_rate }}-bs${{ matrix.batch_size }}/gpu=t4/cpu=4/ram=16/architecture=x64
    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
        learning_rate: [0.001, 0.0005]
        batch_size: [32, 64]
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Train and Evaluate ResNet
        env:
          LEARNING_RATE: ${{ matrix.learning_rate }}
          BATCH_SIZE: ${{ matrix.batch_size }}
        run: |
          source .venv/bin/activate
          python train.py
          deactivate

      - name: Upload metrics artifact
        uses: actions/upload-artifact@v4
        with:
          name: metrics-${{ matrix.learning_rate }}-${{ matrix.batch_size }}
          path: metrics_*.json

  compare_tuning:
    needs: hyperparameter_tuning
    name: Compare Tuning Performance
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Download all metrics
        uses: actions/download-artifact@v4
        with:
          path: metrics

      - name: Compare Metrics
        run: |
          source .venv/bin/activate
          python compare_metrics.py
          deactivate

      - name: Upload comparison results
        uses: actions/upload-artifact@v4
        with:
          name: comparison-results
          path: model_comparison.csv

What this gives you

A few useful properties of the matrix-on-machine.dev pattern:

Matrix strategy. The workflow declares a matrix of hyperparameters and Actions auto-creates one job per combination. Two learning rates × two batch sizes = four concurrent training jobs.
Real parallelism. Each job lands on its own machine.dev GPU runner. Wall-clock time is the slowest combination, not the sum of all combinations.
Per-run metrics. Each job writes a metrics file as a uniquely-named artifact.
Automatic comparison. A final job downloads everything and produces a single comparison report.

Runner config

The default runner used here:

T4 GPU: 16 GB VRAM, fits ResNet on CIFAR-10 comfortably
Configurable CPU, RAM, and architecture per job

Because each combination has its own runner, the search finishes in roughly one combination’s worth of time, even with many combinations.

Tips

Pick hyperparameters that move the needle. Don’t sweep things that won’t change the result.
Start broad, then narrow around promising values
Adjust CPU/RAM per matrix entry if some combinations need more
Set the workflow timeout long enough to cover the slowest combination
Use fail-fast: false so one bad combination doesn’t kill the whole sweep

Getting started

Use MachineDotDev/parallel-hyperparameter-tuning as a template
Open the Actions tab in your repository
Pick the “ResNet Hyperparameter Tuning” workflow
Click “Run workflow”
Wait for all jobs to finish
Download the comparison-results artifact to see which combination won

Adapting it to your model

Update the matrix to your hyperparameters
Replace train.py with your training code
Capture the metrics that actually matter for your task
Update compare_metrics.py to highlight what you care about

How to adapt this

More hyperparameters: add optimizer, weight_decay, dropout, etc. to the matrix. Every combination spawns its own runner.
Larger sweep: a 5×5×5 = 125 combination matrix is fine. Every job runs on its own runner concurrently.
Larger model: bump to gpu=l4 or gpu=a10g if 16 GB VRAM is too tight
Use spot: add tenancy=spot to cut costs by 70-90% (the example uses on-demand by default)

Next steps

Working repo: fork or use as a template
Cost Optimization: spot pricing for sweep economics
LLM Supervised Fine-Tuning: same matrix technique applied to LLM fine-tunes