SKIP_TO_MAIN_CONTENT
M machine.dev
SIGN_UP
04.0 // Documentation v1.3.1 Last updated 2026-04-26

Parallel Hyperparameter Tuning

Run parallel hyperparameter tuning on machine.dev GPU runners. Use GitHub Actions matrix strategy to test multiple configurations simultaneously.

At a glance

Workload: Train ResNet on CIFAR-10 across a 2×2 hyperparameter grid (learning rates × batch sizes) using GitHub Actions matrix strategy Runner: gpu=t4, cpu=4, ram=16 × 4 parallel jobs ($0.004/min each) Estimated cost: ~$0.55 per full sweep (~30 min wall-clock thanks to parallelism)

This page shows how to use a GitHub Actions matrix to fan out hyperparameter combinations across parallel machine.dev GPU runners, then aggregate the results in a single comparison job.

Use Case Overview

Why might you want to use parallel hyperparameter tuning?

  • Find optimal model configurations more efficiently by testing multiple parameter sets simultaneously
  • Reduce the total time needed for hyperparameter search
  • Systematically compare model performance across different configurations
  • Automate the process of identifying the best-performing models

How It Works

The Parallel Hyperparameter Tuning workflow uses GitHub Actions’ matrix strategy to run multiple training jobs concurrently. Each job trains a ResNet model on the CIFAR-10 dataset with a different combination of hyperparameters. The workflow is defined in GitHub Actions and can be triggered on-demand.

The tuning process:

  1. Defines a matrix of hyperparameter combinations to explore
  2. Launches multiple GPU-powered training jobs concurrently, one for each combination
  3. Saves performance metrics from each training run as artifacts
  4. Aggregates and compares results across all runs
  5. Generates a comprehensive comparison report

Workflow Implementation

The Parallel Hyperparameter Tuning is implemented as a GitHub Actions workflow that runs multiple jobs in parallel. Here’s the workflow definition:

name: ResNet Hyperparameter Tuning

on:
  workflow_dispatch:

jobs:
  hyperparameter_tuning:
    name: Hyperparameter Tuning
    runs-on:
      - machine
      - gpu=t4
      - cpu=4
      - ram=16
      - architecture=x64
    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
        learning_rate: [0.001, 0.0005]
        batch_size: [32, 64]
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Train and Evaluate ResNet
        env:
          LEARNING_RATE: ${{ matrix.learning_rate }}
          BATCH_SIZE: ${{ matrix.batch_size }}
        run: |
          source .venv/bin/activate
          python train.py
          deactivate

      - name: Upload metrics artifact
        uses: actions/upload-artifact@v4
        with:
          name: metrics-${{ matrix.learning_rate }}-${{ matrix.batch_size }}
          path: metrics_*.json

  compare_tuning:
    needs: hyperparameter_tuning
    name: Compare Tuning Performance
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Download all metrics
        uses: actions/download-artifact@v4
        with:
          path: metrics

      - name: Compare Metrics
        run: |
          source .venv/bin/activate
          python compare_metrics.py
          deactivate

      - name: Upload comparison results
        uses: actions/upload-artifact@v4
        with:
          name: comparison-results
          path: model_comparison.csv

Key Features

The power of this implementation comes from several key features:

  1. Matrix Strategy: The workflow defines a matrix of hyperparameters, automatically creating separate jobs for each combination. In this example, we’re exploring two learning rates (0.001, 0.0005) and two batch sizes (32, 64), resulting in 4 concurrent training jobs.

  2. Parallel Execution: Each hyperparameter combination runs as a separate job on its own GPU runner, allowing multiple experiments to run simultaneously rather than sequentially.

  3. Metrics Collection: Each training job produces performance metrics that are saved as artifacts with names that indicate the hyperparameter values used.

  4. Automated Comparison: After all training jobs complete, a separate job downloads all metrics and generates a comparison report, making it easy to identify the best configuration.

Using machine.dev GPU Runners

This hyperparameter tuning process leverages machine.dev GPU runners to provide the necessary computing power for efficient model training. The workflow is configured to use:

  • T4 GPU: An entry-level ML GPU with 16GB VRAM, well-suited for training moderate-sized models
  • Configurable resources: CPU, RAM, and architecture specifications optimized for each training job

The parallel nature of this approach means that you can complete a hyperparameter search in a fraction of the time it would take to run sequentially, even when using the same hardware resources per job.

Best Practices

  • Choose parameters wisely: Select hyperparameters that have the most impact on model performance
  • Start with a broad search: Begin with a wide range of values, then refine with narrower ranges around promising values
  • Consider resource allocation: Adjust CPU/RAM requirements based on your specific model and dataset needs
  • Set appropriate timeouts: Ensure your workflow timeout is sufficient for all jobs to complete
  • Use fail-fast: false: This ensures all combinations are evaluated even if some fail, giving you a complete picture

Getting Started

To run the Parallel Hyperparameter Tuning workflow:

  1. Use the MachineDotDev/parallel-hyperparameter-tuning repository as a template
  2. Navigate to the Actions tab in your repository
  3. Select the “ResNet Hyperparameter Tuning” workflow
  4. Click “Run workflow” to start the tuning process
  5. Wait for all jobs to complete
  6. Download the comparison-results artifact to identify the best hyperparameter configuration

Customizing the Workflow

You can easily adapt this workflow for your own models and hyperparameters:

  1. Modify the matrix in the workflow file to include your specific hyperparameters
  2. Update the training script (train.py) to work with your model and dataset
  3. Adjust the metrics collection to capture the performance indicators most relevant to your task
  4. Customize the comparison script (compare_metrics.py) to generate insights tailored to your needs

How to adapt this

  • More hyperparameters: add optimizer, weight_decay, dropout, etc. to the matrix — every combination spawns its own runner
  • Larger sweep: matrix with 5×5×5 = 125 combinations is fine; each runs on its own runner concurrently
  • Larger model: bump to gpu=l4 or gpu=a10g if T4’s 16 GB VRAM is too tight
  • Use spot: add tenancy=spot to cut costs by 70–90% (the example currently uses on-demand)

Next steps