SKIP_TO_MAIN_CONTENT
machine.dev
SIGN_UP
04.0 // Documentation v1.8.4 Last updated 2026-06-05

LLM Supervised Fine-Tuning

Fine-tune LLMs like Llama with machine.dev GPU runners. Implement LoRA training with automatic checkpointing and spot instance retry mechanisms.

At a glance

Workload: Fine-tune Llama 3.2 3B Instruct on the FineTome-100k dataset using LoRA via Unsloth. Runner: gpu=t4, cpu=4, ram=16, tenancy=spot ($0.004/min). Estimated cost: ~$0.15 per training run (~30 min).

This page shows how to fine-tune a language model on machine.dev GPU runners with checkpointing and spot-instance retries.

When to fine-tune

Reasons you might fine-tune an LLM:

  • Adapt a base model to a specific domain or task
  • Improve performance on the conversational style your product needs
  • Pull a model closer to your brand voice or formatting preferences
  • Reduce hallucinations on factual questions specific to your domain

How it works

The pipeline uses Unsloth for fast LoRA fine-tuning. It runs as a GitHub Actions workflow you trigger on demand with input parameters.

The job:

  1. Loads a base model (e.g. Llama 3.2 3B Instruct)
  2. Prepares a conversational dataset (FineTome-100k, OpenAssistant oasst1, or your own)
  3. Applies LoRA for memory-efficient training
  4. Saves checkpoints during training (in the retry-enabled workflow)
  5. Pushes the fine-tuned model to Hugging Face Hub

Workflow

The basic version:

name: Supervised Fine-Tuning

on:
  workflow_dispatch:
    inputs:
      source_model:
        type: string
        required: false
        description: 'The base model to fine-tune'
        default: 'unsloth/Llama-3.2-3B-Instruct'
      data_set:
        type: string
        required: false
        description: 'Which dataset to use for fine-tuning'
        default: 'finetome-100k'
      max_seq_length:
        type: string
        required: false
        description: 'The maximum sequence length'
        default: '4096'
      lora_rank:
        type: string
        required: false
        description: 'The lora rank'
        default: '64'
      max_steps:
        type: string
        required: false
        description: 'The maximum number of steps'
        default: '250'
      gpu_memory_utilization:
        type: string
        required: false
        description: 'The GPU memory utilization'
        default: '0.90'
      learning_rate:
        type: string
        required: false
        description: 'The learning rate'
        default: '2e-5'
      per_device_train_batch_size:
        type: string
        required: false
        description: 'The per device training batch size'
        default: '2'
      hf_repo:
        type: string
        required: true
        description: 'The Hugging Face repository to upload the model to'

jobs:
  train:
    name: Supervised LoRA Training (unsloth)
    runs-on: machine/gpu=t4/cpu=4/ram=16/architecture=x64
    timeout-minutes: 180
    env:
      SOURCE_MODEL: ${{ inputs.source_model }}
      MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }}
      LORA_RANK: ${{ inputs.lora_rank }}
      DATA_SET: ${{ inputs.data_set }}
      GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }}
      MAX_STEPS: ${{ inputs.max_steps }}
      LEARNING_RATE: ${{ inputs.learning_rate }}
      PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }}
      HF_TOKEN: ${{ secrets.HF_TOKEN }}
      HF_HUB_ENABLE_HF_TRANSFER: 1
      HF_REPO: ${{ inputs.hf_repo }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Run Training
        run: |
          python3 train.py

Retry on spot interruption

For longer runs, the repository ships a workflow with automatic checkpointing and retry:

name: Supervised Fine-Tuning with Retry

on:
  workflow_dispatch:
    inputs:
      attempt:
        type: string
        description: 'The attempt number'
        default: '1'
      max_attempts:
        type: number
        description: 'The maximum number of attempts'
        default: 5
      # Same parameters as in the basic workflow
      # ...

This avoids losing training progress when AWS reclaims a spot instance. The pattern:

  1. Save checkpoints to Hugging Face Hub during training
  2. Detect spot interruptions with a custom GitHub Action
  3. Restart the workflow with an incremented attempt number
  4. Resume from the latest checkpoint

The retry walkthrough:

  1. The workflow starts a training job with a given attempt number (default: 1)
  2. Checkpoints get pushed to Hugging Face Hub on a schedule
  3. If the job completes, the workflow ends
  4. If the job fails on a spot interruption:
    • The check-runner-interruption action confirms a preemption was the cause
    • The workflow calculates the next attempt number
    • Within max_attempts, it triggers a new run with the incremented attempt
    • Original parameters carry over to the new attempt
  5. The new attempt downloads the latest checkpoint and resumes from there

Even if a spot instance is reclaimed, training picks up where it left off on the next instance.

Runner config

The default runner used here:

  • T4 GPU: 16 GB VRAM, fast enough for Unsloth-optimised training of 3B models
  • Spot tenancy: cheap, paired with the retry pattern above
  • Configurable CPU, RAM, and architecture

For 7B and larger models, step up the GPU:

runs-on: machine/gpu=l4/cpu=4/ram=16/architecture=x64

Getting started

  1. Use MachineDotDev/llm-supervised-fine-tuning as a template
  2. Create a Hugging Face access token with write permissions
  3. Add it as a repository secret named HF_TOKEN
  4. Open the Actions tab in your repository
  5. Pick the “Supervised Fine-Tuning with Retry” workflow
  6. Click “Run workflow” and set parameters:
    • Base model and dataset
    • Sequence length, LoRA rank, training steps
    • GPU memory utilisation and learning rate
    • Hugging Face target repository
  7. Run it and wait
  8. The fine-tuned model lands on Hugging Face Hub

Tips

  • Pick a dataset that matches the domain you actually care about
  • Lower batch size if you hit OOM
  • Use the retry-enabled workflow for any run longer than ~20 minutes
  • Watch the workflow logs to track loss
  • Evaluate on prompts that look like your real workload, not just the eval set

How to adapt this

  • Larger model (7B+): swap gpu=t4 for gpu=l4 (24 GB VRAM, $0.006/min spot) or gpu=l40s (48 GB, $0.016/min spot). See CPU vs GPU.
  • Different base model: change the source_model input
  • Different dataset: change the data_set input or update the data loading code
  • Run nightly: add on: schedule: cron: '0 2 * * *' to the workflow

Next steps