LLM Supervised Fine-Tuning — machine.dev docs

[INDEX] // ALL_DOCS ›

[TOC] // ON_THIS_PAGE ›

At a glance

Workload: Fine-tune Llama 3.2 3B Instruct on the FineTome-100k dataset using LoRA via Unsloth Runner: gpu=t4, cpu=4, ram=16, tenancy=spot ($0.004/min) Estimated cost: ~$0.15 per training run (~30 min)

This page shows how to fine-tune a language model on machine.dev GPU runners with automatic checkpointing and spot-instance retries.

Use Case Overview

Why might you want to fine-tune language models?

Adapt pre-trained models to specific domains or tasks
Improve performance on domain-specific conversational scenarios
Create models that better align with your brand voice or style
Reduce hallucinations and improve factual accuracy in specific domains

How It Works

The LLM Supervised Fine-Tuning workflow uses Unsloth to accelerate the fine-tuning process. The workflow is defined in GitHub Actions workflow files and can be triggered on-demand with configurable parameters.

The fine-tuning process:

Loads a specified base model (e.g., Llama 3.2 3B Instruct)
Prepares a conversational dataset (e.g., FineTome-100k or OpenAssistant’s oasst1)
Applies Low-Rank Adaptation (LoRA) for memory-efficient training
Automatically saves checkpoints during training (in the retry-enabled workflow)
Pushes the fine-tuned model to Hugging Face Hub

Workflow Implementation

The LLM Supervised Fine-Tuning is implemented as GitHub Actions workflows that can be triggered manually. Here’s the basic workflow definition:

name: Supervised Fine-Tuning

on:
  workflow_dispatch:
    inputs:
      source_model:
        type: string
        required: false
        description: 'The base model to fine-tune'
        default: 'unsloth/Llama-3.2-3B-Instruct'
      data_set:
        type: string
        required: false
        description: 'Which dataset to use for fine-tuning'
        default: 'finetome-100k'
      max_seq_length:
        type: string
        required: false
        description: 'The maximum sequence length'
        default: '4096'
      lora_rank:
        type: string
        required: false
        description: 'The lora rank'
        default: '64'
      max_steps:
        type: string
        required: false
        description: 'The maximum number of steps'
        default: '250'
      gpu_memory_utilization:
        type: string
        required: false
        description: 'The GPU memory utilization'
        default: '0.90'
      learning_rate:
        type: string
        required: false
        description: 'The learning rate'
        default: '2e-5'
      per_device_train_batch_size:
        type: string
        required: false
        description: 'The per device training batch size'
        default: '2'
      hf_repo:
        type: string
        required: true
        description: 'The Hugging Face repository to upload the model to'

jobs:
  train:
    name: Supervised LoRA Training (unsloth)
    runs-on:
      - machine
      - gpu=t4
      - cpu=4
      - ram=16
      - architecture=x64
    timeout-minutes: 180
    env:
      SOURCE_MODEL: ${{ inputs.source_model }}
      MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }}
      LORA_RANK: ${{ inputs.lora_rank }}
      DATA_SET: ${{ inputs.data_set }}
      GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }}
      MAX_STEPS: ${{ inputs.max_steps }}
      LEARNING_RATE: ${{ inputs.learning_rate }}
      PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }}
      HF_TOKEN: ${{ secrets.HF_TOKEN }}
      HF_HUB_ENABLE_HF_TRANSFER: 1
      HF_REPO: ${{ inputs.hf_repo }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Run Training
        run: |
          python3 train.py

Advanced Retry Mechanism

For enhanced reliability, the repository also provides a workflow with automatic checkpointing and retry functionality:

name: Supervised Fine-Tuning with Retry

on:
  workflow_dispatch:
    inputs:
      attempt:
        type: string
        description: 'The attempt number'
        default: '1'
      max_attempts:
        type: number
        description: 'The maximum number of attempts'
        default: 5
      # Same parameters as in the basic workflow
      # ...

This implementation ensures training progress isn’t lost due to spot instance interruptions by:

Automatically saving checkpoints to Hugging Face Hub during training
Detecting spot instance interruptions using a custom GitHub Action
Restarting the workflow with an incremented attempt number
Resuming training from the latest checkpoint

The retry mechanism works through the following steps:

The workflow starts a training job with a specified attempt number (default: 1)
During training, checkpoints are periodically saved to Hugging Face Hub
If the job completes successfully, the workflow ends
If the job fails due to a spot instance interruption:
- The check-runner-interruption action detects that the failure was due to a spot instance preemption
- The workflow calculates the next attempt number
- If within the maximum attempts limit, it triggers a new workflow run with an incremented attempt number
- All original parameters are preserved for the new attempt
When a new attempt starts, it downloads the latest checkpoint and resumes training from that point

This mechanism ensures that even if a spot instance is reclaimed, your training progress isn’t lost, and the job can continue from the last checkpoint on a new instance.

Using machine.dev GPU Runners

This fine-tuning process leverages machine.dev GPU runners to provide the necessary computing power. The workflow is configured to use:

T4 GPU: An entry-level ML training GPU with 16GB of VRAM, suitable for efficient training with unsloth optimizations
Spot instance: To optimize for cost while maintaining performance
Configurable resources: CPU, RAM, and architecture specifications

For more demanding models or larger datasets, you can also configure the workflow to use more powerful GPUs:

runs-on:
  - machine
  - gpu=l4
  - cpu=4
  - ram=16
  - architecture=x64

Getting Started

To run the LLM Supervised Fine-Tuning workflow:

Use the MachineDotDev/llm-supervised-fine-tuning repository as a template
Set up a Hugging Face access token with write permissions
Add this token as a repository secret named HF_TOKEN in your GitHub repository settings
Navigate to the Actions tab in your repository
Select the “Supervised Fine-Tuning with Retry” workflow
Click “Run workflow” and configure your parameters:
- Choose your base model and dataset
- Adjust sequence length, LoRA rank, and training steps
- Configure GPU memory utilization and learning rate
- Specify your Hugging Face target repository
Run the workflow and wait for results
Access your fine-tuned model on Hugging Face Hub

Best Practices

Select appropriate datasets: Choose datasets that match your target application domain
Adjust batch size for your GPU: Lower batch sizes if you encounter out-of-memory errors
Use checkpointing for longer runs: For extensive training sessions, use the retry-enabled workflow
Monitor training progress: Check workflow logs to observe loss metrics
Test with prompts similar to your use case: Evaluate the model on examples that match your intended application

How to adapt this

Larger model (7B+): swap gpu=t4 for gpu=l4 (24 GB VRAM, $0.006/min spot) or gpu=l40s (48 GB, $0.016/min spot) — see CPU vs GPU
Different base model: change the source_model input
Different dataset: change the data_set input or modify the data loading code
Run nightly: add on: schedule: cron: '0 2 * * *' to the workflow

Next steps

Working repo — fork or use as a template
Cost Optimization — checkpointing pattern explained
CPU vs GPU — picking the right GPU for your model size