[INDEX] // ALL_DOCS ›
[TOC] // ON_THIS_PAGE ›
At a glance
Workload: Fine-tune Llama 3.2 3B Instruct on the FineTome-100k dataset using LoRA via Unsloth.
Runner: gpu=t4, cpu=4, ram=16, tenancy=spot ($0.004/min).
Estimated cost: ~$0.15 per training run (~30 min).
This page shows how to fine-tune a language model on machine.dev GPU runners with checkpointing and spot-instance retries.
When to fine-tune
Reasons you might fine-tune an LLM:
- Adapt a base model to a specific domain or task
- Improve performance on the conversational style your product needs
- Pull a model closer to your brand voice or formatting preferences
- Reduce hallucinations on factual questions specific to your domain
How it works
The pipeline uses Unsloth for fast LoRA fine-tuning. It runs as a GitHub Actions workflow you trigger on demand with input parameters.
The job:
- Loads a base model (e.g. Llama 3.2 3B Instruct)
- Prepares a conversational dataset (FineTome-100k, OpenAssistant oasst1, or your own)
- Applies LoRA for memory-efficient training
- Saves checkpoints during training (in the retry-enabled workflow)
- Pushes the fine-tuned model to Hugging Face Hub
Workflow
The basic version:
name: Supervised Fine-Tuning
on:
workflow_dispatch:
inputs:
source_model:
type: string
required: false
description: 'The base model to fine-tune'
default: 'unsloth/Llama-3.2-3B-Instruct'
data_set:
type: string
required: false
description: 'Which dataset to use for fine-tuning'
default: 'finetome-100k'
max_seq_length:
type: string
required: false
description: 'The maximum sequence length'
default: '4096'
lora_rank:
type: string
required: false
description: 'The lora rank'
default: '64'
max_steps:
type: string
required: false
description: 'The maximum number of steps'
default: '250'
gpu_memory_utilization:
type: string
required: false
description: 'The GPU memory utilization'
default: '0.90'
learning_rate:
type: string
required: false
description: 'The learning rate'
default: '2e-5'
per_device_train_batch_size:
type: string
required: false
description: 'The per device training batch size'
default: '2'
hf_repo:
type: string
required: true
description: 'The Hugging Face repository to upload the model to'
jobs:
train:
name: Supervised LoRA Training (unsloth)
runs-on: machine/gpu=t4/cpu=4/ram=16/architecture=x64
timeout-minutes: 180
env:
SOURCE_MODEL: ${{ inputs.source_model }}
MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }}
LORA_RANK: ${{ inputs.lora_rank }}
DATA_SET: ${{ inputs.data_set }}
GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }}
MAX_STEPS: ${{ inputs.max_steps }}
LEARNING_RATE: ${{ inputs.learning_rate }}
PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_ENABLE_HF_TRANSFER: 1
HF_REPO: ${{ inputs.hf_repo }}
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run Training
run: |
python3 train.py
Retry on spot interruption
For longer runs, the repository ships a workflow with automatic checkpointing and retry:
name: Supervised Fine-Tuning with Retry
on:
workflow_dispatch:
inputs:
attempt:
type: string
description: 'The attempt number'
default: '1'
max_attempts:
type: number
description: 'The maximum number of attempts'
default: 5
# Same parameters as in the basic workflow
# ...
This avoids losing training progress when AWS reclaims a spot instance. The pattern:
- Save checkpoints to Hugging Face Hub during training
- Detect spot interruptions with a custom GitHub Action
- Restart the workflow with an incremented attempt number
- Resume from the latest checkpoint
The retry walkthrough:
- The workflow starts a training job with a given attempt number (default: 1)
- Checkpoints get pushed to Hugging Face Hub on a schedule
- If the job completes, the workflow ends
- If the job fails on a spot interruption:
- The
check-runner-interruptionaction confirms a preemption was the cause - The workflow calculates the next attempt number
- Within
max_attempts, it triggers a new run with the incremented attempt - Original parameters carry over to the new attempt
- The
- The new attempt downloads the latest checkpoint and resumes from there
Even if a spot instance is reclaimed, training picks up where it left off on the next instance.
Runner config
The default runner used here:
- T4 GPU: 16 GB VRAM, fast enough for Unsloth-optimised training of 3B models
- Spot tenancy: cheap, paired with the retry pattern above
- Configurable CPU, RAM, and architecture
For 7B and larger models, step up the GPU:
runs-on: machine/gpu=l4/cpu=4/ram=16/architecture=x64
Getting started
- Use MachineDotDev/llm-supervised-fine-tuning as a template
- Create a Hugging Face access token with write permissions
- Add it as a repository secret named
HF_TOKEN - Open the Actions tab in your repository
- Pick the “Supervised Fine-Tuning with Retry” workflow
- Click “Run workflow” and set parameters:
- Base model and dataset
- Sequence length, LoRA rank, training steps
- GPU memory utilisation and learning rate
- Hugging Face target repository
- Run it and wait
- The fine-tuned model lands on Hugging Face Hub
Tips
- Pick a dataset that matches the domain you actually care about
- Lower batch size if you hit OOM
- Use the retry-enabled workflow for any run longer than ~20 minutes
- Watch the workflow logs to track loss
- Evaluate on prompts that look like your real workload, not just the eval set
How to adapt this
- Larger model (7B+): swap
gpu=t4forgpu=l4(24 GB VRAM, $0.006/min spot) orgpu=l40s(48 GB, $0.016/min spot). See CPU vs GPU. - Different base model: change the
source_modelinput - Different dataset: change the
data_setinput or update the data loading code - Run nightly: add
on: schedule: cron: '0 2 * * *'to the workflow
Next steps
- Working repo: fork or use as a template
- Cost Optimization: checkpointing pattern explained
- CPU vs GPU: picking the right GPU for your model size