[INDEX] // ALL_DOCS ›
[TOC] // ON_THIS_PAGE ›
At a glance
Workload: Fine-tune Llama 3.2 3B Instruct on the FineTome-100k dataset using LoRA via Unsloth
Runner: gpu=t4, cpu=4, ram=16, tenancy=spot ($0.004/min)
Estimated cost: ~$0.15 per training run (~30 min)
This page shows how to fine-tune a language model on machine.dev GPU runners with automatic checkpointing and spot-instance retries.
Use Case Overview
Why might you want to fine-tune language models?
- Adapt pre-trained models to specific domains or tasks
- Improve performance on domain-specific conversational scenarios
- Create models that better align with your brand voice or style
- Reduce hallucinations and improve factual accuracy in specific domains
How It Works
The LLM Supervised Fine-Tuning workflow uses Unsloth to accelerate the fine-tuning process. The workflow is defined in GitHub Actions workflow files and can be triggered on-demand with configurable parameters.
The fine-tuning process:
- Loads a specified base model (e.g., Llama 3.2 3B Instruct)
- Prepares a conversational dataset (e.g., FineTome-100k or OpenAssistant’s oasst1)
- Applies Low-Rank Adaptation (LoRA) for memory-efficient training
- Automatically saves checkpoints during training (in the retry-enabled workflow)
- Pushes the fine-tuned model to Hugging Face Hub
Workflow Implementation
The LLM Supervised Fine-Tuning is implemented as GitHub Actions workflows that can be triggered manually. Here’s the basic workflow definition:
name: Supervised Fine-Tuning
on:
workflow_dispatch:
inputs:
source_model:
type: string
required: false
description: 'The base model to fine-tune'
default: 'unsloth/Llama-3.2-3B-Instruct'
data_set:
type: string
required: false
description: 'Which dataset to use for fine-tuning'
default: 'finetome-100k'
max_seq_length:
type: string
required: false
description: 'The maximum sequence length'
default: '4096'
lora_rank:
type: string
required: false
description: 'The lora rank'
default: '64'
max_steps:
type: string
required: false
description: 'The maximum number of steps'
default: '250'
gpu_memory_utilization:
type: string
required: false
description: 'The GPU memory utilization'
default: '0.90'
learning_rate:
type: string
required: false
description: 'The learning rate'
default: '2e-5'
per_device_train_batch_size:
type: string
required: false
description: 'The per device training batch size'
default: '2'
hf_repo:
type: string
required: true
description: 'The Hugging Face repository to upload the model to'
jobs:
train:
name: Supervised LoRA Training (unsloth)
runs-on:
- machine
- gpu=t4
- cpu=4
- ram=16
- architecture=x64
timeout-minutes: 180
env:
SOURCE_MODEL: ${{ inputs.source_model }}
MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }}
LORA_RANK: ${{ inputs.lora_rank }}
DATA_SET: ${{ inputs.data_set }}
GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }}
MAX_STEPS: ${{ inputs.max_steps }}
LEARNING_RATE: ${{ inputs.learning_rate }}
PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_ENABLE_HF_TRANSFER: 1
HF_REPO: ${{ inputs.hf_repo }}
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run Training
run: |
python3 train.py
Advanced Retry Mechanism
For enhanced reliability, the repository also provides a workflow with automatic checkpointing and retry functionality:
name: Supervised Fine-Tuning with Retry
on:
workflow_dispatch:
inputs:
attempt:
type: string
description: 'The attempt number'
default: '1'
max_attempts:
type: number
description: 'The maximum number of attempts'
default: 5
# Same parameters as in the basic workflow
# ...
This implementation ensures training progress isn’t lost due to spot instance interruptions by:
- Automatically saving checkpoints to Hugging Face Hub during training
- Detecting spot instance interruptions using a custom GitHub Action
- Restarting the workflow with an incremented attempt number
- Resuming training from the latest checkpoint
The retry mechanism works through the following steps:
- The workflow starts a training job with a specified attempt number (default: 1)
- During training, checkpoints are periodically saved to Hugging Face Hub
- If the job completes successfully, the workflow ends
- If the job fails due to a spot instance interruption:
- The
check-runner-interruptionaction detects that the failure was due to a spot instance preemption - The workflow calculates the next attempt number
- If within the maximum attempts limit, it triggers a new workflow run with an incremented attempt number
- All original parameters are preserved for the new attempt
- The
- When a new attempt starts, it downloads the latest checkpoint and resumes training from that point
This mechanism ensures that even if a spot instance is reclaimed, your training progress isn’t lost, and the job can continue from the last checkpoint on a new instance.
Using machine.dev GPU Runners
This fine-tuning process leverages machine.dev GPU runners to provide the necessary computing power. The workflow is configured to use:
- T4 GPU: An entry-level ML training GPU with 16GB of VRAM, suitable for efficient training with unsloth optimizations
- Spot instance: To optimize for cost while maintaining performance
- Configurable resources: CPU, RAM, and architecture specifications
For more demanding models or larger datasets, you can also configure the workflow to use more powerful GPUs:
runs-on:
- machine
- gpu=l4
- cpu=4
- ram=16
- architecture=x64
Getting Started
To run the LLM Supervised Fine-Tuning workflow:
- Use the MachineDotDev/llm-supervised-fine-tuning repository as a template
- Set up a Hugging Face access token with write permissions
- Add this token as a repository secret named
HF_TOKENin your GitHub repository settings - Navigate to the Actions tab in your repository
- Select the “Supervised Fine-Tuning with Retry” workflow
- Click “Run workflow” and configure your parameters:
- Choose your base model and dataset
- Adjust sequence length, LoRA rank, and training steps
- Configure GPU memory utilization and learning rate
- Specify your Hugging Face target repository
- Run the workflow and wait for results
- Access your fine-tuned model on Hugging Face Hub
Best Practices
- Select appropriate datasets: Choose datasets that match your target application domain
- Adjust batch size for your GPU: Lower batch sizes if you encounter out-of-memory errors
- Use checkpointing for longer runs: For extensive training sessions, use the retry-enabled workflow
- Monitor training progress: Check workflow logs to observe loss metrics
- Test with prompts similar to your use case: Evaluate the model on examples that match your intended application
How to adapt this
- Larger model (7B+): swap
gpu=t4forgpu=l4(24 GB VRAM, $0.006/min spot) orgpu=l40s(48 GB, $0.016/min spot) — see CPU vs GPU - Different base model: change the
source_modelinput - Different dataset: change the
data_setinput or modify the data loading code - Run nightly: add
on: schedule: cron: '0 2 * * *'to the workflow
Next steps
- Working repo — fork or use as a template
- Cost Optimization — checkpointing pattern explained
- CPU vs GPU — picking the right GPU for your model size