We shipped a change to how you ask Machine for a runner. Most of you won't have to touch anything, and your old workflows keep working. But the change fixes a class of bug worth explaining, because the way GitHub matches jobs to self-hosted runners has a sharp edge that we walked straight into.
It started with disk labels.
The feature that surfaced the bug
A while back we added per-job disk controls: disk_size, disk_iops, and disk_throughput. Handy if you're checkpointing a big model and want a 500 GB volume with fast writes. Small feature, a few lines on your side.
Not long after, we started seeing something strange in the logs. Jobs were finishing on runners we hadn't provisioned for them. A job that asked for a plain 8-core box would land on a runner we'd spun up for a completely different job, one with a big tuned disk attached. The work still ran. But the accounting was wrong, and every so often a job would sit waiting while its runner ran another of your own jobs.
It took us longer than we'd like to admit to connect it to the disk labels.
How GitHub actually picks a runner
Here's the part that trips people up. When you queue a job for a self-hosted runner, GitHub doesn't look for an exact label match. It looks for any idle runner whose labels are a superset of what the job asked for.
That rule is fine on its own. A runner labelled self-hosted, linux, x64, gpu should be allowed to pick up a job that only asks for self-hosted and linux. The extra labels don't disqualify it.
Machine runs one fresh runner per job, inside your team's own tenancy — it can never be picked up by another team, so this was only ever your own jobs colliding, not anyone else's. We register each runner with the labels the job asked for, then throw it away.
The trouble starts when one runner's labels are a superset of another job's request.
Why disk labels made it worse
Before disk controls, most jobs asked for nearly the same handful of labels. Two jobs requesting eight cores looked identical to GitHub, and our provisioner handed each its own runner before either could wander off. Collisions were possible but rare enough to hide.
Disk labels changed the shape of the problem. Now a runner could carry an 8-core label plus three disk settings, while a different job down the queue asked for plain 8 cores. That second request is a strict subset of the first runner's labels, so GitHub considered the big-disk runner a perfectly valid home for the small job. The more optional labels we added, the more supersets existed, and the more often a heavyweight runner would reach down and grab a job meant for something simpler.
This is the subset/superset problem, and it isn't a GitHub bug. It's how the matching is designed to work. The mistake was ours. We were describing runners in a way that let them overlap.
The first fix, which we threw away
Our first instinct was to police it. We built a registration gate: a holding state for new runners and a background task that compared label sets and decided, for each runner, whether it was safe to let it accept a job yet. If a runner's labels were a superset of some other queued job, hold it back.
It worked. It was also slow and fragile. We'd added a queue, a reconciliation loop, and a new failure mode to a path whose whole selling point is that a runner shows up in about a minute. Every job now waited on a scheduler reasoning about set theory. We shipped it, lived with it for a bit, didn't like it, and pulled it back out.
The lesson was that we were solving the wrong problem. We didn't need to referee overlapping labels. We needed to stop the labels overlapping.
The actual fix: stop the labels overlapping
The version we shipped is much smaller. Pack everything a job needs into one label, led by machine — the word that routes the job to us:
runs-on: machine/cpu=8/disk_size=500That single long string is the whole change. To GitHub it's one label, not a set, so it can't be a subset or superset of anything else, and the overlap that caused the stealing has nowhere to form. A runner for machine/cpu=8/disk_size=500 is no longer a stand-in for a job asking machine/cpu=8 — different labels, not one nested in the other.
A unique tag like id=${{ github.run_id }} is still welcome, and handy for marking matrix legs, but it's a nice-to-have, not the fix. Even two jobs running at once with identical labels are fine: each still gets its own runner.
Your existing workflows are fine
If you're reading this thinking you now have to go and rewrite every pipeline you own, you don't. The older style, where you list each label separately, still works exactly as it did. Nothing you've already shipped will break.
We're just not recommending it anymore. New examples in our docs use the packed label, and if you run a lot of jobs at once, matrix sweeps especially, that's where the old separate-label style could bite. Move to one packed label and the overlap can't happen. Leave your old workflows alone and they'll keep running as they always have.
That's the whole change. One label instead of a set, a bug that can't form anymore, and an optional id to name a run.