SKIP_TO_MAIN_CONTENT
03.1 // Post Field notes May 13, 2026

Disk config, tunable and visible. Cost breakdown on every job.

PUBLISHED MAY 13, 2026 // AUTHOR LEONARD O'SULLIVAN
FEATURED_IMAGE.PNG ASSET_01
A dark audio mixing console with rows of faders set to different levels and glowing channel lights
01 Article body
[REF_001]

Full
Text.

Technical write-up from the team at machine.dev.

Two things shipped this month, and they're built to work together. You can now configure the disk your job runs on, straight from the runs-on label. And every finished job now shows you what it cost, split into compute and disk, so when you tune the disk you can see exactly what the tuning bought you.

Here's both, and how they fit.

The disk was always there. Now you can see the knobs.

Every machine.dev runner has always booted with a gp3 EBS root volume under it. Until now you got one size of it: 100 GB, 6,000 IOPS, 250 MB/s throughput, the same for everyone. For most jobs that's fine. Checkout, build, test, push, the disk is never the thing slowing you down.

But some jobs live and die on disk. A training run writing checkpoints every few minutes. A build that unpacks a giant dependency cache and then writes it back. Anything that touches a big dataset on local disk instead of streaming it. For those jobs the default 250 MB/s is a ceiling you can feel, and you had no way to lift it.

Now you do. Three labels, all optional, all in the packed runs-on string:

YAML
runs-on: machine/cpu=16/disk_size=500/disk_iops=10000/disk_throughput=750
#  disk_size=500       → 500 GB root volume
#  disk_iops=10000     → 10,000 provisioned IOPS
#  disk_throughput=750 → 750 MB/s throughput

The ranges:

  • disk_size — default 100 GB, range 1 to 16,384 GB
  • disk_iops — default 6,000, range 6,000 to 16,000
  • disk_throughput — default 250 MB/s, range 250 to 1,000

Leave them off and you get the defaults, same as before. Set them and you get a runner tuned to the shape of your workload. A checkpoint-heavy fine-tune wants high throughput so the GPU isn't sitting idle waiting on writes. A job that does a lot of small random reads wants IOPS. A job that just needs somewhere to put a 400 GB dataset wants size and nothing else. You pick.

One thing worth saying plainly: the defaults are free. The 6,000 IOPS and 250 MB/s every runner already had cost you nothing, and they still cost you nothing. You only pay for headroom above the defaults, and only for the minutes your job is actually running. More on that in a second, because that's the other half of this.

Now you can see what the disk cost you

Here's the problem with adding a knob. The moment you can turn IOPS up to 16,000, someone's going to turn it up to 16,000 "just in case," run a thousand jobs, and get a bill they don't understand. A knob without a readout is how you end up overpaying and never knowing which knob did it.

So we put the readout next to the knob. Every finished job now has a cost breakdown on its page, and it splits the bill in two:

TEXT
COST BREAKDOWN
████████████████████████░░░░░░
■ Compute                $0.095
□ EBS                    $0.020
─────────────────────────────
Total                    $0.115

Compute is the runner: the per-minute rate for the CPU or GPU you asked for, times the minutes it ran. EBS is the disk: your volume size, plus any IOPS and throughput above the free defaults, prorated to the same runtime. Two numbers, one total, a bar so you can see the split at a glance without doing the division in your head.

That job up there spent twelve cents. Ten of it was compute, two of it was disk. The disk was about a sixth of the bill, which tells you something real: this was a job with the disk turned up. A job on default storage would show a couple of tenths of a cent of EBS, a sliver of the bar you'd have to squint at. Two cents of disk on a twelve-cent job means someone asked for more, and now they can see precisely what "more" came to.

That's the whole point of shipping these together. The disk labels let you spend money on disk. The breakdown shows you the money you spent. You tune, you run, you look at the split, and you decide whether the speed was worth the spend. If the EBS line is creeping up and your job didn't get faster, you turned the wrong knob. Turn it back down. The number will tell you.

A worked example, end to end

Say you've got a fine-tune that checkpoints often, and the GPU keeps stalling on disk writes. You bump the throughput and give it room for the checkpoints:

YAML
runs-on: machine/gpu=l40s/tenancy=spot/disk_size=500/disk_iops=8000/disk_throughput=500

The job runs. The GPU stops waiting on the disk. You open the job page when it's done and the breakdown shows compute as the overwhelming majority of the bill, with EBS as a small, visible slice on top. Now you know two things you didn't before: the throughput bump worked, and it cost you a couple of cents. Cheap, for a GPU that's no longer twiddling its thumbs between writes.

Or the other way. You turned IOPS up to the ceiling, the job ran exactly as fast as before, and the EBS line is bigger than it was for no benefit. The breakdown just told you the IOPS weren't the bottleneck. Drop them back to default, keep the throughput bump that actually helped, run it again. The split is the feedback loop. You're not guessing which knob mattered, you're reading it off the page.

The disk labels are live now on every runner type, CPU and GPU. The cost breakdown is on every job page from today, and it's retroactive on jobs run since the rollout, so go look at a recent one. If you've been running on default storage this whole time, the EBS line will be reassuringly tiny. If you've been turning knobs, now you can finally see what they cost.

The full label reference, with every disk option and the EBS rate breakdown, is in the configuration docs and the pricing page.

02 Share
03 Continue reading
04 Initialize
READY
> STOP_READING // START_BUILDING

High powered
CI infra.

$10 free compute on signup. Two minutes to connect your GitHub org. torch.cuda.is_available() == True on the first run.