If It Works in Dev and Breaks in Prod, You Don't Understand It.

Best PracticesAug 14, 20254 min read

Every developer has said it. 'It worked on my machine.' It's almost a rite of passage — your first prod incident, your first 2am Slack message, your first time staring at logs that make no sense because locally everything runs fine. But here's what nobody tells you in those moments: the bug isn't the problem. Your mental model of the system is.

What the gap actually is

Dev and prod differ in ways that feel minor until they aren't. Environment variables you set once and forgot about. A database with real data volumes and real query patterns instead of your five seeded rows. Network latency that doesn't exist when both the client and server are on the same machine. Concurrent users hitting the same endpoint instead of just you. Race conditions that need real traffic to surface. You're not testing the system in dev — you're testing a toy model of it.

The mental model problem

When something breaks in prod and not dev, it usually means there's a dependency, assumption, or behavior you didn't know you were relying on. You thought you understood the code, but you were actually understanding the code under your specific local conditions. That's a different thing. And the only way to close the gap is to genuinely understand what your code depends on — not just what it does when everything is perfect.

If you can't explain why it would break, you can't claim you know why it works.
— Zaid Maraqa

How to actually close the gap

Make your dev environment boring. Same OS, same versions, same config, same data volume patterns as prod. Use environment variables the same way. Run your migrations against a local copy of a prod-like dataset occasionally. Write tests that cover the cases that only emerge under load or concurrency. And when something breaks in prod, don't just fix the symptom — trace it back to what you didn't know, and update your mental model before you move on.

The real lesson

Production isn't adversarial. It's just honest. It doesn't care about your assumptions. It runs your code exactly as written, under conditions you didn't control for, with users doing things you didn't expect, at volumes you didn't test. The developers who have few prod incidents aren't lucky — they've built the habit of asking 'what am I assuming here?' before shipping, not after.