The pitch for feature flags is seductive. Wrap risky code in a toggle, roll it out gradually, flip it off if something goes wrong. You get canary deployments without the infrastructure, A/B tests without a data team, and hotfixes without a redeploy. For about six months of a young codebase, it works great.
Then it doesn't.
The second codebase
Every engineering team I've worked with that adopted feature flags seriously ended up with the same problem: they built a second codebase on top of their real one. Not on purpose. It happens one toggle at a time.
First there's one flag around a new checkout flow. Then someone wraps a migration behind another flag. Then the mobile team needs their own flag for the new onboarding. Then legal wants a flag for GDPR. Then an SRE adds a kill switch for the rate limiter. Six months later, you open a file and the actual logic is buried four levels deep inside nested if (flags.isEnabled(...)) blocks. The branching factor of your program has doubled, and nobody knows which combinations are actually reachable in production.
The dashboard helps with none of this. It shows you which flags exist. It does not show you which flags are load-bearing, which ones have been stuck at 100% for a year, which ones have a fallback path that hasn't been executed since 2023, or which ones would break if you deleted them right now. That information only exists in the heads of the people who wrote the code, and those people have mostly left the company.
The half-life problem
There's a thing that happens with feature flags that I've started calling the half-life problem. A flag starts at 0% rollout. An engineer ramps it to 100%. The flag works. Ship it. On to the next thing.
Now the flag is "done." Except nobody marks it as done, because there's no reward for cleaning up a flag and there's real risk in touching code that works. So the flag sits at 100% forever. The fallback branch stops being tested. Dependencies drift. Three years later, someone needs to touch the same file for an unrelated reason and discovers that the "dead" fallback branch actually references a deleted API. Would it have worked if the flag flipped off? Nobody knows. The cleanup ticket gets written, gets assigned, gets reassigned, gets closed as "wontfix - too risky."
Talk to any team with a mature feature flag setup and they'll tell you the same thing: more than half of their flags are zombies. Still in the code, still consuming evaluations, still cluttering the UI. Still shipping to prod every build. Still a potential source of a three-a.m. incident if someone trips over the wrong one.
Flag management is a full-time job nobody has
The standard response to all this is process. Write a flag TTL. Put cleanup on the sprint board. Run a monthly flag audit. Assign an owner. Require a JIRA ticket for every flag. Build a dashboard that shows stale flags.
I have watched every version of this fail. Not because engineers are lazy — they aren't — but because flag hygiene is the kind of work that is always less urgent than the next feature, the next incident, or the next OKR. It's the housekeeping of software: important in aggregate, invisible when done well, and the first thing cut when the sprint is behind.
The teams that do stay on top of flags usually have one person whose job is effectively "flag librarian." They nag, they audit, they write the cleanup PRs. When they leave, the rot comes back within two quarters. I've seen it happen three times at three different companies.
What would it take to fix this?
The thing that bothers me most about the current state of feature flags is that almost all of the pain is structural. It's not that the primitives are wrong. Traffic splitting, kill switches, gradual rollout — all of that is great. The pain is that the primitives are exposed directly to humans, and humans are not a good long-term maintainer of a system that gets a new entry every week and rewards no one for gardening it.
The systems that survive this kind of entropy are the ones where the boring parts are automated. You don't write your own DNS records by hand. You don't maintain your own TCP retry logic. You didn't hand-roll your own TLS cert rotation — well, maybe you did in 2014, and you remember how that went. The answer, every time, was the same: push the tedium down into the platform and let humans work at a higher level.
Feature flags need the same treatment. The flag itself shouldn't be a thing an engineer creates, maintains, ramps, and cleans up. It should be a capability the platform applies automatically, at the right moments, on behalf of the humans writing the code. The engineer's job is to write the feature. The platform's job is to make it safe to ship.
That's what we're building at DeployRamp, and it's the thing I keep coming back to: the best feature flag is the one you never knew was there.
The TL;DR
Manual feature flagging has three failure modes, and every team hits all three eventually:
- Flag sprawl. The codebase fills up with toggles. Nobody tracks which combinations are reachable.
- Zombie flags. Flags hit 100%, nobody cleans them up, the fallback paths rot.
- No owner. Flag hygiene is important-but-not-urgent work, which means it never gets done.
You can throw more process at this, and it'll work for about as long as the person who cares about it is still on the team. After that, you're back in the same hole. The only real fix is to take the humans out of the boring parts of the loop.
More on how we do that in the next post.