exit-0-is-not-success.md

Exit 0 Is Not Success

1 June 2026·6 min read

I run most of my changes through agentic coding tools now — fast, and confident even when they're wrong. A field guide to the gotchas where everything reports done and isn't, and the two practices that actually catch them.

reliabilitysecurityverificationagentic-codinggitops

I run most of my homelab changes through agentic coding tools now — wrappers around models that read the repo, make the change, run the commands. They're fast, and they're confident: the change applies, the command exits 0, the log says success. The catch is that "the tool reported success" and "the thing actually works" are different claims, and agentic tools are very good at producing the first while quietly missing the second.

So I keep a running list of the gotchas — the spots where everything reports done and isn't. Most I caught on a second pass, or when a review loop stopped on an assumption the first pass had skated over. Here are the ones worth knowing.

The gotchas

A 200 from a health check tells you nothing about auth. I rotated an API key and went to confirm the old default was dead — curled the service, got a 200, would've called it done. But /health has no auth on it; it returns 200 to anyone, including a key that doesn't exist. "The server is up" and "this credential works" are different questions, and the health route only answers the first. To actually test a key, you have to hit something the key is required for.

A random value can line-wrap and silently break secret encryption. Generate a long base64 value, drop it into a secret, run the encrypt step — exit 0. Except the value had wrapped onto a second line, the newline broke the YAML, and the encrypt tool, handed malformed input, gave up and left the secret in plaintext on disk. Nothing errored. The only tell was the file being a third of the size it should be. Secrets are the one place you can't afford the success signal to be lying, and here it lied for free.

"Configured successfully" in the logs can sit directly on top of a 500. An app with SSO came up clean — pod Running, GitOps synced, logs announcing social login configured successfully. The login flow itself returned a 500. It needed five environment variables, and one of them wasn't an identity-provider setting at all — it was an internal kill-switch in the app, off by default. The startup log knew about exactly none of it.

A pipe reports the last command's exit code. dump | encrypt returns encrypt's status, so the dump can fail outright and the whole line still exits 0 — unless you set -o pipefail. A nightly backup did exactly this: the dump couldn't reach the database, encrypt cheerfully wrapped an empty stream into a valid-looking file, and the job reported Completed. Every night. The only reason I know is that I went to restore one.

Why they all slip through

Exit 0 means "this process chose to return 0," and nothing more. The process owns the number, and most of them only know about a tiny slice of the job. kubectl get pods knows the process is alive, not that it works. A health check knows the server's up, not that your key does anything. encrypt exiting 0 means it ran, not that it encrypted. Completed on a backup means the job finished, not that anything's restorable.

None of these tools are lying — they're each answering a smaller question than the one I'm actually asking. And an agentic loop running flat-out will take the small answer as the big one every time, because the small answer is right there, green, and it keeps the loop moving.

The one a review pass caught

The gotchas above I caught by hand — a wrong file size, a failed restore, a 500. This one I'd have shipped, because it would have passed the check I'd planned to run.

I went to rotate a Keycloak signing key — the key behind every token, every login. Add the new key to the realm config, remove the old one, let the reconciler apply it. It applied clean, exit 0. The old key was still in the JWKS, still trusted.

The reconciler runs with no-delete — a guardrail so an import can't accidentally wipe live config. Updating a component is allowed; deleting one is blocked. Which is great until the thing you're doing is a deletion: removing the old key from the config wasn't a delete it would honour, it was one it would silently skip. A rotation that doesn't retire the old key isn't a rotation, it's adding a key — and if you were rotating because the old one might be compromised, it's still live.

It didn't ship because a review pass stopped on the one assumption the whole plan rested on — that "remove from the config" means "remove from Keycloak" — and asked whether that's actually true under no-delete. It isn't. And the verification I'd planned would have missed it anyway: I was going to confirm the new key was in the JWKS, which passes whether or not the old one's gone. The check that mattered was the absence — old key out — and it's the one the first pass skips. (The fix, for the record: retire the old provider by updating it to enabled=false, which no-delete allows, then confirm on /certs that the old kid is gone. I rehearsed it on a throwaway Keycloak first, which caught a second gotcha — OpenSSL 3 emits PKCS#8 by default, Keycloak wants PKCS#1, and the mismatched import fails leaving the old key in place. Exit 0, naturally. The flag is openssl rsa -traditional.)

What actually catches them

Two things, and neither is a better dashboard.

One: verify on the surface a real consumer hits, and verify it twice — once expecting success, once expecting failure. The positive check tells you the happy path works; the negative check tells you the thing you actually care about. New key present is easy. Old key gone is the part that proves anything, and it's the part the fast pass drops.

Two: don't let the pass that makes the change be the only pass that checks it. Whoever — or whatever — wrote the diff is the worst-placed to catch what the diff assumes. So a separate, adversarial pass gets one job: try to break it, and re-ask the premises instead of re-reading the lines. That's what caught the key rotation. Nothing in the first pass would have.

The part that doesn't speed up

The tooling is a genuine force multiplier. It's also a faster way to be confidently wrong, and the exit code won't tell you which one you just did. A machine can run the command; it can't own the decision — that's still me, and a clean exit doesn't transfer it. So I spend some of the speed the tools buy back on the boring half: hit the real surface, prove the absence, run it past a second pass. Clean exit, green dashboard, "Completed" — I've learned exactly how little any of them promise.

← back to research