The testing process where I work is pretty thorough. There is little appetite to hotshot a code change all the way into a production public safety system where a bug can have an impact on, well, public safety. Any code my team produces go to QA, where they test it in their own environment without interference/influence from us. Then, before it’s deployed, it goes to the deployment team, who has their own environment and their own means of testing the product, and they test it.
(This is in pretty stark contrast to how things worked at my very first development job, where source control didn’t exist and I was essentially testing in production.)
The deployment team is unique in that they have two environments; the actual production environment serving live traffic, and the staging environment where production support steps are rehearsed before being executed for real. The staging environment is supposed to be a 1:1 replica of the production environment so they can rule out environment-specific issues.
So imagine my near-total lack of surprise when I get summoned to join a call last night at 1am because… wait for it… the software was doing something completely unexpected in the production deployment. Why was this? Turns out the staging environment was in fact not a 1:1 replica of the production environment, and the deployment team missed a detail turning out to be completely unique to production.
I’m probably going somewhere with this story, but being this short on sleep has taken its toll.
But hey, at least the finger-pointing process is always a good time.