I've gone back and forth on this. Yes - sometimes your worry is completely unfou...

gravypod · on Oct 4, 2020

It sounds like the grungy work you'd need to look into is a canary deploy with a production traffic duplicator. Spin up a version N+1 in AWS, copy all the traffic N is getting, only hook N+1 to mocks and observed behavior.

If your monitoring and alerting and quotas are setup right you'll know if the version update is ok.

To me that is very scary work though. Very difficult and risky and this article re-applies (at least to me).

redisman · on Oct 4, 2020

Canary works for immediately broken deploys. If it manifests or is caught a few days later or a week later then you need a new plan

gravypod · on Oct 4, 2020

You can also collect code coverage statistics from this canary deploy. I don't know if this is possible with EBS but I've done it with JVM apps where you can connect a debugger remotely. Keep running the canary until 100% of code paths are hit. If a code path isn't hit, find out why and repeat.

It's also a great way to empirically find dead code.

touisteur · on Oct 4, 2020

Trying to build something like this with Intel PT, and it's great. I used to do it with a patched libgcov, but now it's even better. Getting counters for every code branch, notifications after a new path has been taken once N% coverage is reached, liveness info about periodic tasks, I/O threads, execution times too.

gravypod · on Oct 4, 2020

I think it's probably the best way to be "sure" about your code and unfortunately this is an area that is under-tooled.