A few pointers from our own experience: - centralized logs (we tried an ELK stac...

shoo · on June 15, 2018

> Edit: also, decoupling. If your Lambdas are calling each other directly, consider putting a queue or SNS topic in between. Makes it easier to test each unit independently, can manage timeout / retry issues on your behalf, and gives you a convenient observation point for inter-service traffic

i recall someone giving a talk (offline at a functional programming meetup?) about something like this. it wasn't in the context of AWS services, or serverless, and probably wasn't even in the context of a distributed system --- but the crux of the suggestion was that they re-architected some system to jam queues between everything, which gave them great traceability and the ability to do things like capture messages then replay them later or what have you. once they had the queues in place they realised it would be trivial to add a bunch of other features leveraging the queues (such as "undo", clearly this would depend upon what side effects your system has -- this was a functional programming meetup so perhaps their system didnt have many/any). might be good for enabling recording of execution for export and playback in a debugging environment or for construction of automated regression tests.