Great article! I always love hearing Stripe talking about their internals.
I've been using this practice and I agree that it's incredibly useful. I think because people tend to think in terms of "logs", they end up overlooking the much more useful construct of "canonical logs". Many fine-grained logs themselves are almost always less useful than the fewer fully-described canonical logs. Other observability tools often call these "events" instead of "logs" for that reason.
There's a tool call Honeycomb [1] that gives you exactly what this article's talking about in a really nicely designed package out of the box. And since it handles all of the ingestion and visualization, you don't have to worry about setting up Kafka, or the performance of logplexes, or teaching everyone SQL, or how to get nice graphs. I was a little skeptical at first, but after using it for over a year now I'm completely converted.
If you record fully-described "events" for each request, and you use sub-spans for the smaller segments of requests, you also get a waterfall-style trace visualization. Which eliminates the last need for fine-grained logs completely.
If this article seems interesting to you, I'd highly, highly recommend Honeycomb. (Completely unaffiliated, I just think it's a great product.)
> The most effective way to structure your instrumentation, so you get the maximum bang for your buck, is to emit a single arbitrarily wide event per request per service hop.
> We're talking wiiiide. We usually see 200-500 dimensions in a mature app. But just one write.
Honeybee here. Feel free to just try it, there's a 14 day free trial, and a free community edition for small amounts of data :) experiment away, and our community slack is super friendly!
We certainly wouldn't fit into the community edition :)
Our main project is running on Django 1.11. I'm going to wait until we're on Django > 2 for the database tracing integration.
What I'd love to see is a screencast, demo, or series of screenshots that digs into the (out of the box) Django integration. NewRelic gives us a lot of insight into our database performance, including EXPLAIN traces for slow queries. Does Honeycomb provide something similar?
Yes, honeycomb is great. It's one of those "I wish I had more bigger projects, just so I could use this more" services. Other APMs / logging systems are just not really comparable.
I am wondering how things like OpenTracing-esque spans and sub-spans fit into the format Stripe describes. Are they just logged as `subspan1`, `subspan2`, `subspan3` in the log format?
It seems like that works, but I'm also unclear if maybe each sub-span is better off as its own log line? But that carries its own problems.
I've been using this practice and I agree that it's incredibly useful. I think because people tend to think in terms of "logs", they end up overlooking the much more useful construct of "canonical logs". Many fine-grained logs themselves are almost always less useful than the fewer fully-described canonical logs. Other observability tools often call these "events" instead of "logs" for that reason.
There's a tool call Honeycomb [1] that gives you exactly what this article's talking about in a really nicely designed package out of the box. And since it handles all of the ingestion and visualization, you don't have to worry about setting up Kafka, or the performance of logplexes, or teaching everyone SQL, or how to get nice graphs. I was a little skeptical at first, but after using it for over a year now I'm completely converted.
If you record fully-described "events" for each request, and you use sub-spans for the smaller segments of requests, you also get a waterfall-style trace visualization. Which eliminates the last need for fine-grained logs completely.
If this article seems interesting to you, I'd highly, highly recommend Honeycomb. (Completely unaffiliated, I just think it's a great product.)
[1]: https://www.honeycomb.io/