How are you storing them, and what do you use to read/visualize/analyze them? I'd imagine just putting them up in a UI becomes a needle-in-a-haystack issue. Are you programmatically analyzing them?
Honeycomb. For shorter traces (most of them), a waterfall view is great. For those long ones, we try to split them up if it makes sense but you can also just run queries scoped to that trace to answer questions about it (how many of the spans are db queries, how many are this query, are they quick, etc etc)
Tempo's a backend/sink for traces, but if you click through to the Tempo docs and find out how to generate tracing data[1], you learn that you have two options: OpenTelemetry, which they recommend, and Zipkin, which they do not recommend.
Tempo is a traces server. Prometheus is a metrics server.
Grafana, the same company that develops and sells Tempo created a horizontally scalable version of Prometheus called Mimir.
OpenTelemetry is an ecosystem, not just 1 app. It’s protocols, libraries, specs, a Collector (which acts as a clearinghouse for metrics+traces+logs data). It’s bigger than just Tempo. The intention of Patel seems to be to decouple the protocol from the app by having adapters for all of the pieces.
If you look at DB-engines.com/ranking and look at all of the collective interest in all of the databases listed, you will see that the aggregate "score" of all databases combined 7105.84. Postgres is indeed popular; but it is only ranked 4th on the list, with its own score of 648.96. MySQL currently is still 50% larger in terms of interest, with a score of 998.15.
Which means interest in Postgres (specifically) is only 9.13% of overall interest in databases; MySQL another 14.04%. Combined 23.27%.
Is that a significant percentage of interest? Yes. Many others are a fraction of 1% of mindshare in the market.
Yet the reason there are 423 systems ranked in DB-Engines is because no one size fits all data, or data query patterns, or workloads, or SLAs, or use cases.
PostgreSQL and MySQL are, at the end of the day, oriented towards OLTP workloads. While you can stretch them to be used for OLAP, these are "unnatural acts." They were both designed in days long ago for far smaller datasets than typical for modern-day petabyte-scale, real-time (streaming) ingestion, cloud-native deployments. While many engineering teams have cobbled together PostgreSQL and MySQL frankenservers designed for petabyte-scale workloads, YMMV for your data ingest, and for p99s and QPS.
The dynamic at play here is that there are some projects that lend themselves to "general services" databases, where MySQL or PostgreSQL or anything else to hand is useful for them. And then there are specialized databases designed for purpose for certain types of workloads, data models, query patterns, use cases, and so on.
So long as "chaos" fights against "law" in the universe, you will see this desire to have "one" database standard rule them all, versus a Cambrian explosion of options for users and use cases.
While you’re not wrong re: Postgres and MySQL not being necessarily designed for PB-scale, IME many shops with huge datasets are just doing it wrong. Storing numbers as strings, not using lookup tables for low-cardinality data, massive JSON blobs everywhere, etc.
I’m not saying it fixes everything, but knowing how a DB works, and applying proper data modeling and normalization could severely reduce the size of many datasets.
It all depends on what kind of queries you're running. I came from the OLTP market, where you're generally doing single-row operations. Basic CRUD. Single table work on denormalized data.
Now go to OLAP, and a single query might be doing multiple table joins. It might be scouring billions of records. It might need to do aggregations. Suddenly "millions of ops" might be reduced to 100 QPS. If you're lucky.
And yes, that's even using fast local NVMe. It's just a different kind of query, with a different kind of result set. YMMV.
I think it's a matter of use case. Doing ad hoc data exploration on an OLTP system generally sucks the wind out of the performance. Even if you have some type of workload prioritization, isolation, and limitation, allowing data scientists and business analysts freely wandering through your production OLTP database sounds like a Bad Time.
The organization might say "Okay. Maybe you should do your ad hoc exploration on an OLAP system. Preferably our data warehouse where you can let your report run for hours and we won't see a production brownout while it's running."
So complexity of ad hoc joins in the warehouse generally can get more complex.
The devil is in the details, and anyone who is looking to implement TiDB for data correctness should read through not just this but other currently-open correctness-related Github issues:
Anyone who has done timer or countdown-based work has to think the opposite. Whether that's sports or event production, rocket launches, or so on. You're not thinking about the time you've past/expended. You're thinking of the time you still have left. "Two minutes left in game." "We're live in 30 seconds" etc.
NFL delay-of-game penalties are interesting for this, because when the clock first shows 0 seconds, that means that the team still has a full second to start the play.
https://github.com/odigos-io/odigos