More

PeterCorless · 2025-01-10T23:39:25 1736552365

Yes. edenfed posted a comment linking to the project above. Here is is again, though:

PeterCorless · 2025-01-10T23:35:40 1736552140

How are you storing them, and what do you use to read/visualize/analyze them? I'd imagine just putting them up in a UI becomes a needle-in-a-haystack issue. Are you programmatically analyzing them?

IneffablePigeon · 2025-01-11T10:19:19 1736590759

Honeycomb. For shorter traces (most of them), a waterfall view is great. For those long ones, we try to split them up if it makes sense but you can also just run queries scoped to that trace to answer questions about it (how many of the spans are db queries, how many are this query, are they quick, etc etc)

PeterCorless · 2025-01-10T23:33:33 1736552013

Cramer wants to get traces out of OTel. Which is ironic because he's one of the creators of OpenTracing.

https://cra.mr/the-problem-with-otel/

deepsun · 2025-01-13T21:44:43 1736804683

He also started Sentry, so must know a thing or two on the topic.

PeterCorless · 2025-01-10T23:18:46 1736551126

SaS Institute used that exact same analogy & even this video in their talk about implementing ScyllaDB back in 2020 (check out 0:35 in the video):

https://www.scylladb.com/2020/05/28/sas-institute-changing-a...

Seems like moving to OTel might even be a bit more complex for some brownfield folks.

PeterCorless · 2025-01-10T23:07:23 1736550443

For tracing FOSS: Grafana Tempo.

https://grafana.com/oss/tempo/

flurie · 2025-01-10T23:25:22 1736551522

Tempo's a backend/sink for traces, but if you click through to the Tempo docs and find out how to generate tracing data[1], you learn that you have two options: OpenTelemetry, which they recommend, and Zipkin, which they do not recommend.

[1] https://grafana.com/docs/tempo/latest/getting-started/instru...

paulddraper · 2025-01-11T22:11:21 1736633481

"I don't want solutions, I want to be mad."

thephyber · 2025-01-11T02:03:24 1736561004

Tempo is a traces server. Prometheus is a metrics server.

Grafana, the same company that develops and sells Tempo created a horizontally scalable version of Prometheus called Mimir.

OpenTelemetry is an ecosystem, not just 1 app. It’s protocols, libraries, specs, a Collector (which acts as a clearinghouse for metrics+traces+logs data). It’s bigger than just Tempo. The intention of Patel seems to be to decouple the protocol from the app by having adapters for all of the pieces.

Too · 2025-01-11T07:48:31 1736581711

Prometheus is not only a metrics server, it's also become the de-facto metrics exposition format.

PeterCorless · 2025-01-07T01:11:39 1736212299

If you look at DB-engines.com/ranking and look at all of the collective interest in all of the databases listed, you will see that the aggregate "score" of all databases combined 7105.84. Postgres is indeed popular; but it is only ranked 4th on the list, with its own score of 648.96. MySQL currently is still 50% larger in terms of interest, with a score of 998.15.

Which means interest in Postgres (specifically) is only 9.13% of overall interest in databases; MySQL another 14.04%. Combined 23.27%.

Is that a significant percentage of interest? Yes. Many others are a fraction of 1% of mindshare in the market.

Yet the reason there are 423 systems ranked in DB-Engines is because no one size fits all data, or data query patterns, or workloads, or SLAs, or use cases.

PostgreSQL and MySQL are, at the end of the day, oriented towards OLTP workloads. While you can stretch them to be used for OLAP, these are "unnatural acts." They were both designed in days long ago for far smaller datasets than typical for modern-day petabyte-scale, real-time (streaming) ingestion, cloud-native deployments. While many engineering teams have cobbled together PostgreSQL and MySQL frankenservers designed for petabyte-scale workloads, YMMV for your data ingest, and for p99s and QPS.

The dynamic at play here is that there are some projects that lend themselves to "general services" databases, where MySQL or PostgreSQL or anything else to hand is useful for them. And then there are specialized databases designed for purpose for certain types of workloads, data models, query patterns, use cases, and so on.

So long as "chaos" fights against "law" in the universe, you will see this desire to have "one" database standard rule them all, versus a Cambrian explosion of options for users and use cases.

sgarland · 2025-01-07T03:19:21 1736219961

While you’re not wrong re: Postgres and MySQL not being necessarily designed for PB-scale, IME many shops with huge datasets are just doing it wrong. Storing numbers as strings, not using lookup tables for low-cardinality data, massive JSON blobs everywhere, etc.

I’m not saying it fixes everything, but knowing how a DB works, and applying proper data modeling and normalization could severely reduce the size of many datasets.

PeterCorless · 2025-01-09T18:25:29 1736447129

Yes. The chance to do wrong things with data increases with scale.

PeterCorless · 2025-01-07T00:22:20 1736209340

It all depends on what kind of queries you're running. I came from the OLTP market, where you're generally doing single-row operations. Basic CRUD. Single table work on denormalized data.

Now go to OLAP, and a single query might be doing multiple table joins. It might be scouring billions of records. It might need to do aggregations. Suddenly "millions of ops" might be reduced to 100 QPS. If you're lucky.

And yes, that's even using fast local NVMe. It's just a different kind of query, with a different kind of result set. YMMV.

sgarland · 2025-01-07T00:33:43 1736210023

Not sure why you think OLTP doesn’t also do complex joins. In a properly normalized schema, you’ll likely have many.

But yes, OLAP is of course its own beast, and most DBs are suited for one or the other.

PeterCorless · 2025-01-07T01:16:34 1736212594

I think it's a matter of use case. Doing ad hoc data exploration on an OLTP system generally sucks the wind out of the performance. Even if you have some type of workload prioritization, isolation, and limitation, allowing data scientists and business analysts freely wandering through your production OLTP database sounds like a Bad Time.

The organization might say "Okay. Maybe you should do your ad hoc exploration on an OLAP system. Preferably our data warehouse where you can let your report run for hours and we won't see a production brownout while it's running."

So complexity of ad hoc joins in the warehouse generally can get more complex.

PeterCorless · 2025-01-07T00:18:36 1736209116

Note that TiDB did subject itself to Jepsen testing (relatively) early. Here's their 2019 results:

https://jepsen.io/analyses/tidb-2.1.7

The devil is in the details, and anyone who is looking to implement TiDB for data correctness should read through not just this but other currently-open correctness-related Github issues:

e.g., https://github.com/pingcap/tidb/issues?q=is%3Aissue%20state%...

TiDB currently has 74 open issues related to "correctness," and has closed 163.

https://github.com/pingcap/tidb

PeterCorless · 2025-01-06T18:42:06 1736188926

Anyone who has done timer or countdown-based work has to think the opposite. Whether that's sports or event production, rocket launches, or so on. You're not thinking about the time you've past/expended. You're thinking of the time you still have left. "Two minutes left in game." "We're live in 30 seconds" etc.

timerol · 2025-01-06T19:03:25 1736190205

NFL delay-of-game penalties are interesting for this, because when the clock first shows 0 seconds, that means that the team still has a full second to start the play.

lupire · 2025-01-06T18:57:39 1736189859

Countdown clocks run backwards, not forwards.

PeterCorless · 2025-01-06T18:34:23 1736188463

Yes. Though analog second hands often "tick" the seconds. (Some move the second hand smoothly.)