But when you publish a paper on Monarch, aren't you giving away the core idea? I...

cortesoft · on Aug 29, 2020

They aren’t saying it is difficult without giving away trade secrets, they are saying it is difficult because the software has dependencies on internal services, which themselves have dependencies on other internal services. Basically, in order to run it you need to also run the entirety of Google’s tech stack. This doesn’t work for an open source project.

They aren’t worried about ‘giving away the idea’, they just don’t have an easy technical way to open source just the one component.

RcouF1uZ4gsC · on Aug 29, 2020

> But when you publish a paper on Monarch, aren't you giving away the core idea?

The thing is Monarch is not the secret sauce that makes Google money. Every other company in the world could be running monarch, and Google revenue and market position would not be affected.

Publicizing this is a win-win situation. The idea gets exposure and potentially other devs can be benefit, and Google gets some good PR as a place where devs get to work on cool stuff.

jeffbee · on Aug 29, 2020

You could definitely build Monarch from this paper. It is very detailed. But keep in mind that Monarch is already ten years old.

joshuamorton · on Aug 29, 2020

[Also a Googler, opinons mine] In addition to what the other people are saying: there are some limitations to monarch (or really, the data upload path) that are quite annoying, so monarch isn't even necessarily the "best". It's just very good. There are ways to improve it.

The issue is, even if you give away the secret sauce that doesn't really help with making the secret sauce scale or whatnot, nor does anyone that isn't a large cloud provider need a custom solution like monarch. Prometheus or Datadog work fine for everyone else. This might be interesting reading for those companies, but also it might not be, because those can't be as centralized as monarch is (consider if prometheus had an API and ran a centralized cluster of data-ingestion servers, and you made time series to that global, Prometheus-owned, cluster).

m0zg · on Aug 30, 2020

> nor does anyone that isn't a large cloud provider need a custom solution like monarch

I'm not actually sure that's true. It's like many other things inside Google - people outside don't necessarily understand the value or know what's actually possible, because they've never experienced anything similar. It's sort of like trying to discuss the finer points of the taste of oysters with someone who has never tasted them.

I would very much like the feature set of Monarch (and streamz), without the maintenance overhead or even insane scale. Very, very few companies out there need to run anything at anywhere near "billion-user" scale, but literally all of them could benefit from painless and detailed monitoring that Monarch offers.

jldugger · on Aug 29, 2020

The one thing I really want (which apparently Monarch has) is histogram retention. I'm often called upon to summarize service latency as global p50 and p95, and at the sheer volume of data we have, we aggregate that metric. Thus I am left calculating an average of p95s, which isn't super useful.

To the best of my knowledge, nothing else in the market does that.

jeffbee · on Aug 30, 2020

Stackdriver/Google Cloud monitoring is backed by Monarch, so if you want the flavor of a Monarch distribution-valued metric, see the docs:

https://cloud.google.com/monitoring/api/ref_v3/rest/v3/Typed...

Since the distribution is represented by a CDF of buckets, there's no guarantee that you'll get an accurate representation of the median or any other quantile. On the other hand you'll get an exact average.

scottlamb · on Aug 30, 2020

Monarch has distributions with predetermined bucket boundaries. These are indeed very useful.

Pet peeve: it can calculate and graph something it calls a quantile. But if the value is in the middle of a large bucket, it will just interpolate or something and the result will be terribly misleading. It'd be much better if it gave lower/upper bounds.

I try to use distributions via questions of the form "what fraction of values are less than / greater than N [which I've verified is a bucket boundary]?". This gives you an answer you can trust.

pkhuong · on Aug 30, 2020

Circonus (https://www.circonus.com/) supports both recording histogram directly, and merging histograms for analysis. IIUC, it also supports first-class timeseries data similarly to Monarch, where each data point has a high precision timestamp that does not have to align with other timeseries in the data set.

einpoklum · on Aug 29, 2020

> aren't you giving away the core idea?

The core idea is a distributed time-series DBMS... not much to give away there. It "gives away" some architectural novelty, but it's not the solution to P=NP. These papers typically describe engineering feats more than they do a revoluntionary idea.