Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone interested in PL theory, I've long found the query language exposed by Monarch surprisingly interesting (briefly discussed in section 5.1 but the description doesn't quite do it justice). It's a functional language, a breath of fresh air compared to "real programming languages" in use at Google like C++, Java, or Go.

The most interesting idea is that its native data types are time series of integers, doubles, distributions of doubles, booleans, and tuples of the above. This means that the data you operate on intrinsically consist of many timestamped data points. It's easy to apply an operation to each point of the data, and it's also easy to apply operations on a rolling window, or on successive points. This makes the language have the feel of an array-based language, but even better because the elements are timestamped and the array can be sparse.

Furthermore the presence of fields in each data point adds more dimensions to the level of aggregation (not just the inherent time-based). Now the language has the feel of a native multi-dimensional array language. It feels amazing to program in it. You can easily do sophisticated queries like figuring out how many standard deviations each task's RPC latency for a specific call is above or below all tasks' mean, for outlier detection.



The query language is the brainchild of John Banning, one of the authors of the paper, and has a long history behind it. In 2007 or so he started working on a replacement for Borgmon's rule language; the thinking at the time was that the main problem with Borgmon was that its language was surprising and difficult for casual users to grasp. (And with a monitoring language, there are only casual users.)

That work eventually resulted in a language called Optic, which was indeed (IMO) a very nice cleanup of Borgmon. Ultimately though that work got shelved in favor of Monarch, whose focus was less on the language problems of Borgmon and more on the points listed in the introduction of the paper, especially points 1, 3, and 4 (at least in my memory).

The underpinnings of the query data model and execution model got hashed out reasonably well as part of the first implementation of Monarch, which started in earnest in late 2008 or early 2009. But the textual form of the query language suffered for quite a long time after that. I wrote the first crappy version of an operators-joined-by-pipes language sometime in 2010. ("Language" is a generous term; John liked to refer to it in a kindly way as "an impoverished notation.") But it was clear even then that the basics of that syntax were appealing: they lined up nicely with how our users mentally constructed their queries. "You start with the raw data; then apply a rate; then aggregate by these fields; then take the maximum over the last five minutes" etc.

Through a couple of revisions over the subsequent few years, that "impoverished notation" eventually got embedded, through some awful operator overloading, as a kind of DSL inside of Python. But it was clear to everyone that it would be impossible to release that publicly to GCP users; it was much too clunky, and also by then tied inextricably to Python idiosyncrasies. So in about 2015, give or take, we came back to the question of what a better textual notation might look like.

The obvious first choice was to see if we could somehow twist SQL into being useful, possibly with some custom functions or very minor extensions. Around this time there was a large effort going on to standardize several different SQL dialects that were being used by internal systems (BigQuery/Dremel's SQL dialect was not the same as Spanner's dialect, etc). So it felt like there was a convenient opportunity to somehow fit time series data into the same model.

John did a bunch of due diligence to try to make that idea work, but it just wouldn't fly. I remember a list he had of about fifty of the most common kinds of queries, written with a SQL version next to (an early version of) Monarch's current query language. Nearly everyone he showed it to, across the spectrum of experience and seniority, both SWE and SRE, said "of course I'd rather read and write SQL, let me look at that list"... and then went through it careful and came out thinking, well, maybe not.

I don't know if there are any interesting conclusions to draw from the history of it, except that language design is really hard. I agree that it's a fun little language, and I'm very happy that John and the team managed to get it out publicly in Stackdriver.


Googlers still have to use the terrible python dsl (“mash”). Even worse: they have to use it wrapped in a different terrible python dsl (“gmon”). Sigh.


I was mostly using th new notation when I left in 2018.


Stream values can also be strings, and the language is terrible for dealing with string-valued streams when they come up (but they come up surprisingly often, you just don't normally worry about them too much).

If you ignore the issue of alignment, I actually think that a more conventional array based language, either something SQL-like or numpy-like, would be more accessible to most people. And things like windowing are incredibly unintuitive for most users.


Haven't read the paper yet, but could you expand on the tuple use? It seems like the odd person out in that list of primatives.


They are used when joining timeseries.


A join creates a tuple but that's not the only way to use them. You can also just produce a tuple with an expression in the query such as (val(), 5, "dog") if you like. The whole language is documented here:

https://cloud.google.com/monitoring/mql/reference




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: