As someone interested in PL theory, I've long found the query language exposed b...

apk · on Aug 30, 2020

The query language is the brainchild of John Banning, one of the authors of the paper, and has a long history behind it. In 2007 or so he started working on a replacement for Borgmon's rule language; the thinking at the time was that the main problem with Borgmon was that its language was surprising and difficult for casual users to grasp. (And with a monitoring language, there are only casual users.)

That work eventually resulted in a language called Optic, which was indeed (IMO) a very nice cleanup of Borgmon. Ultimately though that work got shelved in favor of Monarch, whose focus was less on the language problems of Borgmon and more on the points listed in the introduction of the paper, especially points 1, 3, and 4 (at least in my memory).

The underpinnings of the query data model and execution model got hashed out reasonably well as part of the first implementation of Monarch, which started in earnest in late 2008 or early 2009. But the textual form of the query language suffered for quite a long time after that. I wrote the first crappy version of an operators-joined-by-pipes language sometime in 2010. ("Language" is a generous term; John liked to refer to it in a kindly way as "an impoverished notation.") But it was clear even then that the basics of that syntax were appealing: they lined up nicely with how our users mentally constructed their queries. "You start with the raw data; then apply a rate; then aggregate by these fields; then take the maximum over the last five minutes" etc.

Through a couple of revisions over the subsequent few years, that "impoverished notation" eventually got embedded, through some awful operator overloading, as a kind of DSL inside of Python. But it was clear to everyone that it would be impossible to release that publicly to GCP users; it was much too clunky, and also by then tied inextricably to Python idiosyncrasies. So in about 2015, give or take, we came back to the question of what a better textual notation might look like.

The obvious first choice was to see if we could somehow twist SQL into being useful, possibly with some custom functions or very minor extensions. Around this time there was a large effort going on to standardize several different SQL dialects that were being used by internal systems (BigQuery/Dremel's SQL dialect was not the same as Spanner's dialect, etc). So it felt like there was a convenient opportunity to somehow fit time series data into the same model.

John did a bunch of due diligence to try to make that idea work, but it just wouldn't fly. I remember a list he had of about fifty of the most common kinds of queries, written with a SQL version next to (an early version of) Monarch's current query language. Nearly everyone he showed it to, across the spectrum of experience and seniority, both SWE and SRE, said "of course I'd rather read and write SQL, let me look at that list"... and then went through it careful and came out thinking, well, maybe not.

I don't know if there are any interesting conclusions to draw from the history of it, except that language design is really hard. I agree that it's a fun little language, and I'm very happy that John and the team managed to get it out publicly in Stackdriver.

1010011010 · on Aug 30, 2020

Googlers still have to use the terrible python dsl (“mash”). Even worse: they have to use it wrapped in a different terrible python dsl (“gmon”). Sigh.

lokar · on Sept 1, 2020

I was mostly using th new notation when I left in 2018.

joshuamorton · on Aug 30, 2020

Stream values can also be strings, and the language is terrible for dealing with string-valued streams when they come up (but they come up surprisingly often, you just don't normally worry about them too much).

If you ignore the issue of alignment, I actually think that a more conventional array based language, either something SQL-like or numpy-like, would be more accessible to most people. And things like windowing are incredibly unintuitive for most users.

ethbro · on Aug 30, 2020

Haven't read the paper yet, but could you expand on the tuple use? It seems like the odd person out in that list of primatives.

xyse53 · on Aug 30, 2020

They are used when joining timeseries.

jeffbee · on Aug 30, 2020

A join creates a tuple but that's not the only way to use them. You can also just produce a tuple with an expression in the query such as (val(), 5, "dog") if you like. The whole language is documented here:

https://cloud.google.com/monitoring/mql/reference