> return a custom data type that represents the composed set of functions, and w...

djk447 · on Oct 19, 2021

(NB: Post author here)

Yeah. It's a bit mind bending I guess. Will try to explain but might just need a longer format thing and we will probably do a piece on this in the future.

Essentially, each function returns a special "PipelineElement" datatype that identifies the function, but not what it's acting on. When you string them together, it returns a "PipelineElement" that represents the composition of all the functions with the correct ordering etc. That can then be applied to the incoming `timevector`.

As far as I know, no one's done this before. I don't think it's a completely new idea, Stonebreaker thought about how you can treat code as data and thought that bringing data and code closer together was an important goal of the Postgres project, but this specific way of doing it is pretty new and out there I think!

I think most people would think that in order to do this you'd need to do something like modify the parser or something more invasive.

(We do use one planner hook on the function call, the expression simplification one that allows us to convert something like `timevector(ts, val) -> sort() -> delta() -> abs() -> sum()` into `timevector(ts, val) -> (sort() -> delta() -> abs() -> sum())` so that all the operations can be done in a single pass. But that's pretty minor. If we need to figure something out around the common subexpression elimination we may need to hook into the planner a little more, we'll see).

djk447 · on Oct 19, 2021

Oh yeah, and building on this, this is also the source of the ability to actually store the pipeline element in the database, say in a column and apply it later that I mentioned in a comment below. That's one of the cool mind-bending things this allows.

The example I gave below was building on the example in the post, where we had thermometers. Except now think about if you have thermocouples and each thermocouple had its own calibration curve, and they were different types so they might even have different types of equations providing the calibration curve.

You can now store the pipeline that does the correct adjustments and calculates the actual temperature separately for each `device_id` and JOIN to the "calibrations" table to apply the calculation.

It's kinda crazy, but really cool and I'm hoping to do a post on that whole thing at some point to encourage people to play around with it and see what sorts of things it allows...

derefr · on Oct 19, 2021

For your next magic trick, I'd suggest using these to build a Postgres equivalent to Lucene's persisted levenstein automata :)

chousuke · on Oct 21, 2021

From your description it sounds like you've implemented in PostgreSQL something akin to Haskell's IO, which is also a datastructure representing operations, with ordering enforced by it beind a monad. That sounds pretty cool.