> return a custom data type that represents the composed set of functions, and we can (for at least part of the pipeline) handle simplification and combination of that into a single set of functions to run over the `timevector`
I'm really having trouble wrapping my head around this idea. I guess it speaks volumes about the potential and power of PostgreSQL in general.
Has PostgreSQL ever been used like this in the past, or is this really thinking outside of the box?
We don't use TimeScale or have any intention of in the immediate future (Snowflake provides 99% of what we need atm), but I'm always interested and impressed to hear what they're doing.
Yeah. It's a bit mind bending I guess. Will try to explain but might just need a longer format thing and we will probably do a piece on this in the future.
Essentially, each function returns a special "PipelineElement" datatype that identifies the function, but not what it's acting on. When you string them together, it returns a "PipelineElement" that represents the composition of all the functions with the correct ordering etc. That can then be applied to the incoming `timevector`.
As far as I know, no one's done this before. I don't think it's a completely new idea, Stonebreaker thought about how you can treat code as data and thought that bringing data and code closer together was an important goal of the Postgres project, but this specific way of doing it is pretty new and out there I think!
I think most people would think that in order to do this you'd need to do something like modify the parser or something more invasive.
(We do use one planner hook on the function call, the expression simplification one that allows us to convert something like `timevector(ts, val) -> sort() -> delta() -> abs() -> sum()` into `timevector(ts, val) -> (sort() -> delta() -> abs() -> sum())` so that all the operations can be done in a single pass. But that's pretty minor. If we need to figure something out around the common subexpression elimination we may need to hook into the planner a little more, we'll see).
Oh yeah, and building on this, this is also the source of the ability to actually store the pipeline element in the database, say in a column and apply it later that I mentioned in a comment below. That's one of the cool mind-bending things this allows.
The example I gave below was building on the example in the post, where we had thermometers. Except now think about if you have thermocouples and each thermocouple had its own calibration curve, and they were different types so they might even have different types of equations providing the calibration curve.
You can now store the pipeline that does the correct adjustments and calculates the actual temperature separately for each `device_id` and JOIN to the "calibrations" table to apply the calculation.
It's kinda crazy, but really cool and I'm hoping to do a post on that whole thing at some point to encourage people to play around with it and see what sorts of things it allows...
From your description it sounds like you've implemented in PostgreSQL something akin to Haskell's IO, which is also a datastructure representing operations, with ordering enforced by it beind a monad. That sounds pretty cool.
I'm really having trouble wrapping my head around this idea. I guess it speaks volumes about the potential and power of PostgreSQL in general.
Has PostgreSQL ever been used like this in the past, or is this really thinking outside of the box?
We don't use TimeScale or have any intention of in the immediate future (Snowflake provides 99% of what we need atm), but I'm always interested and impressed to hear what they're doing.