I spent the past year puzzling over the DB market as well, but I don't feel like I'm much closer to understanding it.
It appears that a lot of attention is now directed at the folks doing 100 MB queries, and the high end has moved past everybody's radar. My idea of an exciting product is Ocient, who have skipped over Cloud and gone for hyperscale on-prem hardware. Yellowbrick is also a contender here.
I have a lot of experience with Vertica, and they seem to have gotten stuck in this niche as well, with sales tilted towards big accounts, but less traction in smaller shops, and a difficult road to get a SaaS or similar easy-start offering.
There's a crossover point where self-managed is cheaper than cloud, but nobody seems to have any idea where it is. Snowflake will gladly tell you that your sub-$1M Vertica cluster should be replaced by $10M of sluggish SaaS, and that you are saving money by doing so. These decisions seem more in the realm of psychology or political science.
DHH's cloud exit was a refreshing take on the expense issue, even if it wasn't strictly in the database space -- the cost per VCPU and so forth that he documented is a good start for estimating savings, and he debunked a lot of the "hidden costs" that cloud maximalists claim.
In the business/financial space the biggest news to me was the correction in Snowflake's stock price, which seemed to indicate that investors were finally noticing metrics like price-performance, but they added a little more AI and went back into irrationality.
I'm heavily in favor of DuckDB, Hudi, Iceberg, S3 tables, and the like. Mixing high-end and low-end tools seems like the best strategy (although settling on one high-end DWH has also worked IME), and the low end is getting better and cheaper, squeezing out the mid-range SaaS vendors.
In research I found Goetz Graefe's work in offset-value coding exciting -- he's wired it into query operators in a way that saves a lot of CPU on sorting and joins/aggregation. This is a technique that I've applied favorably in string sorting, and it was discovered in the DB community decades ago but largely forgotten. (This work precedes 2024, but I'm a slow study.)
> There's a crossover point where self-managed is cheaper than cloud
Single data point here: before cloud managed dbs were a thing our smallish startup was running mysql on virtual servers by installing it from the linux package manager. Always worked great, runs without needing manual attention for years at a time once set up, so I've never felt the need to change.
So at least in some cases the crossover point is "right from the start".
I was seriously considering applying to Ocient (had an internal referral), but there's no way I could live on their salary ranges ($145K-185K quoted for senior SWE roles), given that I live in a HCOL area.
I don't know much about the financial side of the company, but it seems like a client-led effort by telco's etc. against the dreck that tech VC's keep pushing on them. That can't translate into decent salaries unfortunately.
Silicon Valley doesn't have a good record in the DB/DWH space; producing a fully-featured DBMS doesn't seem to fit the VC model.
It appears that a lot of attention is now directed at the folks doing 100 MB queries, and the high end has moved past everybody's radar. My idea of an exciting product is Ocient, who have skipped over Cloud and gone for hyperscale on-prem hardware. Yellowbrick is also a contender here.
I have a lot of experience with Vertica, and they seem to have gotten stuck in this niche as well, with sales tilted towards big accounts, but less traction in smaller shops, and a difficult road to get a SaaS or similar easy-start offering.
There's a crossover point where self-managed is cheaper than cloud, but nobody seems to have any idea where it is. Snowflake will gladly tell you that your sub-$1M Vertica cluster should be replaced by $10M of sluggish SaaS, and that you are saving money by doing so. These decisions seem more in the realm of psychology or political science.
DHH's cloud exit was a refreshing take on the expense issue, even if it wasn't strictly in the database space -- the cost per VCPU and so forth that he documented is a good start for estimating savings, and he debunked a lot of the "hidden costs" that cloud maximalists claim.
In the business/financial space the biggest news to me was the correction in Snowflake's stock price, which seemed to indicate that investors were finally noticing metrics like price-performance, but they added a little more AI and went back into irrationality.
I'm heavily in favor of DuckDB, Hudi, Iceberg, S3 tables, and the like. Mixing high-end and low-end tools seems like the best strategy (although settling on one high-end DWH has also worked IME), and the low end is getting better and cheaper, squeezing out the mid-range SaaS vendors.
In research I found Goetz Graefe's work in offset-value coding exciting -- he's wired it into query operators in a way that saves a lot of CPU on sorting and joins/aggregation. This is a technique that I've applied favorably in string sorting, and it was discovered in the DB community decades ago but largely forgotten. (This work precedes 2024, but I'm a slow study.)