Hacker News new | past | comments | ask | show | jobs | submit login

Interesting approach, although I think SQL Server could do this kind of stuff natively (i.e. partition based on something easily indexable). Since Microsoft own GitHub, they could give it a try and potentially not go broke on licensing fees like the rest of us would.

https://docs.microsoft.com/en-us/sql/relational-databases/pa...




>Since Microsoft own GitHub, they could give it a try and potentially not go broke on licensing fees like the rest of us would.

They are currently contributing to both Vitess and Rails. I am guessing they are also helping / a paying customer of PlanetScale.

Which means all the improvement and edgecase are battle tested on Github and upstreamed. I much rather they continue their current path. So others could enjoy the Github Stack.

I know HN hate Oracle and MySQL. But generally speaking I think Oracle has been doing a great job in Java and MySQL development.


I can't speak for HN but I don't hate Oracle. The engineering is top notch. It's the "screw the customer" attitude of the sales dept that no doubt emanates from the guy at the top that annoys me no end.


> The engineering is top notch.

Do you still think so after reading this: https://news.ycombinator.com/item?id=18442941


Read it, and well sure it is a challenge to work on an old codebase, and you need to jump through a few hoops that you don't need in pristine projects. Something tells me this guy wasn't able/willing to make the effort. (Could be a language problem, naming a "bug" a "bag" for example, u and a are not even close on the keyboard).


> (Could be a language problem, naming a "bug" a "bag" for example, u and a are not even close on the keyboard).

I think this is reading far too much into nothing. He writes 'bug' correctly about 5 times. I'm not an expert in psycholinguistics, but I suspect there's a phenomenon where you can mangle your internal pronunciation of the word and hit the wrong vowel.


> I think this is reading far too much into nothing.

Could be. I could have left that somewhat tenuous side note out, my argument doesn't rest on it.


Though I agree that old codebases are a particular challenge, picking on word confusion is unnecessary. Even if minutiae is important in a legacy codebase. Dyslexia is a real thing. So are linters thankfully.


> The engineering is top notch

Is it? Can you give me an example of the top notch engineering from "modern" Oracle?

I mean this as a genuine question, I am not that familiar with recent work coming out of Oracle.


Can you give more parameters for what something needs to qualify for your definition of "modern"?

The database harkens back to when computers were new. There's a ton of money that goes into continued development of it, and is vertically integrated, including custom hardware and the software to run it. It's extremely expensive hardware - they still sell SPARC servers, running Solaris if that's what you want (but they do also support a custom Linux kernel for their hardware).

It's such a high end niche that there's only a handful of companies that can even run the benchmark competitively because it just costs so much in hardware to play at that level, which makes it very opaque unless you're fluent in a lot of terms, some of them proprietary, others not. Eg https://blogs.oracle.com/exadata/post/exadata-uses-persisten... it's an absolutely fascinating journey into getting better SQL performance that involves some really high end shit, and (like kubernetes) most of the people out there just don't play in the same league. Which isn't a judgement against them and their needs, but it costs a lot of resources to wring microseconds more performance from a multi-million dollar machine. An AWS EC2 cloud VM, this ain't.


> It's such a high end niche that there's only a handful of companies that can even run the benchmark competitively

Alternatively, you don’t see benchmarks because Oracle’s licensing bans posting benchmarks - https://www.brentozar.com/archive/2018/05/the-dewitt-clause-...


> Can you give more parameters for what something needs to qualify for your definition of "modern"?

The ability to tolerate machine failure with zero downtime and zero data loss.


JVM G1 GC is pretty amazing.

Sure, Shenandoah (started at RedHat) is even more brutally mindblowing, but it has a constant overhead (and maybe obviously, maybe not, but it builds on the already existing pretty good GC infrastructure in the JVM).

So at least it seems the Hotspot Group is left mostly alone to do their high quality work.


graalvm is extremely impressive.


After I patched my third trivial bug in the Oracle InstantClient libraries when Oracle couldn't/wouldn't, that was the last straw for me. They have some impressive capabilities but I don't think it's because their engineering is great; I think it's because enough money can solve any problem.


What github did was not table partitioning but schema separation. Splitting tables between separate disk partitions works well when the storage is local to the engine and historically got around the size/io limits of hard disks but not necessarily suited to modern scaling problems.

What github did instead was to separate out whole sub-systems from the main db cluster to separate clusters so that requests could hit completely separate engines so that the number of connections to each would be reduced and the chance of breaking something by a mistake in a cross-schema query is reduced by physical separation.

I think you could still do this with SQL Server but why bother if you already use MySQL and the tools exist there.


As fas as I understand, the partitioning is rooted at the application level, by preventing cross-domain joins:

> Building on top of schema domains, two new SQL linters enforce virtual boundaries between domains. They identify any violating queries and transactions that span schema domains by adding a query annotation and treating them as exemptions. If a domain has no violations, it is virtually partitioned and ready to be physically moved to another database cluster

Table-level partitioning doesn't help with this (AFAIU), as queries access multiple tables anyway (without app-level changes).

The standand db-level feature closest to what they're doing, if somebody "wants to try this at home", is probably tablespacing (or separate dbs, in the next step).

I see some inspiration from microservices (separation of models/storages), except that they're (I suppose) keeping the monolith approach.


You can kind of do the same thing with SQL server table partitions but it would largely rely on the original data model being very clean. Depending on how old that code base is, and how experienced the developers were, that might not have been an option I suppose.

I am glad it isn't me trying to split it up :-)


Also Microsoft owns Citus, which is scalable sharding for postgresql.


Yep, they should throw out the window all these years of MySQL experience and expertise and start from scratch :s




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: