Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ve been writing some pretty ambitious Hive queries at work lately.

As I learn more about SQL and pull off more complex queries, my respect for it deepens. To have such power and support so many use cases with so few constructs is really an engineering feat. It’s timeless for a reason.

Some of my relatively common access patterns are awkward to express, but they can still be expressed in a few lines + a CTE or two, which is really impressive for a language so small.

This is not to say we can’t do better. But SQL has achieved a deep resonance with its problem space that most tools don’t even come close to. The brightest minds and most effective tooling shops in our field would be lucky merely to do as well.



This shows that humans can learn languages and get motivated by mastering them. This does not tell anything about the consistency, composability or orthogonality if SQL. Those qualities affecting the newcomers effort to learn it.


I think you mean the relational model is timeless. SQL is simply just not a great language, for an otherwise great idea.


Blindly asserting that SQL is bad without providing any argument or proof does not contradict OP's opinion on SQL.


Almost anything written by Date and/or Darwen during the last 30 years is stock full of arguments and proofs that SQL is a horrendously poor language. pls don't come complaining that all of those arguments are not replicated here.

I'll give you just one : in SQL, you can write "WHERE 4 > (SELECT COUNT() FROM ... WHERE ... )" but you cannot write "WHERE (SELECT COUNT() FROM ... WHERE ... ) < 4" [Darwen, "The askew Wall"].

Or iow, for certain specific kinds of 'a' and 'b', you can write "a < b" but not "b > a". Do you actually know ANY language that exposes that kind of thing ??? REALLY ???


The article itself gives a pretty decent description of how the language falls down.. and the ggp’s expression of his impression pretty trivially just maps to the relational model/algebra.... so all necessary arguments have been made :-)


really? the article has a handful of strange edge cases that you'll almost never smack into in practice, and that's cause to say the language is crap?

Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

If it's not broken, let's not fix it, eh?


The only edge cases I could find in the article were the discussion of nulls... which aren’t so much an edge case as they are just an ever-present problem that no one really notices until it hurts (its not at all difficult to write a broken-on-null query). Is there something else you’re referring to? Afaict, everything else was just trivially observable aspects of the language

>Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

In what universe does popularity imply quality? Certainly not the one I’m in — Avengers endgame is apparently #2 gross worldwide ever :-)


I think the point he is making is that why hasn't anything better come up to better represent the relational model in over 40 years?


There are other models as well eg datalog, but popularity is a factor of many things — a major factor with databases is that historically the engine is far more important than the language; you would expect users to choose an engine, and whatever language came with it. Which implies that the decision is on the producer side, not the consumers.

Which finally implies that usage popularity is not adequately explained by user preference, because you wouldn’t expect user language preference to have significant impact on the decision making process.

But regardless, are you also going to claim that C, Oracle, IBM, Microsoft, etc were optimal in their respective fields, during their respective eras, because they held total market domination? Incumbency is one hell if a drug..

Bandwaggoning and deffering to the status quo is not argument for quality; there are too many factors involved beyond quality itself. Fixable flaws in the language have been pointed out and not been addressed in this conversation chain. Alternatives languages exist, both academically and practically; they don’t come alone. Its not a competition in a void, the ecosystem has to move together. This occurs in every aspect of tech (did you ever wonder if there could be an OS other than unix and window styles?).

Asking why SQL was never beaten out in popularity is a hugely different question from whether there can be a better language than SQL, with largely unrelated amswers(Oracle/IBM were hardly fair players in anything they did)


> There are other models as well eg datalog, but popularity is a factor of many things

The lack of popularity is also a reflection of how the string of next best things have failed to actually deliver on their promise, accompanied by the lack of a rational argument to adopt them instead of using a time-tested technology.


What time-tested technology are you referring to? I'm only talking about the SQL language -- not the relational model, not the RDBMS engine, not the drivers.

The SQL language is just an API to the total engine; the majority of its value derives from the relational model; the value of the relational algebra is not being questioned. Only the particular interface to describe the relational algebra.

The power of the JVM is not the power of the Java language. You can have other languages make use of the same power Java has access to by targetting the JVM, and as a programming community we accept that just fine (and we also accept that despite Java having many known flaws as a language, its status as a first-party Oracle interface to the JVM, and its position within the status quo, makes it extremely difficult to upend; but that hardly implies Java is some perfect language, as most HN users will trivially acknowledge).

In the same fashion, the SQL language is (ideally) decoupled from the RDBMS engine; it can be replaced. But in practice, they're not so decoupled, and SQL has been consistently the only first-class interface to the engine, so like Java, it (can) enjoys far greater ubiquity and stability from the quality of the underlying tech, than the language itself may deserve.

So once again; popularity/stability of the SQL language is not (necessarily) exemplary of the quality of the SQL language. It's much more likely that it represents the value of the RDBMS, and the relational model (which I don't think anyone is arguing against), and the SQL language enjoys a free ride by being the only interface thats even offered.

I mean hell, just read the article. It has trivially observable flaws in its design and semantics (like stuffing a 3VL logic into a 2VL language). EdgeQL may or may not be the optimal solution, and I'm not arguing (or interested in arguing) one way or the other on that, but I don't see how anyone can reasonably argue that SQL's ubiquity shows its perfection, when it's so clearly imperfect (because those flaws are being very directly pointed out).


You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years. For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

Why is SQL the exception to the rule? You talk about popularity, but its really a question about stability. Remember, SQL is not that popular, as the NOSQL movement and comments like yours prove.


> For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

I don't think you've picked a very good example here of a field where the lingua franca has progressed over time. C still dominates nearly all systems programming, against all odds. It dominates operating system development, driver development, embedded systems development, pretty much anything that demands an extremely high degree of efficiency.

I agree with your overall point regarding SQL and its superior stability, but C is an example of where popularity won out against many better options over time. It's not like Rust was the first mainstream attempt to create safer languages for systems development either. There's been many over the years designed to fill the same niche, with more modern features, tools and goals. For better or worse, C still dominates the landscape for reasons entirely independent of whether or not it was truly the best tool for the every task.

Also, I'd contend that your claim regarding the popularity of NoSQL is incorrect. If you only talk to web developers writing Javascript, you'd get the impression that NoSQL is taking over the world. But the reality is that amongst pretty much every other demographic, SQL is still highly regarded as the ideal technology.


>You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years

Well yes, because I argued that it was irrelevant to the original question: can SQL be improved?

Additionally I actually did address the question of SQL’s stability, at least partially: it’s not SQL thats so stable, but the relational model. SQL just happened to be IBM’s, and IBM was highly successful pushing its DB around, and Oracle (kind-of) cloned it to push their DB around with less contention, and so it went on. But its the RDBMS engine that primarily pushes a DB’s value; The SQL language is a ride-along.

And once more to be clear: its longevity is not a result of SQL’s quality, but the quality of the relational model. Thus its popularity, and stability (its also not that stable, in that its heavily extended by everyone in arbitrary fashion), is irrelevant to the original question.

>SQL is not that popular, as the NOSQL movement

If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

If they’ve chosen NoSQL to avoid the relational model, then their choice says nothing about the SQL language.


I think the historical truth is that Oracle was first to market, and IBM just adopted SQL so it would not risk being waaaaaaaaay too late to the market "party". [Darwen, "Why are there no relational DBMS's"]


> If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

Yes, they are trying to avoid the relational model by avoiding SQL because SQL has basically been the face of the relational model for 40 years. if there was something better, they'd use that. Nothing better has come up, and i don't know why either.


Better things have come up. See the projects list at http://www.thethirdmanifesto.com/ .


but no-one is using them, so are they actually better? (philosphical question, also pertains to a ton of better options... I mourn BeOS)


The particular meaning of "better" here was "better at representing the relational model".

If you want to believe that popularity is a measure or reliable indicator of quality, I can't be bothered with you.

PS I have a worked-out example on my site of what it takes to enforce a business rule "no one has a salary higher than his manager's" in SQL (> 100 LOC) and in SIRA_PRISE (one single relatively simple formula of the RA to declare). You decide which is "better".


Much of the comments (and the article) criticize SQL as being non-standard and difficult to learn. These critiques have been around as long as SQL has.

However, there is a more insidious problem with SQL: it's all too easy to write SQL statements that have O(N^2) complexity. A simple JOIN can easily result in O(N^2) complexity, yet there aren't easy tools to identify these performance issues. As a result, as a database grows, things that once were executed quickly take forever.

I'd like to see the end of joins, replaced with something that is more explicit about what is happening under the hood.


Your concerns are warranted.

However. You are wrong tying the problem to joins. I remember an analyst who launched a SELECT COUNT because he was just curious about the number of rows in the table. No joins involved but users did suffer. Elsewhere in this thread I've seen the problem be tied to table scans, and that's also wrong. A table scan isn't a problem if it's a 5-page table. As Darwen often argued : why are people always only lamenting about those couple of tables with millions of rows ? Why should we deprive users of the power of relational algebra if their database simply aren't that big ?

It's a matter of determining the cost of the data access strategy (regardless of JOIN/EXCEPT/what_have_you) and (implementing a protocol for) capping it at runtime (or earlier if possible). No need for language changes here.


> Your concerns are warranted. However. You are wrong tying the problem to joins. ... It's a matter of determining the cost of the data access strategy

I was using joins as the common example. Sure, there are many other ways of using SQL to shoot yourself in the foot, but most of the issues I've run into seem to come from joins.


There are real problems with COBOL and, yet, it's used in the most critical parts of our societal infrastructure. People just learned to live with the problems and are by now completely oblivious to them.


SQL is too broken to even try fixing.


there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines. this was largely overlooked in the article and undefined what operations between set of multiple cardinality would require and if this would result back to what they call unwieldy runtime errors when doing math over set of mismatched cardinality.


>there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines

I don’t see the relationship between those two things; or rather, what the latter even means. Can you expand?

Also, apparently the “sets” described in the article are actually multisets aka bags [0] — so the same semantics as any other RDBMS/SQL. No idea why they’d confound the two, especially when set vs bag semantics is a well-discussed topic in the literature..

[0] https://edgedb.com/docs/edgeql/overview/#everything-is-a-set


something like this

SELECT StudentID, Name, (SELECT COUNT(x) FROM StudentExam WHERE StudentExam.StudentID = Student.StudentID) / (SELECT COUNT(x) FROM StudentEnrollmetnPeriods WHERE StudentEnrollmetnPeriods.StudentID = Student.StudentID) AS ExamsTakenPerYear FROM Student ORDER BY ExamsTaken DESC;

works because the result from aggregate are scalar and thus math operations always meaningful, albeit this in particular might not be the most brilliant example, there's definitely a case for running such operations on the server, because you might filter on the results, say, like

"select all student with less than 2 exam per year average"

at which point either you have the distinction between scalar and set in the language itself and you can filter invalid queries at the parsing level or you have to do the checks at runtime when two set are in an operation with mixed cardinality, halt the query and throw an error.

edit: count(x) because I don't know how to escape asteriks


Set and scalar should be regarded as data types in a more traditional programming language, and treating a 1-element set as a scalar is a type coercion that works as much as other type coercions: only if the data matches expectations.


you're missing the point: it's not about coercion or data types per se, it's whether the error happens at parsing or runtime; of course the software has ways to figure out intention but is it something we actually want?

it's not coincidence that the article complains about the same thing but in reverse:

> This is legal, but only if the subquery returns not more than one row. Otherwise, an error would be raised at run time.

except the 'fix' reintroduce it in a way that's subtler and way hard to detect because now everything is a set even when the intent is to have a scalar.


Hive queries are written in HiveQL, not SQL. I used to write a lot of Hive and Impala queries, and going back to plain SQL is disappointing.


While based on SQL, HiveQL does not strictly follow the full SQL-92 standard, just like all the other SQL dialects out there. hiveQL is SQL.


That's a fair argument, but from personal experience HiveQL has some pretty major features added on that make it "have such power and support so many use cases" in a way that other common variants like the one used by MySQL doesn't.

A lot of HiveQL's power comes from extending the language, not from the use of a small set of timeless features.


What is your favourite HiveQl extension?


It supports so many use cases primarily because of the small number of fundamental "constructs".

I mean, if I gave you protons, electrons and neutrons, you could build the universe out of them!


That doesn't make sense, if I give you more things (I'm avoiding saying 'atom', but you know what I mean) then it's not the case that you can suddenly do less.

Expressiveness isn't inversely correlated to number of constructs.


Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.


I think your premise might be flawed.

Are we just assuming that protons and neutrons automatically pull in the quarks, neutrinos, other leptons and force carriers as well? Does the electron have a photon dependency? What about dark matter?

If not, then the our three particles are internally inconsistent and woefully incapable of building a universe. Otherwise, our set of dependencies is essentially the whole darn universe anyway.

Same goes for atoms. Give me enough and I can build a star or particle accelerator to create whatever elements or fundamental particles I want.


Wow, it was a simplistic passing metaphor. I was not expecting to get into the complexities of physics, but simply that if you operate at a more fundamental level, you can generally address a greater range of problems.


The point is that your metaphor is broken and incorrect. GP's point stands. Abstractions are leaky.


Even leaky abstractions can be useful, if they are taken in context. If you choose not to understand the spirit of my comment, and take its meaning literally to the point of it having no utility for you, then so be it. A significant number of upvotes suggest GP's point may be moot, and, dare I say, pedantic.

Also, the GPs suggestion of building an accelerator is as silly as someone writing an assembly compiler via a mush of SQL. It probably can be done, but it's not in the spirit of the topic!


Fair point. Abstractions, leaky or otherwise, are helpful! It does seem a bit like we've successfully abstracted the conversation into oblivion though. hehe

Anyway, I feel like your original comment and my reply are kind of doing the same thing at the meta-conversation level. We're stretching a metaphor in order to make some counterpoint argument. My intent with the excessive pedantry was to highlight the farce of conflating rough metaphor with substantive insight.

Anyway, internet conversations are hard. Thanks for engaging!


> Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

> In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.

I think what OJFord is getting at is, what if you had more subatomic particles to build with?


The more fundamental you go, the lesser number of things you need to form a "complete set".

Thus, whatever the complete set is for subatomic particles, it will most certainly have lesser members than for the complete set of atoms.

Getting back to my original point, SQL operates at a fundamental enough level that you do not need a great number of "constructs" to create a query that gives you whatever it is that you need.

I'm not saying it's perfect, but was just supplementing the OPs observation that SQL has surprisingly few "constructs".


Depends on whether photons exist or not.

Radiation makes the universe a lot more interesting.


In your analogy, SQL is the periodic table.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: