We Can Do Better Than SQL

brightball · on May 9, 2019

I don't disagree that SQL can be improved. It's one of the biggest reasons I use Postgres in the first place because there are so many improvements available on top of SQL.

All that said...SQL is pretty darn effective. As a language, it's the true backbone of the internet today. It's readable, explicit, fairly concise and naturally translates to how data should be broken down for efficient storage...or make some trade-offs to allow for more efficient retrieval.

There are differences with different vendor implementations...but that's what different vendors are for - to find things the other guys are doing wrong and improving on them to build a better product.

I wish the folks luck in their work to improve things, but the language I've been able to rely on consistently over the last 17 years or so has been SQL...and I've worked with a lot of languages. SQL is the one that gets the most done reliably and lets me down the least often.

setr · on May 9, 2019

Afaict, the state of SQL as a grammar with tooling is kind of pathetic.

As a standardized language, it doesn’t really exist; everyone implements numerous extensions, and almost no one is fully ansi compliant

Almost all formatters attempt to be generic (believing standardization exists), and fail to support the full grammar for any dialect.

Across the board, all parsers have pathetic error message support (error on line 3, which is actually just the start of the statement).

The schema offers type constraints, but querying/ide’s extract no value from that (that is, types are statically specified/constrained, but query editors all pretend its fully dynamic)

Theres a lot of awkward nonsense, like where clauses are parsed before the select in most parsing engines, causing alias usage to fail without wrapping in a subselect/with clause

The grammars themselves are an inconsistent, ad-hoc mess

The grammar is also unnecessarily context-dependent (eg from must follow select, and where after that), making programmatic composition unnecessarily difficult

I don’t know how much of the tooling issue is a result of SQL as a language versus the history itself, but I can at least confirm that trying to parse multiple dialects is absolute hell, which would at least explain the sorry state of affairs for eg formatters.

But the majority of its expressive power derives from the relational algebra, and has nothing to do with the SQL grammar, and thats the majority of its value. It seems obvious to me that at the very least the compositional issue of SQL, and its self-inconsistent grammar, should be vulnerable to near-lossless improvement without too much struggle, though I can’t say what the alternative would actually look like.

But it seems like its riddled with a lot of unnecessary flaws

jeremyjh · on May 9, 2019

Actually standards compliance is really good these days. There are a lot of custom functions that are hard to do without in some reporting applications but behavior of SQL clauses is consistent across every engine.

namibj · on May 10, 2019

JetBrains tooling does use schema metadata for it's autocompletion. I'm not sure though if it's anywhere close to full-on Haskell autocompletion in the Atom editor (very fiddly and prone to break on minor version changes, I must say).

hannofcart · on May 10, 2019

Offtopic, but I am genuinely curious now about whether Atom's Haskell autocompletion is significantly better than in other editors like VSCode, or (n)vim with things like YouCompleteMe/NeoComplete/Deocomplete etc. Have you used other editors? What is your opinion?

Rapzid · on May 10, 2019

The SQL spec is actually hidden away behind some book or something you have to purchase.

I was experimenting around with creating an Entity Framework equivalent in Typescript and really wanted to create a SQL AST for use under the hood(optimizing queries, SQL push down, etc). Ended up using the PostgreSQL types and a Ruby plugin that binds some PostgreSQL libs to work on my POC. Crazy town.

pjmlp · on May 10, 2019

Big boys SQL is relatively close to the standard.

closeparen · on May 9, 2019

I’ve been writing some pretty ambitious Hive queries at work lately.

As I learn more about SQL and pull off more complex queries, my respect for it deepens. To have such power and support so many use cases with so few constructs is really an engineering feat. It’s timeless for a reason.

Some of my relatively common access patterns are awkward to express, but they can still be expressed in a few lines + a CTE or two, which is really impressive for a language so small.

This is not to say we can’t do better. But SQL has achieved a deep resonance with its problem space that most tools don’t even come close to. The brightest minds and most effective tooling shops in our field would be lucky merely to do as well.

avierax · on May 10, 2019

This shows that humans can learn languages and get motivated by mastering them. This does not tell anything about the consistency, composability or orthogonality if SQL. Those qualities affecting the newcomers effort to learn it.

asimpletune · on May 9, 2019

I think you mean the relational model is timeless. SQL is simply just not a great language, for an otherwise great idea.

geezerjay · on May 10, 2019

Blindly asserting that SQL is bad without providing any argument or proof does not contradict OP's opinion on SQL.

ErwinSmout · on May 11, 2019

Almost anything written by Date and/or Darwen during the last 30 years is stock full of arguments and proofs that SQL is a horrendously poor language. pls don't come complaining that all of those arguments are not replicated here.

I'll give you just one : in SQL, you can write "WHERE 4 > (SELECT COUNT() FROM ... WHERE ... )" but you cannot write "WHERE (SELECT COUNT() FROM ... WHERE ... ) < 4" [Darwen, "The askew Wall"].

Or iow, for certain specific kinds of 'a' and 'b', you can write "a < b" but not "b > a". Do you actually know ANY language that exposes that kind of thing ??? REALLY ???

setr · on May 10, 2019

The article itself gives a pretty decent description of how the language falls down.. and the ggp’s expression of his impression pretty trivially just maps to the relational model/algebra.... so all necessary arguments have been made :-)

marcus_holmes · on May 10, 2019

really? the article has a handful of strange edge cases that you'll almost never smack into in practice, and that's cause to say the language is crap?

Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

If it's not broken, let's not fix it, eh?

setr · on May 10, 2019

The only edge cases I could find in the article were the discussion of nulls... which aren’t so much an edge case as they are just an ever-present problem that no one really notices until it hurts (its not at all difficult to write a broken-on-null query). Is there something else you’re referring to? Afaict, everything else was just trivially observable aspects of the language

>Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

In what universe does popularity imply quality? Certainly not the one I’m in — Avengers endgame is apparently #2 gross worldwide ever :-)

orange8 · on May 10, 2019

I think the point he is making is that why hasn't anything better come up to better represent the relational model in over 40 years?

setr · on May 10, 2019

There are other models as well eg datalog, but popularity is a factor of many things — a major factor with databases is that historically the engine is far more important than the language; you would expect users to choose an engine, and whatever language came with it. Which implies that the decision is on the producer side, not the consumers.

Which finally implies that usage popularity is not adequately explained by user preference, because you wouldn’t expect user language preference to have significant impact on the decision making process.

But regardless, are you also going to claim that C, Oracle, IBM, Microsoft, etc were optimal in their respective fields, during their respective eras, because they held total market domination? Incumbency is one hell if a drug..

Bandwaggoning and deffering to the status quo is not argument for quality; there are too many factors involved beyond quality itself. Fixable flaws in the language have been pointed out and not been addressed in this conversation chain. Alternatives languages exist, both academically and practically; they don’t come alone. Its not a competition in a void, the ecosystem has to move together. This occurs in every aspect of tech (did you ever wonder if there could be an OS other than unix and window styles?).

Asking why SQL was never beaten out in popularity is a hugely different question from whether there can be a better language than SQL, with largely unrelated amswers(Oracle/IBM were hardly fair players in anything they did)

geezerjay · on May 10, 2019

> There are other models as well eg datalog, but popularity is a factor of many things

The lack of popularity is also a reflection of how the string of next best things have failed to actually deliver on their promise, accompanied by the lack of a rational argument to adopt them instead of using a time-tested technology.

setr · on May 10, 2019

What time-tested technology are you referring to? I'm only talking about the SQL language -- not the relational model, not the RDBMS engine, not the drivers.

The SQL language is just an API to the total engine; the majority of its value derives from the relational model; the value of the relational algebra is not being questioned. Only the particular interface to describe the relational algebra.

The power of the JVM is not the power of the Java language. You can have other languages make use of the same power Java has access to by targetting the JVM, and as a programming community we accept that just fine (and we also accept that despite Java having many known flaws as a language, its status as a first-party Oracle interface to the JVM, and its position within the status quo, makes it extremely difficult to upend; but that hardly implies Java is some perfect language, as most HN users will trivially acknowledge).

In the same fashion, the SQL language is (ideally) decoupled from the RDBMS engine; it can be replaced. But in practice, they're not so decoupled, and SQL has been consistently the only first-class interface to the engine, so like Java, it (can) enjoys far greater ubiquity and stability from the quality of the underlying tech, than the language itself may deserve.

So once again; popularity/stability of the SQL language is not (necessarily) exemplary of the quality of the SQL language. It's much more likely that it represents the value of the RDBMS, and the relational model (which I don't think anyone is arguing against), and the SQL language enjoys a free ride by being the only interface thats even offered.

I mean hell, just read the article. It has trivially observable flaws in its design and semantics (like stuffing a 3VL logic into a 2VL language). EdgeQL may or may not be the optimal solution, and I'm not arguing (or interested in arguing) one way or the other on that, but I don't see how anyone can reasonably argue that SQL's ubiquity shows its perfection, when it's so clearly imperfect (because those flaws are being very directly pointed out).

orange8 · on May 10, 2019

You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years. For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

Why is SQL the exception to the rule? You talk about popularity, but its really a question about stability. Remember, SQL is not that popular, as the NOSQL movement and comments like yours prove.

ajxs · on May 10, 2019

> For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

I don't think you've picked a very good example here of a field where the lingua franca has progressed over time. C still dominates nearly all systems programming, against all odds. It dominates operating system development, driver development, embedded systems development, pretty much anything that demands an extremely high degree of efficiency.

I agree with your overall point regarding SQL and its superior stability, but C is an example of where popularity won out against many better options over time. It's not like Rust was the first mainstream attempt to create safer languages for systems development either. There's been many over the years designed to fill the same niche, with more modern features, tools and goals. For better or worse, C still dominates the landscape for reasons entirely independent of whether or not it was truly the best tool for the every task.

Also, I'd contend that your claim regarding the popularity of NoSQL is incorrect. If you only talk to web developers writing Javascript, you'd get the impression that NoSQL is taking over the world. But the reality is that amongst pretty much every other demographic, SQL is still highly regarded as the ideal technology.

setr · on May 10, 2019

>You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years

Well yes, because I argued that it was irrelevant to the original question: can SQL be improved?

Additionally I actually did address the question of SQL’s stability, at least partially: it’s not SQL thats so stable, but the relational model. SQL just happened to be IBM’s, and IBM was highly successful pushing its DB around, and Oracle (kind-of) cloned it to push their DB around with less contention, and so it went on. But its the RDBMS engine that primarily pushes a DB’s value; The SQL language is a ride-along.

And once more to be clear: its longevity is not a result of SQL’s quality, but the quality of the relational model. Thus its popularity, and stability (its also not that stable, in that its heavily extended by everyone in arbitrary fashion), is irrelevant to the original question.

>SQL is not that popular, as the NOSQL movement

If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

If they’ve chosen NoSQL to avoid the relational model, then their choice says nothing about the SQL language.

ErwinSmout · on May 11, 2019

I think the historical truth is that Oracle was first to market, and IBM just adopted SQL so it would not risk being waaaaaaaaay too late to the market "party". [Darwen, "Why are there no relational DBMS's"]

orange8 · on May 13, 2019

> If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

Yes, they are trying to avoid the relational model by avoiding SQL because SQL has basically been the face of the relational model for 40 years. if there was something better, they'd use that. Nothing better has come up, and i don't know why either.

ErwinSmout · on May 11, 2019

Better things have come up. See the projects list at http://www.thethirdmanifesto.com/ .

marcus_holmes · on May 12, 2019

but no-one is using them, so are they actually better? (philosphical question, also pertains to a ton of better options... I mourn BeOS)

ErwinSmout · on May 12, 2019

The particular meaning of "better" here was "better at representing the relational model".

If you want to believe that popularity is a measure or reliable indicator of quality, I can't be bothered with you.

PS I have a worked-out example on my site of what it takes to enforce a business rule "no one has a salary higher than his manager's" in SQL (> 100 LOC) and in SIRA_PRISE (one single relatively simple formula of the RA to declare). You decide which is "better".

speedplane · on May 12, 2019

Much of the comments (and the article) criticize SQL as being non-standard and difficult to learn. These critiques have been around as long as SQL has.

However, there is a more insidious problem with SQL: it's all too easy to write SQL statements that have O(N^2) complexity. A simple JOIN can easily result in O(N^2) complexity, yet there aren't easy tools to identify these performance issues. As a result, as a database grows, things that once were executed quickly take forever.

I'd like to see the end of joins, replaced with something that is more explicit about what is happening under the hood.

ErwinSmout · on May 12, 2019

Your concerns are warranted.

However. You are wrong tying the problem to joins. I remember an analyst who launched a SELECT COUNT because he was just curious about the number of rows in the table. No joins involved but users did suffer. Elsewhere in this thread I've seen the problem be tied to table scans, and that's also wrong. A table scan isn't a problem if it's a 5-page table. As Darwen often argued : why are people always only lamenting about those couple of tables with millions of rows ? Why should we deprive users of the power of relational algebra if their database simply aren't that big ?

It's a matter of determining the cost of the data access strategy (regardless of JOIN/EXCEPT/what_have_you) and (implementing a protocol for) capping it at runtime (or earlier if possible). No need for language changes here.

speedplane · on May 14, 2019

> Your concerns are warranted. However. You are wrong tying the problem to joins. ... It's a matter of determining the cost of the data access strategy

I was using joins as the common example. Sure, there are many other ways of using SQL to shoot yourself in the foot, but most of the issues I've run into seem to come from joins.

rbanffy · on May 10, 2019

There are real problems with COBOL and, yet, it's used in the most critical parts of our societal infrastructure. People just learned to live with the problems and are by now completely oblivious to them.

ErwinSmout · on May 11, 2019

SQL is too broken to even try fixing.

avereveard · on May 10, 2019

there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines. this was largely overlooked in the article and undefined what operations between set of multiple cardinality would require and if this would result back to what they call unwieldy runtime errors when doing math over set of mismatched cardinality.

setr · on May 10, 2019

>there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines

I don’t see the relationship between those two things; or rather, what the latter even means. Can you expand?

Also, apparently the “sets” described in the article are actually multisets aka bags [0] — so the same semantics as any other RDBMS/SQL. No idea why they’d confound the two, especially when set vs bag semantics is a well-discussed topic in the literature..

[0] https://edgedb.com/docs/edgeql/overview/#everything-is-a-set

avereveard · on May 10, 2019

something like this

SELECT StudentID, Name, (SELECT COUNT(x) FROM StudentExam WHERE StudentExam.StudentID = Student.StudentID) / (SELECT COUNT(x) FROM StudentEnrollmetnPeriods WHERE StudentEnrollmetnPeriods.StudentID = Student.StudentID) AS ExamsTakenPerYear FROM Student ORDER BY ExamsTaken DESC;

works because the result from aggregate are scalar and thus math operations always meaningful, albeit this in particular might not be the most brilliant example, there's definitely a case for running such operations on the server, because you might filter on the results, say, like

"select all student with less than 2 exam per year average"

at which point either you have the distinction between scalar and set in the language itself and you can filter invalid queries at the parsing level or you have to do the checks at runtime when two set are in an operation with mixed cardinality, halt the query and throw an error.

edit: count(x) because I don't know how to escape asteriks

HelloNurse · on May 10, 2019

Set and scalar should be regarded as data types in a more traditional programming language, and treating a 1-element set as a scalar is a type coercion that works as much as other type coercions: only if the data matches expectations.

avereveard · on May 10, 2019

you're missing the point: it's not about coercion or data types per se, it's whether the error happens at parsing or runtime; of course the software has ways to figure out intention but is it something we actually want?

it's not coincidence that the article complains about the same thing but in reverse:

> This is legal, but only if the subquery returns not more than one row. Otherwise, an error would be raised at run time.

except the 'fix' reintroduce it in a way that's subtler and way hard to detect because now everything is a set even when the intent is to have a scalar.

daveFNbuck · on May 10, 2019

Hive queries are written in HiveQL, not SQL. I used to write a lot of Hive and Impala queries, and going back to plain SQL is disappointing.

orange8 · on May 10, 2019

While based on SQL, HiveQL does not strictly follow the full SQL-92 standard, just like all the other SQL dialects out there. hiveQL is SQL.

daveFNbuck · on May 10, 2019

That's a fair argument, but from personal experience HiveQL has some pretty major features added on that make it "have such power and support so many use cases" in a way that other common variants like the one used by MySQL doesn't.

A lot of HiveQL's power comes from extending the language, not from the use of a small set of timeless features.

orange8 · on May 10, 2019

What is your favourite HiveQl extension?

osrec · on May 9, 2019

It supports so many use cases primarily because of the small number of fundamental "constructs".

I mean, if I gave you protons, electrons and neutrons, you could build the universe out of them!

OJFord · on May 9, 2019

That doesn't make sense, if I give you more things (I'm avoiding saying 'atom', but you know what I mean) then it's not the case that you can suddenly do less.

Expressiveness isn't inversely correlated to number of constructs.

osrec · on May 9, 2019

Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.

xelxebar · on May 10, 2019

I think your premise might be flawed.

Are we just assuming that protons and neutrons automatically pull in the quarks, neutrinos, other leptons and force carriers as well? Does the electron have a photon dependency? What about dark matter?

If not, then the our three particles are internally inconsistent and woefully incapable of building a universe. Otherwise, our set of dependencies is essentially the whole darn universe anyway.

Same goes for atoms. Give me enough and I can build a star or particle accelerator to create whatever elements or fundamental particles I want.

osrec · on May 10, 2019

Wow, it was a simplistic passing metaphor. I was not expecting to get into the complexities of physics, but simply that if you operate at a more fundamental level, you can generally address a greater range of problems.

xelxebar · on May 15, 2019

The point is that your metaphor is broken and incorrect. GP's point stands. Abstractions are leaky.

osrec · on May 15, 2019

Even leaky abstractions can be useful, if they are taken in context. If you choose not to understand the spirit of my comment, and take its meaning literally to the point of it having no utility for you, then so be it. A significant number of upvotes suggest GP's point may be moot, and, dare I say, pedantic.

Also, the GPs suggestion of building an accelerator is as silly as someone writing an assembly compiler via a mush of SQL. It probably can be done, but it's not in the spirit of the topic!

xelxebar · on May 16, 2019

Fair point. Abstractions, leaky or otherwise, are helpful! It does seem a bit like we've successfully abstracted the conversation into oblivion though. hehe

Anyway, I feel like your original comment and my reply are kind of doing the same thing at the meta-conversation level. We're stretching a metaphor in order to make some counterpoint argument. My intent with the excessive pedantry was to highlight the farce of conflating rough metaphor with substantive insight.

Anyway, internet conversations are hard. Thanks for engaging!

vageli · on May 10, 2019

> Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

> In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.

I think what OJFord is getting at is, what if you had more subatomic particles to build with?

osrec · on May 10, 2019

The more fundamental you go, the lesser number of things you need to form a "complete set".

Thus, whatever the complete set is for subatomic particles, it will most certainly have lesser members than for the complete set of atoms.

Getting back to my original point, SQL operates at a fundamental enough level that you do not need a great number of "constructs" to create a query that gives you whatever it is that you need.

I'm not saying it's perfect, but was just supplementing the OPs observation that SQL has surprisingly few "constructs".

maxerickson · on May 10, 2019

Depends on whether photons exist or not.

Radiation makes the universe a lot more interesting.

closeparen · on May 10, 2019

In your analogy, SQL is the periodic table.

munk-a · on May 9, 2019

Having used splunk's query language... I'd hope that in general vendors start with an assumption of SQL support and maybe try tweaking it at the edges rather than burning the house down - SQL has survived this long because it's extremely expressive and any replacement for it is going to need to match that expressiveness.

All that said I think it was originally structured to partially be a human readable language and it fails pretty hard at that - that's a facet I'm sure smart people could revise to make more natural.

james-mcelwain · on May 9, 2019

To be fair, Splunk excels at dealing with unstructured logs, which is in many ways a harder problem. Obviously in a perfect world, all our data would be structured, which would make querying much simpler.

madhadron · on May 9, 2019

I used to work for Splunk. Querying Splunk with SQL is completely plausible, and something that Splunk has made a number of attempts at over the years.

The problem isn't SQL. It's that Splunk's query engine is tied up internally with a "grammar" that is a direct port of a shell pipeline into C++ with no intermediate representation or anything a compiler guy would recognize as a grammar. There was no design, no mathematical underpinning to it.

Splunk's unstructured log capabilities are really domain knowledge about making them semistructured as fast as possible: token indexing, a lot of effort on recognizing character encodings and timestamps intelligently, looking for key=value pairs, and letting people write regexes to extract fields themselves. The query language isn't somehow designed for a different data model.

In EWD1123, Dijkstra showed that the relational calculus and the regularity calculus (which governs regexes) are basically the same thing. My takeaway from that is that the relational model can be reinterpreted as a model over anything you want to match and manipulate with regexes by just changing the field selectors.

jeremyjh · on May 9, 2019

Typically the earlier part of your pipeline turns the logs into data records via regex captures. So you could have had a pipeline of what are essentially shell commands that produce a table that SQL operates on.

munk-a · on May 9, 2019

I don't buy that, splunk has a clear concept of columns and rows and could have chosen to expose support for a SQL like grammar for assembling them - they instead chose a less structured format that makes data assembly quite difficult.

More legitimately splunk may simply be unable to deliver performant data expression if a user is typing a query complex enough to justify SQL.

masklinn · on May 10, 2019

> I use Postgres […] SQL is pretty darn effective.

FWIW postgres used to have its own query language derived from QUEL[0] rather than SQL.

And findings that SQL is kinda shit are not exactly recent, e.g. C.J. Date's "A Critique of the SQL Language" (1983) lists the following sections

* lack of orthogonality: expressions

* lack of orthogonality: builtin functions

* lack of orthogonality: miscellaneous items formal definition

* mismatch with host languages

* missing function

* mistakes

* aspects of the relational model not supported

The conclusion was, obviously, prescient:

> if SQL is adopted on a wide scale in its present fortm~ then we will to some degree have missed the relational boat~ or at least failed to capitalize to the fullest possible extent on the potential of the relational model. That would be a pity, because we had an opportunity to do it right, and with a little effort we could have done so. The question is whether it is now too late. I sincerely hope not.

SQL succeeded not because it's "pretty darn effective" but because IBM decided on it (at a time where it drove technology) and Oracle are great at sales and marketing (whereas Ingres definitely wasn't).

[0] https://en.wikipedia.org/wiki/QUEL_query_languages

_fq4v · on May 9, 2019

> I use Postgres in the first place because there are so many improvements available on top of SQL.

Most of Postgres is standard SQL. It's just that most non-Postgres databases do not implement standard SQL very well.

brightball · on May 9, 2019

The various extensions / plugins that allow for custom data types, indexes, use of multiple programming languages to write functions, ability to use a foreign data wrapper to connect to Redis and build a VIEW out of the result or push data out to Redis/Memcached with a database function, varieties of powerful search capabilities, etc

It's got a lot of stuff going on in there.

cagmz · on May 11, 2019

> ability to use a foreign data wrapper to connect to Redis and build a VIEW out of the result or push data out to Redis/Memcached with a database function,

This sounds amazing. Does anyone have a link to some docs for this?

ken · on May 9, 2019

That "most" is a pretty loaded word! Last I checked, even CREATE INDEX is not part of any ANSI or ISO SQL specification. (That's why every RDBMS has such different features and syntax for this.) Good luck building a system with just "standard SQL".

paulddraper · on May 10, 2019

> even CREATE INDEX is not part of any ANSI or ISO SQL specification

That's correct, and that's why PostgreSQL has ADD CONSTRAINT.

The constraints describe the actual schema of the data; INDEX is an DBMS-specific implementation* for improving performance (including index types, etc.).

* Those since the standard does not yet cover some thing partial unique constraints, these have to be done as INDEX in PostgreSQL.

ErwinSmout · on May 11, 2019

SQL has nothing to say about the layer of physical implementation, and that's where INDEX belongs (as does STOGROUP [DB2] and what have you).

That's not a bug, that's a feature. Deriving from the deliberate intent to make physical independence (which was simply nonexistent at the time the model was conceived) a reality.

mbreese · on May 9, 2019

Why should it be? Indexing is an implementation detail for each RDBMS, so it doesn’t make sense to add it to the language spec.

garmaine · on May 10, 2019

It does make sense when the goal of the language spec is interoperability, and it's something that everyone has to do, in practice, no matter the platform.

Call it a "hint", if you will, the meaning of which is implementation defined. But standardize the syntax for pete's sake.

ErwinSmout · on May 13, 2019

Interoperability is achieved if the same query with the same inputs yields the same result. "In the same amount of time" (which is the part that indexes are aimed at) is manifestly not part of that picture, and that's by design.

And "hints" in the language are not exactly going to improve "interoperability" if their meaning is still allowed to be implementation-defined, are they ?

ken · on May 10, 2019

That sounds backwards to me. It's only lack of standardization that forces it to be an "implementation detail". To any application that needs to be able to query data in reasonable time, it's fundamental and necessary functionality.

inferiorhuman · on May 9, 2019

Most of Postgres is standard SQL. It's just that most non-Postgres databases do not implement standard SQL very well.

Sure, but the non-standard enhancements like JSON support are part of what sets Postgres apart from the competition IMO.

wtetzner · on May 9, 2019

SQL:2016 includes support for JSON: https://en.wikipedia.org/wiki/SQL:2016

ioltas · on May 10, 2019

Postgres 12, whose development has reached feature freeze and is in a stability period, has added some support for JSON path as of SQL:2016: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

The release is planned for roughly next September/October.

inferiorhuman · on May 9, 2019

Interesting, I wonder which databases actually implement SQL:2016 (and JSON as standardized). JSON support in Postgres predates SQL:2016 in any case.

crazysmoove · on May 9, 2019

I think Oracle does. My impression is that PostgreSQL sort-of "invented" JSON support in the db, then Oracle (and probably others) added JSON support with a different syntax, and then Oracle got their version defined into the SQL standard. I'm half-guessing here, but it's based on what I've gleaned from Markus Winand's [0] excellent compatibility tables in his slides [1].

[0] https://modern-sql.com/

[1] https://modern-sql.com/slides/ModernSQL-2019-05-08.pdf, for example

(Edited for formatting.)

irrational · on May 9, 2019

Wait, what major relational databases don't support JSON? Oracle and Sql Server definitely do. Does MySQL not support JSON?

ken · on May 9, 2019

SQLite (the most popular RDBMS today) doesn't, unless you count extensions. And of course, none of them use the same syntax or functions or data types.

matharmin · on May 9, 2019

While it is technically an optional extension in sqlite, it is part of the main source distribution, and very easy to enable.

nbevans · on May 10, 2019

And why wouldn't you count SQLite extensions? Its JSON1 extension is practically ubiquitous.

marcolussetti · on May 9, 2019

If I recall correctly, Oracle's support for JSON is not at the same level as PostgreSQL. By that I mean seamlessly store JSON as a data type like any other. In Oracle JSON is stored as VARCHAR/CLOB and then you implement tests to validate whereas in PostgreSQL and MSSQL is its own data type.

GordonS · on May 10, 2019

Similar story with SQL Server. I believe JSON support is a bit better in SQL Server 2019, but still far lacking behind that of Postgres.

lefred · on May 9, 2019

Hi, MySQL supports JSON (it has a native JSON datatype) but it also support NoSQL (CRUD operations). For more details, check https://www.slideshare.net/lefred.descamps/pyconx-python-mys... or https://www.slideshare.net/lefred.descamps/oracle-code-roma-...

jade12 · on May 9, 2019

Doesn't every major RDBMS support JSON out of the box these days?

zaarn · on May 9, 2019

(Last I checked atleast) JSON isn't as well supported on MySQl or MSSQL compared to PG.

In PG, a JSON column is so well integrated that you can do all sorts of crazy stuff (indices over JSON queries is my favorite). You could build an entire RDBMS on top of PG's JSON column.

atwebb · on May 9, 2019

MSSQL indexes and provides dot notation/object query from 2016 forward, schemas are supported as well.

jtms · on May 10, 2019

I feel like Postgres is so powerful I could damn near build an entire web app backend with JUST Postgres (i'm only sorta kidding here). If that's standard SQL, well then I really like standard SQL :-)

CuriouslyC · on May 10, 2019

I've done pretty much this a few times, and it's amazing if you're working in the context of enterprise data systems that need to provide extensive capabilities to a "small" userbase (i.e. concurrent in the thousands). You just need a small shim layer for security, to transform results sets, and handle browser -> database connectivity.

Postgrest works very well as this shim layer, though I've moved on to writing sql directly in the client and communicating through a web socket shim. In terms of reducing code complexity and improving performance this is absolutely unbeatable, you just need to parse incoming sql to sanitize it and make sure there is no role escalation. Because of postgres's foreign data wrappers this method can provide a consistent surface for basically all your enterprise data. The only gotcha with FDWs is that some of them don't "push down" many query clauses, so you end up doing much slower queries on the remote system and filtering locally, which is terrible for obvious reasons. That being said, the FDWs are pretty much all open source, so you can just implement push down support for those missing clauses yourself.

menssen · on May 10, 2019

> you just need to parse incoming sql to sanitize it and make sure there is no role escalation

"Just"? I guess I'm skeptical of a statement that begins "you just have to parse sql".

Is this actually easier than I'm imagining it? I'd be curious to hear more about the security and authorization model of this approach.

CuriouslyC · on May 10, 2019

You don't have to do full statement parsing, you basically just have to do a limited parse that looks for the various ways that someone could execute a set role statement. As long as you don't let a user execute set role, you use db roles for user accounts, you have a reasonable statement timeout in place to prevent DOS, and your postgres security model is tight (the big gotcha is not to allow access to untrusted code), this approach works fine.

A good tool for this purpose is https://github.com/JavaScriptor/js-sql-parser as it will fail to parse complex statements that are likely to include an attack vector.

In terms of authorization, you can either create per user connection pools if using web sockets and log the user in directly that way (which makes things easy) or if you must use rest, use a single connection pool with a master user then use some form of token to tell the shim who to set role to before executing the query.

jtms · on May 10, 2019

Thank you for posting this. I have been thinking about using something along these lines for awhile now. Next project that fits the bill I will see about building a proof of concept implementation and go from there. Have you had any exposure to any of the other projects similar to PostgREST that you have any thoughts about?

CuriouslyC · on May 10, 2019

I've played with Graphile, it's not a bad choice if you are already invested in GraphQL. I'm not a GraphQL fan for a number of reasons, but it's a definitely a solid project.

paulddraper · on May 10, 2019

You think you're kidding.

https://github.com/PostgREST/postgrest

https://wiki.postgresql.org/wiki/Python

https://github.com/plv8/plv8

https://www.graphile.org/

ar-jan · on May 10, 2019

I'm keeping an eye on https://github.com/aquametalabs/aquameta "Web development platform built entirely in PostgreSQL"

jtms · on May 10, 2019

Graphile looks quite intriguing! any experience with it? Recommended?

dafrie · on May 10, 2019

I would also have a look at Hasura GraphQLEngine. Have used it now for various projects and is extremely nice, especially with the superb subscriptions support and eventing for the odd requirement that cant be handled within Postgres

Munksgaard · on May 10, 2019

I'm using Graphile in deployment at the moment, and it works quite nicely. Granted, we have a limited amount of users thus far, so I don't really know how well it handles big loads, but it allowed us to get off the ground with a PoC _really_ quickly.

roystonvassey · on May 10, 2019

Same. As a data scientist, I have seen multiple 'analytics workbench' solutions come, go and occasionally stay - Clementine (what is now IBM SPSS), Stata, SAS and its many variants, Statistica and in the recent years, tools built off and designed to make working with data using Python/R.

But, the common workhorse tool that has stayed strong through all these has been the common SQL. Elegant, simple, powerful and thoroughly reliable, it is my primary go-to tool. In an otherwise changing ecosystem, its simplicity and reliability is a boon. Yes, it is primarily because of the nostalgic familiarity but I also believe it continues to be extremely powerful, one that will serve you very well.

dkersten · on May 10, 2019

I came here to write something similar, but you said it better than I could. Sure, SQL has the problems that the article mentions and their solution looks nice on the surface, but SQL has worked much better for me than any alternatives, especially if it’s slightly extended SQL like in Postgres, as you mention, and any attempts to improve it will always be an uphill battle as SQL is pervasive and well supported. That doesn't mean people shouldn't try, but it does mean that their likelihood of success depends on factors other than whether the solution is an improvement over SQL or not.

reilly3000 · on May 10, 2019

I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Of course we can do better than SQL. We could obviously do better than Javascript. It’s globally understood that we could all do better than English.

We sometimes struggle to express ourselves with language. We can blame the language, try to fix it, or invent a new language and try to get people to speak it.

I would rather spend my time refining my elocution than learn a new language. That said, there are words and phrases that simply work better in other languages. I don’t know a single-word corollary to ‘Simpatico’ in English.

I wish the best to those who would make programming more expressive, and the worst to those who would try to streamline away subtlety.

Performance is quite loosely linked to language, given sufficient abstractions and optimizations. I submit NumPy as an example. We humans have lots of ways to say what we want, and we want lots of different things in myriad ways. As an analyst I often wish for a richer language, some way I could transcend tabular data thinking to find and make associations around real-world state. I’m certainly self-satisfied when my nested subqueries return what I’m expecting, but I may be able to add more value if I had a better way of expressing my questions.

If we can do better than SQL, it ought to bring more people closer to the reality that lies behind the data, and further from the methods used to obtain it. Maybe I don’t have the words for what I’m looking for yet...

pradn · on May 10, 2019

The article was still useful. I don't mind an ad at the bottom if there's genuine value. I am not experienced in SQL so learning of its shortcomings (especially the NULL part) will be useful when I write the odd query here and there.

swebs · on May 10, 2019

>I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Posts have an upvote button, but no downvote button, so dissenters can only use comments to express disagreement.

jonny_eh · on May 12, 2019

And people often upvote based solely on the post title.

lsh · on May 10, 2019

> I don’t know a single-word corollary to ‘Simpatico’ in English.

"amiable"? "congenial"? "affable"?

Rapzid · on May 10, 2019

I wonder what the parent meant by "corollary"? Simpatico is actually an English word now..

pbhjpbhj · on May 10, 2019

Is that USA English? Never heard/read it at all in UK, not even in a crossword. Nor, ever online (here, reddit, etc.) or from other global sources.

Rapzid · on May 10, 2019

It's in Oxford online and Merriam Webster.

csomar · on May 10, 2019

> I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Upvote manipulation maybe? Then once it is on the front-page it gets lots of exposure. So the probability to get upvoted becomes higher.

combatentropy · on May 10, 2019

It's another clever feature of Hacker News, in my book.

Perhaps YCombinator omitted the downvote button because it can be abused. For example, if there is a story that is bad news for a certain company, that company could organize a downvoting campaign to hide it. But the upvote button can also be manipulated, as you say.

However, which is worse? (A) To write a silly story and have it rightfully downvoted into obscurity, or (B) To write a silly story, have it upvoted into the limelight, but be littered with comments that expose its flaws.

I think that as long as I remember that just because a story is on the front page doesn't mean it's right, then Hacker News has a nicely curated set of moderation rules.

mrguyorama · on May 10, 2019

>For example, if there is a story that is bad news for a certain company, that company could organize a downvoting campaign to hide it.

That's how story flagging currently works though, and is significantly more effective

cletus · on May 9, 2019

Yeah I'm not sold. One example from this post that struck me was the author wanted to embed a select in a table expression. I'm not a fan of this at all. I don't want it not to be clear if a given expression list will explode in values or not.

I like the fact that SQL has a solid foundation in relational algebra. I see no such foundation for the alternative.

I do like what LINQ did here (being SQLish), which was to put the FROM clauses first. Some SQL variants have WITH clauses that are quite convenient but you end up with:

    WITH (...) AS a,
    (...) AS b
    SELECT
      a.a1,
      b.b1
    FROM a
    JOIN b
    ON a.a = b.b

Common alternative:

   SELECT
     a.a1,
     b.b1
   FROM (...) a
   JOIN (...) b
   ON a.a = b.b

whereas I'd prefer:

   FROM (...) a
   JOIN (...) b
   ON a.a = b.b
   SELECT
     a.a1,
     b.b1

dragonwriter · on May 9, 2019

> Some SQL variants have WITH clauses

“Some variants”, including standard SQL since SQL:1999.

MarkusWinand · on May 10, 2019

WITH is supported by all major SQL brands in the meanwhile.

https://modern-sql.com/feature/with#compatibility

latch · on May 10, 2019

Does the spec have any execution requirements re WITH? Pretty sure `WITH` in postgresql is gated and thus can have considerable performance implication vs a nested query (or none at all depending on the query).

trollied · on May 10, 2019

Execution is down to the RDBMS.

Postgres is changing in the next release, BTW: https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...

mathh · on May 11, 2019

Impossibility of With clause inlining is indeed one of the major limitation of Postgres, compared to Oracle. We we still have to wait for optimizer hints

JackFr · on May 9, 2019

If the language is the problem, why write a new server?

Transpile your language to the equivalent SQL, and rely on decades of research and real world experience in things like replication, optimization, locking strategies, high availability, security etc. the things unrelated to the language current SQL databases are really good at.

Enterprises need enterprisey features.

1st1 · on May 9, 2019

This is exactly what we do. EdgeDB is based on PostgreSQL.

lixtra · on May 9, 2019

And do you support other backends as well (i.e. MySQL?). Does EdgeDB still offer a regular SQL interface because people need time to migrate.

Even if I was convinced EQL was the future I wouldn’t throw out the old stuff. If the old stuff continues working but there is a smooth migration path, I would probably give it a try.

Think of TypeScript vs. JavaScript.

1st1 · on May 9, 2019

> And do you support other backends as well (i.e. MySQL?). Does EdgeDB still offer a regular SQL interface because people need time to migrate.

No, but we have a few ideas on how to connect existing databases to EdgeDB.

> Even if I was convinced EQL was the future I wouldn’t throw out the old stuff. If the old stuff continues working but there is a smooth migration path, I would probably give it a try.

Yes, we'll be working on that.

PeCaN · on May 9, 2019

I know CompanyDB is the new company.com but I feel this would all be a bit more clear if you just called the company EdgeQL.

davidw · on May 9, 2019

Which version, and how easy is it for you to track the 'latest and greatest' Postgres version, as they come out?

1st1 · on May 9, 2019

Latest stable and we'll keep it that way.

mvaliente2001 · on May 10, 2019

That's wonderful! It's good to know that there's a tested DBMS under the hood.

cthor · on May 9, 2019

It looks like this is built on top of Postgres, so they're already doing this.

JackFr · on May 9, 2019

I missed that. It makes sense.

ErwinSmout · on May 12, 2019

Because the 'servers' in existence speak only one language and it is that problematic one we want and need to get rid of.

Your 'transpile' handwaving won't fly. You are blindly presuming that the 'SQL equivalent' (a) always exists (it doesn't) and (b) can always be generated by an automated transpiler (it can't).

brianpgordon · on May 9, 2019

I have no background in databases so this may be naive or wrong, but the single biggest pain point in SQL that comes to my mind is that it can be difficult to tell what a query is doing without also knowing the constraints on the tables involved. Here's a real-life (ish) example from work:

We have some_table which we want to join to other_table, but we need to map an identifier through mapping_table in order to do it. So we end up with a query like:

SELECT (...) FROM some_table INNER JOIN mapping_table ON (...) INNER JOIN other_table ON (...) ...;

I know for sure when writing this query that the middle join to mapping_table will map every some_table row to exactly one row (no more, no fewer) in mapping_table. The problem is that the query doesn't capture this. The mapping table isn't really named something as obvious as "mapping_table" so someone reading the query has a hard time inferring what the intent was. It totally changes how you mentally parse and think about the query if the result set can be accumulating multiple matching rows from the join, or maybe even losing rows if there are no matches. You have to go bring up your database schema to figure this out.

And, as a fan of static typing, I can't help but cringe at the possibility of someone changing the constraints on the table without realizing that there are queries which implicitly depend on the old ones. SQL offers no resilience to this and will happily change the meaning of your query without a peep of complaint if you drop that constraint from mapping_table.

If there's a fancy way to capture this "mapping" relationship in standard SQL that doesn't just use a dumb inner join, I'd love to know about it. If not, I'd love a query language that supports some annotations that help reading and are either stripped out before sending to the database engine, or are actually checked at runtime.

tda · on May 9, 2019

If I understand correctly you are describing a one to one or one to many relationship. The canonical way to express that is 'don't use a mapping table'. Mapping tables are for many to many relationships. So why is there a mapping table in you example, and could you just get rid of it altogether?

brianpgordon · on May 9, 2019

In this case it's tying together data from different, independent systems. So the schema may not be textbook ideal but I don't think there's any way around it.

tda · on May 10, 2019

If you really must use a mapping table, I think you could put a unique constraint on both of the foreign key columns too the mapped tables. That would enforce that every row in one table maps to at most one row in the other

Aeolun · on May 10, 2019

I don’t think that fundamentally changes the nature of a one to many relationship though. Unless your mapping table has a variety of different columns to join on.

babyloneleven · on May 9, 2019

Can you use views or other feature to tie together a table and its mapping table into a single view?

ddebernardy · on May 9, 2019

> I know for sure when writing this query that the middle join to mapping_table will map every some_table row to exactly one row (no more, no fewer) in mapping_table.

Your example query joins one some_table row with 1-n rows on mapping table, and another 1-n rows from whatever else is in there to that. If you're expecting a single row in the resulting set per some_table row, it means that you're filtering very hard (which is fine) or that you've a schema problem (which is the actual problem).

wisnesky · on May 9, 2019

In CQL (http://categoricaldata.net), which generalizes relational theory with category theory, you can annotate schemas with equations and have them checked at runtime, or at compile time with an automated theorem prover (e.g., to establish that a query into a schema with a constraint will always materialize an instance that satisfies that constraint). One example is de-normalization: https://www.categoricaldata.net/denorm.php

oconnor663 · on May 10, 2019

I get that with some queries, you're not sure how the data is going to be retrieved, and letting the database figure all of it out for you is a good strategy. But with a lot of queries, particularly the ones that I'm doing in the milliseconds of a pageload, I want to be very sure that all my joins are hitting efficient indexes. I hate that someone can change my schema, and that can turn my efficient index lookup into a horrible scan, without breaking tests.

Folks who know more SQL than me: Is there a good way to say "I would rather this query fail than try to scan a table?"

dragonwriter · on May 12, 2019

> I hate that someone can change my schema, and that can turn my efficient index lookup into a horrible scan, without breaking tests.

They can't, if you do perf testing as part of your pre-release testing, which you should do.

If you aren't doing perf testing, you are saying perf isn't an acceptance criteria, so why are you upset that breaking perf doesn't break tests?

ErwinSmout · on May 12, 2019

Where you start to go wrong is where your talk starts to be of "YOUR" schema. The schema isn't yours, it's the company's. And guarding it is the DBA's job.

(I understand full well that that is a problem if the company has kicked out the DBA role and handed it over to the individual programmers, but perhaps that is precisely the problem.)

oconnor663 · on May 12, 2019

Fair, though maybe I could describe the same problem from the other end. If I'm a DBA and I want to change the way indexing is done, I might need to audit every query in my company to figure out which ones depend on the old index. One way or another, it seems like knowledge about a query's intent to use an index could've been captured explicitly, in a way that's automatically enforced in the future against any number of accidental breaks. But instead all that knowledge is implicit.

ErwinSmout · on May 13, 2019

Integrating EXPLAINs in your build/deploy process should, in principle, make it possible for anyone to address any concern in such realms.

The DBA you were mentioning here, e.g., could first do a query in the EXPLAIN results, and he'll have his "company-wide queries audit" in a matter of seconds.

Shivetya · on May 10, 2019

I am quite sure other implementations of SQL have similar tools but from my support of an iSeries.

The SQL scripting function is a tool run from a desktop, all emulation and such is JAVA based, with the feature to ask the system what the query is doing. The feature called Visual Explain will explode the query into a graphic representation of how the system optimized it to run. It will recommend indexes as needed. This is very good for understanding when table scans are forced, how files actually joined up, and more.

GordonS · on May 10, 2019

> And, as a fan of static typing, I can't help but cringe at the possibility of someone changing the constraints on the table without realizing that there are queries which implicitly depend on the old ones

When creating views and sprocs, SQL Server lets you mark them as 'schema bound', creating dependencies on the schema objects it uses.

Not sure if something like this exists in Postgres?

jeltz · on May 10, 2019

All views in PostgreSQL are schema bound, which is usually good but can be annoying at times.

GordonS · on May 10, 2019

Ah, I didn't know that!

ilitirit · on May 10, 2019

I don't disagree with the obviously true statement, but this code comment comes to mind:

    // Dear maintainer:
    // 
    // Once you are done trying to 'optimize' this routine,
    // and have realized what a terrible mistake that was,
    // please increment the following counter as a warning
    // to the next guy:
    // 
    // total_hours_wasted_here = 42

stingraycharles · on May 10, 2019

Also, don't forget the hours wasted of the people having to learn another query language. :)

ErwinSmout · on May 13, 2019

You can test that for yourself. Go through the three language-related sections on my site and decide for yourself how many hours you'd need to "waste" before you would "get it".

And as for "waste" : if while learning a better language, they are in addition also learning the relational model, and to think relationally, (their knowledge of both of which will be VERY poor if all they've ever seen is SQL) then there can't have been much time "wasted", can it ?

djsumdog · on May 10, 2019

Or the XKCD with the 14 competing standards and, "let's make one good standard," and now you have 15 competing standards.

docker_up · on May 9, 2019

I think "ease of use" over SQL is not the hill I would die over if I were trying to displace SQL.

It's far too embedded throughout the entire industry and as a data analyst, learning EdgeQL vs SQL and then being locked into a new startup database that could disappear in a year doesn't seem like a high probability strategy.

I wish the people all the luck but unfortunately SQL is "good enough", pretty standardized (I can use just about any relational database and get useful data by knowing the basics). The inconsistencies may be mathematically "ugly" but it's not hard to wrap your head around and overcome.

nine_k · on May 9, 2019

A good replacement tech usually gets piggy-backed on the tech it's going to replace.

That is, a reliable tool that translates 99% of normal SQL into readable Edge SQL, and vice versa, would help adoption a lot.

Remember how new JavaScript features became mainstream though transpiling, long before native implementations.

numbsafari · on May 9, 2019

Agreed. I think this is one reason that Looker's LookML has been successful. Not that it's entirely what I'd want out of a "SQL replacement", but it's an enhancement that "compiles" down to SQL rather than looking to replace it. Plus, you can always go direct to SQL, in case you need to take advantage of some specific feature or complexity that their language doesn't address itself.

eanzenberg · on May 9, 2019

No it won't, because you add immense friction for no added features.

nine_k · on May 9, 2019

Composability is quite a feature for me.

ben_jones · on May 9, 2019

I accidentally built an in-memory database that now lives prominently in our production stack. It works great, its incredibly performant, the codebase is relatively simple (it makes heavy use of code-generation), it will scale very well - but not a day goes by I don't think what if I had just taken the time to adapt an existing solution to the problem set.

There are just so many free things you get with SQL and established RDBMS that deeply impact application features, quality, stability, operations, and much, much, more. I've had to write a custom mongo-db like interface for querying, as well as a fair number of hacky bits to effectively cover the surface area of SQL in an inferior way.

I've learned tremendously, but I just wish people don't follow in my exact footsteps because that's probably wasted time.

pstuart · on May 9, 2019

SQLite is a wonderful tool for such a scenario.

dman · on May 9, 2019

As someone who landed up doing what you did, I also had days where I felt what you have described here. On the other hand by going custom you can usually exploit some understanding you have of your problem to great effect.

ericb · on May 9, 2019

The middlebrow dismissal strikes again!

Why is it so hard to imagine something displacing SQL? A simpler, more predictable syntax seems perfectly plausible--it could ship alongside SQL. Do we have StockHolm Syndrome?

The negativity is surprising and at the same time predictable.

hobs · on May 9, 2019

Because of the decades of things not displacing it - when someone suggests "oh we'll just do it simpler" they often are not seeing the forest for the trees.

Simpler languages have been shipped dozens if not hundreds of times, and they generally tend towards expressing the things they missed or not giving enough functionality for the things they missed.

I am not saying its impossible, but you're going to have to do a lot more than hand waving to justify the reverse position.

chuckgreenman · on May 9, 2019

It's sort of like the mouse trap problem. Mouse traps and SQL are already incredibly simple and incredible effective, that's why you don't see a reinvented mouse trap at home depot and why SQL remains unseated despite many efforts to replace it.

perl4ever · on May 9, 2019

"that's why you don't see a reinvented mouse trap at home depot"

Interesting for you to say that. The last time I went looking for mouse traps, I found that every place I went had a reinvented type, and traditional style were extremely hard to find. Yet, it is true that the traditional style is simple and effective. I've learned from experience that anything else is likely to be useless.

adamlett · on May 9, 2019

You really think SQL is incredibly simple? I think TFA makes a pretty convincing argument that it is anything but.

darkpuma · on May 9, 2019

Most common SQL statements read like somewhat stilted english. Many non-programmers find this particularly accessible.

Yes you can make some lovecraftian horrors if you really want to, but SQL is one of those things where just a little bit of knowledge goes a long way. If you can understand the basics you can get a lot of work done.

It's a lot like Excel. You can do some really complex confusing stuff in Excel. But you can also teach the basics to non-programmers quite easily, and command of the basic skills will be very empowering. Basic knowledge of Excel, like SQL, gives the user new ways to leverage computers when creating their own solutions to their own problems.

adamlett · on May 9, 2019

Most common SQL statements read like somewhat stilted english. Many non-programmers find this particularly accessible.

The problem is not reading SQL, but writing it.

It's a lot like Excel. You can do some really complex confusing stuff in Excel […] Basic knowledge of Excel, like SQL, gives the user new ways to leverage computers when creating their own solutions to their own problems.

I can’t speak to your experiences, but I’ve never in my life encountered someone who was not a professional programmer ever even contemplating using SQL for anything let alone creating their own solutions to their own problems. I think it’s safe to assume that the overwhelming majority of people who use SQL, are programmers who most certainly cannot get by with just basic knowledge.

darkpuma · on May 9, 2019

Librarians and secretaries are two examples of "non-programmer" careers where functional knowledge of SQL is pretty common. Less-so these days with secretaries, but moreso for librarians. Not to mention tons of researchers across countless disciplines have SQL in their toolboxes. I've even met government bureaucrats with professional backgrounds in regional banking that know SQL. Previous programming experience? Using HP-12c calculators...

>I think it’s safe to assume that the overwhelming majority of people who use SQL, are programmers who most certainly cannot get by with just basic knowledge.

Most demonstrably do though, so there's that. If you cast a wide net when polling programmers, I think you'd find that mode level of knowledge was relatively low. You don't need to be a SQL rockstar ninja dude to do what most professional programmers are doing with SQL most of the time. Obviously advanced knowledge is good for any professional programmer to have, but the fact is there are a TON of people out there who only know the basics, and that works for them.

perl4ever · on May 9, 2019

"I’ve never in my life encountered someone who was not a professional programmer ever even contemplating using SQL for anything"

I have. That's only an anecdotal observation, but it seems to me that the existence of Visual Basic and the popularity over the years, combined with the utter disdain for it by "real programmers" is evidence that, more generally, there are a huge number of kinda, sorta, programmers who are outside the IT culture.

venuur · on May 10, 2019

In the business world (e-commerce in my case) I know plenty of business analysts who write SQL as their only programming experience. That’s part of what I like about SQL. It bridges the gap.

magicalhippo · on May 10, 2019

Our support guys are mostly hired from our customer base. They have great domain knowledge, but do not have any formal tech training.

After some time, most can handle enough SQL to help customers with basic issues that cannot be handled in the application. Some have become quite good at it, and can do quite non-trivial stuff. None of these folks write any code beyond SQL.

CaptainZapp · on May 10, 2019

Yes you can make some lovecraftian horrors if you really want to, but SQL is one of those things where just a little bit of knowledge goes a long way.

I agree. At the same time a little bit of knowledge is incredibly dangerous.

That query, which worked so brilliantly on the test system during devlopment, suddenly grinds all of production to a halt.

The basic problem is that indexing and other physical performance boosters were not really rerquired when those queries were tested with 3'000 customers.

Being set-based that's quite different when you suddenly deal with 30'000'000 customers and a number of joins, which may not be supported by indexes, since that was never obvious in development.

That said: I'm not arguing against SQL. It's a great language for its purpose. What I do argue for is to have an SQL domain expert and an expert on how it phyisically maps to the underlying database engine for more complex projects.

Such a resource can be immensly valuable in assisting application developers to avoid major mistakes when they deal with the underlying database.

Edit: A couple of issues, which actually negated my argument upon reread

chuckgreenman · on May 9, 2019

Certainly there are complex aspects of it but you could teach someone how to do most of what you need to do in SQL in under an hour.

dragonwriter · on May 9, 2019

> The middlebrow dismissal strikes again!

Yes, it was that, but, still, there's a big and growing hurdle here that many similar efforts, with similar objective merits compared to contemporary SQL implementations, have failed to overcome, and not a lot of reason provided to think EdgeDB is better positioned.

> Why is it so hard to imagine something displacing SQL?

Because systems providing just as good solutions to largely the same set of SQL deficiencies have been produced and failed to displace SQL for a couple of decades.

The problem isn't doing better than SQL. It's doing enough better than SQL to overcome the depth of knowledge, experience, support, tooling maturity, and comfort people have with SQL. And that most gets deeper over time, on top of SQL getting internal mitigations, if not actual solutions, to some of the problems over time.

That said—as I’ve done with several before—I’ll probably download EdgeDB and try to do some stuff with it.

garmaine · on May 10, 2019

That's like saying nothing will replace COBOL. Or nothing will replace Fortran. Or C++. Or Java.

There's no reason the industry can't move to new technology for new projects. No one is going to rewrite legacy applications in the new language, at least not until the transition is so far along that SQL specialists are aged out and costly. But for new things? It's totally acceptable to pick new languages.

And btw, this has already happened in some areas. Most developers I know don't code in SQL: they use an ORM provided by their language runtime or a support library. That's essentially the same thing.

GenericsMotors · on May 10, 2019

> And btw, this has already happened in some areas. Most developers I know don't code in SQL: they use an ORM provided by their language runtime or a support library. That's essentially the same thing.

This works well enough until your ORM shits the bed and you're forced to figure out why your SQL database is "slow".

SQL isn't slow, ORMs are just terrible when you hit an edge-case or when pretend there isn't a relational model behind your opaque materialised objects.

I used to be one of those developers you mentioned, but I'm not anymore after years of debugging "SQL performance issues" (hint: it was the ORM), and actually taking the time to learn the language and take advantage of specific RDBMS features. My preference has shifted to just use a lightweight library to materialise objects (like Dapper), and write the queries myself.

garmaine · on May 10, 2019

Oh I couldn’t agree more. I didn’t mean it as an endorsement of ORMs, but it does demonstrate that SQL isn’t entrenched as the sole user interface to RDBMS.

marcosdumay · on May 10, 2019

Would you adopt an extra abstraction layer (bugs, maintenance, incompatibilities and surprises included) over your database to get a coherent handling of nulls and a few other optimizations of this level?

I know I'm staying with the nulls. I wish the best luck for them, and if it survives to maturity, I'll haply go get the ~1% (probably less) more productivity they offer. But right now I'm not moving. It's sad, really, but things are stacked against them. Change is costly, so we lose all the small changes that could compose into something huge.

ErwinSmout · on May 12, 2019

You even don't have to "imagine" it anymore, it already exists. See the projects list at http://www.thethirdmanifesto.com .

The crux is : it takes much much more to achieve that than what the average C# or java coder yup has to offer. So the average C# or java coder yup will dismiss those things too.

skybrian · on May 9, 2019

It could happen. However, the probability of success for any given attempt is low. Consider all the attempts at improving on JavaScript before Typescript.

save_ferris · on May 9, 2019

Totally agree. It’s a bedrock technology that’s near-universal as far as company needs, at least in startup and web development environments.

The JS framework churn makes me super appreciative of the stability that SQL brings.

a13n · on May 9, 2019

JS framework churn isn't what it was five years ago

wbrasky · on May 9, 2019

You're right, it's far far worse now.

tlrobinson · on May 9, 2019

Citation needed? I've been using React for 5 years and don't foresee not using it any time soon.

Even if it's still true, "churn" translates to progress. It would be hard to argue complex web application UI development isn't better off now than it was 10 years ago.

geezerjay · on May 10, 2019

> Citation needed?

I don't have a citation, but a colleague of mine who's working on front-end projects repeatedly complains that today's JS stacks require half a dozen base JS packages, which in turn download dozens if not hundreds of ancillary packages, not to mention requiring a couple of transpilers and a bunch of tooling.

And for what? Well, just to be able to render some text and a couple of buttons.

Nowadays we have whole server projects that take less than 50MB of source code and dependencies to build, while a miserable SPA with a login screen and a couple of menus and buttons requires nearly 400MB of JS.

That's pretty bleak.

tlrobinson · on May 10, 2019

Serious question: why do you care about 400MB vs 50MB of dev dependencies? Is your internet connection slow? Are you running out of hard drive space?

There are projects like create-react-app and Parcel which offer very reasonable zero-configuration toolchains.

If you care a lot about runtime dependency weight, there are lightweight libraries like Preact (3kB).

irrational · on May 9, 2019

I know. Things were so much nicer 5 years ago just before React, and the other myriad frameworks we have now, came out.

tlrobinson · on May 9, 2019

Out of curiosity, which frameworks/libraries are you referring to, and why do you think they are nicer than the "UI as a function of state" style libraries like React?

weberc2 · on May 9, 2019

SQL is good enough for the existing set of applications, but that's not really saying much. There are lots of other applications for which SQL is not good enough, and those applications either don't exist because an affordable alternative doesn't exist or they implement their own proprietary database (e.g., many popular BI tools). It's safe to say that your use case--using SQL to perform one-off queries--is fine; writing a program that can dynamically build (performant) SQL to access data of arbitrary schema is quite a lot harder even if you can assume a single implementation. And much of this difficulty comes down to lack of composability.

Perhaps SQL is fine if it's your interface for accessing data on a one-off basis, but if you're trying to build a complex tool on top of it (say, an analysis tool for arbitrary data), the inconsistencies and performance concerns mount. People often end up inventing their own proprietary databases to do these analyses (e.g., virtually any business intelligence tool) assuming they can afford to do so. Perhaps EdgeQL isn't the ideal alternative, but as it is SQL is not good enough for many use cases.

zaarn · on May 9, 2019

SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query. In most cases when people feel like SQL cannot do something it is either because the Database does not implement a part of the standard or because they are not familiar with some of the more advanced usage of SQL. Simple SELECT FROM WHERE clauses, even including JOIN, are still fairly simple compared with what you CAN do if you want.

I'd recommend reading the PGSQL manual, they go very in depth about many of the supported features and how they are implement and can be used.

perl4ever · on May 9, 2019

"SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query."

That's kind of orthogonal to what I think is the issue being expressed here.

SQL can do many things; the problems tend to be when a query doesn't perform consistently and predictably. There's always a balance to be struck between communicating what is to be done, and how it is to be done, and SQL leaves so much of the "how" out that the query interpreter/optimizer is incredibly sophisticated and does a fantastic amount of work and yet frequently gets things spectacularly wrong, maybe due to misconfiguration and maybe due to fundamental limitations.

Obviously more information on how to do something is not always better; otherwise we'd be using assembler. But there is a balance.

Experts tend to say "write everything in one query, and if it doesn't work, fix the configuration of your database" which is not helpful given the division of responsibilities in any company. But they will say that because they are devoted to the idea that all that expressiveness is good for something.

mr_toad · on May 9, 2019

> SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query.

If that were true, people would build RDBMS’s in SQL.

GenericsMotors · on May 10, 2019

SQL isn't a general purpose language, it's a data query language.

It fulfils its purpose and does it well.

ErwinSmout · on May 13, 2019

Not true.

Since the addition of SQL/PSM (1996 IIRC) it has become a computationally complete language (procedural like many of the others) with variables and loops and what have you.

weberc2 · on May 10, 2019

That's only true tautologically--i.e., if you decide to constrain "its purpose" to the set of things SQL does well. If you want to do something perfectly reasonable--like programmatically building queries to access data of arbitrary (read "unknown at compile time") schema, you'll find it's quite hard to do this, at least if you care about performance at all. Largely because SQL doesn't compose well.

GenericsMotors · on May 10, 2019

Optimization requires knowing the schema, and query usage patterns, and data stats - which competent RDBMS engines use to great effect already.

Sounds like you're looking for a magic silver bullet - there's no free lunch in our field though.

Lastly, the comment I was replying to can be paraphrased as "well is SQL is so great why aren't RDBMS' built using SQL, huh?". Which is a ridiculous question since SQL isn't the right tool for that job - its very name tells you that.

If you want to continue arguing against strawmen do it with someone else.

zaarn · on May 10, 2019

Most of a RDBMS is actually build using SQL. (For example, in PG constraints and foreign keys are done using triggers and SQL Functions.)

awb · on May 9, 2019

Same goes for JavaScript. It's so ubiquitous that it's worth putting up with the downsides.

It'll take several unicorns and Fortune 100s hiring thousands of engineers to code in an SQL-alternative to create an ecosystem large enough to eventually overtake SQL.

hudbuddy · on May 9, 2019

This situation seems comparable to Typescript in this analogy. It has taken several years, but I feel like it is beginning to take hold.

BuckRogers · on May 9, 2019

Agreed on all of that. Further, I wouldn't embrace something intending to displace SQL without it being authored by someone like Anders Hejlsberg (Turbo Pascal, Delphi, C#, TypeScript). That sort of involvement grants confidence that it's as "correct" as it can be for most users. That matters for buy-in. It can happen, Kotlin is a decent example, but is not embraced as widely as TypeScript has been. I'm sure that confidence plays a big role there. The stakes are much higher here with SQL, than in Java or ECMAScript.

There's plenty of brilliant people out there, but when you're talking about replacing the most successful data storage language in the history of mankind, you need everything. SQL has been "killed" many times, everyone wants to sell something. It would probably take involvement from a FAAMG entity. I think the first clue that there's no real room for technical innovation and we're staring at only the opportunity for technical churn, is that no FAAMG players, who definitely operate at-scale, have tried to displace SQL outright already.

TypeScript or Kotlin are really the closest, best and most recent examples of what would need to happen. For me as an end-user, ubiquity and skill-reusability matters. If you don't like SQL, there are ORMs.

billman · on May 9, 2019

In the financial industry, I have a seen a couple places where SQL was the interface for their payment systems. Mind you that these payment systems were not written using any kind of relational paradigm.

pbiswal · on May 9, 2019

Hi - a lot of commenters have valid concerns and critiques, but I joined HN after lurking for years to say that I really like the direction you’ve taken! I’m particularly happy about the convenient syntax for joined inserts and sum types.

I see that you’ve built this as a patched version of Postgres, but I’m curious how much of this syntax you could implement as a client library and shell that would run against an existing Postgres instance.

Right away, you would get adoption from people who have an existing Postgres, or who want to take advantage of SaaS offerings like AWS Aurora.

Longer term, I could imagine the client/shell being extended to support multiple backend DB dialects, even things like Spark or Redshift which you’d have a hard time modifying intrusively.

It could also be cool to explore interoperation with existing schemas written in plain SQL, so people could adopt it incrementally that way.

1st1 · on May 9, 2019

Thank you for your comment, it aligns pretty well with our current thinking about this.

> It could also be cool to explore interoperation with existing schemas written in plain SQL, so people could adopt it incrementally that way.

Yes, this will happen too.

_fq4v · on May 9, 2019

The article criticized SQL because the following expressions are incompatible

> SELECT * FROM table

and

> SELECT count(*) FROM table

This is actually not true, as both return table values. It then says that, in EdgeQL, every expression results in a 'set'. This is a distinction without a difference.

I don't disagree we can do better, but this is the same.

dragonwriter · on May 9, 2019

It's only a slight misstatement; scalar queries are a subset of table queries in SQL, rather than different thing, but scalar queries are allowed in places where other table queries are not.

The difference in EdgeDB isn't really everything returns a set, but seems instead to be that everything consumes sets and not just scalars.

vpetrovykh · on May 9, 2019

You are exactly right about everything consuming sets in EdgeDB. Even when a function is defined on scalars, it's really defined on singleton sets. Literals are also singleton sets, so "1" and "{1}" are equivalent and so are "foo(1)" and "foo({1})". Usually we omit the set braces for singleton values to reduce visual noise.