Hacker News new | past | comments | ask | show | jobs | submit login
We Can Do Better Than SQL (edgedb.com)
624 points by 1st1 on May 9, 2019 | hide | past | favorite | 424 comments



I don't disagree that SQL can be improved. It's one of the biggest reasons I use Postgres in the first place because there are so many improvements available on top of SQL.

All that said...SQL is pretty darn effective. As a language, it's the true backbone of the internet today. It's readable, explicit, fairly concise and naturally translates to how data should be broken down for efficient storage...or make some trade-offs to allow for more efficient retrieval.

There are differences with different vendor implementations...but that's what different vendors are for - to find things the other guys are doing wrong and improving on them to build a better product.

I wish the folks luck in their work to improve things, but the language I've been able to rely on consistently over the last 17 years or so has been SQL...and I've worked with a lot of languages. SQL is the one that gets the most done reliably and lets me down the least often.


Afaict, the state of SQL as a grammar with tooling is kind of pathetic.

As a standardized language, it doesn’t really exist; everyone implements numerous extensions, and almost no one is fully ansi compliant

Almost all formatters attempt to be generic (believing standardization exists), and fail to support the full grammar for any dialect.

Across the board, all parsers have pathetic error message support (error on line 3, which is actually just the start of the statement).

The schema offers type constraints, but querying/ide’s extract no value from that (that is, types are statically specified/constrained, but query editors all pretend its fully dynamic)

Theres a lot of awkward nonsense, like where clauses are parsed before the select in most parsing engines, causing alias usage to fail without wrapping in a subselect/with clause

The grammars themselves are an inconsistent, ad-hoc mess

The grammar is also unnecessarily context-dependent (eg from must follow select, and where after that), making programmatic composition unnecessarily difficult

I don’t know how much of the tooling issue is a result of SQL as a language versus the history itself, but I can at least confirm that trying to parse multiple dialects is absolute hell, which would at least explain the sorry state of affairs for eg formatters.

But the majority of its expressive power derives from the relational algebra, and has nothing to do with the SQL grammar, and thats the majority of its value. It seems obvious to me that at the very least the compositional issue of SQL, and its self-inconsistent grammar, should be vulnerable to near-lossless improvement without too much struggle, though I can’t say what the alternative would actually look like.

But it seems like its riddled with a lot of unnecessary flaws


Actually standards compliance is really good these days. There are a lot of custom functions that are hard to do without in some reporting applications but behavior of SQL clauses is consistent across every engine.


JetBrains tooling does use schema metadata for it's autocompletion. I'm not sure though if it's anywhere close to full-on Haskell autocompletion in the Atom editor (very fiddly and prone to break on minor version changes, I must say).


Offtopic, but I am genuinely curious now about whether Atom's Haskell autocompletion is significantly better than in other editors like VSCode, or (n)vim with things like YouCompleteMe/NeoComplete/Deocomplete etc. Have you used other editors? What is your opinion?


The SQL spec is actually hidden away behind some book or something you have to purchase.

I was experimenting around with creating an Entity Framework equivalent in Typescript and really wanted to create a SQL AST for use under the hood(optimizing queries, SQL push down, etc). Ended up using the PostgreSQL types and a Ruby plugin that binds some PostgreSQL libs to work on my POC. Crazy town.


Big boys SQL is relatively close to the standard.


I’ve been writing some pretty ambitious Hive queries at work lately.

As I learn more about SQL and pull off more complex queries, my respect for it deepens. To have such power and support so many use cases with so few constructs is really an engineering feat. It’s timeless for a reason.

Some of my relatively common access patterns are awkward to express, but they can still be expressed in a few lines + a CTE or two, which is really impressive for a language so small.

This is not to say we can’t do better. But SQL has achieved a deep resonance with its problem space that most tools don’t even come close to. The brightest minds and most effective tooling shops in our field would be lucky merely to do as well.


This shows that humans can learn languages and get motivated by mastering them. This does not tell anything about the consistency, composability or orthogonality if SQL. Those qualities affecting the newcomers effort to learn it.


I think you mean the relational model is timeless. SQL is simply just not a great language, for an otherwise great idea.


Blindly asserting that SQL is bad without providing any argument or proof does not contradict OP's opinion on SQL.


Almost anything written by Date and/or Darwen during the last 30 years is stock full of arguments and proofs that SQL is a horrendously poor language. pls don't come complaining that all of those arguments are not replicated here.

I'll give you just one : in SQL, you can write "WHERE 4 > (SELECT COUNT() FROM ... WHERE ... )" but you cannot write "WHERE (SELECT COUNT() FROM ... WHERE ... ) < 4" [Darwen, "The askew Wall"].

Or iow, for certain specific kinds of 'a' and 'b', you can write "a < b" but not "b > a". Do you actually know ANY language that exposes that kind of thing ??? REALLY ???


The article itself gives a pretty decent description of how the language falls down.. and the ggp’s expression of his impression pretty trivially just maps to the relational model/algebra.... so all necessary arguments have been made :-)


really? the article has a handful of strange edge cases that you'll almost never smack into in practice, and that's cause to say the language is crap?

Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

If it's not broken, let's not fix it, eh?


The only edge cases I could find in the article were the discussion of nulls... which aren’t so much an edge case as they are just an ever-present problem that no one really notices until it hurts (its not at all difficult to write a broken-on-null query). Is there something else you’re referring to? Afaict, everything else was just trivially observable aspects of the language

>Given the vast usage of SQL and its overwhelming popularity, if there were real problems with the language then there would be a lot more noise about them. But most people seem to be happy with it.

In what universe does popularity imply quality? Certainly not the one I’m in — Avengers endgame is apparently #2 gross worldwide ever :-)


I think the point he is making is that why hasn't anything better come up to better represent the relational model in over 40 years?


There are other models as well eg datalog, but popularity is a factor of many things — a major factor with databases is that historically the engine is far more important than the language; you would expect users to choose an engine, and whatever language came with it. Which implies that the decision is on the producer side, not the consumers.

Which finally implies that usage popularity is not adequately explained by user preference, because you wouldn’t expect user language preference to have significant impact on the decision making process.

But regardless, are you also going to claim that C, Oracle, IBM, Microsoft, etc were optimal in their respective fields, during their respective eras, because they held total market domination? Incumbency is one hell if a drug..

Bandwaggoning and deffering to the status quo is not argument for quality; there are too many factors involved beyond quality itself. Fixable flaws in the language have been pointed out and not been addressed in this conversation chain. Alternatives languages exist, both academically and practically; they don’t come alone. Its not a competition in a void, the ecosystem has to move together. This occurs in every aspect of tech (did you ever wonder if there could be an OS other than unix and window styles?).

Asking why SQL was never beaten out in popularity is a hugely different question from whether there can be a better language than SQL, with largely unrelated amswers(Oracle/IBM were hardly fair players in anything they did)


> There are other models as well eg datalog, but popularity is a factor of many things

The lack of popularity is also a reflection of how the string of next best things have failed to actually deliver on their promise, accompanied by the lack of a rational argument to adopt them instead of using a time-tested technology.


What time-tested technology are you referring to? I'm only talking about the SQL language -- not the relational model, not the RDBMS engine, not the drivers.

The SQL language is just an API to the total engine; the majority of its value derives from the relational model; the value of the relational algebra is not being questioned. Only the particular interface to describe the relational algebra.

The power of the JVM is not the power of the Java language. You can have other languages make use of the same power Java has access to by targetting the JVM, and as a programming community we accept that just fine (and we also accept that despite Java having many known flaws as a language, its status as a first-party Oracle interface to the JVM, and its position within the status quo, makes it extremely difficult to upend; but that hardly implies Java is some perfect language, as most HN users will trivially acknowledge).

In the same fashion, the SQL language is (ideally) decoupled from the RDBMS engine; it can be replaced. But in practice, they're not so decoupled, and SQL has been consistently the only first-class interface to the engine, so like Java, it (can) enjoys far greater ubiquity and stability from the quality of the underlying tech, than the language itself may deserve.

So once again; popularity/stability of the SQL language is not (necessarily) exemplary of the quality of the SQL language. It's much more likely that it represents the value of the RDBMS, and the relational model (which I don't think anyone is arguing against), and the SQL language enjoys a free ride by being the only interface thats even offered.

I mean hell, just read the article. It has trivially observable flaws in its design and semantics (like stuffing a 3VL logic into a 2VL language). EdgeQL may or may not be the optimal solution, and I'm not arguing (or interested in arguing) one way or the other on that, but I don't see how anyone can reasonably argue that SQL's ubiquity shows its perfection, when it's so clearly imperfect (because those flaws are being very directly pointed out).


You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years. For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

Why is SQL the exception to the rule? You talk about popularity, but its really a question about stability. Remember, SQL is not that popular, as the NOSQL movement and comments like yours prove.


> For example, in systems programming over that same timespan we've had assembly, c, c++ and rust.

I don't think you've picked a very good example here of a field where the lingua franca has progressed over time. C still dominates nearly all systems programming, against all odds. It dominates operating system development, driver development, embedded systems development, pretty much anything that demands an extremely high degree of efficiency.

I agree with your overall point regarding SQL and its superior stability, but C is an example of where popularity won out against many better options over time. It's not like Rust was the first mainstream attempt to create safer languages for systems development either. There's been many over the years designed to fill the same niche, with more modern features, tools and goals. For better or worse, C still dominates the landscape for reasons entirely independent of whether or not it was truly the best tool for the every task.

Also, I'd contend that your claim regarding the popularity of NoSQL is incorrect. If you only talk to web developers writing Javascript, you'd get the impression that NoSQL is taking over the world. But the reality is that amongst pretty much every other demographic, SQL is still highly regarded as the ideal technology.


>You still haven't explained why SQL in particular has ruled over its problem domain for over 40 years

Well yes, because I argued that it was irrelevant to the original question: can SQL be improved?

Additionally I actually did address the question of SQL’s stability, at least partially: it’s not SQL thats so stable, but the relational model. SQL just happened to be IBM’s, and IBM was highly successful pushing its DB around, and Oracle (kind-of) cloned it to push their DB around with less contention, and so it went on. But its the RDBMS engine that primarily pushes a DB’s value; The SQL language is a ride-along.

And once more to be clear: its longevity is not a result of SQL’s quality, but the quality of the relational model. Thus its popularity, and stability (its also not that stable, in that its heavily extended by everyone in arbitrary fashion), is irrelevant to the original question.

>SQL is not that popular, as the NOSQL movement

If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

If they’ve chosen NoSQL to avoid the relational model, then their choice says nothing about the SQL language.


I think the historical truth is that Oracle was first to market, and IBM just adopted SQL so it would not risk being waaaaaaaaay too late to the market "party". [Darwen, "Why are there no relational DBMS's"]


> If people are trying to make use of NoSQL because they want to avoid the SQL language (not the relational model), they’ve made a grave mistake in understanding their technologies; I don’t think such a naive opinion should be considered relevant to the equation.

Yes, they are trying to avoid the relational model by avoiding SQL because SQL has basically been the face of the relational model for 40 years. if there was something better, they'd use that. Nothing better has come up, and i don't know why either.


Better things have come up. See the projects list at http://www.thethirdmanifesto.com/ .


but no-one is using them, so are they actually better? (philosphical question, also pertains to a ton of better options... I mourn BeOS)


The particular meaning of "better" here was "better at representing the relational model".

If you want to believe that popularity is a measure or reliable indicator of quality, I can't be bothered with you.

PS I have a worked-out example on my site of what it takes to enforce a business rule "no one has a salary higher than his manager's" in SQL (> 100 LOC) and in SIRA_PRISE (one single relatively simple formula of the RA to declare). You decide which is "better".


Much of the comments (and the article) criticize SQL as being non-standard and difficult to learn. These critiques have been around as long as SQL has.

However, there is a more insidious problem with SQL: it's all too easy to write SQL statements that have O(N^2) complexity. A simple JOIN can easily result in O(N^2) complexity, yet there aren't easy tools to identify these performance issues. As a result, as a database grows, things that once were executed quickly take forever.

I'd like to see the end of joins, replaced with something that is more explicit about what is happening under the hood.


Your concerns are warranted.

However. You are wrong tying the problem to joins. I remember an analyst who launched a SELECT COUNT because he was just curious about the number of rows in the table. No joins involved but users did suffer. Elsewhere in this thread I've seen the problem be tied to table scans, and that's also wrong. A table scan isn't a problem if it's a 5-page table. As Darwen often argued : why are people always only lamenting about those couple of tables with millions of rows ? Why should we deprive users of the power of relational algebra if their database simply aren't that big ?

It's a matter of determining the cost of the data access strategy (regardless of JOIN/EXCEPT/what_have_you) and (implementing a protocol for) capping it at runtime (or earlier if possible). No need for language changes here.


> Your concerns are warranted. However. You are wrong tying the problem to joins. ... It's a matter of determining the cost of the data access strategy

I was using joins as the common example. Sure, there are many other ways of using SQL to shoot yourself in the foot, but most of the issues I've run into seem to come from joins.


There are real problems with COBOL and, yet, it's used in the most critical parts of our societal infrastructure. People just learned to live with the problems and are by now completely oblivious to them.


SQL is too broken to even try fixing.


there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines. this was largely overlooked in the article and undefined what operations between set of multiple cardinality would require and if this would result back to what they call unwieldy runtime errors when doing math over set of mismatched cardinality.


>there's a reason why not everything in SQL is a set and it's that you can do a lot of mathematical work from inside the engines

I don’t see the relationship between those two things; or rather, what the latter even means. Can you expand?

Also, apparently the “sets” described in the article are actually multisets aka bags [0] — so the same semantics as any other RDBMS/SQL. No idea why they’d confound the two, especially when set vs bag semantics is a well-discussed topic in the literature..

[0] https://edgedb.com/docs/edgeql/overview/#everything-is-a-set


something like this

SELECT StudentID, Name, (SELECT COUNT(x) FROM StudentExam WHERE StudentExam.StudentID = Student.StudentID) / (SELECT COUNT(x) FROM StudentEnrollmetnPeriods WHERE StudentEnrollmetnPeriods.StudentID = Student.StudentID) AS ExamsTakenPerYear FROM Student ORDER BY ExamsTaken DESC;

works because the result from aggregate are scalar and thus math operations always meaningful, albeit this in particular might not be the most brilliant example, there's definitely a case for running such operations on the server, because you might filter on the results, say, like

"select all student with less than 2 exam per year average"

at which point either you have the distinction between scalar and set in the language itself and you can filter invalid queries at the parsing level or you have to do the checks at runtime when two set are in an operation with mixed cardinality, halt the query and throw an error.

edit: count(x) because I don't know how to escape asteriks


Set and scalar should be regarded as data types in a more traditional programming language, and treating a 1-element set as a scalar is a type coercion that works as much as other type coercions: only if the data matches expectations.


you're missing the point: it's not about coercion or data types per se, it's whether the error happens at parsing or runtime; of course the software has ways to figure out intention but is it something we actually want?

it's not coincidence that the article complains about the same thing but in reverse:

> This is legal, but only if the subquery returns not more than one row. Otherwise, an error would be raised at run time.

except the 'fix' reintroduce it in a way that's subtler and way hard to detect because now everything is a set even when the intent is to have a scalar.


Hive queries are written in HiveQL, not SQL. I used to write a lot of Hive and Impala queries, and going back to plain SQL is disappointing.


While based on SQL, HiveQL does not strictly follow the full SQL-92 standard, just like all the other SQL dialects out there. hiveQL is SQL.


That's a fair argument, but from personal experience HiveQL has some pretty major features added on that make it "have such power and support so many use cases" in a way that other common variants like the one used by MySQL doesn't.

A lot of HiveQL's power comes from extending the language, not from the use of a small set of timeless features.


What is your favourite HiveQl extension?


It supports so many use cases primarily because of the small number of fundamental "constructs".

I mean, if I gave you protons, electrons and neutrons, you could build the universe out of them!


That doesn't make sense, if I give you more things (I'm avoiding saying 'atom', but you know what I mean) then it's not the case that you can suddenly do less.

Expressiveness isn't inversely correlated to number of constructs.


Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.


I think your premise might be flawed.

Are we just assuming that protons and neutrons automatically pull in the quarks, neutrinos, other leptons and force carriers as well? Does the electron have a photon dependency? What about dark matter?

If not, then the our three particles are internally inconsistent and woefully incapable of building a universe. Otherwise, our set of dependencies is essentially the whole darn universe anyway.

Same goes for atoms. Give me enough and I can build a star or particle accelerator to create whatever elements or fundamental particles I want.


Wow, it was a simplistic passing metaphor. I was not expecting to get into the complexities of physics, but simply that if you operate at a more fundamental level, you can generally address a greater range of problems.


The point is that your metaphor is broken and incorrect. GP's point stands. Abstractions are leaky.


Even leaky abstractions can be useful, if they are taken in context. If you choose not to understand the spirit of my comment, and take its meaning literally to the point of it having no utility for you, then so be it. A significant number of upvotes suggest GP's point may be moot, and, dare I say, pedantic.

Also, the GPs suggestion of building an accelerator is as silly as someone writing an assembly compiler via a mush of SQL. It probably can be done, but it's not in the spirit of the topic!


Fair point. Abstractions, leaky or otherwise, are helpful! It does seem a bit like we've successfully abstracted the conversation into oblivion though. hehe

Anyway, I feel like your original comment and my reply are kind of doing the same thing at the meta-conversation level. We're stretching a metaphor in order to make some counterpoint argument. My intent with the excessive pedantry was to highlight the farce of conflating rough metaphor with substantive insight.

Anyway, internet conversations are hard. Thanks for engaging!


> Let's say you gave me an atom. In this situation, you can't create other atoms. When you give me protons, electrons and neutrons, I can create any atom I like, in addition to anything else in the universe. Thus, when you give me an atom, you actually reduce the possibilities somewhat.

> In reality, it's not just about the small number of constructs, but also about how fundamental they are. In one sense, atoms are less fundamental than protons/neutrons/electrons, and therefore reduce the number of possible creations.

I think what OJFord is getting at is, what if you had more subatomic particles to build with?


The more fundamental you go, the lesser number of things you need to form a "complete set".

Thus, whatever the complete set is for subatomic particles, it will most certainly have lesser members than for the complete set of atoms.

Getting back to my original point, SQL operates at a fundamental enough level that you do not need a great number of "constructs" to create a query that gives you whatever it is that you need.

I'm not saying it's perfect, but was just supplementing the OPs observation that SQL has surprisingly few "constructs".


Depends on whether photons exist or not.

Radiation makes the universe a lot more interesting.


In your analogy, SQL is the periodic table.


Having used splunk's query language... I'd hope that in general vendors start with an assumption of SQL support and maybe try tweaking it at the edges rather than burning the house down - SQL has survived this long because it's extremely expressive and any replacement for it is going to need to match that expressiveness.

All that said I think it was originally structured to partially be a human readable language and it fails pretty hard at that - that's a facet I'm sure smart people could revise to make more natural.


To be fair, Splunk excels at dealing with unstructured logs, which is in many ways a harder problem. Obviously in a perfect world, all our data would be structured, which would make querying much simpler.


I used to work for Splunk. Querying Splunk with SQL is completely plausible, and something that Splunk has made a number of attempts at over the years.

The problem isn't SQL. It's that Splunk's query engine is tied up internally with a "grammar" that is a direct port of a shell pipeline into C++ with no intermediate representation or anything a compiler guy would recognize as a grammar. There was no design, no mathematical underpinning to it.

Splunk's unstructured log capabilities are really domain knowledge about making them semistructured as fast as possible: token indexing, a lot of effort on recognizing character encodings and timestamps intelligently, looking for key=value pairs, and letting people write regexes to extract fields themselves. The query language isn't somehow designed for a different data model.

In EWD1123, Dijkstra showed that the relational calculus and the regularity calculus (which governs regexes) are basically the same thing. My takeaway from that is that the relational model can be reinterpreted as a model over anything you want to match and manipulate with regexes by just changing the field selectors.


Typically the earlier part of your pipeline turns the logs into data records via regex captures. So you could have had a pipeline of what are essentially shell commands that produce a table that SQL operates on.


I don't buy that, splunk has a clear concept of columns and rows and could have chosen to expose support for a SQL like grammar for assembling them - they instead chose a less structured format that makes data assembly quite difficult.

More legitimately splunk may simply be unable to deliver performant data expression if a user is typing a query complex enough to justify SQL.


> I use Postgres […] SQL is pretty darn effective.

FWIW postgres used to have its own query language derived from QUEL[0] rather than SQL.

And findings that SQL is kinda shit are not exactly recent, e.g. C.J. Date's "A Critique of the SQL Language" (1983) lists the following sections

* lack of orthogonality: expressions

* lack of orthogonality: builtin functions

* lack of orthogonality: miscellaneous items formal definition

* mismatch with host languages

* missing function

* mistakes

* aspects of the relational model not supported

The conclusion was, obviously, prescient:

> if SQL is adopted on a wide scale in its present fortm~ then we will to some degree have missed the relational boat~ or at least failed to capitalize to the fullest possible extent on the potential of the relational model. That would be a pity, because we had an opportunity to do it right, and with a little effort we could have done so. The question is whether it is now too late. I sincerely hope not.

SQL succeeded not because it's "pretty darn effective" but because IBM decided on it (at a time where it drove technology) and Oracle are great at sales and marketing (whereas Ingres definitely wasn't).

[0] https://en.wikipedia.org/wiki/QUEL_query_languages


> I use Postgres in the first place because there are so many improvements available on top of SQL.

Most of Postgres is standard SQL. It's just that most non-Postgres databases do not implement standard SQL very well.


The various extensions / plugins that allow for custom data types, indexes, use of multiple programming languages to write functions, ability to use a foreign data wrapper to connect to Redis and build a VIEW out of the result or push data out to Redis/Memcached with a database function, varieties of powerful search capabilities, etc

It's got a lot of stuff going on in there.


> ability to use a foreign data wrapper to connect to Redis and build a VIEW out of the result or push data out to Redis/Memcached with a database function,

This sounds amazing. Does anyone have a link to some docs for this?


That "most" is a pretty loaded word! Last I checked, even CREATE INDEX is not part of any ANSI or ISO SQL specification. (That's why every RDBMS has such different features and syntax for this.) Good luck building a system with just "standard SQL".


> even CREATE INDEX is not part of any ANSI or ISO SQL specification

That's correct, and that's why PostgreSQL has ADD CONSTRAINT.

The constraints describe the actual schema of the data; INDEX is an DBMS-specific implementation* for improving performance (including index types, etc.).

* Those since the standard does not yet cover some thing partial unique constraints, these have to be done as INDEX in PostgreSQL.


SQL has nothing to say about the layer of physical implementation, and that's where INDEX belongs (as does STOGROUP [DB2] and what have you).

That's not a bug, that's a feature. Deriving from the deliberate intent to make physical independence (which was simply nonexistent at the time the model was conceived) a reality.


Why should it be? Indexing is an implementation detail for each RDBMS, so it doesn’t make sense to add it to the language spec.


It does make sense when the goal of the language spec is interoperability, and it's something that everyone has to do, in practice, no matter the platform.

Call it a "hint", if you will, the meaning of which is implementation defined. But standardize the syntax for pete's sake.


Interoperability is achieved if the same query with the same inputs yields the same result. "In the same amount of time" (which is the part that indexes are aimed at) is manifestly not part of that picture, and that's by design.

And "hints" in the language are not exactly going to improve "interoperability" if their meaning is still allowed to be implementation-defined, are they ?


That sounds backwards to me. It's only lack of standardization that forces it to be an "implementation detail". To any application that needs to be able to query data in reasonable time, it's fundamental and necessary functionality.


Most of Postgres is standard SQL. It's just that most non-Postgres databases do not implement standard SQL very well.

Sure, but the non-standard enhancements like JSON support are part of what sets Postgres apart from the competition IMO.


SQL:2016 includes support for JSON: https://en.wikipedia.org/wiki/SQL:2016


Postgres 12, whose development has reached feature freeze and is in a stability period, has added some support for JSON path as of SQL:2016: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

The release is planned for roughly next September/October.


Interesting, I wonder which databases actually implement SQL:2016 (and JSON as standardized). JSON support in Postgres predates SQL:2016 in any case.


I think Oracle does. My impression is that PostgreSQL sort-of "invented" JSON support in the db, then Oracle (and probably others) added JSON support with a different syntax, and then Oracle got their version defined into the SQL standard. I'm half-guessing here, but it's based on what I've gleaned from Markus Winand's [0] excellent compatibility tables in his slides [1].

[0] https://modern-sql.com/

[1] https://modern-sql.com/slides/ModernSQL-2019-05-08.pdf, for example

(Edited for formatting.)


Wait, what major relational databases don't support JSON? Oracle and Sql Server definitely do. Does MySQL not support JSON?


SQLite (the most popular RDBMS today) doesn't, unless you count extensions. And of course, none of them use the same syntax or functions or data types.


While it is technically an optional extension in sqlite, it is part of the main source distribution, and very easy to enable.


And why wouldn't you count SQLite extensions? Its JSON1 extension is practically ubiquitous.


If I recall correctly, Oracle's support for JSON is not at the same level as PostgreSQL. By that I mean seamlessly store JSON as a data type like any other. In Oracle JSON is stored as VARCHAR/CLOB and then you implement tests to validate whereas in PostgreSQL and MSSQL is its own data type.


Similar story with SQL Server. I believe JSON support is a bit better in SQL Server 2019, but still far lacking behind that of Postgres.


Hi, MySQL supports JSON (it has a native JSON datatype) but it also support NoSQL (CRUD operations). For more details, check https://www.slideshare.net/lefred.descamps/pyconx-python-mys... or https://www.slideshare.net/lefred.descamps/oracle-code-roma-...


Doesn't every major RDBMS support JSON out of the box these days?


(Last I checked atleast) JSON isn't as well supported on MySQl or MSSQL compared to PG.

In PG, a JSON column is so well integrated that you can do all sorts of crazy stuff (indices over JSON queries is my favorite). You could build an entire RDBMS on top of PG's JSON column.


MSSQL indexes and provides dot notation/object query from 2016 forward, schemas are supported as well.


I feel like Postgres is so powerful I could damn near build an entire web app backend with JUST Postgres (i'm only sorta kidding here). If that's standard SQL, well then I really like standard SQL :-)


I've done pretty much this a few times, and it's amazing if you're working in the context of enterprise data systems that need to provide extensive capabilities to a "small" userbase (i.e. concurrent in the thousands). You just need a small shim layer for security, to transform results sets, and handle browser -> database connectivity.

Postgrest works very well as this shim layer, though I've moved on to writing sql directly in the client and communicating through a web socket shim. In terms of reducing code complexity and improving performance this is absolutely unbeatable, you just need to parse incoming sql to sanitize it and make sure there is no role escalation. Because of postgres's foreign data wrappers this method can provide a consistent surface for basically all your enterprise data. The only gotcha with FDWs is that some of them don't "push down" many query clauses, so you end up doing much slower queries on the remote system and filtering locally, which is terrible for obvious reasons. That being said, the FDWs are pretty much all open source, so you can just implement push down support for those missing clauses yourself.


> you just need to parse incoming sql to sanitize it and make sure there is no role escalation

"Just"? I guess I'm skeptical of a statement that begins "you just have to parse sql".

Is this actually easier than I'm imagining it? I'd be curious to hear more about the security and authorization model of this approach.


You don't have to do full statement parsing, you basically just have to do a limited parse that looks for the various ways that someone could execute a set role statement. As long as you don't let a user execute set role, you use db roles for user accounts, you have a reasonable statement timeout in place to prevent DOS, and your postgres security model is tight (the big gotcha is not to allow access to untrusted code), this approach works fine.

A good tool for this purpose is https://github.com/JavaScriptor/js-sql-parser as it will fail to parse complex statements that are likely to include an attack vector.

In terms of authorization, you can either create per user connection pools if using web sockets and log the user in directly that way (which makes things easy) or if you must use rest, use a single connection pool with a master user then use some form of token to tell the shim who to set role to before executing the query.


Thank you for posting this. I have been thinking about using something along these lines for awhile now. Next project that fits the bill I will see about building a proof of concept implementation and go from there. Have you had any exposure to any of the other projects similar to PostgREST that you have any thoughts about?


I've played with Graphile, it's not a bad choice if you are already invested in GraphQL. I'm not a GraphQL fan for a number of reasons, but it's a definitely a solid project.



I'm keeping an eye on https://github.com/aquametalabs/aquameta "Web development platform built entirely in PostgreSQL"


Graphile looks quite intriguing! any experience with it? Recommended?


I would also have a look at Hasura GraphQLEngine. Have used it now for various projects and is extremely nice, especially with the superb subscriptions support and eventing for the odd requirement that cant be handled within Postgres


I'm using Graphile in deployment at the moment, and it works quite nicely. Granted, we have a limited amount of users thus far, so I don't really know how well it handles big loads, but it allowed us to get off the ground with a PoC _really_ quickly.


Same. As a data scientist, I have seen multiple 'analytics workbench' solutions come, go and occasionally stay - Clementine (what is now IBM SPSS), Stata, SAS and its many variants, Statistica and in the recent years, tools built off and designed to make working with data using Python/R.

But, the common workhorse tool that has stayed strong through all these has been the common SQL. Elegant, simple, powerful and thoroughly reliable, it is my primary go-to tool. In an otherwise changing ecosystem, its simplicity and reliability is a boon. Yes, it is primarily because of the nostalgic familiarity but I also believe it continues to be extremely powerful, one that will serve you very well.


I came here to write something similar, but you said it better than I could. Sure, SQL has the problems that the article mentions and their solution looks nice on the surface, but SQL has worked much better for me than any alternatives, especially if it’s slightly extended SQL like in Postgres, as you mention, and any attempts to improve it will always be an uphill battle as SQL is pervasive and well supported. That doesn't mean people shouldn't try, but it does mean that their likelihood of success depends on factors other than whether the solution is an improvement over SQL or not.


I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Of course we can do better than SQL. We could obviously do better than Javascript. It’s globally understood that we could all do better than English.

We sometimes struggle to express ourselves with language. We can blame the language, try to fix it, or invent a new language and try to get people to speak it.

I would rather spend my time refining my elocution than learn a new language. That said, there are words and phrases that simply work better in other languages. I don’t know a single-word corollary to ‘Simpatico’ in English.

I wish the best to those who would make programming more expressive, and the worst to those who would try to streamline away subtlety.

Performance is quite loosely linked to language, given sufficient abstractions and optimizations. I submit NumPy as an example. We humans have lots of ways to say what we want, and we want lots of different things in myriad ways. As an analyst I often wish for a richer language, some way I could transcend tabular data thinking to find and make associations around real-world state. I’m certainly self-satisfied when my nested subqueries return what I’m expecting, but I may be able to add more value if I had a better way of expressing my questions.

If we can do better than SQL, it ought to bring more people closer to the reality that lies behind the data, and further from the methods used to obtain it. Maybe I don’t have the words for what I’m looking for yet...


The article was still useful. I don't mind an ad at the bottom if there's genuine value. I am not experienced in SQL so learning of its shortcomings (especially the NULL part) will be useful when I write the odd query here and there.


>I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Posts have an upvote button, but no downvote button, so dissenters can only use comments to express disagreement.


And people often upvote based solely on the post title.


> I don’t know a single-word corollary to ‘Simpatico’ in English.

"amiable"? "congenial"? "affable"?


I wonder what the parent meant by "corollary"? Simpatico is actually an English word now..


Is that USA English? Never heard/read it at all in UK, not even in a crossword. Nor, ever online (here, reddit, etc.) or from other global sources.


It's in Oxford online and Merriam Webster.


> I’ll never understand why obvious marketing makes it to the front page, only to get shat upon by 95% of the comments.

Upvote manipulation maybe? Then once it is on the front-page it gets lots of exposure. So the probability to get upvoted becomes higher.


It's another clever feature of Hacker News, in my book.

Perhaps YCombinator omitted the downvote button because it can be abused. For example, if there is a story that is bad news for a certain company, that company could organize a downvoting campaign to hide it. But the upvote button can also be manipulated, as you say.

However, which is worse? (A) To write a silly story and have it rightfully downvoted into obscurity, or (B) To write a silly story, have it upvoted into the limelight, but be littered with comments that expose its flaws.

I think that as long as I remember that just because a story is on the front page doesn't mean it's right, then Hacker News has a nicely curated set of moderation rules.


>For example, if there is a story that is bad news for a certain company, that company could organize a downvoting campaign to hide it.

That's how story flagging currently works though, and is significantly more effective


Yeah I'm not sold. One example from this post that struck me was the author wanted to embed a select in a table expression. I'm not a fan of this at all. I don't want it not to be clear if a given expression list will explode in values or not.

I like the fact that SQL has a solid foundation in relational algebra. I see no such foundation for the alternative.

I do like what LINQ did here (being SQLish), which was to put the FROM clauses first. Some SQL variants have WITH clauses that are quite convenient but you end up with:

    WITH (...) AS a,
    (...) AS b
    SELECT
      a.a1,
      b.b1
    FROM a
    JOIN b
    ON a.a = b.b
Common alternative:

   SELECT
     a.a1,
     b.b1
   FROM (...) a
   JOIN (...) b
   ON a.a = b.b
whereas I'd prefer:

   FROM (...) a
   JOIN (...) b
   ON a.a = b.b
   SELECT
     a.a1,
     b.b1


> Some SQL variants have WITH clauses

“Some variants”, including standard SQL since SQL:1999.


WITH is supported by all major SQL brands in the meanwhile.

https://modern-sql.com/feature/with#compatibility


Does the spec have any execution requirements re WITH? Pretty sure `WITH` in postgresql is gated and thus can have considerable performance implication vs a nested query (or none at all depending on the query).


Execution is down to the RDBMS.

Postgres is changing in the next release, BTW: https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...


Impossibility of With clause inlining is indeed one of the major limitation of Postgres, compared to Oracle. We we still have to wait for optimizer hints


If the language is the problem, why write a new server?

Transpile your language to the equivalent SQL, and rely on decades of research and real world experience in things like replication, optimization, locking strategies, high availability, security etc. the things unrelated to the language current SQL databases are really good at.

Enterprises need enterprisey features.


This is exactly what we do. EdgeDB is based on PostgreSQL.


And do you support other backends as well (i.e. MySQL?). Does EdgeDB still offer a regular SQL interface because people need time to migrate.

Even if I was convinced EQL was the future I wouldn’t throw out the old stuff. If the old stuff continues working but there is a smooth migration path, I would probably give it a try.

Think of TypeScript vs. JavaScript.


> And do you support other backends as well (i.e. MySQL?). Does EdgeDB still offer a regular SQL interface because people need time to migrate.

No, but we have a few ideas on how to connect existing databases to EdgeDB.

> Even if I was convinced EQL was the future I wouldn’t throw out the old stuff. If the old stuff continues working but there is a smooth migration path, I would probably give it a try.

Yes, we'll be working on that.


I know CompanyDB is the new company.com but I feel this would all be a bit more clear if you just called the company EdgeQL.


Which version, and how easy is it for you to track the 'latest and greatest' Postgres version, as they come out?


Latest stable and we'll keep it that way.


That's wonderful! It's good to know that there's a tested DBMS under the hood.


It looks like this is built on top of Postgres, so they're already doing this.


I missed that. It makes sense.


Because the 'servers' in existence speak only one language and it is that problematic one we want and need to get rid of.

Your 'transpile' handwaving won't fly. You are blindly presuming that the 'SQL equivalent' (a) always exists (it doesn't) and (b) can always be generated by an automated transpiler (it can't).


I have no background in databases so this may be naive or wrong, but the single biggest pain point in SQL that comes to my mind is that it can be difficult to tell what a query is doing without also knowing the constraints on the tables involved. Here's a real-life (ish) example from work:

We have some_table which we want to join to other_table, but we need to map an identifier through mapping_table in order to do it. So we end up with a query like:

SELECT (...) FROM some_table INNER JOIN mapping_table ON (...) INNER JOIN other_table ON (...) ...;

I know for sure when writing this query that the middle join to mapping_table will map every some_table row to exactly one row (no more, no fewer) in mapping_table. The problem is that the query doesn't capture this. The mapping table isn't really named something as obvious as "mapping_table" so someone reading the query has a hard time inferring what the intent was. It totally changes how you mentally parse and think about the query if the result set can be accumulating multiple matching rows from the join, or maybe even losing rows if there are no matches. You have to go bring up your database schema to figure this out.

And, as a fan of static typing, I can't help but cringe at the possibility of someone changing the constraints on the table without realizing that there are queries which implicitly depend on the old ones. SQL offers no resilience to this and will happily change the meaning of your query without a peep of complaint if you drop that constraint from mapping_table.

If there's a fancy way to capture this "mapping" relationship in standard SQL that doesn't just use a dumb inner join, I'd love to know about it. If not, I'd love a query language that supports some annotations that help reading and are either stripped out before sending to the database engine, or are actually checked at runtime.


If I understand correctly you are describing a one to one or one to many relationship. The canonical way to express that is 'don't use a mapping table'. Mapping tables are for many to many relationships. So why is there a mapping table in you example, and could you just get rid of it altogether?


In this case it's tying together data from different, independent systems. So the schema may not be textbook ideal but I don't think there's any way around it.


If you really must use a mapping table, I think you could put a unique constraint on both of the foreign key columns too the mapped tables. That would enforce that every row in one table maps to at most one row in the other


I don’t think that fundamentally changes the nature of a one to many relationship though. Unless your mapping table has a variety of different columns to join on.


Can you use views or other feature to tie together a table and its mapping table into a single view?


> I know for sure when writing this query that the middle join to mapping_table will map every some_table row to exactly one row (no more, no fewer) in mapping_table.

Your example query joins one some_table row with 1-n rows on mapping table, and another 1-n rows from whatever else is in there to that. If you're expecting a single row in the resulting set per some_table row, it means that you're filtering very hard (which is fine) or that you've a schema problem (which is the actual problem).


In CQL (http://categoricaldata.net), which generalizes relational theory with category theory, you can annotate schemas with equations and have them checked at runtime, or at compile time with an automated theorem prover (e.g., to establish that a query into a schema with a constraint will always materialize an instance that satisfies that constraint). One example is de-normalization: https://www.categoricaldata.net/denorm.php


I get that with some queries, you're not sure how the data is going to be retrieved, and letting the database figure all of it out for you is a good strategy. But with a lot of queries, particularly the ones that I'm doing in the milliseconds of a pageload, I want to be very sure that all my joins are hitting efficient indexes. I hate that someone can change my schema, and that can turn my efficient index lookup into a horrible scan, without breaking tests.

Folks who know more SQL than me: Is there a good way to say "I would rather this query fail than try to scan a table?"


> I hate that someone can change my schema, and that can turn my efficient index lookup into a horrible scan, without breaking tests.

They can't, if you do perf testing as part of your pre-release testing, which you should do.

If you aren't doing perf testing, you are saying perf isn't an acceptance criteria, so why are you upset that breaking perf doesn't break tests?


Where you start to go wrong is where your talk starts to be of "YOUR" schema. The schema isn't yours, it's the company's. And guarding it is the DBA's job.

(I understand full well that that is a problem if the company has kicked out the DBA role and handed it over to the individual programmers, but perhaps that is precisely the problem.)


Fair, though maybe I could describe the same problem from the other end. If I'm a DBA and I want to change the way indexing is done, I might need to audit every query in my company to figure out which ones depend on the old index. One way or another, it seems like knowledge about a query's intent to use an index could've been captured explicitly, in a way that's automatically enforced in the future against any number of accidental breaks. But instead all that knowledge is implicit.


Integrating EXPLAINs in your build/deploy process should, in principle, make it possible for anyone to address any concern in such realms.

The DBA you were mentioning here, e.g., could first do a query in the EXPLAIN results, and he'll have his "company-wide queries audit" in a matter of seconds.


I am quite sure other implementations of SQL have similar tools but from my support of an iSeries.

The SQL scripting function is a tool run from a desktop, all emulation and such is JAVA based, with the feature to ask the system what the query is doing. The feature called Visual Explain will explode the query into a graphic representation of how the system optimized it to run. It will recommend indexes as needed. This is very good for understanding when table scans are forced, how files actually joined up, and more.


> And, as a fan of static typing, I can't help but cringe at the possibility of someone changing the constraints on the table without realizing that there are queries which implicitly depend on the old ones

When creating views and sprocs, SQL Server lets you mark them as 'schema bound', creating dependencies on the schema objects it uses.

Not sure if something like this exists in Postgres?


All views in PostgreSQL are schema bound, which is usually good but can be annoying at times.


Ah, I didn't know that!


I don't disagree with the obviously true statement, but this code comment comes to mind:

    // Dear maintainer:
    // 
    // Once you are done trying to 'optimize' this routine,
    // and have realized what a terrible mistake that was,
    // please increment the following counter as a warning
    // to the next guy:
    // 
    // total_hours_wasted_here = 42


Also, don't forget the hours wasted of the people having to learn another query language. :)


You can test that for yourself. Go through the three language-related sections on my site and decide for yourself how many hours you'd need to "waste" before you would "get it".

And as for "waste" : if while learning a better language, they are in addition also learning the relational model, and to think relationally, (their knowledge of both of which will be VERY poor if all they've ever seen is SQL) then there can't have been much time "wasted", can it ?


Or the XKCD with the 14 competing standards and, "let's make one good standard," and now you have 15 competing standards.


I think "ease of use" over SQL is not the hill I would die over if I were trying to displace SQL.

It's far too embedded throughout the entire industry and as a data analyst, learning EdgeQL vs SQL and then being locked into a new startup database that could disappear in a year doesn't seem like a high probability strategy.

I wish the people all the luck but unfortunately SQL is "good enough", pretty standardized (I can use just about any relational database and get useful data by knowing the basics). The inconsistencies may be mathematically "ugly" but it's not hard to wrap your head around and overcome.


A good replacement tech usually gets piggy-backed on the tech it's going to replace.

That is, a reliable tool that translates 99% of normal SQL into readable Edge SQL, and vice versa, would help adoption a lot.

Remember how new JavaScript features became mainstream though transpiling, long before native implementations.


Agreed. I think this is one reason that Looker's LookML has been successful. Not that it's entirely what I'd want out of a "SQL replacement", but it's an enhancement that "compiles" down to SQL rather than looking to replace it. Plus, you can always go direct to SQL, in case you need to take advantage of some specific feature or complexity that their language doesn't address itself.


No it won't, because you add immense friction for no added features.


Composability is quite a feature for me.


I accidentally built an in-memory database that now lives prominently in our production stack. It works great, its incredibly performant, the codebase is relatively simple (it makes heavy use of code-generation), it will scale very well - but not a day goes by I don't think what if I had just taken the time to adapt an existing solution to the problem set.

There are just so many free things you get with SQL and established RDBMS that deeply impact application features, quality, stability, operations, and much, much, more. I've had to write a custom mongo-db like interface for querying, as well as a fair number of hacky bits to effectively cover the surface area of SQL in an inferior way.

I've learned tremendously, but I just wish people don't follow in my exact footsteps because that's probably wasted time.


SQLite is a wonderful tool for such a scenario.


As someone who landed up doing what you did, I also had days where I felt what you have described here. On the other hand by going custom you can usually exploit some understanding you have of your problem to great effect.


The middlebrow dismissal strikes again!

Why is it so hard to imagine something displacing SQL? A simpler, more predictable syntax seems perfectly plausible--it could ship alongside SQL. Do we have StockHolm Syndrome?

The negativity is surprising and at the same time predictable.


Because of the decades of things not displacing it - when someone suggests "oh we'll just do it simpler" they often are not seeing the forest for the trees.

Simpler languages have been shipped dozens if not hundreds of times, and they generally tend towards expressing the things they missed or not giving enough functionality for the things they missed.

I am not saying its impossible, but you're going to have to do a lot more than hand waving to justify the reverse position.


It's sort of like the mouse trap problem. Mouse traps and SQL are already incredibly simple and incredible effective, that's why you don't see a reinvented mouse trap at home depot and why SQL remains unseated despite many efforts to replace it.


"that's why you don't see a reinvented mouse trap at home depot"

Interesting for you to say that. The last time I went looking for mouse traps, I found that every place I went had a reinvented type, and traditional style were extremely hard to find. Yet, it is true that the traditional style is simple and effective. I've learned from experience that anything else is likely to be useless.


You really think SQL is incredibly simple? I think TFA makes a pretty convincing argument that it is anything but.


Most common SQL statements read like somewhat stilted english. Many non-programmers find this particularly accessible.

Yes you can make some lovecraftian horrors if you really want to, but SQL is one of those things where just a little bit of knowledge goes a long way. If you can understand the basics you can get a lot of work done.

It's a lot like Excel. You can do some really complex confusing stuff in Excel. But you can also teach the basics to non-programmers quite easily, and command of the basic skills will be very empowering. Basic knowledge of Excel, like SQL, gives the user new ways to leverage computers when creating their own solutions to their own problems.


Most common SQL statements read like somewhat stilted english. Many non-programmers find this particularly accessible.

The problem is not reading SQL, but writing it.

It's a lot like Excel. You can do some really complex confusing stuff in Excel […] Basic knowledge of Excel, like SQL, gives the user new ways to leverage computers when creating their own solutions to their own problems.

I can’t speak to your experiences, but I’ve never in my life encountered someone who was not a professional programmer ever even contemplating using SQL for anything let alone creating their own solutions to their own problems. I think it’s safe to assume that the overwhelming majority of people who use SQL, are programmers who most certainly cannot get by with just basic knowledge.


Librarians and secretaries are two examples of "non-programmer" careers where functional knowledge of SQL is pretty common. Less-so these days with secretaries, but moreso for librarians. Not to mention tons of researchers across countless disciplines have SQL in their toolboxes. I've even met government bureaucrats with professional backgrounds in regional banking that know SQL. Previous programming experience? Using HP-12c calculators...

>I think it’s safe to assume that the overwhelming majority of people who use SQL, are programmers who most certainly cannot get by with just basic knowledge.

Most demonstrably do though, so there's that. If you cast a wide net when polling programmers, I think you'd find that mode level of knowledge was relatively low. You don't need to be a SQL rockstar ninja dude to do what most professional programmers are doing with SQL most of the time. Obviously advanced knowledge is good for any professional programmer to have, but the fact is there are a TON of people out there who only know the basics, and that works for them.


"I’ve never in my life encountered someone who was not a professional programmer ever even contemplating using SQL for anything"

I have. That's only an anecdotal observation, but it seems to me that the existence of Visual Basic and the popularity over the years, combined with the utter disdain for it by "real programmers" is evidence that, more generally, there are a huge number of kinda, sorta, programmers who are outside the IT culture.


In the business world (e-commerce in my case) I know plenty of business analysts who write SQL as their only programming experience. That’s part of what I like about SQL. It bridges the gap.


Our support guys are mostly hired from our customer base. They have great domain knowledge, but do not have any formal tech training.

After some time, most can handle enough SQL to help customers with basic issues that cannot be handled in the application. Some have become quite good at it, and can do quite non-trivial stuff. None of these folks write any code beyond SQL.


Yes you can make some lovecraftian horrors if you really want to, but SQL is one of those things where just a little bit of knowledge goes a long way.

I agree. At the same time a little bit of knowledge is incredibly dangerous.

That query, which worked so brilliantly on the test system during devlopment, suddenly grinds all of production to a halt.

The basic problem is that indexing and other physical performance boosters were not really rerquired when those queries were tested with 3'000 customers.

Being set-based that's quite different when you suddenly deal with 30'000'000 customers and a number of joins, which may not be supported by indexes, since that was never obvious in development.

That said: I'm not arguing against SQL. It's a great language for its purpose. What I do argue for is to have an SQL domain expert and an expert on how it phyisically maps to the underlying database engine for more complex projects.

Such a resource can be immensly valuable in assisting application developers to avoid major mistakes when they deal with the underlying database.

Edit: A couple of issues, which actually negated my argument upon reread


Certainly there are complex aspects of it but you could teach someone how to do most of what you need to do in SQL in under an hour.


> The middlebrow dismissal strikes again!

Yes, it was that, but, still, there's a big and growing hurdle here that many similar efforts, with similar objective merits compared to contemporary SQL implementations, have failed to overcome, and not a lot of reason provided to think EdgeDB is better positioned.

> Why is it so hard to imagine something displacing SQL?

Because systems providing just as good solutions to largely the same set of SQL deficiencies have been produced and failed to displace SQL for a couple of decades.

The problem isn't doing better than SQL. It's doing enough better than SQL to overcome the depth of knowledge, experience, support, tooling maturity, and comfort people have with SQL. And that most gets deeper over time, on top of SQL getting internal mitigations, if not actual solutions, to some of the problems over time.

That said—as I’ve done with several before—I’ll probably download EdgeDB and try to do some stuff with it.


That's like saying nothing will replace COBOL. Or nothing will replace Fortran. Or C++. Or Java.

There's no reason the industry can't move to new technology for new projects. No one is going to rewrite legacy applications in the new language, at least not until the transition is so far along that SQL specialists are aged out and costly. But for new things? It's totally acceptable to pick new languages.

And btw, this has already happened in some areas. Most developers I know don't code in SQL: they use an ORM provided by their language runtime or a support library. That's essentially the same thing.


> And btw, this has already happened in some areas. Most developers I know don't code in SQL: they use an ORM provided by their language runtime or a support library. That's essentially the same thing.

This works well enough until your ORM shits the bed and you're forced to figure out why your SQL database is "slow".

SQL isn't slow, ORMs are just terrible when you hit an edge-case or when pretend there isn't a relational model behind your opaque materialised objects.

I used to be one of those developers you mentioned, but I'm not anymore after years of debugging "SQL performance issues" (hint: it was the ORM), and actually taking the time to learn the language and take advantage of specific RDBMS features. My preference has shifted to just use a lightweight library to materialise objects (like Dapper), and write the queries myself.


Oh I couldn’t agree more. I didn’t mean it as an endorsement of ORMs, but it does demonstrate that SQL isn’t entrenched as the sole user interface to RDBMS.


Would you adopt an extra abstraction layer (bugs, maintenance, incompatibilities and surprises included) over your database to get a coherent handling of nulls and a few other optimizations of this level?

I know I'm staying with the nulls. I wish the best luck for them, and if it survives to maturity, I'll haply go get the ~1% (probably less) more productivity they offer. But right now I'm not moving. It's sad, really, but things are stacked against them. Change is costly, so we lose all the small changes that could compose into something huge.


You even don't have to "imagine" it anymore, it already exists. See the projects list at http://www.thethirdmanifesto.com .

The crux is : it takes much much more to achieve that than what the average C# or java coder yup has to offer. So the average C# or java coder yup will dismiss those things too.


It could happen. However, the probability of success for any given attempt is low. Consider all the attempts at improving on JavaScript before Typescript.


Totally agree. It’s a bedrock technology that’s near-universal as far as company needs, at least in startup and web development environments.

The JS framework churn makes me super appreciative of the stability that SQL brings.


JS framework churn isn't what it was five years ago


You're right, it's far far worse now.


Citation needed? I've been using React for 5 years and don't foresee not using it any time soon.

Even if it's still true, "churn" translates to progress. It would be hard to argue complex web application UI development isn't better off now than it was 10 years ago.


> Citation needed?

I don't have a citation, but a colleague of mine who's working on front-end projects repeatedly complains that today's JS stacks require half a dozen base JS packages, which in turn download dozens if not hundreds of ancillary packages, not to mention requiring a couple of transpilers and a bunch of tooling.

And for what? Well, just to be able to render some text and a couple of buttons.

Nowadays we have whole server projects that take less than 50MB of source code and dependencies to build, while a miserable SPA with a login screen and a couple of menus and buttons requires nearly 400MB of JS.

That's pretty bleak.


Serious question: why do you care about 400MB vs 50MB of dev dependencies? Is your internet connection slow? Are you running out of hard drive space?

There are projects like create-react-app and Parcel which offer very reasonable zero-configuration toolchains.

If you care a lot about runtime dependency weight, there are lightweight libraries like Preact (3kB).


I know. Things were so much nicer 5 years ago just before React, and the other myriad frameworks we have now, came out.


Out of curiosity, which frameworks/libraries are you referring to, and why do you think they are nicer than the "UI as a function of state" style libraries like React?


SQL is good enough for the existing set of applications, but that's not really saying much. There are lots of other applications for which SQL is not good enough, and those applications either don't exist because an affordable alternative doesn't exist or they implement their own proprietary database (e.g., many popular BI tools). It's safe to say that your use case--using SQL to perform one-off queries--is fine; writing a program that can dynamically build (performant) SQL to access data of arbitrary schema is quite a lot harder even if you can assume a single implementation. And much of this difficulty comes down to lack of composability.

Perhaps SQL is fine if it's your interface for accessing data on a one-off basis, but if you're trying to build a complex tool on top of it (say, an analysis tool for arbitrary data), the inconsistencies and performance concerns mount. People often end up inventing their own proprietary databases to do these analyses (e.g., virtually any business intelligence tool) assuming they can afford to do so. Perhaps EdgeQL isn't the ideal alternative, but as it is SQL is not good enough for many use cases.


SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query. In most cases when people feel like SQL cannot do something it is either because the Database does not implement a part of the standard or because they are not familiar with some of the more advanced usage of SQL. Simple SELECT FROM WHERE clauses, even including JOIN, are still fairly simple compared with what you CAN do if you want.

I'd recommend reading the PGSQL manual, they go very in depth about many of the supported features and how they are implement and can be used.


"SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query."

That's kind of orthogonal to what I think is the issue being expressed here.

SQL can do many things; the problems tend to be when a query doesn't perform consistently and predictably. There's always a balance to be struck between communicating what is to be done, and how it is to be done, and SQL leaves so much of the "how" out that the query interpreter/optimizer is incredibly sophisticated and does a fantastic amount of work and yet frequently gets things spectacularly wrong, maybe due to misconfiguration and maybe due to fundamental limitations.

Obviously more information on how to do something is not always better; otherwise we'd be using assembler. But there is a balance.

Experts tend to say "write everything in one query, and if it doesn't work, fix the configuration of your database" which is not helpful given the division of responsibilities in any company. But they will say that because they are devoted to the idea that all that expressiveness is good for something.


> SQL is extremely expressive, it's almost impossible to build something that cannot be expressed in an SQL query.

If that were true, people would build RDBMS’s in SQL.


SQL isn't a general purpose language, it's a data query language.

It fulfils its purpose and does it well.


Not true.

Since the addition of SQL/PSM (1996 IIRC) it has become a computationally complete language (procedural like many of the others) with variables and loops and what have you.


That's only true tautologically--i.e., if you decide to constrain "its purpose" to the set of things SQL does well. If you want to do something perfectly reasonable--like programmatically building queries to access data of arbitrary (read "unknown at compile time") schema, you'll find it's quite hard to do this, at least if you care about performance at all. Largely because SQL doesn't compose well.


Optimization requires knowing the schema, and query usage patterns, and data stats - which competent RDBMS engines use to great effect already.

Sounds like you're looking for a magic silver bullet - there's no free lunch in our field though.

Lastly, the comment I was replying to can be paraphrased as "well is SQL is so great why aren't RDBMS' built using SQL, huh?". Which is a ridiculous question since SQL isn't the right tool for that job - its very name tells you that.

If you want to continue arguing against strawmen do it with someone else.


Most of a RDBMS is actually build using SQL. (For example, in PG constraints and foreign keys are done using triggers and SQL Functions.)


Same goes for JavaScript. It's so ubiquitous that it's worth putting up with the downsides.

It'll take several unicorns and Fortune 100s hiring thousands of engineers to code in an SQL-alternative to create an ecosystem large enough to eventually overtake SQL.


This situation seems comparable to Typescript in this analogy. It has taken several years, but I feel like it is beginning to take hold.


Agreed on all of that. Further, I wouldn't embrace something intending to displace SQL without it being authored by someone like Anders Hejlsberg (Turbo Pascal, Delphi, C#, TypeScript). That sort of involvement grants confidence that it's as "correct" as it can be for most users. That matters for buy-in. It can happen, Kotlin is a decent example, but is not embraced as widely as TypeScript has been. I'm sure that confidence plays a big role there. The stakes are much higher here with SQL, than in Java or ECMAScript.

There's plenty of brilliant people out there, but when you're talking about replacing the most successful data storage language in the history of mankind, you need everything. SQL has been "killed" many times, everyone wants to sell something. It would probably take involvement from a FAAMG entity. I think the first clue that there's no real room for technical innovation and we're staring at only the opportunity for technical churn, is that no FAAMG players, who definitely operate at-scale, have tried to displace SQL outright already.

TypeScript or Kotlin are really the closest, best and most recent examples of what would need to happen. For me as an end-user, ubiquity and skill-reusability matters. If you don't like SQL, there are ORMs.


In the financial industry, I have a seen a couple places where SQL was the interface for their payment systems. Mind you that these payment systems were not written using any kind of relational paradigm.


Hi - a lot of commenters have valid concerns and critiques, but I joined HN after lurking for years to say that I really like the direction you’ve taken! I’m particularly happy about the convenient syntax for joined inserts and sum types.

I see that you’ve built this as a patched version of Postgres, but I’m curious how much of this syntax you could implement as a client library and shell that would run against an existing Postgres instance.

Right away, you would get adoption from people who have an existing Postgres, or who want to take advantage of SaaS offerings like AWS Aurora.

Longer term, I could imagine the client/shell being extended to support multiple backend DB dialects, even things like Spark or Redshift which you’d have a hard time modifying intrusively.

It could also be cool to explore interoperation with existing schemas written in plain SQL, so people could adopt it incrementally that way.


Thank you for your comment, it aligns pretty well with our current thinking about this.

> It could also be cool to explore interoperation with existing schemas written in plain SQL, so people could adopt it incrementally that way.

Yes, this will happen too.


The article criticized SQL because the following expressions are incompatible

> SELECT * FROM table

and

> SELECT count(*) FROM table

This is actually not true, as both return table values. It then says that, in EdgeQL, every expression results in a 'set'. This is a distinction without a difference.

I don't disagree we can do better, but this is the same.


It's only a slight misstatement; scalar queries are a subset of table queries in SQL, rather than different thing, but scalar queries are allowed in places where other table queries are not.

The difference in EdgeDB isn't really everything returns a set, but seems instead to be that everything consumes sets and not just scalars.


You are exactly right about everything consuming sets in EdgeDB. Even when a function is defined on scalars, it's really defined on singleton sets. Literals are also singleton sets, so "1" and "{1}" are equivalent and so are "foo(1)" and "foo({1})". Usually we omit the set braces for singleton values to reduce visual noise.


There is a difference. You are right in a sense that scalar expressions are a form of a table expression, but...

This is a valid expression:

  SELECT (SELECT count(*) FROM table) + 1
This is not:

  SELECT (SELECT * FROM table), 1
In EdgeQL:

  SELECT (count(SomeType) + 1)
and

  SELECT (SomeType, 1)
are equally valid.


The brackets make it look like they are both subqueries and contained within in the EdgeQL examples. As such, you could write the examples in SQL like this:

    SELECT count(*) + 1 FROM table
And

    SELECT *, 1 FROM table
Both of which are valid in SQL. And you'd get the same result as in the EdgeQL examples.

Indeed, a more relevant second counter example would be:

    SELECT (SELECT * FROM table) + 1
Not valid... unless `table` only had one row and one column containing a number.


Yes, and that's actually the point being made in the post. You need to rewrite your queries to make SQL happy.


The EdgeQL query has no obvious meaning, unless (,) calculates the cross product. Saying

SELECT Table1, 1

returns each row of Table1 along with 1 and that 1 returns the set {1} means that you have arbitrarily assigned each row of Table1 a row from the set {1}. This gives what you want in the case of a scalar, but what about a non-scalar.

For example, suppose Table1 contains (John, Smith), (Alice, Perkins), (Bob, Best). Then, what should

SELECT Table1, {1,2}

return? If you do not say cross product, you have arbitrarily assigned rows to one another, resulting in meaningless data. If you say cross product, then you have rewritten the sql expression

SELECT * FROM Table1, (VALUES (1), (2))

I don't disagree the SQL syntax is longer and has some unnecessary keywords, but unlike EdgeQL, the query means something particular.


In EdgeQL, expressions are element-wise functions over the cross-product of the input sets:

    edgedb> SELECT {'First', 'Second'} ++ '!';
    {'First!', 'Second!'}
EdgeQL queries are essentially set comprehensions.


Right, so SELECT Table1 and SELECT 1 compose to SELECT Table1, 1 in the same way that

SELECT 1

and SELECT * FROM TABLE1 and SELECT 1 compose to

> SELECT * FROM TABLE1, (SELECT 1)

Aside from the different syntax, it is the same kind of composition. Not sure what EdgeQL has gained here, then.


Erm, isn’t the syntax the whole point of this discussion?


Is this such a common problem that you need a new language to solve it?


We think that people struggling with writing good/advanced SQL queries and the existence of ORMs and other heavy frameworks to make RDBMSes "friendly" is a serious problem. And inventing a a new querying language is a necessary step to make relational databases more accessible.


Just to clarify (and I strongly dislike overuse of ORMs)...

Some people use a thing x

A subset of those people use a subset of x called y

This subset of people have built a tool to make y easier to express

Thus x is invalid

I think that's a really bad argument, if everyone was using ORMs then we'd all merrily forget about SQL and just use ORMs which would become the new "language for structuring queries", but we don't all, and EdgeDB in particular thinks that ORMs provide a much degraded expression set from SQL and thus want to tweak X so that it's as expressible as the tool to express Y but without making it just about Y... I think your goal boils down to shifting the trade-off point between X and Y - the extreme of which is writing a very expressive Y that also supports X but in a non-expressive manner.


> SELECT (SomeType, 1)

This cannot be the case if you call each thing a 'set'. Unless `(a, b)` calculates the cross product, there is no meaningful non-arbitrary way to assign each element of a to an element of b. That would depend on an ordering, which makes it not a set to everyone but a marketing dept somewhere.


`(a, b)` is a cross-product, just like "SELECT * FROM a, b"


They're saying that both of those expressions are valid, not equivalent.

SELECT count(star) FROM table

and

SELECT star FROM table

Are both valid, not equivalent.

Furthermore, listing any kind of set depends on some ordering, be it random.


> SELECT (SELECT * FROM table), 1

What would you expect to returned for by this query?


Yes.

I think many here miss totally the point. Is the same mindset that lead to believe C, bash or unix are good(?) then why bother?

I'm very lucky that I start with FoxPro, so I know what is to live with a MUCH better thing than SQL.

To the point:

- "SQL is standard, everywhere... why try to change or do something else?"

I hope none here work in new software products.

And also, try to replace ALL the OLD with the NEW is no the point, is to try to DO BETTER THAN THE OLD.

I suspect most of use are in that business... right?

- "SQL is a good language!"

Superficially, yes.. as good is javascript or C. More exactly to be the "default" option we are VERY lucky SQL is not alike C or JS. But "good" is not. Is adecuate.

IF you DON'T KNOW WHY then learn the why first. For us that LIVE in the RDBMS world is clear as water than SQL is not the ideal, is what we have.

----

Is important to understand that "SQL", Rdbms and the relational model are badly misunderstood (just look how the NoSql movement market itself!)

- SQL IS NOT THE RELATIONAL MODEL, is BASED on it.

The relational model is MORE expressive, powerful, flexible and EASIER than SQL.

- SQL IS NOT RDBMS

A much better APIs can be build on top a RDBMS but SQL is a poor language for that. This is part of why exist so many extensions.

Think, for example, why you can't do

    SELECT key FROM BTreeInde_CustomerCode
    GROUP BY (whatever -- And I mean GROUP, not SUMMARIZE!)
    let city = SELECT name
    let filter = WHERE id = @id
    let query = city + filter(id=1) 
and many many other things like that

Many things that get mixed with the "SQL" language are part of the implementation or capabilities of the RDBMS.

- However, SQL IS A FINE EXAMPLE IN HOW DO DATA

And in fact, is not that hard to imagine a much better version, SQL is not that far from a better option!


I worked as a Foxpro programmer for 18 years. I later switched to the Clojure language for 8 years. I used the functional programming language Clojure as SuperFoxpro, which is very easy to use and enjoyable. I has formed a new programming idea ---- `Everything is Foxpro (RMDB)`. ;-)

I advocate: Building a relational data model on top of hash-map to achieve a combination of NoSQL and RMDB advantages. This is actually a reverse implementation of posgtresql.

[Clojure is a functional programming language based on relational database theory](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)

[Everything is RMDB](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)

[Implement relational data model and programming based on hash-map (NoSQL)](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)


Excellent take. The functional model is not too far.


I didn’t really understand your post other than a high level point that you believe SQL is insufficient.

I’m not quite understanding the benefit of your example. I don’t personally want to write that style of syntax.


> you believe SQL is insufficient

I don't believe it, is insufficient, as a fact.

Compare

    let city = SELECT name
    let filter = WHERE id = @id
to

    let city = map name
    let filter = filter(id = id)
    let query = city |> filter 
this is to show that SQL, despite being a programming language, can't compose well.

The syntax is not the point.

You can build full apps in forth or lisp (as examples of even more minimal Langs than SQL) but now consider the idea of build a full app in SQL. Without other language making the rest of their limitations.


You’re asking me to consider the idea of building a full app in SQL. I would ask you - why?

SQL wasn’t designed nor ever meant to be able to write a full app. You’re expecting it to do something it wasn’t designed to do. Because it is a programming language doesn’t necessarily mean it ought to be able to solve any kind of problem.

I personally don’t think SQL is all that great, for various reasons (lack of standardization with various dialects and their own gaps or abilities).


> I would ask you - why?

Optimally, you'll want to build your full app in a single language, frontend, data queries, everything. That eliminates every problematic interface where you have to re-encode values, verify types and other invariants, connect different runtimes, abstract foreign behavior, etc. (Somehow the JS developers talking about isoiconicity only cite code reuse... And the Lisp ones focus on metaprograming.)

We have a variety of good application languages, so it's a safe bet that one can create a not horrible one; we also have a few standards for IPC and some very safe literature on how to integrate them with a language. Foreign interfaces are always iffy, but it's also settle how to create a not horrible one. The most risky part is data querying, so it makes sense to start with a proven implementation here.


> You’re asking me to consider the idea of building a full app in SQL. I would ask you - why?

"We Can Do Better Than SQL"

And the fact that some can't imagine to build a full app in SQL (not as is TODAY but as could BE) is sad.

I live in that world before. With the base/foxpro familia's you can totally build a full app with a database language, in fact, easier than with python.

---

The point here is that obviously, SQL is not mean for programming, but do one-off queries here and there. But it not exist a reason to aim higher.

BTW, part of the laments of the creators of the relational model is that SQL have ruined, like COBOL, the mind of millions of developers.


>”And the fact that some can’t imagine to build a full app in SQL is sad”

You are outlining some general purpose programming language that allows querying, information retrieval, maybe declaratively or functionally or using objects? I don’t know - you provide no other details.

How expressive would this language be? What abstractions would it provide out of the box? How expressive is it? Your examples allow some code reuse via naming/aliasing specific clauses... is that it?

Specifically what apps would it allow someone to build? And what trade offs result from those decisions? Is it better suited for mathematical operations or data science vs building general enterprise applications? There’s a difference. Is it interpreted like SQL? Would it need to be compiled?

You’re doing a lot of handwaving, and when pressed for details, you respond with a thinly veiled insult - the problem is with the reader or me specifically for not being able to read you mind. We should take you at face value without question.

Good luck with that attitude in your career.


> You are outlining some general purpose programming language that allows querying, information retrieval, maybe declaratively or functionally or using objects?

Yes. That is the mindset in see how "Do Better Than SQL".

> How expressive would this language be?....

As expressive as any other. By coincidence I'm building a relational lang as side project but my post was more about think in how SQL limitations can be lifted.

If all is just "sql is only for data retrieval" then is not much else to add than just spice the syntax, maybe.

> You’re doing a lot of handwaving, and when pressed for details, you respond with a thinly veiled insult

That is not my intention. When I said "Some can’t imagine to build a full app in SQL is sad" I mean in general. I think that that will sound bonkers as say "build an app with xml" or similar. But the point of this post is about how be beyond what sql is.

----

A bit more details, now that you ask for more. My idea is revive the spirit of the dbase family of languages. There, you don't need ORMs, build a database app is alike build any other and the language was powerful enough to not need any else for the majority of their users.

The main abstraction is the "relation" that look at data not as scalar or simply lists, but as SCHEMA + DATA:

    city = [name:str, state:str; "Miami", "Fl"]
                  schema             data
and ALL the values are relations:

    1 //is alike [value:i32; 1]
Is similar to kdb+:

http://www.timestored.com/kdb-guides/kdb-database-intro

and allow to work in vectorized form as in arrays languages:

    1 + 1 = 2 //internally: [1] + [1] = [2]
    [1, 2, 3] + 1 = [2, 3, 4]
but where arrays languages only have 1 "column" of data, relations are 2d.

The next power is add the relational operators:

https://en.wikipedia.org/wiki/Relational_algebra

And because all values are relations, everything can be "queried":

    [1, 2, 3] ?where #value == 1 //show [1]
    [1, 2, 3] ?union 4 ?join city.id ?sort #name
    for i in File.open("...") ?where empty(#line) == false do
etc. Is like LINQ? Yes. Where it differ from "just use LINQ, duh!" is that almost all languages are "scalar is first class, but collections not". Instead the relational model operate on sets/collections and all can be generalized for 0, 1 or many values.

All of this is without talking about storage. Is just a regular language with regular stuff. In my case I'm working in an interpreter, inmutable first, structural types. made in rust:

http://tablam.org

and my plan is use it for replace (embed) a lot of code made in several languages I have around and do data processing logic.

I wish to make it bigger, but that are my plans for now.

My case is for enterprise apps, because that is what I do more. But I don't see any reason to expand it. Implementation <> Language.


Agreed with the sibling. Large chance I have just missed your goal.

It may help to expand on your proposal. For example, what do you mean with group by, but not summarize? The group by clause says nothing about how to combine data not in the group. That has to be on the select. And I have yet to find a solution that beats ROLLUP for giving a great overview of data.

About the only thing that sucks on the current syntax is that it is not possible to get auto complete in the select, without having some from.


> What do you mean with group by/ ROLLUP

ROLLUP is not the same than groups. With

    [1:a 1:b 2:a]
Group by (as is normaly in other langs):

    [1:[a,b] 2:[a]]
Sql "group by" instead is for summary/tally of data:

    [1:2 2:1] //assuming count
ROLLUP is for make a nice report!

    [1:[a,b] Count:2;  2:[a] Count:1]
What SQL lacks (in this case), but the relational model not, is the ability to nest relations as easily as JSON.

----

Part of the deceptive power of SQL is that is very adequate and for "just query" of data is ok for most cases.

But as "programming language" it lacks some extra power and versatility.


> What SQL lacks (in this case), but the relational model not, is the ability to nest relations as easily as JSON.

SQL has nested relations since SQL:2003[1] (called multisets). They are not widley supported and frankly speaking I don't like them.

JSON was finally added with SQL:2016[2].

[1] https://sigmodrecord.org/publications/sigmodRecord/0403/E.Ji...

[2] https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016#js...


This feels like a misunderstanding of SQL. It is not a programming language. It is a data selection language.

And group by does work the same as other languages. However, you can't talk of the group by without the select. So, if you use count, or sum, or max, or ..., Yes it will be a summary. You can use string_agg or friends if you want all values. But... Probably safer to get each as a row.


"We Can Do Better Than SQL"

SQL is a programming language. Is just too limited and that is why many can't imagine to use it for a full project.

But that limitations have no sense. The amount of code, time and efficiency that bring is huge (I know, I code in that kind of lang before) and is discussed in:

http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...

In special look a point 5/6 of Summary. The irony is that we already have a "solution"(sql) to the whole problem, but is too limited and awkward to be used in practique.


I'm not sure why, but this post was hard for me to follow. The linked article is borderline incoherent to me.

What gets me, is even in languages where I'm not working with a relational database, the API for selecting data almost always drifts to something akin to SQL. To the point that I cringe if someone tries to force a new API.


> I'm very lucky that I start with FoxPro, so I know what is to live with a MUCH better thing than SQL.

For us who didn't use FoxPro, why was it MUCH better than SQL?


Because the poor formatting and example (making the code look too much as sql!), some nuance was lost. Consider instead:

    let keys = SELECT key FROM BTreeIndex_CustomerCode;

    let city = SELECT name;

    let groups = GROUP BY (whatever -- And I mean GROUP, not SUMMARIZE!);

    let filter = WHERE id = @id;
    let query = city + filter(id=1);

    for city in query GROUP BY keys
         ....


It's like SQL, but different. About as hard to learn, not fully developed and incompatible with years of tooling in the ecosystem.


This critique reminds me of Dart: A better JavaScript, that wasn't better-enough to be worth the transition.

I have the same concerns about the proposed language. Yes it's better, but is that enough? What's going to drive adoption?


Dart was better enough, as evidenced by the recent uptick of Typescript. Had it been released in 2019, I could envision a different story.


You sound skeptical about the daydreams of a bored CS graduate.


SQL is an incredible language. I don't think we can do better, only different. It's not that common to mix two programming languages like you mix SQL and application code. For example having your app code in JavaScript then write SQL queries to access the data feels unfashionable. Polling also seem like something from the 80's. RethinkDb was a step in the right direction, but I think the mistake was to invent another query language, instead of making the database querying seamlessly integrated to programming languages. It should be possible to write observables without having to context switch back and forth between data querying and app logic.

I'm a "full stack" developer/speaker, and for a long time I useed English on the front-end, German on the back-end, and Spanish for talking to the databaste. When I switched to English on the back-end I saw an productivity boost and I got better at and started to like English more. But I still talk Spanish with the database. I could probably be more efficient using English to talk to everything. I would probably still use French for design though.


This is the wrong direction.

Let's get the history in place first. The relational calculus set up a set of operators. That's what Codd worked with. It was combined with a natural language project at IBM which gave us the vaguely English like syntax of SQL.

> In EdgeDB, the data schema is formulated in a way that is much closer to the contemporary application data model. ...Unlike SQL, EdgeQL can easily extract arbitrary data trees.

This is a reinvention of the the hierarchical database, one of the architectures that was abandoned when the relational database came along. This was all fought out on a large scale in the 1980's.

The fundamental idea that you have a highly structured data tree is what the relational model threw away. "Data Oriented Design"[^1] does a pretty good job of explaining why, but roughly: we're accustomed to programming where we have an object/entity/value that represents a conceptual "thing" in our discourse. We compose things into other things, subtype things, and otherwise deal in things. And this turns out to be a disaster when lots of people need to extend and reuse hunks of the same database. If you give up on things as first class parts of your data model, and only talk about relations among attributes, a lot of those problems go away.

If you want to improve SQL, look at going back to the relational calculus without the natural language veneer. There have been a number of systems that tried that, though none have gotten real world traction. I'm also blanking on their names at the moment, so if someone who's memory is working better can help me, I would appreciate it.

[^1]: http://www.dataorienteddesign.com/dodmain/


> This is a reinvention of the the hierarchical database

It's not. Just because data can be fetched as a set of "tree-like" things does not make the model hierarchical in any way. But it makes lots of practical use cases easier.

> The fundamental idea that you have a highly structured data tree is what the relational model threw away.

And then ORMs came and reinstated the status quo. Truth is, many applications actually benefit from a highly structured data graph. That said, you don't have to use the "highly structured" bits of EdgeDB, it's perfectly fine if all your types are simple tuples.

> We compose things into other things, subtype things, and otherwise deal in things. And this turns out to be a disaster when lots of people need to extend and reuse hunks of the same database.

I would be very interested in reading about this. Do you have any references? Most cases of "disaster" in my experience had to do with extremely poorly defined and documented schemas that had barely any relation to the actual business model.


I always wished SQL had a better handle on sum types. If I have a user whose favorite story is an instance of Either[Movie, Book], then it's already a pain to deal with in a nice simple way. And that's as simple as sum types get.


You can do that in EdgeDB:

  type Movie { 
    property director -> str
  };
  type Book {
    property author -> str
  };

  type User {
      multi link favorites -> Movie | Book
  };

  SELECT User {
    favorites: {
        [IS Movie].director,
        [IS Book].author,
    }
  };


Can you tell me whether I'm understanding this correctly?

Would this query result in, e.g. [(director: Null, author: J.K Rowling), (director: Spielberg, Null), ...] or would it be [author: J.K. Rowling, director: Spielberg, ...] or just plain strings: [J.K. Rowling, Spielberg, ...]? I still don't totally get the model here.


If fetched as JSON: {"favories": [{"director": null, "author": "J.K Rowling"}, {"director": "Spielberg", "author": null}, ...]}

You can also query the actual object type by referring to the __type__ link:

  SELECT User {
    favorites: {
        __type__,
        [IS Movie].director,
        [IS Book].author,
    }
  };


Thanks for the clarification! Seems really cool :)


This struck me just yesterday. You're left with using nulls (which is a sin punishable by poor data quality) or multiple tables for each sum type. Annoying.


Tables should be lightweight. It's the cruft the language and implementation that makes you avoid them.


That is an acute observation and something I might consider dumping SQL for, or rather, I feel SQL could be upgraded to include


It is interesting to see new approaches. But SQL is so entrenched, so available and so good enough.. it will be a very uphill journey to compete


Many comments (probably correctly) point out that SQL can't be displaced because it is ubiquitous and "good enough".

I wonder if there is room for a "TypeScript" of SQL that would allow developers to opt-in to whatever new language features or paradigms we feel SQL is missing.

It would then transpile down to regular SQL to be executed.


I was wondering the same thing. Why not rather than build a new database engine, write a library like an ORM that transpiles a language - like EdgeQL - to SQL, so you can attempt to use it, without switching underlying database engine, and still use your old SQL code?


EdgeDB is built on Postgres, it's not from scratch.


My Mac app, Strukt, does a bit of this. The main problem is that the basics of the various databases don't align very well. The functionality that's common to all databases is quite weak. Even such things as case sensitivity are drastically different. And for many of the differences, there's simply no way to emulate the other's functionality, correctly and efficiently.

C compiles to assembly or machine code, and TypeScript compiles to JavaScript, and in both cases you have the full power of the level you're targeting, so it's fine. An RDBMS has very specific fast-paths that you really need to hit, or the whole exercise becomes pointless. It's like trying to write a 3D game in JavaScript without any way to access the GPU or even SIMD ops. We can do 3D games in JS today, but that's really only feasible because the browser vendors went and exposed the basic fast paths.


Yeah then we get the best of both worlds. I might use that if it existed. All else being equal, EdgeQL the language looks great compared to SQL.


Isn't this what an ORM is? Or at least an ORM covers some new features that users can opt-in to without losing the ability to write raw SQL. I guess it's not the perfect analogy to TypeScript, but I think it's a pretty good comparison. The other thing I can think of are things like GraphQL which can be used as an abstraction on top of a SQL DB.


Some ORMs like Hibernate have their own QL, but they typically preserve the SQL data model.

This isn't just replacing SQL, they're trying to implement a new data model that is backed by Postgres.


The demise of SQL has been greatly exaggerated. If somebody wants to actually propose something better, the threshold is high. They should publish:

- A tool which transpiles their new language into (multiple versions of) SQLite, MySQL, PostgreSQL - This tool should always produce a correct result (Modernizr) or a specific error (e.g. upgrade MySQL to XX+ to use `NEW_LANGUAGE_CODE_SNIPPET` JSON functions) - A separate tool that up-piles SQLite, MySQL, PostgreSQL to the new language, possibly exploiting new streamlined expressions - An editor with IntelliSense for the new language

Even then you can probably achieve a lot value just by inputting table joins before the developer starts editing code.

---

If you're not doing that then you are just keyword spamming by mentioning SQL.


>The questions we often hear are “Why create a new query language?” and “What’s wrong with SQL?”. This post contains answers to both.

Ah, I can already tell I'm not going to like this article.

>lack of proper orthogonality — SQL is hard to compose;

Their reasoning: "The difference in structure is large enough to make any sort of source-level query reuse impractical."

This is blatantly false. The second example in the article using the join covers both the single-row and multi-row result cases. Additionally, SQL can be made very reusable and modular using tools like DBT, or within a DB using views or other constructs.

>lack of compactness — SQL is a large language;

It's certainly smaller in the keyword set than almost any other computer language I have used. This is like saying "It's a very cold day in the Sahara desert" when the temperature is 40C/105F.

>lack of consistency — SQL is inconsistent in syntax and semantics;

Is it? Or do people take liberties with their implementations? There is an SQL Standard and Postgres is compliant to it. Other databases not being fully ANSI SQL compliant is the fault of the DB creators, not the fault of SQL.

>poor system cohesion — SQL does not integrate well enough with application languages and protocols.

It is a relational language designed to be used within a relational database. Do the authors also complain that their screwdrivers are very inefficient at cutting meat?

They're onto something with the trickiness of NULL, but there have already been tons and tons of discussions on Hacker News - anyone with database experience, either analytical or transaction, will tell you that imagining a NULL-free world actually creates more problems then it solves.

I found this to be a highly misinformed article, personally; really it's a marketing ploy for their proprietary language. Just like most other "we have our own query language better than SQL" product, it probably isn't.


Thanks for the DBT tip. I've always wanted composable SQL statements but never known how to achieve it.

https://docs.getdbt.com/docs#section-what-makes-dbt-so-power...


We've started using DBT at my office for ETL into our data warehouse and it's a godsend. Being able to add tests to tables and views, built in documentation support, built in DAGs are all great, and the folks at Fishtown who make dbt are super responsive on their community slack channel. (Also it's open source.)

One pain point is the integration of non-SQL (e.g. Python) jobs into the flow. DBT only manages SQL scripts, so you can lose some of the elegance when you have to glue some external jobs to it. That being said, the development is pretty rapid and as I mentioned the support is great, so I'm excited to see where they go.


Agreed on most points, but not the keywords. SQL has hundreds of keywords.


Ah, you're correct. From the querying side, there's a couple dozen that you need to know. Different personas would only need to know a small subset but it's certainly true the list of keywords itself is large.


No SQL is not a relational language.

Which is precisely why "better" is both possible and desirable.

And which is also why either of forward and backward "compatibility", to the fullest 100% extent, is intrinsically and inherently impossible.


This article misses the single biggest failing of SQL, in my mind: that it's not a language. It's a family of incompatible languages, with similar syntax -- like "Lisp".

I can't write a program that works against 'any database', because switching databases means a lot of extra custom work. That means databases can't easily compete against each other. (Oracle's license terms sure don't help.) That's terrible. It's worse than x86/PPC/ARM -- at least with CPUs, a compiler can generate machine code for any of them, coming from the same source code.


Your point seems a little like tilting at windmills to me. Different databases written by different entities who didn't have a huge incentive to strictly conform to each others syntax varied in their implementations. This is just life.

> It's worse than x86/PPC/ARM

You're... saying that there should only be one CPU architecture too?


> This is just life.

Market forces naturally explain what we have today, therefore ... I'm Don Quixote. Sure, I'll accept the title. Likewise, I could say: if you don't like reading complaints like this, blog posts and internet comments might not be for you. :-)

> You're... saying that there should only be one CPU architecture too?

No. I'm saying that the analogous situation with CPU architectures was observed 50 years ago, and despite causing inefficiencies (especially at first), we've spent the effort to create abstractions to gradually make it easier to describe systems at a higher level. It generally doesn't matter, to either users or developers, that there are multiple CPU architectures. We still have many programming languages, too.

Databases have gone in the opposite direction: while adding features over the years, even when these features are 99.9% functionally identical between vendors, they've picked different syntax and data types. Even though they all speak "SQL", databases are more incompatible today than ever.

I guess what I'm getting at is: it's a good thing that compilers were invented before all software needed an ROI, and it's a shame databases didn't get the same treatment before corporate interests took over.


Exactly. The situation is even worse because many libraries support only "common ground of SQL" and therefore underutilize the real capabilities of the database.

We mentioned that there are "many SQLs" in this blog post, but not in this specific context.


I like a SQL a lot, but I wouldn't mind a nicer interface to a relational database. TutorialD seems more elegant and expressive than EdgeQL though. If I were designing a new query language, that's where I would look for inspiration. Anyway I'm glad to see people experimenting!

I am waiting for someone to write a paper titled "SQL NULL is (nearly) a Monad." It is "contagious" just like mapping over None. Sometimes I think it'd be cool to add Maybe<T> columns to Postgres, where operators would do the right thing for Some(x) vs None, but then I think, "Wait, that's how it already works!" At least it is very close. But one way or another I would love to see a re-assessment of SQL NULL from the perspective of category theory. If we could redo NULLs as Maybe types, what would that improve? Incidentally, Leonid Lybkin published a really cool paper in 2014 about handling NULLs with category-theory ideas, but it is more about drawing inferences from incomplete premises ("Incomplete data: what went wrong and how to fix it"). Maybe he'll write something about Maybe column types too. :-)

Also if you are interested in improving relational databases, Codd's 1979 paper "Extending the Database Relational Model to Capture More Meaning" is super interesting. The first part is where he adds NULLs (in order to add OUTER JOINs basically), but the second part is almost never mentioned and gives a way to query the schema itself as part of a database query. It looks a lot like doing graph database queries from your relational db. Also a better kind of EAV pattern. Also more support for OO-style inheritance. Somehow 40 years later it is still way more advanced than any existing RDBMS. It would totally break the parse-plan-execute pipeline of today's systems, but it is fascinating to think about. I wish more people would read this paper before trying to go "beyond relational"!


Check out http://categoricaldata.net for a categorical approach to database theory that uses a labeled null semantics from type theory.


Actually quite a lot of people told us they thought EdgeQL was inspired by Tutorial D :) Have you had a chance to actually try to play with EdegQL?


Not sure if there is something wrong in my brain but I have not been able to understand the hate against NULL. To me the three value logic feels very natural and just clicks. Lately I have been dipping my toes into Alteryx, which treats empty values in a way that definitely does not correspond to 'normal' NULL logic in databases and it feels very much like a constraint on building my workflows.


If a NULL were just a value in 3-valued logic, it could have been OK. However, that's not how it works out. Consider the following (that should be equivalent if NULL is just like Maybe in True-Maybe-False logic):

  SELECT null AND true;

  SELECT bool_and(column1) FROM (VALUES (null::bool),(true)) AS foo;
In PostgreSQL the first query produces NULL, while the second produces TRUE. Yet, it's also possible to get NULL as a result of aggregating values with bool_and:

  SELECT bool_and(column1) FROM (VALUES (null::bool),(null::bool)) AS foo;
And that's not how a 3-valued logic works.


You are mixing two concepts here: 3VL and how most aggregates work in SQL: the drop NULL values before doing their work. That's why the first BOOL_AND example only sees one value, thus returning true.


The one small thing I've often wished for is to place the from clause before the select clause. Having to type out the columns before the table means auto completion never works well.


I said this before and I will say it again: SQL is for data processing what IP is for networks: it's the neck of the hourglass.

There are many things happening below SQL layer (storage engines, implementations, hardware etc) as well as above it (applications, reporting engines, various services and whatnot). There are many things happening below IP layer (hardware, protocols) as well as above it (TCP, applications).

Even though there are better ways to build a neck of an hourglass, my bet is that (1) they are not much better and (2) it would be extremely difficult to get them replaced in all the existing hourglasses without breaking things terribly.

Conclusion: SQL is going to stay forever, like it or not.


Open source (Apache 2.0), and implemented on top of Postgres. This is quite interesting. I love SQL, but there's definitely room for improvement.


The critiques of SQL arent' wrong. I have written 4 parsers and can tell you the lack of orthogonality is a pain. Luckily we have the NIST compatibility guidelines to keep us safe. A replacement of SQL which contains the same compatibility tests would sell me on a new language, but it's really hard


We can do better. I use sql a lot though I’m not great at it. This seems much better.

The problem is getting a replacement on enough databases (storage engines?) so it’s universal. And performant. I’m not sure what’s involved with that but since there are a lot of open source dbs it seems possible.

Nice work


There have been tens of attempts over the years - a Dutch one called Xplain was quite good IIRC. But they all suffer from delivering too little extra (if fully adopted) against the incumbent.

Unfortunately, SQL is good enough for the majority of users and uses.


Saying you can do better than SQL is an extraordinarily bold claim, considering SQL is one of the most successful programming languages in human history and certainly the most successful fourth-generation programming language. Wide swaths of people you might ordinarily think of as non-programmers know how to use SQL in meaningful ways.

Doing better is certainly possible, but SQL is much better than a lot of people give it credit for.


In my opinion the example of the first chapter ("Lack of Orthogonality") is wrong. The subquery (is it called "inline subquery?)...

> SELECT name FROM emp WHERE role = 'dept head' AND deptno = dept.no

...should in my opinion definitely return only 1 row for each department - from a logical point of view returning multiple rows would mean that the data is corrupt upstream or that the organization itself is corrupt or that there is a lack of attributes in the DB (if no additional selection criteria like for example "management" or "operations" or "deputee" etc... can be added to the query - meaning that no sub/organization can have more than 1 person responsible for the exact same thing).

I admit that this is a very focused critic and that I'm very happy with the current behaviour of the generic SQL language and its special cases which are linked to the DB being used (using currently Oracle, MariaDB, Clickhouse - used DB2, Kudu through Cloudera stack, PostgreSQL, maybe something else) and how it stores/processes the data etc... .


> In situations where EdgeDB cannot infer the type of an empty set [2]

Why not bottom, or are containers not covariant? I'm worried that we're not clear about the difference between variables and values.

> Strictly speaking, EdgeQL sets are multisets [1]

Like nulls, this is another broken aspects of SQL.

Maybe you can address these concerns, but I'm skeptical that it's relational:

* It conflates types with relation variables

* It's using links rather than foreign key constraints

* It doesn't seem like the user can even specify a candidate key

* Which is why the "id" field is magical and special

* Constraints only seem to apply to scalars

The reason SQL has been successful is that tables are dead simple to understand, and you're throwing that out. This feels more like a graph database.

[1]: https://edgedb.com/docs/edgeql/overview#type-system [2]: https://edgedb.com/docs/edgeql/expressions/overview#set-cons...


>> Why not bottom, or are containers not covariant? I'm worried that we're not clear about the difference between variables and values.

Yes, `{}` is inferred as `anytype`, but we chose to restrict its use in output for interoperability reasons: the output of a query must be of a concrete, known type.

>> Strictly speaking, EdgeQL sets are multisets [1] > Like nulls, this is another broken aspects of SQL.

From the purity standpoint, yes, but in practice, duplicate values are either expected, or too expensive to eliminate. That said there is a `DISTINCT` [1] aggregate.

> * It conflates types with relation variables

Not exactly. We do use the same symbol to denote a type, and a relation that it represents, but the context of which is which is always clear (i.e. Foo in a cast expression is always a type), but Foo in a regular expression is always a relvar.

> * It's using links rather than foreign key constraints

Links are an abstraction over a foreign key. In the vast majority of cases foreign keys are implemented as 'some_id' -> 'id' anyway. If you want a custom foreign key, you can set an expression constraint on a property or a link.

> * It doesn't seem like the user can even specify a candidate key

type Foo { constraint exclusive on ((.name, .last_name)) }

> * Which is why the "id" field is magical and special

See RM proscription 15 in the Third Manifesto [2]

> * Constraints only seem to apply to scalars

They aren't, see above.

[1] https://edgedb.com/docs/edgeql/funcops/set#operator::DISTINC... [2] https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf


Thanks for the clarifications!


>> We Can Do Better Than SQL (edgedb.com)

No you can't. Many companies have tried, including very large and successful companies and have failed miserably. Just try out the JSON query language used by MongoDB, including their aggregation pipelines, and you will absolutely love SQL.


"We Can Do Better Than SQL", sure, but how much better?

Fundamentally, SQL is a declarative language for databases, that makes the easy stuff easy, and the hard stuff doable.

If people want to build new front ends for SQL databases, to make it easier to build apps, I can buy that.

But trying to replace SQL as the standard declarative language for data is a fool's errand. You'd have better luck getting developers to switch away from git. People don't use SQL databases because they're fun for developers to hack on, they use them because they solve a business need.

If you want to innovate in the database space, I think you need to provide something that benefits end users (global availability, auto scaling, speed, client syncing ala Firebase, ...).


Git is severely suboptimal, and I would hope we develop a better alternative in the near future. While the underlying model is sound, the interface is confusing and deeply inconsistent. Much like RA vs SQL.


> makes the easy stuff easy, and the hard stuff doable

I like SQL, but I disagree. It also makes easy stuff hard.

Given many numbers (billions), how would you find the 1st, 3rd and 5th highest values?


Maybe I have done too much SQL, but for me it is trivial and easier to do than in most other languages. PostgreSQL will execute the query below in O(n) if there is no index and O(1) if there is an index on num.

Does any other language implement this in a better way? While also still giving a O(n) time complexity and O(1) space? I may have to implement my own top-n heapsort then, or my own top-n insertion sort.

    SELECT * FROM (
      SELECT
        row_number() AS n, num
      FROM t
      ORDER BY num DESC
      LIMIT 5
    ) top5 WHERE n IN (1, 3, 5);
This is assuming we do not care about ties. If we care about ties it gets a bit messier but not that bad.


In other languages, this task does not require a sort. It's just a for loop. The fact that you need a nested table and a sort illustrates my point about SQL making some easy problems hard (maybe harder is more accurate).


The SQL to do this is simple and straightforward using analytic functions (specifically nth_value); the performance is likely to suck hard if you don't have an appropriate index, but that's not an SQL problem, but a “sorting billions of numbers is expensive” problem.

  SELECT 
    nth_value(num, 1) OVER ORDER BY num, 
    nth_value(num, 3) OVER ORDER BY num, 
    nth_value(num, 5) OVER ORDER BY num
  FROM t


nth_value can be done in a faster way than sorting and picking. It can be done in O(n) while sorting is O(n log n).


Which is why PostgreSQL uses heap sort with a fixed max heap size for sort with a small limit (called "top-N heapsort" when running explain analyze). Then the complexity for getting the kth value is O(n log k) which is O(n).


Using an index for access path and ranking for selection


Amazing how our history of programming has so many things putting layers over SQL. I think mainly because the bridge between SQL and General Programming is often not too pretty ( string based queries with no validation ). But in my experience, it's often better to just embrace SQL. I won't argue the syntax is nice, but it is conceptually nice. Usually the further away you get from queries on relational databases, the more often you do things very inefficiently from a database point of view, to the point of being orders of magnitude worse.


The {} construct seems awfully close to NULL. Not sure I understand the fundamental difference -- having a third value in boolean contexts still introduces 3VL.

Why not just use an Option/Maybe type?


Empty set differs from NULL is that you get an empty set if you apply an element-wise function over it (which most operators and functions in EdgeQL are).

> Why not just use an Option/Maybe type?

We are considering adding algebraic types and a syntax to match them, but it ultimately boils down to taste and use case:

  SELECT value ?? "fallback in case of empty"
is not fundamentally different from (hypothetical):

  SELECT 
    MATCH value
    CASE Some(non_empty_val) THEN non_empty_val
    CASE Empty THEN "fallback in case of empty"
    END;


"Empty set differs from NULL is that you get an empty set if you apply an element-wise function over it"

It still sounds like NULL -- I must be missing something.


Take the CASE WHEN example from the blog. An equivalent EdgeQL expression is

  SELECT 'one' IF value = 1 ELSE 'not one'
If value is an empty set, then the result is _always_ an empty set, unlike SQL that pretends NULL values are actually boolean for the purposes of the condition. EdgeQL expressions are, essentially, set comprehensions, the above is equivalent to this Python expression:

  {'one' if v = 1 else 'not one' for v in value}


I acknowledge the problems with SQL that the author stated, but I still prefer SQL over the alternative presented.

What would be other alternatives to a declarative language like SQL? Prolog?


Datalog indeed is a thing: https://en.wikipedia.org/wiki/Datalog


Any declarative language that does not commit the violations of the relational model that SQL does commit.

Date&Darwen formally pinned down what such a language has to look like in their three subsequent editions of "the third manifesto" (first edition roundabouts 1995). There are at least two fully operational systems that comply.


Why not Datalog?

It would seem to meet all your constraints, and is well-studied.


With due respect, Datalog is a very difficult language to learn, and it just doesn't have the support and resources SQL does. We were forced to use it in a Data Cleaning course a few years ago because the professor was big on it, and everyone struggled. To this day, fellow students complain about that part of the course. No one ever looked at it again after that class.

Datalog will ALWAYS be a niche language, because it is designed for a certain type of computational mind. The vast majority will want to learn SQL due to its ease despite any shortcomings it may have.


I am not familiar with Datalog but if it is anything like Prolog then it should be extremely easy compared to SQL. Right now if I want to make any query that is more complex than Select x from table in SQL I will have to make multiple google searches and look at multiple resources that explain the unintuitive SQL syntax. Meanwhile if I want to do something more complex in prolog it will be trivial. For example if I want to do an inner join in prolog I can just do something like: table1(X, FirstName, LastName), table2(OrderId, X, OrderDate, OrderCost).


I have heard nothing but good things about learndatalogtoday.com

Datalog suffers from some of the same problems that SQL does (different dialects). It does seem to me that Datomic and Datascript have the most intuitive syntax.


I think as a user if you have to think about the computational aspects of a query you're doing something wrong.

From a declarative sense datalog is much more concise than SQL. Depending on how you want to think about it, it also subsumes sparql.

The datalog compilers we're working on in our group create very efficient code and the evaluation strategy sometimes is far from what you would think of by looking at the query.

But since the result is defined as the minimal herbrandt model/fixpoint of the Tp operator you don't think about computation because you don't have to.


Is the difficulty you had due to it being a functional language? Or something more intrinsic to the language itself?


Much of it was writing code akin to recursive SQL (which was touted as one of the things it was way better than SQL at). Just never could understand it, and it took hours for most to get simple recursive models like connecting family trees. That language (like many) has a lot of work to do in terms of support and tutorials in order to build reasonable interest for it.


I think the problem here is is institutional--both our teaching institutions and normal industrial practices. Objective studies have shown repeatedly that functional programming models (which emphasize recursion, a stumbling block you mention) are easier for students with no prior programming knowledge to pick up and use effectively. It also tends to make better programmers, more quickly. There's a reason MIT taught Lisp/Scheme in its introductory computer science course for so many decades.

However (1) once you've ingrained all the counter-proactive habits of thinking and worked past the stumbling blocks that imperative and object-oriented programming models present, the conceptual jump from that to SQL is smaller; and (2) good, in-depth introductory materials don't materialize out of nowhere without demand. Your instructor was probably trying to do something good, both by introducing you to data processing in a language with fewer syntactical hurdles and in year-by-year incrementally improving the quality of introductory material by using it in instruction.

Datalog allows you to express data relationships in a more straight forward, more compact, and easier to refactor way free of most boilerplate. It does, however, require you think about what you want to accomplish abstractly, rather than as an imperative process, which is difficult merely because of the years of experience the typical student already has in unergonomic languages like C++, Java, etc.


Category theory can do better than relational database theory: the open-source categorical query language CQL extends SQL with generalizations of abstractions such as schema mappings and queries while supporting automated theorem proving and fixing foundational semantic issues such as null handling.

http://categoricaldata.net


Thanks for linking to this. I think I've been trying to build a poor mans version of it in scala without realizing it...


I'm fine with SQL, but i would also like there to be a more explicit language that requires you to invoke the indexes when querying explicitly (instead of implicitly like SQL). If you attempted to use an index that didn't exist the query would simply fail (instead of just run slowly).


Some dialects allow you to hint to what index to use.


Before diving into ways that SQL could improve I'd like to give some thanks to a real workhorse that has proved useful over decades, which is an incredibly long time in tech.

Could it be better? Sure, but author's proposal doesn't solve where I usually have problems. What are my pain points and what would I like to see instead?

1) One giant statement. Personally I really like Hadley Wickham's dplyr [1,2] (think "data pliers") which has a SQL like notion of joining different tables and selecting values but separates the filter, mutate and summarise verbs as separate steps in a pipeline rather than one huge statement. For transactions dplyr would have to add an update verb as well.

2) Hard to test, especially for more complex ETL. dplyr approach highlights that a lot of SQL these days is being used in ETL applications in addition to the usual retrieval, transactions and reporting. Being able to express as a pipeline of operations is easier for me to understand as execution is conceptually consecutive and I can unit test individual parts as part of a normal programming language environment.

3) My data isn't all tabular. Better support and semantics for non-scalar entries where value is a record itself like in json, BigQuery, Hive, Presto, etc.

4) Not that extendible. Better support for user defined operations (UDFs). More and more frequently I want to apply some non-trivial operation to data, e.g. run a machine learning model and it makes sense to do that as close as possible to the data usually. It is possible to do a fair bit in SQL itself with window functions but it is generally painful. You can point Hive at your jar and run a UDF but it is also painful to integrate and debug in my experience.

[1] https://cran.r-project.org/web/packages/dplyr/vignettes/dply... [2] https://datacarpentry.org/R-genomics/04-dplyr.html


We can do so much better! The problem really is, even if you have a nicer more expressive language that cross-compiles to SQL, it's really easy to end up with things that are impossible write in some databases. Even though SQL is based on Math, so it should "just work", the syntax is badly designed, and it is tied up with database consistency models, so it is easy to write something that should be possible but isn't. We can probably make much better compromises in a different language, but it is unclear how we could efficiently do backwards compatibility given all the existing databases and database clients.


From all the languages I learned since high school, SQL is the one that is pretty much still the same. Agreed, it's not perfect but I think it's one of the most stable things there are. New features get added, which creates inconsistencies between dialects. But if you stick to basics, it's all still very much the same. In 2000s XML was added, then later JSON and there will be future support for other formats also to integrate new data types.


The end of that headline is "but it's not really worth the effort and will likely not succeed".

Sure, SQL is not the most elegant language. But it works. It does what we need it to. Millions of developers knows it, countless tools speak it.

So I expect EdgeQL to go over like a lead balloon. Unless you can convince one of the big existing SQL servers to adopt it, almost no one will ever get the chance to use it, and it'll be a niche language for a niche database.


As someone with experience in using PostgreSQL, but who found that it would fall short for a current project due to among other things the problems you refer to as “poor system cohesion”, I am intrigued by EdgeQL.

Something I wonder though; can materialized views be created with EdgeQL? And if so, does updating materialized views still require fully rebuilding them, as PostgreSQL does, or can they be updated incrementally?


Alas, we don't support matviews yet, but it's on the TODO list.


I fundamentally disagree with this article. SQL is arguably the most important language in a developers toolbox. The attempts to isolate and replace it's complexity through ORMs and nosql implementations has never resulted in a viable replacement.

Like mathematical syntax fits mathematical logic. So SQL matches data retrieval. Let's stop trying to replace it just because it's old. SQL is amazing.


I'm not sure you read the article? The author lays out a number of ways in which SQL itself does not match the relational data model well, and gets in the way of doing data retrieval well.

His criticisms are nothing new. Relational "gurus" Date, Codd, Darwin, Fabian Pascal all made similar criticisms of SQL over the last 30 years.


EdgeQL looks really interesting - I'm glad I read through the article.

I think the issue - for me - with articles like this is the tendency to take the stance of "${EXISTING_TECH} sucks, let's drop it for this new thing"

I would have found this way more compelling if this line was the very first in the article:

> The relational model is still the most generally applicable and effective method of representing data.


I always have negative opinion on SQL but I can bear with it by just using a subset. Most of complex stuff are handled in application rather than SQL. Just use it as a simple data store.

I think Datalog is a better alternative. However, for SQL or DBMS is a field with more powerful lock-in than other fields. Or Oracle wouldn't live so long so well without significant improvement over these years.


Datalog appears to address a lot of these concerns


I’ve been arguing relational databases are an operational anti-pattern for a few years.

Your architecture shouldn’t start with an implementation choice. It should start with business modeling and drive down to tech choices, which may be one of many types of serialization/data storage technology.

Adopters of Domain-Driven Design have already moved past this argument.


I'm disappointed they left QBE (https://en.wikipedia.org/wiki/Query_by_Example) out of their history of SQL and relations. SQL vs QBE is one of the great examples of concise technology beating over-wrought UX.


Having worked with databases for a while, SQL seems to be useful because it forces you to think about how your data is structured. To me SQL is a thin wrapper around the relational algebra notation. The biggest problem I run into with SQL is that it is hard to tell how performant a complex query is before actually running it.


Not a great interface, but estimated query plans can be pretty useful for getting a feel for how heavy a query will be.


I'm confused by this:

> In EdgeQL, sets are flat, i.e. a set (including an empty one) cannot be an element of another set

It seems like in the last example, they're returning a set (of movies) that contains other sets (directors, cast, reviews). Am I misunderstanding? I guess it would be helpful to see what the output looks like for these examples.


There are good examples in the tutorial: https://edgedb.com/docs/tutorial/queries#ref-tutorial-querie...

The last query is still returning a flat set of Movie objects. The shape selector describes _related_ data that is also needed to be fetched by the query. So, the query actually returns multiple sets that can be reconstructed by the client into a graph of objects (or serialized into JSON).


You know what would be amazing to have for DB engines? Auto-optimization. For example, something that automatically determines which queries are slow and which indexes should be created to speed them up. Basically self-tuning. Not sure if this could be something for a machine learning model to do.


I like the groupwise maximum query...I always have to look that one up if I haven't done one in a while.


>> The NoSQL movement was born, in part, out of the frustration with the perceived stagnation and inadequacy of SQL databases.

Maybe. I would suggest tight budgets and lack of talented DBAs are the two primary reasons NoSQL became a thing. "We can make do without a SQL DBA."

No. You can't. IMNSHO


My real issue with SQL is how verbose it is. The amount of relative importance to the number of characters in SQL is often pretty poor (for example, when doing multi-table joins). Having to spend a few minutes deciphering each SQL query is a real pain


I like the fact that this is built on Postgres. Will be curious to give it a try on the side.


Since Date’s criticism of SQL is prominently referenced, the natural question is “Is EdgeDB a D, and, if not, why not?” (D referring to the class of relational systems for which Date and Darwen provided criteria.)


EdgeDB is not quite D if you read all the proscriptions to the letter, but it's very close :-)


We can do better than saying everything always needs to be "better".


That's a pass for me for now. I think it's current state is fairly easy to pick up. Should this somehow be highly a adopted and truly show a benefit over the standard, I still might pass :)



I happen to like SQL and found it pretty easy to learn. Tons of resources out there. And now with Postgres ad MySQL adding JSON storage we're getting all the benefits of NoSQL in SQL.


Sidenote, has anyone seen how to actually talk to EdgeDB languages without a Client? As it seems currently only Python has a client.

(edit: I'm specifically wondering about Rust, if it matters)


JavaScript and Go clients are coming soon. Java/Rust will follow.


There's also HTTP/JSON support.


Hah, that should have been in the first answer. With no Rust client, I almost ignored the DB until there was a Rust client. If it has full HTTP/JSON support then I could still use it.

Any links to the HTTP-JSON API documentation? I've not found it thus far. (edit: I hope it's a fully featured API, and not the GraphAPI haha)


Very exciting project. Not a very good name, as it conflates with the existing term, "Edge Computing", making one wonder if they're somehow related.


SQL has been great. Relational data is greater.

I absolutely hate working with graphQL, and document stores are a non-starter for projects with any decent amount of data.


Good luck with that.


How about APL? (0=2|x)/x←⍳20

Visit rosettacode.org and try looking up some solutions in SQL and then reference the APL variant. It's night and day.


How do I test for empty set? If I understand correctly then "SELECT * FROM Movies WHERE description = {};" will not work, right?



Seriously, were talking about a relational system where expressing a tree is super difficult. Why do we put up with this???


Academically, EdgeDB is very interesting, but practically, it seems to be solving a problem that does not exist. SQL is great for simple set operations and will lose its efficiency as you move towards more advanced features. The industry has accepted SQLs limitations and created new ways to interface with data, such as UDFs, pig, and pyspark. Those widely accepted alternatives can save you from doing something in SQL that you really shouldn't be doing in the first place.


> any element-wise operation on an empty set is, likewise, an empty set

Does this mean that "SELECT 1 + {}" gives "{}"?


Correct.


Assume I have a table "Actors" with a column "age" and for some of the records the age is not set (an empty set). Does this mean that "SELECT SUM(age) FROM Actors;" gives "{}" or do you implement a special logic for empty-set summation when used in connection with aggregation (like SQL does)?


Aggregate functions in EdgeDB have an "initial value", which, for `sum()` is defined as zero. Other aggregates, like `avg()` are not defined for empty sets (you cannot divide by 0), so an error is thrown in this case.


I am still a bit unclear about this. Assume we have three actor records where two records have age=30 and for the remaining one the age is not set. From what I understand then "SELECT sum(age) FROM Actor;" returns "60" while "SELECT 30 + 30 + {}" returns "{}". This appears to be an inconsistent handling of empty sets (thought it would be the same as in SQL).


The difference is that `+` is defined as a strict function (returns empty on empty input): plus(a, b), whereas sum() is an aggregate that is specifically defined as 0 on empty input.


The sum of an empty set is, in fact, 0 (the identity for addition).

The generalized conjuction (we have a function called "all" for that) of an empty set is True (the identity for conjunction).

The generalized disjuction (we have a function called "any" for that) of an empty set is False (the identity for disjunction).

All of the above "sum", "all", and "any" are basically aggregate functions that operate on sets as a whole.

There is no special logic that you wouldn't get from considering these operations generalized for a set.


Thanks, that was helpful. However, I still think that having "sum(1,1,{})" returning "2" and "1+1+{}" returning "{}" can be viewed as somewhat inconsistent.


Let me make a tiny correction to the expression you wrote:

"sum({1, 1, {}})" - the function sum takes only one argument and it's a set. Because we flatten all "nested" sets, the expression "{1, 1, {}}" is equivalent to "{1} UNION {1} UNION {}".

The expression "1 + 1 + {}" albeit valid grammatically, can be equivalently re-written as "{1} + {1} +{}". At this point it should be far more obvious why "sum({1} UNION {1} UNION {})" is not the same as "{1} + {1} + {}".

Literals may be a little confusing because they look like elements, but they are still sets, singleton sets, specifically. There's practical value in simply thinking about "a bunch of things: A, B, C", where each of the A, B and C can themselves be empty, a single thing, or a bunch of things while ignoring nesting. In our case we allow duplication in these bunches (which is not part of the bunch theory: http://www.cs.toronto.edu/~hehner/bunch.pdf). However, because most people are familiar with sets we find it easier to keep using the terms "set" and "multi-set" (and stipulate that they are flattened) in explanations.

In general, the way the operator "+" works is this: A + B = {a + b : for all a in A, for all b in B}. Whereas the expression "{A, B}" is defined to be equivalent to "A UNION B".


Good explanation! (which indicates why the handling of empty sets is sometimes a bit confusing)

One more question: What was the motivation behind defining "sum({})" to be "0" rather then "{}" ?


Oh, that's simple: sum(A UNION B) should be the same as sum(A) + sum(B) for any two sets A and B (or else there would be very weird inconsistencies).

sum(A UNION {}) = sum(A) + sum({})

sum(A) = sum(A) + sum({})

0 = sum({})

Typically for any operation generalized for a set the result of op({}) should be equal to the identity for that operation (0 for sum, 1 for product, True for AND, False for OR, etc.). It's always such a value I that for any other value A, A op I = A.


Again, a very nice explanation. However, from a practical point, I see the following problem: Assume that "Select sum(amount) FROM Payments;" calculates the balance for a customer account, and assume further that for some reason (e.g. a programming error) the amount column for that customer has been filled with "{}", then the above query would still return the well defined result "0" which might indicate that everything is correct (while it is no).

This would not happen if "sum({a, b})" was defined as "{a} + {b}" (which is what a user would intuitively assume). However, this definition is also not very practical as any one occurence of {} in the sum would render the whole thing to {} (which is not what a user would expect).

I guess the handling of "{}" will always stay a bit tricky.


This type of error is better remedied by making the balance property be required (so that it cannot be set to {} and produce an exception at the time the error is introduced). This way you will know about the error early enough. The point is that if an empty value is NOT valid then forbidding it at schema level is the best solution. So required keyword is going to do that for you.

Alternatively, if making the property required is not possible due to some workflow constraints, you could do "SELECT Payment{customer} FILTER NOT EXISTS .balance" to find all payments (and the associated customer), which don't have any balance set. Then once you know what they are you might use "UPDATE" to fix the problem.

Empty sets have fairly well-defined and consistent behavior w.r.t. functions (and operators). You learn it once and it applies in all contexts - specifically that empty sets are just sets like any other.


Hhm, this time your answer doesn't convince me. Setting "required" is not always possible, and using UPDATE to set a value works only as an ex-post solution, i.e. you must already know that there is an error. But this was exactly the point of my hypothetical example: the balance-query returns 0 which is a perfectly legal account balance and there is no reason to suspect an error in the first place.

So, I still think that the definition "sum({}) = 0" has more potential to hide errors than the definition "sum({}) = {}" would have.


A few things come to mind:

1) It's a little unlikely that at the same time you have data where an {} is an error, but you are not making the property required AND with knowledge of that you still don't bother with other validation approaches. The point is that if you're aware that this property is potentially incorrect, you would want to check or restrict it. But yes, if this situation is completely unexpected, then "sum" won't notice any issues.

2) You have to remember that {} can arise from perfectly normal operations, such as filtering. So if you filter by a certain date range and there's no Payments there, then "(SELECT Payment FILTER .timestamp > <datetime>$date).amount" expression becomes {} even if the "amount" property itself is required. So doing a sum over it is simply a question of "What's the total amount in payments since $date?" and if there aren't any payments, the answer is 0, not {}. Plus what you certainly don't want is to get an {} from the "sum" here and do "{} + 100" to add some other charge (perhaps a sum from a different account) and still end up with {}, which now creates an error in a situation where there's nothing wrong with the data.

3) Rather than imbuing {} with special meaning to signal errors, a separate property would be more appropriate. Such as a boolean "valid" flag that gets set after the record checks out or can even be a dynamically computable expression that looks at the record (say, Payments from our example) and does something like "valid := EXISTS .amount". Then you'd filter things by the valid property before feeding them to "sum" like so:

  SELECT sum( (SELECT Payment FILTER .valid).amount );


Thanks for your thoughts which are all good and reasonable.

Over the past years I've seen many programmers struggle with the combination of "data aggregation" and NULL. In the beginning I thought that these programmers should just "RTFM", but as this problem occurs so often it might very well be that the practical implementation of aggregation and NULL is "a bit off".

I like your "set" approach a lot, however, we still have "sum({1, 1, {}})" unequal to "1+1+{}" and "sum({}) = 0" which, while consistent, I think are somewhat counterintuitive and I am pretty sure will lead to misunderstanding.

Having said that, I suspect that any decent and consistent approach to this problem is subject to a very reduced form of John Lydgate's famous quote, that is "You can only please some of the programmers some of the time".

Many thanks for your time and this interesting and insightful conversation.


if you can compare SQL every syntax to every usecase possible , until then we cant claim what we are doing is lot better.


More vendors just need to support ANSI SQL.


I would very interested in a language that transcribes to SQL the same way typescript transcribes to javascript.


That's more or less what ORMs try to be.


Only in the manner of speaking (i.e. if you conflate 'mapping' with 'transcribing').


If the generated SQL was compatible across a variety of databases and the performance of the SQL was also passable you'd be on to a good thing.


This is a problem I've been thinking about for years. It's possible to write pure ANSI SQL and be compatible across a lot of databases, but to be optimized in different databases it would need to have a target platform to compile to.

I'm really interested in what other developers think about this.


A simple thing would be macros for subqueries

    A := select col_1, col_2, col_3 from table_1
    B := select col_4, col_5, col_6 from table_2 where 2*col_5 > col_4 + col_6
    $A left join $B on table_1.col_1 = table_2.col_5


"We can do better than SQL."

I heard this not too many years ago, and I am now very skeptical based upon those results.


Any performance metrics for edgedb ?


It's kind of funny to point out issues with SQL's composability, and in the same breath spec out a language in which "a set (including an empty one) cannot be an element of another set".

There are many real life problems with SQL but this doesn't feel like it's resolving any of them.


I wish it had slightly more details, like what does the last query on the page actually return?

Also I am not sure how it's better yet. SQL is typed and has a ton of historical overhead and used in many different database softwares.

This kind of just seems like a slightly different implementation of NoSQL syntax.


> [..] SQL is typed and has a ton of historical overhead [..]

EdgeQL is strictly typed, here's more info: https://edgedb.com/docs/edgeql/overview/

> slightly different implementation of NoSQL syntax.

I'm genuinely curious what NoSQL syntaxes you are referring to.


If there was a perfect, seamless pandas to SQL translator, that would be it.


criticism on SQL does not make it meaningful for a new database. I mean, an ORM with its own query language, a new SQL dialect, some language that compile to standard and optimized SQL are all good ideas.


I love he relational model, but any language where null = null does not return true is insane.

Yes, I understand the theoretical rationale behind three-value-logic. Over here where I'm getting actual work done, it's nothing but an obstacle.


Meh. It looks like just another graph database with a (possibly) better query language.

People of HN: if you really want to make money, make a open source (open core or whatever) horizontally scalable version of Versant OODBMS (Object-Oriented DBMS).


Object relational? No thanks. Just give me relational.


How many of the PostgreSQL features does edgedb support?


Quite a big number, but probably not all yet. For example, in one of the future releases we'll add support for GIS types.


Just wanted to say I see a lot of value in wrangling with the problems inherent to SQL, so kudos to you.

Hopefully a transpiler API is in the works for all of those wonderful standards deviant implementations?


SQL reminds me of COBOL. Even replacing some of the UPPERCASE words in SQL with C-like symbols would be nice.


It's case insensitive, you can use lowercase if you like. The reason the UPPERCASE persists is frankly because most people like it that way. Or maybe I'm wrong about that and we're one preference cascade away from a major flip in public opinion. But personally, I rebelled against the UPPERCASE when I was first learning it, but I've since come to find the UPPERCASE to be quite comforting. I think it has a pleasing aesthetic quality, and I find it helps make large chunks of SQL less intimidating to read.


I think the main reason these days is that there's still a fair bit of SQL embedded in other languages, usually as string literals with placeholders. And then it provides a way to visually distinguish it from "normal" string literals.


You know the keywords don't have to be in uppercase, right?


That's just personal preference, in our shop we use lowercase for SQL reserved words.


They make this comparison in the post.


> Even replacing some of the UPPERCASE words in SQL with C-like symbols would be nice.

In EdgeQL the keywords are recognized in any case. It is a matter of personal style.


As is SQL


Yes, we are not debating that.


My two main gripes with SQL are:

- Lack of interoperability with other languages.

- General ugliness of server-side SQL (stored procedures and functions).

--------

The first one isn't really solved by ORM tools, AFAICT. You can't simply write the SQL query and transpile it into a nice, statically-typed method (with types derived from the actual database structure!) that you can call directly from your language.

For example, the database structure (I'm using T-SQL types here):

    CREATE TABLE T (
        A int PRIMARY KEY,
        B bigint NOT NULL
    );
And the query (T-SQL style parameter syntax):

    SELECT A, B FROM T WHERE A = @a;
Would produce the following method after transpilation (C#, hopefully self-explanatory):

    IEnumerable<(int A, long B)> Query(int a);
But when the database structure changes, that would automatically be reflected in the client language (after a build). For example, making B NULL-abe would produce:

    IEnumerable<(int A, long? B)> Query(int a);
--------

The second gripe is not very important if you use SQL just as a client-side query language. But making the database "defend" its data (in presence of complex business logic, or security requirements, not fully expressible through declarative constraints) is still best achieved by "funnelling" all clients through an API of stored procedures/functions/views, IMHO. As a bonus, this approach also tends to lower database round-trips.

There seems to be a general lack of composability/reusability:

- E.g. one stored procedure returning a set of rows cannot just "pipe" them into another procedure or query - it must first copy the rows into a (temporary) table.

- If your dialect allows you to declare a table variable, you cannot just assign it to another, you have to INSERT.

- You can reuse the same SQL fragment multiple times in the same query (through WITH), but not in different queries without encapsulating it in a function (and good luck with performance if your dialect doesn't inline function query plans or doesn't support functions at all).

- You cannot parametrize ORDER BY, GROUP BY, IN...

And myriad of other problems:

- The syntax is stuck in the '80ties, not well suited for auto-completion, no type inference.

- Lack of simple struct/tuple types in some dialects (may lead to huge parameter lists).

- Inconsistent exception / error handling behavior (sometimes the transaction is aborted, sometimes it isn't).

- Silent data truncation in some cases.

- NULL sometimes meaning "unknown" and sometimes "empty".

- No boolean expressions (e.g. you can't write A IS NULL = B IS NULL).

- And probably many more that are currently not at the top of my head... <RANT CLOSED>

It strikes me that we can do better on all these fronts (and more), without abandoning the "good" parts of relational databases.


I think that EdgeDB actually addresses a lot of your comments re "General ugliness of server-side SQL", i.e.

[ ] You cannot parametrize ORDER BY, GROUP BY, IN...

[v] type inference

[v] tuples, tuples of tuples, arrays of tuples, any combination, really

[v] Consistent exception / error handling behavior

[v] No silent data truncation in some case

[v] no NULLs -- empty sets are way more precisely defined in EdgeQL

[v] boolean expressions

Please give EdgeDB a try. Feedback from advanced SQL users is very important to us.


I know, I've used AR.


to do better than sql we should do at least as good as sql


The correct title for this article is "SQL can do better"


I wholeheartedly agree with the "complaining" part of the post, but then here comes the "solution" and I'm not quite sold.

I mean, I don't necessarily claim that this is not a solution, it just isn't obvious to me at all. Maybe a more extensive explanation with better examples would make it all clear to me and I'd be super-hyped about it already, but right now I'm more like confused.

First off, it would be helpful show the table structure in the examples, and then compare EdgeQL query to the easiest solution in the SQL. After all, the readers supposedly use SQL almost daily for many years (I know I do), but don't know a thing about EdgeQL, so if it can do everything SQL can, but easier, such a comparison must make it pretty obvious.

TBH, my knowledge of the relational algebra is quite rusty by now, so maybe that's the problem, but as I remember, many queries we commonly use with the SQL are not really "relational" queries. Relational algebra deals with the sets of tuples, so things like count(*) or ORDER BY, or GROUP BY are not really a part of relational model, they just exist because they are super-helpful in what we usually are trying to achieve with SQL.

The problems with NULL are of a similar nature. I don't think we should pretend that NULL not being equal NULL is not useful (we don't expect "missing data" to be exactly the same value as another "missing data", do we?), and SELECT DISTINCT treating them as equals is not intuitive (for me it absolutely is: when I'm asking what values occur in a table, a missing entry is a missing entry to me, I don't want to see NULL 10000 times).

So, the introduction kind of made me to expect the solution to be more in compliance with relational concepts, but it doesn't seem to be, since all of the above are present in the EdgeQL in one form or another.

I'm not sure how {} is different from NULL in the EdgeQL, since {} seems to be kind of special thing here, the same as NULL is in the SQL. I mean, it doesn't behave like a true empty set at all! Non-empty set {value} OR {} = {value}, not {} (OR ≡ ∪). <bool>{} being {} instead of a true boolean value looks even more confusing to me than NULL OR (NOT NULL) = NULL. Same ternary algebra here.

Then, I don't really understand a concept of a flat set here. I do kind of understand what we are trying to achieve here: we want to solve the problem of SELECT x, (SELECT y) FROM z throwing an error in a runtime, if count(SELECT y) != 1. And it kind of would make sense in SQL, but it's explicitly advertised as a feature of EdgeQL that it can return trees (json-like structures), and here it doesn't seem to make sense that an output of a query (which is a "flat set", I guess?) cannot have another set as an element. Moreover, it obviusly can be ordered, which also isn't a property of how "set" is commonly defined in the set theory.

As a first impression, syntax and overall structure of the queries doesn't strike me as obvious as well. In fact, since

> SQL does not integrate well enough with application languages and protocols

I would ultimately hope for something that can be expressed as a number of function calls and commonly used data structures (a list, a dictionary/record, etc.) in most/any mainstream PLs, not a one more DSL as in "free form text". (Maybe with a more succinct DSL for the use in a console. Maybe.)

And my ultimate source of confusion. SQL is more or less the same thing even in these DBMS where it isn't exactly The SQL (like ClickHouse). And given I know the overall structure of the DBMS (like, is it, for instance, row-based or column-based?) I can make pretty good assumptions of performance of a given query, even though SQL is still declarative and I do not know what exactly the query-optimizer will do. Maybe it's just that I'm not used to it, but I don't have a feel about how performant would be the last EdgeQL example of the article, and if it would be better to separate it into several queries at some scale. In fact, I don't even understand if it's something that would be reasonable easy to implement in other major RDBMS', or is it ultimately EdgeDB-only feature? If so, it can be only as good as EdgeDB — and is it as good as PosgreSQL, or MariaDB, or sqlite? Unfortunately, in the real world I have to worry more about how performant and robust a thing is under load, than I can worry about programming convenience.


> {value} OR {} = {value}, not {} (OR ≡ ∪), <bool>{} being {} instead of a true boolean value

This is because we define infix OR as

{a OR b | ∀ (a, b) ∈ (A x B)}

Same goes for the cast function.

You may argue that this is confusing, but if you think about everything as a set comprehension, it is way more consistent. For example the `CASE WHEN` example from the post would always return `{}` for empty input, making it obvious.

> but it's explicitly advertised as a feature of EdgeQL that it can return trees (json-like structures)

The tree-like return is not an aspect of a language system, it's a matter of output representation. For example:

    SELECT User { favorites: {name} }
still returns a flat set of User objects. The shape selector `{ ... }` is an annotation that defines how an object is serialized in the output.

> I would ultimately hope for something that can be expressed as a number of function calls and commonly used data structures

This is exactly where we are going. Orthogonality in the underlying language makes it much easier to achieve this.

> and is it as good as PosgreSQL, or MariaDB, or sqlite?

EdgeDB is based on Postgres, we actually transpile EdgeQL to SQL. EXPLAIN and query performance analysis are being worked on.

> in the real world I have to worry more about how performant and robust a thing is under load, than I can worry about programming convenience.

We have posted some benchmarks [1], and more are coming.

[1] https://edgedb.com/blog/edgedb-1-0-alpha-1/


kdb is better than sql.


And thus a new SQL dialect was born.


Well then do it.

Let me guess: ya can't.


So if you can handle the "complexity" of SQL and don't use NULL, SQL is fine?


> don't use NULL

I'm not sure this part is realistic to do in a non-trivial setup. At the very least it's not a common thing to do.


You either have actual NULLs in your tables or you have implied NULLs that don't exist in the actual tables but still come up when you do LEFT JOINs etc.

I don't think NULLs are actually a problem in SQL. It's not like we're talking about pointers, it won't cause your server to crash if you use them wrong. NULLs in SQL are fine.


They're far worse than null pointers. They can cause subtle bugs that lead to queries that appear to work but result in missing or incorrect data.


[flagged]


> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

https://news.ycombinator.com/newsguidelines.html


I use ORMs


What is EdgeDB? It is new to me.


Yeah, it's because it is new. We released the first public version a few weeks ago, here's a blog post with the announcement: https://edgedb.com/blog/edgedb-1-0-alpha-1 Hope you will try it!


I skimmed the post and am not interested enough to look at it further. I'm not saying it isn't interesting at all, but I still need to carve out the time to learn Postgres' new features and check out RethinkDB again. If I had infinite time I would check it out.

The MongoDB example is extremely contrived. If you wanted to do that with MongoDB you would add a new field or a new collection with full_name rather than doing a $map.


The comment in the query mentions that it makes the query only 5% slower. You can drop the "full_name" part, but the query would be still too low level (even compared to SQL/ORM examples) and slow.


Why are you combining separate fields in the first place? Document databases are denormalized. If you want a fair comparison, put full_name in a separate field in addition to first_name and last_name in each document. That's the way to use Mongo. The issue here is that a document db takes extra space, not that you have to use $map for the common task of searching a full name. You only have to use $map for querying things that you didn't plan to query.


The point of that blog post is to look at both performance and usability.

MongoDB has a way to fetch data the way we needed and the way other databases fetch it in our benchmarks. We tried just that: the performance is roughly the same anyways. It's up to MongoDB users how exactly they store the data and how exactly they query it, but it's nice to know that their query language is capable enough.

I think that it's pretty obvious that not a lot of users query data like that in MongoDB and instead just store denormalized data. That's not the point of that blog post though.


[flagged]


I wish we "flop" like MongoDB :)


So, to fix SQL, they're taking an SQL server and building a proprietary language on top of it.


We Can Do Better Than English. It is full of strange grammar and inconsistencies. The tooling for checking grammar throws false positives. In fact all it has going for it is that billions of people already speak it...well I suppose that is a pretty big advantage.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: