What it might need to do is index commonly searched terms and provide a reverse lookup to the location of the row... oh wait, that's called a search engine. :)
There’s a lot missing here. During the brief and unhappy period of my life where I worked in this area, we had quite a lot of luck just generating semantics based on Wordnet in the domain in question. So here you can’t successfully ask for “French wines” even though we know what a country is and that French is a correct adjectival form. Same with things like “oldest wine”, that’s an easy to derive superlative based on info you already have. We got some mileage out of this old fashioned tree based system at the core, with fuzzier machine learning stuff at the edges.
SELECT singer.Name FROM singer JOIN singer_in_concert ON singer.Singer_ID = singer_in_concert.Singer_ID GROUP BY singer.Singer_ID ORDER BY COUNT(*) DESC LIMIT 1
It is close... sort of? It figured out it needed to join, group, and order, but it only drew the relation to the concert, not the venue. Correctness seems a huge challenge here. Even knowing SQL, I feel I'm double checking my results at times. But I can see how this might be incredibly useful someday for Salesforce if there's confidence in the results.
IMO better would be a database interface that acts as a normal programming language, treating tables as arrays of records. Compare:
how many teachers older than 30
teachers.filter(*.age > 30).len
Same length, but the second one has a degree of precision that the first lacks. (Though they might diverge somewhat as the complexity of queries grows, I suspect programming languages would do better as they have better facilities for symbolic manipulation.) Note also that the second example (and your SQL example) had to specify the table's name, while the natural language one did not—another mark against the natural language solution.
Linq is cool; I haven't really had opportunity to play with it as I'm not in the .net ecosystem. Another integrated language/database is kdb, although it borrows somewhat from sql's syntax.
In apl (incl. apl-alikes—kdb, etc.), you can essentially say this directly. 'age > 30' produces an array with the same shape as 'age' with a 1 for instances that are >30 and 0 else. So for instance:
27 42 44 > 30
0 1 1
If all you want is a count of teachers older than 30, you can simply sum up the resulting array, so:
+/0 1 1 ⍝note: '/' is reduction, so +/ is sum reduction
2
If you want the indices, you can use 'where', which will produce an array of indices corresponding to 1s in its argument, so
⍸0 1 1
1 2
Indicates that '0 1 1' has 1s at indices 1 and 2. (We could also take the length of this array to get the count of teachers older than 30, although that would be a bit silly compared to just summing the original array.)
We can then use these indices to index a different array:
ages←27 42 44
names←'anthony' 'barbara' 'bert'
names[1 2]
barbara bert
names[⍸ ages>30]
barbara bert
However, we don't even need to use 'where' in this case; the reduction operator (/) has a second use which lets us say:
0 1 1 / names
barbara bert
(ages > 30) / names
barbara bert
Good luck with your database, and hope this helps!
The Covid database contains cumulative figures, so if you ask "How many deaths in ...?" you get the naive query with the sum of the Deaths column for that country, which is wrong. Actually I wonder how to explain it. I cheated and asked "how many deaths in ... on July 14?" but got the wrong query, with July 14 as Province_or_State no matter how I rephrased the date.
BTW, they didn't make life easy for themselves by having a field called "number of records". I asked something like "what's the number of records" and "what's the sum of the number of records", but it kept replying with `SELECT COUNT(*) FROM wines;`.
> In my opinion this doesn't make sense.
> Part of writing SQL is also understanding the underling data. This won't address this issue.
I guess the endgoal for this is to make non-technical people also be able to efficiently work with databases. From a purely business/financial context this would save companies hours (i.e. money) onboarding/teaching employees to use their database, and even possibly remove the need to hire expensive data analysts because their lower-tier employers suddenly can interact with their databases as efficiently as they can.
edit: I also believe you're putting the cart before the horse with your reasoning. SQL and Legal English NEED to be exact, which makes them very 'complex' because you need to disallow any edge cases. This doesn't mean we WANT them to be complex. It is way more useful if it is easy and intuitive (like natural language). Matter of fact, this would save in both Legal and SQL cases a lot of time, because in both you'd often start with natural english, like 'I want to write a rent contract that protects me and my renter from legal trouble', or 'I want to know from this database which company had the highest net profit in the last quarter'. It's only then that you put money, effort and time into translating this into Legal English or SQL.
It couldn't translate that into SQL.
"What is the price in dollars of Ginger Beer?"
> SELECT Catalog_Contents.price_in_dollars FROM Catalog_Contents WHERE Catalog_Contents.price_in_dollars = "Ginger Beer"
Nope.
"What is the price in dollars of catalog entry name Ginger Beer?"
> SELECT Catalog_Contents.price_in_dollars FROM Catalog_Contents WHERE Catalog_Contents.catalog_entry_name = "Ginger Beer"
Cool! You have to be more specific than I was hoping, but this is still pretty neat.