Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Photon – a live demo of a natural language interface to databases (naturalsql.com)
48 points by atrudeau on Aug 27, 2020 | hide | past | favorite | 26 comments


"What is the price of Ginger Beer?"

It couldn't translate that into SQL.

"What is the price in dollars of Ginger Beer?"

> SELECT Catalog_Contents.price_in_dollars FROM Catalog_Contents WHERE Catalog_Contents.price_in_dollars = "Ginger Beer"

Nope.

"What is the price in dollars of catalog entry name Ginger Beer?"

> SELECT Catalog_Contents.price_in_dollars FROM Catalog_Contents WHERE Catalog_Contents.catalog_entry_name = "Ginger Beer"

Cool! You have to be more specific than I was hoping, but this is still pretty neat.


"What is the price in dollars of catalog entry name Ginger Beer?"

OK - but that's not "natural language".


What it might need to do is index commonly searched terms and provide a reverse lookup to the location of the row... oh wait, that's called a search engine. :)


That’s worse than pretty much any NLP system from 1975


There’s a lot missing here. During the brief and unhappy period of my life where I worked in this area, we had quite a lot of luck just generating semantics based on Wordnet in the domain in question. So here you can’t successfully ask for “French wines” even though we know what a country is and that French is a correct adjectival form. Same with things like “oldest wine”, that’s an easy to derive superlative based on info you already have. We got some mileage out of this old fashioned tree based system at the core, with fuzzier machine learning stuff at the edges.


> What singer sang in the most stadiums?

    SELECT singer.Name FROM singer JOIN singer_in_concert ON singer.Singer_ID = singer_in_concert.Singer_ID GROUP BY singer.Singer_ID ORDER BY COUNT(*) DESC LIMIT 1
It is close... sort of? It figured out it needed to join, group, and order, but it only drew the relation to the concert, not the venue. Correctness seems a huge challenge here. Even knowing SQL, I feel I'm double checking my results at times. But I can see how this might be incredibly useful someday for Salesforce if there's confidence in the results.


> how many teachers older than thirty?

> SELECT COUNT(*) FROM teacher WHERE teacher.Age > "thirty"

Not a bad idea. A good idea, maybe. Implementation needs some work.


IMO better would be a database interface that acts as a normal programming language, treating tables as arrays of records. Compare:

  how many teachers older than 30
  teachers.filter(*.age > 30).len
Same length, but the second one has a degree of precision that the first lacks. (Though they might diverge somewhat as the complexity of queries grows, I suspect programming languages would do better as they have better facilities for symbolic manipulation.) Note also that the second example (and your SQL example) had to specify the table's name, while the natural language one did not—another mark against the natural language solution.


What you described sounds a lot like .NET's LINQ, which translates to SQL queries as well:

    teachers.Where(t => t.Age > 30).Count();


Linq is cool; I haven't really had opportunity to play with it as I'm not in the .net ecosystem. Another integrated language/database is kdb, although it borrows somewhat from sql's syntax.


I would suggest something more like:

    teachers[age > 30].count
(Because I'm working on something exactly like this right now)


In apl (incl. apl-alikes—kdb, etc.), you can essentially say this directly. 'age > 30' produces an array with the same shape as 'age' with a 1 for instances that are >30 and 0 else. So for instance:

        27 42 44 > 30
   0 1 1
If all you want is a count of teachers older than 30, you can simply sum up the resulting array, so:

         +/0 1 1  ⍝note: '/' is reduction, so +/ is sum reduction
   2

If you want the indices, you can use 'where', which will produce an array of indices corresponding to 1s in its argument, so

         ⍸0 1 1
   1 2
Indicates that '0 1 1' has 1s at indices 1 and 2. (We could also take the length of this array to get the count of teachers older than 30, although that would be a bit silly compared to just summing the original array.)

We can then use these indices to index a different array:

         ages←27 42 44
         names←'anthony' 'barbara' 'bert'
         names[1 2]
    barbara  bert
         names[⍸ ages>30]
    barbara  bert
However, we don't even need to use 'where' in this case; the reduction operator (/) has a second use which lets us say:

         0 1 1 / names
    barbara  bert
         (ages > 30) / names
    barbara  bert
Good luck with your database, and hope this helps!


It's possible to use Clojure/ClojureScript functions inside datalog queries

http://www.learndatalogtoday.org/


Your example reminds me of RethinkDB, which I haven't used in years but as I recall it was quite nice!


Looking at responses to your comment I predict that AI won't replace properly trained programmers too soon...


The Covid database contains cumulative figures, so if you ask "How many deaths in ...?" you get the naive query with the sum of the Deaths column for that country, which is wrong. Actually I wonder how to explain it. I cheated and asked "how many deaths in ... on July 14?" but got the wrong query, with July 14 as Province_or_State no matter how I rephrased the date.


“Select all elements from catalog” did not work...


> All confirmed covid deaths

> SELECT covid_19_july_data.Deaths FROM covid_19_july_data WHERE covid_19_july_data.Confirmed = "covid"


I've seen a talk about this task, it's supposed to be hard to get enough training data for regular DL approaches.


Why do you think this is online?

BTW, they didn't make life easy for themselves by having a field called "number of records". I asked something like "what's the number of records" and "what's the sum of the number of records", but it kept replying with `SELECT COUNT(*) FROM wines;`.


In my opinion this doesn't make sense.

SQL is a tight, unambiguous language, that's why it exists.

This is like a legal document written in spoken English. It's only all fine when it works.

Part of writing SQL is also understanding the underling data. This won't address this issue.

This is also not replicable. Language changes in context and time.


> In my opinion this doesn't make sense. > Part of writing SQL is also understanding the underling data. This won't address this issue.

I guess the endgoal for this is to make non-technical people also be able to efficiently work with databases. From a purely business/financial context this would save companies hours (i.e. money) onboarding/teaching employees to use their database, and even possibly remove the need to hire expensive data analysts because their lower-tier employers suddenly can interact with their databases as efficiently as they can.

edit: I also believe you're putting the cart before the horse with your reasoning. SQL and Legal English NEED to be exact, which makes them very 'complex' because you need to disallow any edge cases. This doesn't mean we WANT them to be complex. It is way more useful if it is easy and intuitive (like natural language). Matter of fact, this would save in both Legal and SQL cases a lot of time, because in both you'd often start with natural english, like 'I want to write a rent contract that protects me and my renter from legal trouble', or 'I want to know from this database which company had the highest net profit in the last quarter'. It's only then that you put money, effort and time into translating this into Legal English or SQL.



Doesn't do anything related to language


list all company

> Please check the results in the table. Did I get it right?

yes

> Great!

list all designation

> Sorry, 'designation' is confusing to me


meh "is there any area column in addresses table" did not work seems straight forward to me ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: