Can you talk at a high level about the problems of keyword search, or is that pa...

graycat · on May 17, 2020

Good questions.

For

> Before you can even do a keyword search, you obviously need an intent to do so. But that means keyword search is pretty useless when you don't know what you don't know.

Right: The way at times in the past I have put something like that is to say that, ballpark, to oversimplify some, keyword search requires the user to know what content they want, know that it exists, and have keywords/phrases that accurately characterize that content. For some searches, e.g., the famous movie line

"I don't have to show you no stinking badges",

https://www.youtube.com/watch?v=VqomZQMZQCQ

that is fine; otherwise it asks too much of the user.

For "encoding", my work does not use keywords or any natural language for anything.

My work does get some new data for each search for each user. But privacy is relatively good because for the results I use only what the user gives for that search; in particular, two users giving the same inputs at essentially the same time will get the same results. Thus, search results are independent of the user's IP address or browser agent string. Moreover, the site makes no use of cookies.

The role of the advanced pure math is to say that the data I get and the processing I do with that data and what is in the database should yield good results for the 2/3rds. The role of my original applied math is to make the computations many times faster -- they would be too slow otherwise.

When keywords work well, and they work well enough to be revolutionary for the world, my work is, except for some small fraction of cases, not better. So, there is ballpark the 1/3rd where keywords work well. Then there is the ballpark, guesstimate, 2/3rds I'm going for.

My work is not as easy for the users as picking a great, very accurate, result from the top dozen presented by a keyword search engine, e.g., the movie line example, but is much easier to use than flipping through 50 pages of search results and is intended usually to give good results unreasonable to get from a keyword search, without "characterizing" keywords, that yields, say, millions of search results and would require a user to flip through dozens of pages of search results.

Ads are off on the right side and not embedded in the search results. The SEO (search engine optimization) people will have a tough time influencing the search results!

We will see how well users like it. If people like it, then it will be good to make progress on the huge, usually neglected, content of the 2/3rds.