> Anyone can go build a crawler and scrape the web the way Google scrapes it so ...

heavenlyblue · on June 1, 2018

The site ranking algorithm is a solved problem.

The one reason Google is competitive is due to them taking advantage of the cheap labour that keeps track of ranking manipulation.

Luckily most of the search problems have nothing to do with ranking manipulation.

sheeshkebab · on June 2, 2018

site ranking is not a “solved problem” - google tries to solve it all the time and yet finding anything other than trending or popular stuff still takes more than several attempts (and often doesn’t even result in best results).

heavenlyblue · on June 3, 2018

Google has a set of contradicting requirements for the interface they've got on their website.

From one side it's along the natural-language interface from Alexa or alike; from the other side it's an interface of search for people who generally need access to information.

If Google exposed interfaces similar to Elastic Search - the search would never be an issue anymore; but it would not be easy to use by the users.

jamra · on June 1, 2018

That’s precisely why they don’t want you cheating the hard part and just storing the results. It makes sense to me. Work on your own machine learning if you want good results.

toomuchtodo · on June 1, 2018

There is no such thing as cheating, only staying within boundaries that don't land you in jail or sued in your own jurisdiction. If you can get an edge by using Google's own data, do so.

lugg · on June 1, 2018

Bing bing bing!

Er, ughm. I mean,

Ding ding ding!

anc84 · on June 1, 2018

You could argue that Google should work on their own knowledge database instead of learning from other people's content and/or presenting other people's content in their own frontends (shopping etc)...

heavenlyblue · on June 1, 2018

This is what Common Crawl does: http://commoncrawl.org/. I think more people should know about it.