Hacker News new | past | comments | ask | show | jobs | submit login

I submitted http://ianab.com/trillion/0.html to http://www.google.com/addurl/ and http://search.msn.com.sg/docs/submit.aspx

Please log requests from the Google and Microsoft bots and let us know how long it takes the respective bots to figure out that every page is the same :-)




http://pastebin.com/m54b7d354

Looks like it just hit a bunch of links from the first page.


So it looks like Google grabbed about 40 links before giving up? I wonder what a good "score" is? At first, I'd guess less is better, but too few might be running the risk of throwing out potentially good pages. Too many, and the bot is just wasting effort. The 40 score could vary as well based on parallel conditions assuming many bot instances are sharing a task pool. Be sure to post the Microsoft results if/when they crawl you.


Looks like it hit all of the links on the first page (there are 8x5 boxes = 40) and didn't find anything interesting, so it didn't crawl any deeper. If the second-level pages had more interesting/unique content, I bet it would've kept going.


Thanks a lot for sharing this. Very interesting indeed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: