Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Obama's New Robots.txt (codeulate.com)
56 points by r00k on Jan 20, 2009 | hide | past | favorite | 15 comments


The vast majority of the entries in Bush's robots.txt were filtering out the plain text versions which are linked at the bottom of the HTML versions containing identical content. This prevents duplicates from showing up in searches. This is likely done automatically by whatever software they use to manage the content.

Want proof? Pick any of the entries ending in "/text", for example "/911/911day/text", search Google with the "/text" removed like this: "site:whitehouse.gov inurl:/911/911day" and you can still see the page in the Google cache (at least until Google's index is updated).

If you want to view it as a metaphor, fine, but there's no evidence Bush's administration was trying to hide anything on their website like this article implies. If they wanted to hide it, why would they put it on there in the first place?


This is a great and semi-metaphorical comparison (woohoo transparency!), but to be fair, the Obama administration hasn't done anything yet, so there isn't even anything to hide at this point.


Having /includes/ under document root - and trying to fix this via a robots.txt entry (??) - wouldn't reflect well on Obama, if they actually had any meaning :)


The includes folder looks like it's just JavaScript and CSS (including jQuery) so it has to be under the document root.


Sorry, I'm taking that back... I was thinking of Obama's campaign site where you could actually access stuff like /includes/footer.php (at least for a while, seems to be fixed now)


You're making assumptions about what that directory is for. Perhaps it is used to store static HTML pages that are included from IFRAMEs on other pages.


Well that's at least tons better than hiding the entire site with hundreds of Disallow entries, when they could have just done /*.


Why aren't we allowed to crawl their JS and CSS?

What are they trying to hide?


I hope you're joking.


probably all the hacks to make many browsers work /joke/


I'm more interested in what CMS they are using. Any ideas?


/firstlady/newborn/text !?



Humm... maybe there's a little Obama on the way?

Doubtful, but you never know.


That was from the old robots.txt from the Bush admin, not Obama.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: