Though if you've made it clear that only x, y and z can crawl your site, and someone spoofs, say, y, then it would be easy to demonstrate that someone has done something they know they shouldn't.
and not only can the bot lie, it can disregard the robots.txt file altogether. just like the terms of service document for humans, you can choose to disregard it & deal w/ the consequences (blocked IP's, lawsuit, etc).
robots.txt is just a version of the TOS that computers can read.
probably not, but that's irrelevant. the point is facebook's robots.txt says you can crawl it but their TOS says you can't. facebook have already shown they are willing to sue anyone who tries to crawl without permission. this new model that facebook is trying to use isn't scalable, it favors the big guys and is bad for the open web.
you can also do this with a regular expression. here's the blog post i wrote on it a year ago. i use one function to grab get variables from either window.location or from a script tag. 5 lines of code.
Some of the non-/b/ areas of 4chan are surprising. Look at /jp/: there are people programming games and animations, people translating visual novels from Japanese. They even spawned a group project to make a visual novel from scratch: http://katawa-shoujo.com/
I think anonymous forums can work as long they share a strong common interest.
practically speaking, most respectable email services will have an MX record. i tested it on about 30,000 emails & a simple MX lookup caught about 40 gmial's and only 1 false positive.
that reminds me of the girl who takes advantage of the zappos free returns policy and gets 5 new pairs of shoes every singles week only to mail them back a few days later. but zappos doesn't cut here off -- she is important evidence that they are serious about their policy & for zappos there are enough honest people out there to more than make up for her.