I find it frustrating why statisticians can't just give us something we could indeed use with confidence that it actually tells us what we think it tells us...
The physicists I talked with were really bothered by our 250 year old Bayesian-frequentist argument. Basically there’s only one way of doing physics but there seems to be at least two ways to do statistics, and they don’t always give the same answers.
This says something about the special nature of our field. Most scientists study some aspect of nature, rocks, stars, particles; we study scientists, or at least scientific data. Statistics is an information science, the first and most fully developed information science. Maybe it’s not surprising then that there is more than one way to think about an abstract subject like “information”.
"Terrible" is a little harsh. I found the benchmark to be interesting because it's similar to many NLP-style analysis tasks I have myself. Of course reproducible ones would be great too.
This is a hard problem -- there's a bunch of research in NLP on it, where it's sometimes called temporal tagging. HeidelTime is a system that does this; some examples on their webpage, https://code.google.com/p/heideltime/
Yeah, that's the main PTB-style tagset resource for tweets (that I know of). Our tool can also be run to produce tags in their form (we just retrained our software on their annotated data). See further down the webpage for information on how to get this.
Part of it is genuine differences between online conversational language versus standard written English, like emoticons, Twitter-specific discourse markers, and hard-to-segment compounds or clitic constructions (see the Gimpel and Owoputi papers (2011, 2013) linked on the page, and/or the annotation guidelines document too). Part of it is just that it's easier for humans to annotate the coarse-grained POS tagset, and we didn't have many resources for annotation when we did it.
These things also intersect ... for example, you'd have to figure out how dialectical English verbal auxiliaries like "finna", or the second or so word in "imma", map to PTB tags. It's possible but just takes more work and thinking through the descriptive linguistics and what you want to use it for. Someday I'd like to update the whole thing for a more PTB-like POS tagset, if it can be done well. I feel like Chris Manning's whitepaper on issues in PTB POS data convinced us (well, it convinced me, at least) that it might be a good idea to focus on making high quality tag annotations. (http://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf )
This is on a slightly different statistical methodology issue, but quoting from Brad Efron (http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFre...):
The physicists I talked with were really bothered by our 250 year old Bayesian-frequentist argument. Basically there’s only one way of doing physics but there seems to be at least two ways to do statistics, and they don’t always give the same answers.
This says something about the special nature of our field. Most scientists study some aspect of nature, rocks, stars, particles; we study scientists, or at least scientific data. Statistics is an information science, the first and most fully developed information science. Maybe it’s not surprising then that there is more than one way to think about an abstract subject like “information”.