Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One major use of the public datasets in the academic community is to serve as a common reference when comparing new techniques against the existing standard. A static baseline is desirable for this task.

You could maybe split the difference by having an "original" or "reference" version, and a separate moving target that incorporates crowdsourced improvements.



This sounds like a revisioning system would help a lot. Have a quarterly or annual release cycle or something, so that when you want to compare performance across techniques, you just train both of them to the same target (and ideally all the papers coming out at roughly the same time would already be using the same revision anyway).

You'd always work with a versioned release when training models, and you'd only typically work with HEAD when you were specifically looking to correct flaws in the data (as the authors in the linked article are).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: