Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't training against subset of data and then validating against rest a common practice? It wouldn't detect all the mislabeling but should detect some indicating that manual inspection is required, assuming error isn't very systematic.


It is, and there are some interesting techniques published recently to help mitigate things like this. But if you don't have a good ground truth you're at the very least flying blind and at worst feeding garbage in and getting garbage out; your models will learn what you tell them to learn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: