Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a good idea and there are actually 2 objectives when one wants to clean its dataset:

- you might want to optimize your time and correct as many errors as you can as fast as you can. Using several models will help you ion that case, adn that's actually what we've been focusing on so far.

- you might want to find the most ambiguous cases where you really need to improve your models as those edge cases are the ones causing the problems you have in production. Those 2 objectives are quite opposite. In the first case, you want to find the "easiest" errors, while in the other one, you want to focus on edge cases and you then probably need to look at errors with intermediate scores, where nothing is really sure..



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: