Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was expecting this to be more about running inference in production, though the information in the article itself was interesting on its own.

There does seem to be a dearth of writing on the actual topic of deploying models as prediction APIs, however. I work on an open source ML deployment platform ( https://github.com/cortexlabs/cortex ) and the problems we spend the most time on/teams struggle with the most don't seem to be written about very often, at least in depth (e.g. How do you optimize inference costs? When should you use batch vs realtime? How do you integrate retraining, validation, and deployment into a CI/CD pipeline for your ML service?).

Not taking anyway from the article of course, it is well written and interesting imo.



There seems to be the idea that training an ML model is like compiling code - but every "compile" leaks information into the training pipeline. Repeated testing and choosing (unless it is on a fresh draw) is an optimization step, you are optimizing on the test set.

Using a fresh draw is difficult and expensive, especially since the labels may not be available. Using A/B is expensive, multi-armed bandits are more efficient, but again there is an optimisation element there (waits for shouting to start)

Additionally surely there is a really significant qualitative judgement step about any model that is going to be used to make real world decisions?


You don’t typically perform optimizations iteratively with feedback from the final test set. Instead you split your training set into validation and training, and you iterate on that, leaving your true hold out test set completely unexamined all along.

You would do model comparisons, quality checks, ablation studies, goodness of fit tests and so forth only using the training & validation portions.

Finally you test the chosen models (in their fully optimized states) on the test set. If performance is not sufficient to solve the problem, then you do not deploy that solution. If you want to continue work, now you must collect enough data to constitute at minimum a fully new test set.


I agree with the process you describe - but the traps are things like running a beauty contest (Kaggle ?) of n models against the final test set...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: