Hacker News new | past | comments | ask | show | jobs | submit login

My guess, which could be completely wrong, Anthropic spent more resources on interpretability and it's paying off.

I remember when I first started using activation maps when building image classification models and it was like what on earth was I doing before this... just blindly trusting the loss.

How do you discover biases and issues with training data without interpretability?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: