Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a very low learning rate -- between 2-3 orders of magnitude lower than what I've seen for that number of steps. I'll have to give it a try.


I should have been clear - I'm using the Prodigy settings on that page, not the Adafactor one. You set the learning rate to 1 and the scheduler to cosine, but the real learning rate is figured out by the optimizer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: