Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.

You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].

[1] -- https://arxiv.org/abs/2312.15166

[2] -- https://arxiv.org/abs/1906.06669

[3] -- https://arxiv.org/abs/2205.10487

[4] -- https://galactica.org/static/paper.pdf



In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...


Thank you for taking the time to provide me all this reading.


This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: