Hacker News new | past | comments | ask | show | jobs | submit login

This is a fantastic breakthrough for those of us who fine-tune LLMs on limited hardware budgets.

I was curious about the choice of FSDP over DeepSpeed. I have been using Axolotl for fine-tuning, and FSDP has been broken there, whilst DeepSpeed is rock solid. Why FSDP over DeepSpeed jph00?




DeepSpeed has more features than FSDP, but it's much more complex to hack on -- FSDP is written directly in python using calls to the PyTorch library, whereas DeepSpeed is 20% C++ and 10% CUDA (according to the GitHub stats).

We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".

(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)


Good news -- axolotl has just merged support for FSDP/QLoRA training, thanks to a rapid collaboration between the axolotl and Answr.AI teams!


There's a long gh issues thread with technium struggling with Mistral 7 and loss spikes. Easy to find googling.


Yes I'm familiar with Teknium's Mistral issues, which were resolved some time ago. IIRC they weren't related to FSDP.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: