Hacker News new | past | comments | ask | show | jobs | submit login

DeepSpeed has more features than FSDP, but it's much more complex to hack on -- FSDP is written directly in python using calls to the PyTorch library, whereas DeepSpeed is 20% C++ and 10% CUDA (according to the GitHub stats).

We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".

(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)




Good news -- axolotl has just merged support for FSDP/QLoRA training, thanks to a rapid collaboration between the axolotl and Answr.AI teams!


There's a long gh issues thread with technium struggling with Mistral 7 and loss spikes. Easy to find googling.


Yes I'm familiar with Teknium's Mistral issues, which were resolved some time ago. IIRC they weren't related to FSDP.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: