This is a fantastic breakthrough for those of us who fine-tune LLMs on limited h...

jph00 · on March 8, 2024

DeepSpeed has more features than FSDP, but it's much more complex to hack on -- FSDP is written directly in python using calls to the PyTorch library, whereas DeepSpeed is 20% C++ and 10% CUDA (according to the GitHub stats).

We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".

(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)

jph00 · on March 8, 2024

Good news -- axolotl has just merged support for FSDP/QLoRA training, thanks to a rapid collaboration between the axolotl and Answr.AI teams!

bradfox2 · on March 8, 2024

There's a long gh issues thread with technium struggling with Mistral 7 and loss spikes. Easy to find googling.

jph00 · on March 8, 2024

Yes I'm familiar with Teknium's Mistral issues, which were resolved some time ago. IIRC they weren't related to FSDP.