This is a fantastic breakthrough for those of us who fine-tune LLMs on limited hardware budgets.
I was curious about the choice of FSDP over DeepSpeed. I have been using Axolotl for fine-tuning, and FSDP has been broken there, whilst DeepSpeed is rock solid. Why FSDP over DeepSpeed jph00?
DeepSpeed has more features than FSDP, but it's much more complex to hack on -- FSDP is written directly in python using calls to the PyTorch library, whereas DeepSpeed is 20% C++ and 10% CUDA (according to the GitHub stats).
We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".
(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)
I was curious about the choice of FSDP over DeepSpeed. I have been using Axolotl for fine-tuning, and FSDP has been broken there, whilst DeepSpeed is rock solid. Why FSDP over DeepSpeed jph00?