DeepSpeed has more features than FSDP, but it's much more complex to hack on -- FSDP is written directly in python using calls to the PyTorch library, whereas DeepSpeed is 20% C++ and 10% CUDA (according to the GitHub stats).
We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".
(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)
We've found that FSDP works just as well for our needs, and we appreciated the increased "hackability".
(Axolotl is terrific BTW. I hadn't heard of problems with it with FSDP before -- I'll see if that's something we can help with.)