This might be the most interesting constructive approach in "Open Source" LLMs I've seen. Grounded, reasonable and inviting to replicate! I wish academia took that as a standard.
I doubt it, Jeremy’s been walking the walk for quite a while now, when it comes to opening up access to AI, especially with his excellent, free fast.ai course, it seems pretty clear that his primary motivations are in helping others. (If you’re in this thread, Jeremy, thanks for fast.ai, it helped me immensely in getting started in training models).
For the most part this post was easy to read, and I could feel the collective excitement of the team. I came away feeling like I'd learned something and ready to try it myself. The only time the post gets a little fuzzy is "...store the quantized parameters in a selectable data type, where that storage data type is the same data type as the “computation type” of the mode". I assume "selectable datatype" is the float size of the quantization?
We've got a technical post with all the juicy details coming next week. But that bit refers to packing the 4-bit weights into a type FSDP is happy to shard (like float16 or float32) which matches the other non-quantized bits of the model. This way FSDP will happily wrap and shard all the parameters as if they were just normal floats.
Great job!