Last time we only released the quantized GGUFs. Only llama.cpp users could use it (+ Ollama, but without vision).
Now, we released the unquantized checkpoints, so anyone can quantize themselves and use in their favorite tools, including Ollama with vision, MLX, LM Studio, etc. MLX folks also found that the model worked decently with 3 bits compared to naive 3-bit, so by releasing the unquantized checkpoints we allow further experimentation and research.
TL;DR. One was a release in a specific format/tool, we followed-up with a full release of artifacts that enable the community to do much more.
Nat Friedman leads the project. He was GitHub's CEO, among many other things. He funds many interesting ambitious projects, such as the Vesuvius Challenge (https://scrollprize.org/)
- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key.
- Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval.
- They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu
Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?
You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.
tldr: old benchmarks saturated, methodology was liable to a lot of subtle biases. as she mentions on the pod, they're already working on leaderboard v3.
I like that they say how the model was trained for 1.3 hours on 4 nodes of 8 x H100s. By my rough calculation, that should probably have cost around $100 or so. (At $2 per hour, x 8 gpus x 4 nodes). Not free, but pretty cheap in the scheme of things. At least, once you know what you're doing.