> GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It ...

brucethemoose2 · on Sept 17, 2023

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...

It is (IMO) a necessary and good change.

I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization. And pre quantized gguf is a much smaller download than raw fp16 for conversion.

swyx · on Sept 18, 2023

thanks very much!