> GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible
is there a more canonical blogpost or link to learn more about the technical decisions here?
I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization. And pre quantized gguf is a much smaller download than raw fp16 for conversion.
is there a more canonical blogpost or link to learn more about the technical decisions here?