The Llama name is pretty confusing at this point. LLaMA was the model Facebook r... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

simonw on Oct 16, 2023 | parent | context | favorite | on: GPU Embedding with GGML

The Llama name is pretty confusing at this point.

LLaMA was the model Facebook released under a non-commercial license back in February which was the first really capable openly available model. It drove a huge wave of research, and various projects were named after it (llama.cpp for example).

Llama 2 came out in July and allowed commercial usage.

But... there are increasing number of models now that aren't actually related to Llama at all. Projects like llama.cpp and Ollama can often be used to run those too.

So "Llama" no longer reliably means "related to Facebook's LLaMA architecture".

vanillax on Oct 16, 2023 [–]

- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization

what is autoGTPTQ and exllama, what do it mean it only works with AutoGPTQ and exllama? Are those like TensorFlow Frameworks?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact