Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FYI, I work at Vectara and can answer any questions.

For us, we treat hallucinations as the ability to accurately respond in an "open book" format for retrieval augmented generation (RAG) applications specifically. That is, given a set of information retrieved (X), does the LLM-produced summary:

1. Include any "real" information not contained in X? If "yes," it's a hallucination, even if that information is general knowledge. We see this as an important way to classify hallucinations in a RAG+summary context because enterprises have told us they don't want the LLMs "reading between the lines" to infer things. To pick an absurd/extreme case to show a point, the case of a genetic research firm, say, using CRISPR and finding they can create a purple zebra, if the retrieval system in the RAG bits says "zebras can be purple" due to their latest research, we don't want the LLM to override that knowledge with its knowledge that zebras are only ever black/white/brown. We'd treat that as a hallucination.

2. On the extreme opposite end, an easy way to avoid hallucinating would be for the LLM to say "I don't know" for everything thereby avoiding hallucinating by avoiding answering all questions. That has other obvious negative effects, so we also evaluate LLMs for their ability to answer.

We look at the factual consistency, answer rate, summary length, and some other metrics internally to focus prompt engineering, model selection, and model training: https://github.com/vectara/hallucination-leaderboard



Great repo, glad y'all are looking into this. So am I reading correctly that Intel has a 7B model that doesn't remarkably well with not hallucinating??


That's correct. We've got a blog that talks a bit about it: https://vectara.com/blog/do-smaller-models-hallucinate-more/

Some people are surprised by smaller models having the ability to outperform bigger models, but it's something we've been able to exploit: if you fine tune a small model for a specific task (e.g. reduced hallucinations on a summarization task) as Intel has done, you can achieve great performance economically.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: