Interesting, the current DB dump is not for everyone, but if they also offer an LLM trained on Wikipedia data that answers and provide actual valida citation , please do. (not sure if duckduckgo stop offering that)
AFAIK a good way to provide better answers and avoid hallucinations would be to compute embeddings for all sections of text in Wikipedia and then when a user asks a question create an embedding from that question.
Use it to find the X closest embeddings to the question being posed, lookup their original articles, feed them all into context of an LLM and then ask it to answer the question based on that context (alone).
Contexts are becomming quite large so it's possible to put a lot of stuff in there. LLMs answering questions based on a giben text seem to be more reliable than those that are simply trained/fine tuned on some library of texts. p
The approach described above is what is commonly referred to as RAG[0]. I am not aware of someone having used it on Wikipedia but, from experience and while it helps, it does not eliminate all hallucinations.
I've attempted it as well a year ago (mostly for fun) for our project.
Yes, it can still hallucinate. But I would say it's much much much better in this regard than fine-tuning.
When I did it, the main issue was that our documentation wasn't exhaustive enough. There are plenty of things that are clear to our users (other teams in the company), but not at all clear to the LLM from the few text excerpts it receives. Also, our context was quite limited back then to just a few paragraphs of text.
You can do this with the copilot chat feature in MS Edge. I just tried to ask it to use only wikipedia and it gave me four references, two of which were wiki. So at least you can get it to spit out references with a bias