Why is there so much buzz about RAG? Isn’t it basically a traditional search (ei...

ru552 · on May 22, 2024

*What makes it so useful?

One example is in finance, you have a lot of 45 page PDFs laying around and you're pretty sure one of them has the Reg, or info you need. You aren't sure which so you open them one by one and do a search for a word, then jump through a bunch of those results and decide it's not this PDF. You do that till you find the "one". There are a non trivial amount of Executive level jobs that pretty much do this for half of their work week.

RAG purports to let you search one time.

jumploops · on May 22, 2024

This is true for traditional full-text document search as well.

When most people mention RAG, they’re using a vector store to surface results that are semantically similar to the user’s query (the retrieval part). They then pass these results to an LLM for summary (the generation part).

In practice, the problems with RAG are similar to the traditional problems of search: indices, latency, and correctness.

traverseda · on May 23, 2024

* indices

Doesn't vector search solve a lot of these problems? These AI vector spaces seem like a really easy win here, and they're reasonably lightweight compared to a full LLM.

* Latency

I don't want to call this a solved problem, but it is one that scales horizontally very easily and that a lot of existing tech is able to take advantage of easily

* Correctness

They LLM tooling doesn't necessarily need to make things worse here, although poorly designed it definitely could. AI can do a first pass at fact checking, even though I suspect we'll need humans in the loop for a long while.

---

I think that vector-space at least bring some big advantages for indexing here, being able to search for more abstract concepts.

jumploops · on May 23, 2024

* indices

> Doesn't vector search solve a lot of these problems? These AI vector spaces seem like a really easy win here, and they're reasonably lightweight compared to a full LLM.

Yes and no. What do you vectorize? The whole document? The whole page? The whole paragraph? How you split your data, and then index into it, is still problem-space dependent.

* Latency

> I don't want to call this a solved problem, but it is one that scales horizontally very easily and that a lot of existing tech is able to take advantage of easily

Any time you add steps, you increase latency. This is similar to traditional search where you e.g. need to fetch relevant data but scored based on some user-specific metric. Every lookup adds latency. Same is true for RAG.

* Correctness

> They LLM tooling doesn't necessarily need to make things worse here, although poorly designed it definitely could. AI can do a first pass at fact checking, even though I suspect we'll need humans in the loop for a long while.

Again, this comes back to how you index your data and what results are returned; similar to traditional search. This is problem-space dependent. Plus, we haven't solved LLM hallucinations -- there are strategies to mitigate it, but not clearcut solution.

cpursley · on May 22, 2024

Any tips on effectively getting financial data out of PDFs into a RAG system (especially data contained in tables)? And locally, not via proprietary cloud PDF parsing thingy. That's the current nut I'm trying to crack.

rawsh · on May 22, 2024

https://github.com/VikParuchuri/marker is solid, but slow and needs gpu(s) to be practical

serjester · on May 22, 2024

You might find my library useful - https://github.com/Filimoa/open-parse

TimeBearingDown · on May 22, 2024

I’m probably missing the point: doesn’t https://pdfgrep.org solve this problem?

soneca · on May 22, 2024

What if they don’t remember the regulation code?

”What is the regulation that covers M&A of companies in the pharmaceutical industry?”

It seems much easier to get that response from a LLM than searching words with grep.

rawsh · on May 22, 2024

I built a web version with WASM at https://pdfgrep.com a few years ago in case it’s helpful to anyone

ww520 · on May 22, 2024

RAG is not just traditional search. It's any augmented data that can be fed to the LLM.

The most useful and verifiable RAG setup I've seen is hooking up a RDBMS and LLM, and asking querying questions in English to retrieve the table data. You can do it in several steps.

1. Extract the metadata of the tables, e.g. table names, columns of each table, related columns of the tables, indexed columns, etc. This is your RAG data.

2. Build the RAG context with the metadata, i.e. listing each table, its columns, relationships, etc.

3. Feed the RAG context and the user's querying questions to the LLM. Tell LLM to generate a SQL for the question given the RAG context.

4. Run the SQL query on the database.

It's uncannily good. And it can be easily verified given the SQL.

Xenoamorphous · on May 23, 2024

Is that RAG though? Perhaps I’m missing something but I don’t see where the retrieval step is. Extracting the metadata and passing it to the LLM in the context sounds like a non-RAG LLM application. Or you’re saying that the DB schema is so big and/or the LLM context too small so not all the metadata can be passed in one go and there’s some search step to prune the number of tables?

ww520 · on May 24, 2024

RAG is augmenting the llm generation with external data. How the external data is retrieved is irrelevant. A search is not necessary.

Of course you can do a search on the related tables with regard to the question to narrow down the table list to help the llm to come up with the correct answer.

simonw · on May 22, 2024

That's exactly what it is, and it's useful because when it works it means you can ask a question and get an answer to your question, rather then having to read the documents and then answer that question yourself.

lukev · on May 22, 2024

It also lets a language model answer questions while citing a source, something it fundamentally cannot do on its own.

Everyone talks about "reducing hallucinations" but from a system perspective, everything a LLM emits is equally hallucinated.

Putting the relevant data in context gets around this and provides actual provenance of information, something that is absolutely required for real "knowledge" and which we often take for granted in practice.

Of course, the ability to do so is entirely reliant on the retrieval's search quality. Tradeoffs abound. But with enough clever tricks it does seem possible to take advantage of both the LLMs broad but unsubstantiated content, and specific fact claims.

esafak · on May 22, 2024

You just described RAG: augmenting an LLM with external memory. Perhaps the part you are skipping is that the LLM synthesizes the retrieved information with its own knowledge into one coherent whole.

It's abstractive- (new) versus extractive (old) summarization.

What makes it useful is that it does the work of synthesizing the information. Imagine you ask a question that involves bits and pieces of numerous articles. In the past you had to read them all and mentally synthesize them.

thefourthchime · on May 22, 2024

I've used something like RAG for finding solutions to questions in slack. I take the question, break it into searchable terms, search slack and get a haystack of results. Then I use a LLM to figure out if the results are relivent, finally at the end i take the top 10 results and summarize them and link back to the slack discussion.

dragonwriter · on May 22, 2024

The intent is usually not to simply regurgitate the results, but to augment the prompt with them to enable a better, focussed answer to the user question than either search or an LLM alone would provide.

ingvar77 · on May 25, 2024

The buzz is because it is really one of a most widely used new AI things, easily applicable to millions of businesses. Everyone has some large storage of unstructured data they want to search through and ask questions about - legal docs, candidates, books, articles.. At the same time it’s relatively straightforward to implement so it’s already tens or hundreds of startups / products pushing RAG agenda (all these “it seems easy but it’s not!”). Hopefully soon it will be added as a built in LLM feature - ability to upload own data for LLM to use. It also made many more developers aware of embeddings and vector search, which is great.

oriel · on May 24, 2024

I'm still building my understanding in this space, but so far I've seen its value when using chains and graphs of agents.

The overall system suggests degrees of freedom in search that might not have been available. This is by having a knowledge store in a format (vectors) primed for search, then having it be accessible in full or in partitions, by agents, working on one or more concurrent flows around a query.

I also see value in having a full circuit of native-format components that can be pieced together to make higher order constructs. Agents is just the most recent one to emerge and i can easily see a mixture of fine tuned experts alongside stores of relevant material.

/2c

jxnlco · on May 22, 2024

nothing, all i really say is 'add monitoring, do topic clustering' which is how i did 'search' and 'recommendation' systems

1) are there filters we need to build 2) do we have inventory

nutanc · on May 23, 2024

It's useful because you get to increase your startup valuation if you use "RAG".

rldjbpin · on May 23, 2024

to me it feels like people are waking up to the fact that with current access to sw/hw, you can now make your own search engine and answering tool based on the data you own.