Hey HN fam,
We’ve seen developers spend a lot of time implementing advanced RAG techniques from scratch.
While these techniques are essential for improving performance, their implementation requires a lot of effort and testing!
To help with this process, our team (Athina AI) has released Open-Source Advanced RAG Cookbooks.
This is a collection of ready-to-run Google Colab notebooks featuring the most commonly implemented techniques.
Please show us some love by starring the repo if you find this useful!
Is there a tool/technique to achieve this? I’m aware that I can use LLMs to do so, or read all pages and find identical text (header/footer), but I want to keep the page number as part of the metadata to ensure better citation on retrieval.