RAGdb: A serverless, single-file, embedded db that runs anywhere

AbkMystery · 2025-11-20T18:43:45 1763664225

I've been frustrated by the complexity of modern RAG stacks. To run a simple document search, you usually need Docker, Pinecone/Milvus, an Embedding Model, and heavy dependencies like LangChain or Torch. I wanted an architecture that was truly portable. Introducing RAGdb (v1.0.6) It’s an embedded, multimodal knowledge graph that lives entirely inside a single SQLite file. The Novelty: Instead of heavy embeddings, it uses a Hybrid Search Engine (TF-IDF Vectorization + Exact Substring Boosting) written in pure NumPy. This allows it to run on edge devices, CI/CD pipelines, or inside strict corporate environments where you can't spin up servers. Key Features: Zero Heavy Dependencies: The core is <30MB. Portable Container: The .ragdb file contains the vectors, the metadata, the extracted text, and the search index. You can email the database to a colleague. SOTA OCR: Optional support for ONNX-based OCR if you need to index images. Incremental Ingestion: It hashes files and only re-processes changed documents. Installation: pip install ragdb Code & Architecture: https://github.com/abkmystery/ragdb I’m looking for feedback on the retrieval architecture. I believe this "Single-File" approach is the missing link for local-first AI.