Interesting discussion! While RAG is powerful for document retrieval, applying it to code repositories presents unique challenges that go beyond traditional RAG implementations. I've been working on a universal repository knowledge graph system, and found that the real complexity lies in handling cross-language semantic understanding and maintaining relationship context across different repo structures (mono/poly).
Has anyone successfully implemented a language-agnostic approach that can:
1. Capture implicit code relationships without heavy LLM dependency?
2. Scale efficiently for large monorepos while preserving fine-grained semantic links?
3. Handle cross-module dependencies and version evolution?
Current solutions like AST-based analysis + traditional embeddings seem to miss crucial semantic contexts. Curious about others' experiences with hybrid approaches combining static analysis and lightweight ML models.
Has anyone successfully implemented a language-agnostic approach that can: 1. Capture implicit code relationships without heavy LLM dependency? 2. Scale efficiently for large monorepos while preserving fine-grained semantic links? 3. Handle cross-module dependencies and version evolution?
Current solutions like AST-based analysis + traditional embeddings seem to miss crucial semantic contexts. Curious about others' experiences with hybrid approaches combining static analysis and lightweight ML models.