← Back to work
Agentic RAG over an Enterprise Documentation Corpus
An agentic retrieval system over a large enterprise technical-documentation corpus that measurably cut specification-research time — built self-initiated, now adopted by colleagues.
- ↓ research time
- measured impact measured specific internal figure available on request
- graph + vector
- hybrid retrieval measured
- read-only
- least-privilege bridge measured
Context
Engineers spend an enormous amount of time locating the right passage across a sprawling technical-documentation corpus. I built an agentic retrieval system to collapse that search from a manual hunt into a grounded answer with citations.
What I built
- Hybrid retrieval — a Qdrant vector store over BAAI/bge embeddings combined with a homegrown knowledge-graph RAG layer (entity and co-occurrence graph with community detection) so the system reasons over how documents relate, not just which are individually similar.
- CUDA-accelerated OCR ingestion to bring scanned and image-based documents into the corpus cleanly.
- A least-privilege, read-only tool bridge (MCP) as the enabling control — the agent can retrieve and cite, but never write, and access is scoped and fail-closed by default.
Measured result
Specification-research time dropped materially — a self-measured, work-validated result. The specific figure is available on request. The capability moved beyond me: colleagues now use it.
The corpus itself and the systems it lives in are deliberately unnamed here. This entry is capability-focused by design.