Foxmayn AI
How a Full RAG Pipeline Reduced AI Hallucinations and Made Knowledge Retrieval Production-Safe
At a Glance
Challenge
LLM-generated answers untethered from source data, producing hallucinated outputs
Result
Grounded, auditable AI responses via retrieval checkpoints and reranking
Tech Stack
Status
production
Situation
Organizations wanted to deploy internal AI assistants over their proprietary documents — knowledge bases, SOPs, product docs — but off-the-shelf LLMs hallucinated freely when asked domain-specific questions. Answers sounded authoritative but referenced non-existent policies or fabricated data points. Without retrieval grounding, these AI tools were liabilities rather than productivity gains, especially in enterprise contexts where wrong answers have real consequences.
The Challenge
Build a complete RAG platform that retrieves, ranks, and injects the right source context before any LLM completion — so every generated answer is traceable back to actual documents and auditable in production.
What Was Built
Architected a monorepo with shared packages: @repo/db for Drizzle ORM models (Auth + RAG modules), @repo/llm for OpenRouter SDK chat and batch embeddings, and @repo/qdrant for high-performance vector storage.
Built a multi-strategy RAG pipeline with customizable RAG Profiles — each profile defines retrieval parameters, reranking logic, and context window sizing for different use cases.
Implemented BullMQ-powered background workers for document ingestion: parsing, chunking, embedding generation, and vector indexing happen asynchronously without blocking the API.
Added multi-tenant authentication via Better Auth with API keys, organization scoping, and admin roles — so different teams can manage isolated knowledge bases.
Built a React 19 frontend with TanStack Router and Jotai for real-time document management, RAG profile configuration, and conversational AI interaction.
Designed the Hono API core to be serverless-adaptable, with ORPC ensuring end-to-end type safety across the entire stack.
Results
Hallucination rate
Uncontrolled
Grounded via retrieval + reranking
Document ingestion
Async via BullMQ workers
Multi-tenancy
Org-scoped knowledge bases with API keys
Codebase
88% TypeScript, fully type-safe
The platform enables safe deployment of internal copilots and documentation agents. Every AI response is traceable to source documents, making it suitable for enterprise environments where accuracy and auditability are non-negotiable.
Key Achievement
Reduced hallucinated outputs by adding retrieval checkpoints and reranking logic before final synthesis, increasing answer precision in enterprise use cases.