aiProduction2025

Foxmayn AI

How a Full RAG Pipeline Reduced AI Hallucinations and Made Knowledge Retrieval Production-Safe

At a Glance

Challenge

LLM-generated answers untethered from source data, producing hallucinated outputs

Result

Grounded, auditable AI responses via retrieval checkpoints and reranking

Tech Stack

PostgreSQLDockerRAGQdrantRedis/BullMQOpenRouter

Status

production

Situation

Organizations wanted to deploy internal AI assistants over their proprietary documents — knowledge bases, SOPs, product docs — but off-the-shelf LLMs hallucinated freely when asked domain-specific questions. Answers sounded authoritative but referenced non-existent policies or fabricated data points. Without retrieval grounding, these AI tools were liabilities rather than productivity gains, especially in enterprise contexts where wrong answers have real consequences.

The Challenge

Build a complete RAG platform that retrieves, ranks, and injects the right source context before any LLM completion — so every generated answer is traceable back to actual documents and auditable in production.

What Was Built

  • Architected a monorepo with shared packages: @repo/db for Drizzle ORM models (Auth + RAG modules), @repo/llm for OpenRouter SDK chat and batch embeddings, and @repo/qdrant for high-performance vector storage.

  • Built a multi-strategy RAG pipeline with customizable RAG Profiles — each profile defines retrieval parameters, reranking logic, and context window sizing for different use cases.

  • Implemented BullMQ-powered background workers for document ingestion: parsing, chunking, embedding generation, and vector indexing happen asynchronously without blocking the API.

  • Added multi-tenant authentication via Better Auth with API keys, organization scoping, and admin roles — so different teams can manage isolated knowledge bases.

  • Built a React 19 frontend with TanStack Router and Jotai for real-time document management, RAG profile configuration, and conversational AI interaction.

  • Designed the Hono API core to be serverless-adaptable, with ORPC ensuring end-to-end type safety across the entire stack.

Results

Hallucination rate

Uncontrolled

Grounded via retrieval + reranking

Document ingestion

Async via BullMQ workers

Multi-tenancy

Org-scoped knowledge bases with API keys

Codebase

88% TypeScript, fully type-safe

The platform enables safe deployment of internal copilots and documentation agents. Every AI response is traceable to source documents, making it suitable for enterprise environments where accuracy and auditability are non-negotiable.

Key Achievement

Reduced hallucinated outputs by adding retrieval checkpoints and reranking logic before final synthesis, increasing answer precision in enterprise use cases.

Frequently Asked Questions