OngoingAI/MLBackend

Talos

Graduation project (team): a multi-user RAG platform for chatting with your own uploaded documents, with answers streamed back and cited inline. I owned the ingestion and retrieval core.

FastAPIRAGMilvusMinIORedis/ARQ

The Problem

A team's real knowledge lives in its own documents, so a general chatbot is useless for it. People need answers grounded in their own files, with a pointer to where each answer came from.

Architecture & Approach

Talos is a team project; I owned the ingestion and retrieval core. Files upload to MinIO, and an async ARQ worker ingests them through a race-safe processing state machine. Retrieval runs in two stages: a dense plus BM25 hybrid fused with reciprocal rank fusion, then a cross-encoder reranker, streamed back over SSE with inline citations. I also built Google Drive import and a statistical evaluation harness to measure retrieval quality.

Key Technical Decisions

Milvus for vector search, MinIO for files

A dedicated vector store handled the hybrid dense and sparse retrieval the project needed at scale, while MinIO held the raw uploads separately so storage and search could each be reasoned about on their own.

Async ingestion with a race-safe state machine

Uploads process in the background through an ARQ worker. A processing state machine keeps concurrent uploads and retries from corrupting a document's state, so a half-ingested file can never be queried as if it were ready.

Two-stage retrieval with reranking

Hybrid retrieval with reciprocal rank fusion casts a wide net, then a cross-encoder reranker sharpens the top results before they reach the model. The evaluation harness is what told me the reranker was worth its latency.

Results

In progress: teams chat with their own documents and get answers streamed back with inline citations, with a statistical harness measuring retrieval quality.

Zaylon AIPrevious ContextIQ: Hybrid-Retrieval RAGNext