On-Prem PoC Lab
Fully local RAG stack running on a single host.
ExperimentalSelf-hosted
No external cloud
All inference and storage are local.
Data stays on-prem
Privacy-first, controlled networking.
Architecture Overview
Everything runs on one on-prem machine.
Runtime
Ollama for LLM + embeddings
Qdrant for vector search
Postgres for metadata
Networking
Air-gapped friendly
Local-only service calls
No third-party egress
Host Machine
Gigabyte G5 KC Laptop
GPU
NVIDIA GeForce RTX 3060
6GB VRAM
CPU
Intel Core i5-10500H
6 cores / 12 threads
RAM
32GB
DDR4
Storage
2TB
SSD
Components
Core services in this PoC.
Qdrant
Vector store
Postgres
Metadata store
Ollama (LLM)
Generative models
Ollama (Embed)
Embedding model
Workflows
Lifecycle of documents and answers.
Index
Chunk documents, generate embeddings, upsert to Qdrant.
- Chunk documents
- Generate embeddings
- Upsert to Qdrant
Retrieve
Semantic search with reranking-ready retrieval.
- Semantic search
- Reranking-ready
- Fast iteration
RAG
Answer generation grounded in retrieved context.
- Retrieve context
- Generate answer
- Apply guardrails
Try the PoC
Open the live RAG playground.