On-Prem PoC Lab
Fully local RAG stack running on a single host.
ExperimentalSelf-hosted

No external cloud

All inference and storage are local.

Data stays on-prem

Privacy-first, controlled networking.

Architecture Overview
Everything runs on one on-prem machine.

Runtime

Ollama for LLM + embeddings

Qdrant for vector search

Postgres for metadata

Networking

Air-gapped friendly

Local-only service calls

No third-party egress

Host Machine
Gigabyte G5 KC Laptop

GPU

NVIDIA GeForce RTX 3060

6GB VRAM

CPU

Intel Core i5-10500H

6 cores / 12 threads

RAM

32GB

DDR4

Storage

2TB

SSD

Components
Core services in this PoC.

Qdrant

Vector store

Available

Postgres

Metadata store

Available

Ollama (LLM)

Generative models

Available

Ollama (Embed)

Embedding model

Available
Workflows
Lifecycle of documents and answers.

Index

Chunk documents, generate embeddings, upsert to Qdrant.

  • Chunk documents
  • Generate embeddings
  • Upsert to Qdrant

Retrieve

Semantic search with reranking-ready retrieval.

  • Semantic search
  • Reranking-ready
  • Fast iteration

RAG

Answer generation grounded in retrieved context.

  • Retrieve context
  • Generate answer
  • Apply guardrails
Try the PoC
Open the live RAG playground.