How the RAG pipeline works

Technical overview of the consulting chat assistant on this site.

#1

User question

LiveView chat form

#2

Embed query

Vector embedding via LLM API

#3

Retrieve chunks

pgvector similarity search

#4

Augment prompt

FAQ + portfolio context

#5

LLM answer

Gemini / OpenAI completion

#6

Rate limit & log

ETS token bucket + audit

Knowledge sources

  • Static FAQ chunks in Content.knowledge_chunks/0
  • Project case studies and about narrative
  • Re-indexed via mix rag.reindex into pgvector

Production concerns

  • Hourly rate limits per client IP
  • Bilingual prompts keyed by locale
  • Telegram webhook for async messaging
View demo repo

Inquire about my experience

Portfolio assistant — architecture, projects, and consulting.

See how RAG retrieval works