How the RAG pipeline works
Technical overview of the consulting chat assistant on this site.
#1
User question
LiveView chat form
#2
Embed query
Vector embedding via LLM API
#3
Retrieve chunks
pgvector similarity search
#4
Augment prompt
FAQ + portfolio context
#5
LLM answer
Gemini / OpenAI completion
#6
Rate limit & log
ETS token bucket + audit
Knowledge sources
- • Static FAQ chunks in Content.knowledge_chunks/0
- • Project case studies and about narrative
- • Re-indexed via mix rag.reindex into pgvector
Production concerns
- • Hourly rate limits per client IP
- • Bilingual prompts keyed by locale
- • Telegram webhook for async messaging