Technically transparent.
Emiri is not a "magic box". It's a concrete architecture: crawler → embeddings → RAG → response. We show you how it works so you know exactly what to expect.
Crawler builds the knowledge base
You provide a URL or list of URLs. Playwright visits every subpage, extracts the HTML content, removes noise, and converts to Markdown. Large pages are split into chunks, small ones merged.
Supported: WordPress, Webflow, Next.js, Wix, static HTML, and even paginated sites. Crawling a typical business site (50–200 subpages) takes 2–5 minutes.
Extraction and embeddings
Claude Opus analyzes each chunk and extracts facts in structured format: prices, hours, policies, product descriptions. Then OpenAI text-embedding-3-small converts them into vectors.
Vectors are stored in PostgreSQL with the pgvector extension. Every fact is linked to its source URL and timestamp — Emiri always knows where the information came from.
Retrieval Augmented Generation (RAG)
When a customer writes, Emiri vectorizes the question, runs a similarity search in the knowledge base (top-K chunks) and builds a prompt with context. Then Claude Haiku generates the response.
Fallback: if the knowledge base is insufficient, Emiri apologizes and asks for contact — it never makes up facts. Optionally it can collect the customer's email before forwarding the question.
Bot protection — 4 layers
Protect your customers and yourself. Every conversation passes through: domain verification (whitelist), rate limiting (Redis), content filter (Claude moderation), and a prompt injection trap.
Domain whitelist: the widget only responds from pages you've approved. Rate limiting: 60 messages/minute per IP. Prompt injection: every input is sanitized before being sent to the model.
Privacy and security
Data in EU
Servers on Hetzner CX32 in Germany. No data leaves the European Union.
GDPR ready
Export or delete all customer data from the dashboard. DPA available on request.
RLS at database level
Row Level Security in PostgreSQL. Every tenant sees only their own data — even if something goes wrong in the application.
API keys never leave
Anthropic and OpenAI keys are stored server-side. The widget communicates through our proxy, not directly with the API.