A knowledge base that zep really understands.
Training is where everything that should later be answered comes together. You connect sources, check how they were split up and correct things directly in the chat when an answer is off. Re-indexing runs automatically in the background — you keep working without waiting.

Three steps, one clear workflow.
No plugin, no wizard with required fields. You work at your own pace and jump between the steps — everything syncs back to the dashboard automatically.
- 1. Sources
Notion, PDFs, websites, ZIP — connect it all.
Connect workspaces via OAuth, upload file bundles or enter URLs. zep checks format, language and size and creates an indexing plan — you do not have to configure a chunker.
- 2. Verification
Look through chunks and embeddings.
For each source you see the generated chunks, their token lengths and which passages show up most often as answers. Each chunk has a preview snippet — no black-box vector store.
- 3. Correction
Wrong answer? Set it right in the chat.
When the bot gets it wrong, you correct it directly in the training chat. zep highlights the affected sources, suggests a better source or asks you for additional material — all without code.

Four steps from doc to answer.
When you add a new source, the following pipeline kicks off — all tenant-isolated and with an audit trail:
- Parsing & cleaning. We remove footers, cookie banners, navigation and boilerplate. For PDFs we detect columns, tables and code blocks separately. Discord and forum exports are grouped into threads.
- Chunking & embeddings. Adaptive chunk sizes depending on content (code differs from prose). Embeddings via a current multilingual encoder — we pick the model automatically based on the language and domain of your source.
- Re-ranking. On the answer side, a two-stage retriever runs: first vector search, then a lightweight re-ranker model. This avoids finding only similar but unhelpful snippets.
- Source verification. Every answer contains references. When you click an answer in the training chat, you jump to the exact spot in the source. You can blacklist wrong sources — zep never uses them again.
Typical use cases we see every day.
Notion workspace as FAQ
Connect via OAuth, select databases — re-indexing happens automatically with every change. Update the shipping FAQ and the bot knows immediately.
Online course from scripts
Upload PDFs, explain the module order to zep. Learners ask, the bot cites with module number and chapter reference.
Coding knowledge from your own codebase
Upload a ZIP of your resources, zep indexes files, functions and patterns. The coding bot knows your stack and answers in diff format.
Club wiki from forum + Discord
Forum export plus Discord JSON, threads are grouped, frequent questions detected automatically — a new member bot finished in one session.
Multi-language support
Sources can be mixed. zep detects the language of each answer and uses the matching embedding model — service bots in DE/EN/FR without separate setups.
Live correction by the team
Several editor roles can work in the training chat in parallel. Corrections are versioned, so you see who changed what and when.
FAQ.
How quickly is a new source ready to use?
Small texts (under 50 pages) are indexed in 5-15 seconds. Medium Notion workspaces with a few hundred pages take 30-90 seconds. Larger PDFs (200+ pages) or whole websites keep running in the background — the bot answers from the previous version in the meantime and switches automatically as soon as the new version is ready.
What if the bot gives a wrong answer?
You mark the answer as wrong in the training chat. zep shows you the underlying snippets, and you can blacklist the source, re-chunk it or add a better one. Corrections take effect immediately — no re-upload, no re-index of the whole knowledge base.
Where are my embeddings stored?
Tenant-isolated in our vector store, hosted in the EU (Hetzner Falkenstein). No cross-tenant indexing, no reuse of your embeddings for other bots. On the Pro plan you can also export embeddings and keep them on your own infrastructure.
Which LLM models do you use for the answer?
By default a performance-optimized all-round model (German-speaking region). On the Pro plan you can choose per bot between reasoning models, speed models and long-context models. Custom API keys are possible on the Business plan.
Build a knowledge base without writing YAML.
On the Free plan you can try the complete Training experience. Three sources, one bot persona — and you can see whether the workflow fits your material.
