Reranker explained - when the second sorter pays off

A reranker is the second sorting stage in your Zeptix bot. The first stage quickly searches for many potential hits. The reranker takes another close look at these hits and re-sorts them by true relevance. The result: more precise answers without giving up recall. But it costs latency and sometimes money.

TL;DR

Reranker = second sorting round after the search.
BGE Base is free, runs locally, good for the Pro plan.
Cohere v3 is more precise, costs API fees, available from the Pro plan.
Activating it pays off with larger knowledge bases or tricky questions.
Latency rises by 150 to 300 ms per answer.

What does a reranker actually do?

Imagine you ask your bot: "Which helmet is suitable for long tours?" The search node finds 16 snippets that somehow relate to helmets and tours. These could be:

A snippet about helmet materials.
A snippet about tour planning.
A snippet about head shapes for sports helmets.
A snippet about helmet comfort on long distances.

The order of these hits decides what your language model ultimately phrases. The reranker reads each hit together with the question and gives each one a true relevance grade. Suddenly "helmet comfort on long distances" is at the top - and the answer becomes much better.

Why the first sorting alone is not enough

The fast pre-search is based on vector similarity. It is good at finding thematically matching content, but not good at who really is the best answer. Example: for the question "What does the Pro plan cost?" the pre-search finds ten snippets that are about pricing. The reranker recognizes that only three of them specifically concern the Pro plan, and sorts those to the top.

Engines compared

Engine	Plan	Latency	Cost	Quality
Off	All	0 ms	0 EUR	Basic
BGE Base	Pro+	+150 ms	0 EUR (local)	Good
Cohere v3	Pro+	+250 ms	API fees	Very good

BGE Base runs on Zeptix servers and costs you nothing extra. Cohere v3 is the more accurate option, but it goes to an external provider. Both are only available from the Pro plan onward - on Free and Starter only "Off" is permitted. The Visualizer shows impermissible options grayed out.

When does the reranker pay off?

Reranker on, when:

Your knowledge base has over 100 documents or more than 500 snippets.
End users often ask with synonyms or colloquially.
Answers sometimes use the right topic but the wrong aspect.
You want to push Top-K up (12 or 16) without losing precision.

Reranker off is enough, when:

Your knowledge base is small (under 50 documents).
End users ask very similarly to how your content is phrased.
Latency is critical (e.g. voice bots with a real-time requirement).
Your plan does not support it (Free, Starter, PAYG).

How to activate the reranker

Open https://zeptix.dev/visualizer and select your bot.
In the canvas you see the "Sharpen selection" node - if it is gray, the reranker is off.
Click the node; the inspector opens on the right.
Under Reranker engine, choose either "BGE Base (local)" or "Cohere v3".
Leave the rerank pool at 50 - that fits most bots.
Save, then run a live preview with three test questions.

The "Sharpen selection" node visibly turns green - that is the visual confirmation that the stage is active. The status bar at the bottom also shows an increased estimated latency (from 0.8 s to about 1.0 to 1.1 s).

Rerank pool - what does the number mean?

The rerank pool defines how many initial hits the reranker re-sorts. Higher is not automatically better:

Pool	Effect
20	Only the already-good hits get re-sorted. Fast.
50	Default. The reranker gets enough choice without slowing things down too much.
100	Very broad. Useful for diverse knowledge bases, about 50 ms slower.

For 90 percent of all bots, 50 is the right choice.

Common mistakes

Reranker on + Top-K = 4: Brings almost nothing, because the reranker has too little choice. Top-K at least 8.
Reranker on + threshold = 0.70: After the threshold only three hits remain - the reranker has nothing to sort.
Reranker on without testing: Sometimes the order is already fine beforehand. Before activating, test two to three comparison questions with the live preview.

For advanced users

If you combine the reranker with multi-query (several question variants at once), you get the best pipeline quality - but also the highest latency and cost. Only sensible for high-value B2B bots where answer quality matters more than speed.

Next steps

Understand Precision or Recall before you switch on the reranker.
Optimize the knowledge base itself - see Improve knowledge base quality.
If you want to dive even deeper, take a look at Persona tuning.

← Previous articlePrecision or Recall? Setting Top-K and Threshold Correctly in the Visualizer Next article →Persona Tuning in the Visualizer - Personality, Style and Hard Rules

Reranker explained - when the second sorter in your bot pays off