examples · vector · 6 / 7
6. Fast mode
← Vector exampleswhat this does
Add "mode": "fast" to a topk request. The engine searches a narrower part of the index and returns sooner. The top hits are usually the same as "high_recall", but a few correct results near the boundary may be missed - this is the recall/latency trade-off.
Default is "high_recall". If you don't set mode, you get high recall.
when to use it
- RAG behind a re-ranker. If a cross-encoder re-scores the top 50, a missed item at rank 47 doesn't matter.
- Hot dashboards. Suggestion panels, "people also viewed", autocomplete.
- Agent inner loops. Many small calls where each ms compounds.
when not to use it
- First-pass retrieval correctness. If a missed hit at rank 8 means a wrong answer downstream, stay on high recall.
- Citation lookup. "Find the source for this sentence" needs the right hit, not a near one.
- Small tables. Under ~50k vectors high recall is already fast - fast mode buys you nothing.
the request
POST /v1/tenants/:t/vector/:table/topk
curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/vector/shop.products/topk" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
"k": 10,
"dim": 768,
"metric": "cosine",
"mode": "fast"
}'hits = db.vector.topk(
"shop.products",
query=query_768d,
k=10,
metric="cosine",
mode="fast",
)const hits = await db.vectorTopk("shop.products", {
query: query768d,
k: 10,
dim: 768,
metric: "cosine",
mode: "fast",
});hits, err := db.VectorTopK(ctx, "shop.products", originchain.VectorTopKRequest{
Query: query768d,
K: 10,
Dim: 768,
Metric: "cosine",
Mode: originchain.ModeFast,
}) what you get back
{
"hits": [
{ "id": "sku-9281", "score": 0.9421 },
{ "id": "sku-1144", "score": 0.9187 }
/* ... up to k entries ... */
]
} Same response shape as the default. The scores you see are real - the engine never fabricates a score for a hit it didn't actually compare.
the mode field
| Value | Behaviour |
|---|---|
| "high_recall" | Default. Wider index sweep. Use when correctness of the top hits matters more than latency. |
| "fast" | Narrower sweep. Lower latency, slightly lower recall at the tail of the result list. |
common mistakes
- Using fast everywhere. It's the right call for some workloads but recall drops noticeably on long-tail queries. Measure end-to-end quality before flipping the switch globally.
- Treating fast as an SLA. Fast mode reduces latency, but a hot table or a cold cache still moves the number. Watch p95, not a single sample.