examples · vector · 6 / 7

6. Fast mode

what this does

Add "mode": "fast" to a topk request. The engine searches a narrower part of the index and returns sooner. The top hits are usually the same as "high_recall", but a few correct results near the boundary may be missed - this is the recall/latency trade-off.

Default is "high_recall". If you don't set mode, you get high recall.

when to use it

RAG behind a re-ranker. If a cross-encoder re-scores the top 50, a missed item at rank 47 doesn't matter.
Hot dashboards. Suggestion panels, "people also viewed", autocomplete.
Agent inner loops. Many small calls where each ms compounds.

when not to use it

First-pass retrieval correctness. If a missed hit at rank 8 means a wrong answer downstream, stay on high recall.
Citation lookup. "Find the source for this sentence" needs the right hit, not a near one.
Small tables. Under ~50k vectors high recall is already fast - fast mode buys you nothing.

the request

POST /v1/tenants/:t/vector/:table/topk

curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/vector/shop.products/topk" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query":  [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
    "k":      10,
    "dim":    768,
    "metric": "cosine",
    "mode":   "fast"
  }'

hits = db.vector.topk(
    "shop.products",
    query=query_768d,
    k=10,
    metric="cosine",
    mode="fast",
)

const hits = await db.vectorTopk("shop.products", {
  query:  query768d,
  k:      10,
  dim:    768,
  metric: "cosine",
  mode:   "fast",
});

hits, err := db.VectorTopK(ctx, "shop.products", originchain.VectorTopKRequest{
    Query:  query768d,
    K:      10,
    Dim:    768,
    Metric: "cosine",
    Mode:   originchain.ModeFast,
})

what you get back

{
  "hits": [
    { "id": "sku-9281", "score": 0.9421 },
    { "id": "sku-1144", "score": 0.9187 }
    /* ... up to k entries ... */
  ]
}

Same response shape as the default. The scores you see are real - the engine never fabricates a score for a hit it didn't actually compare.

the mode field

Value	Behaviour
"high_recall"	Default. Wider index sweep. Use when correctness of the top hits matters more than latency.
"fast"	Narrower sweep. Lower latency, slightly lower recall at the tail of the result list.

common mistakes

Using fast everywhere. It's the right call for some workloads but recall drops noticeably on long-tail queries. Measure end-to-end quality before flipping the switch globally.
Treating fast as an SLA. Fast mode reduces latency, but a hot table or a cold cache still moves the number. Watch p95, not a single sample.