OriginChain docs
examples · vector · 6 / 7

6. Fast mode

← Vector examples
what this does

Add "mode": "fast" to a topk request. The engine searches a narrower part of the index and returns sooner. The top hits are usually the same as "high_recall", but a few correct results near the boundary may be missed - this is the recall/latency trade-off.

Default is "high_recall". If you don't set mode, you get high recall.

when to use it
  • RAG behind a re-ranker. If a cross-encoder re-scores the top 50, a missed item at rank 47 doesn't matter.
  • Hot dashboards. Suggestion panels, "people also viewed", autocomplete.
  • Agent inner loops. Many small calls where each ms compounds.
when not to use it
  • First-pass retrieval correctness. If a missed hit at rank 8 means a wrong answer downstream, stay on high recall.
  • Citation lookup. "Find the source for this sentence" needs the right hit, not a near one.
  • Small tables. Under ~50k vectors high recall is already fast - fast mode buys you nothing.
the request
POST /v1/tenants/:t/vector/:table/topk
curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/vector/shop.products/topk" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query":  [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
    "k":      10,
    "dim":    768,
    "metric": "cosine",
    "mode":   "fast"
  }'
what you get back
{
  "hits": [
    { "id": "sku-9281", "score": 0.9421 },
    { "id": "sku-1144", "score": 0.9187 }
    /* ... up to k entries ... */
  ]
}

Same response shape as the default. The scores you see are real - the engine never fabricates a score for a hit it didn't actually compare.

the mode field
ValueBehaviour
"high_recall"Default. Wider index sweep. Use when correctness of the top hits matters more than latency.
"fast"Narrower sweep. Lower latency, slightly lower recall at the tail of the result list.
common mistakes
  • Using fast everywhere. It's the right call for some workloads but recall drops noticeably on long-tail queries. Measure end-to-end quality before flipping the switch globally.
  • Treating fast as an SLA. Fast mode reduces latency, but a hot table or a cold cache still moves the number. Watch p95, not a single sample.