examples · fts · 5 / 6
5. BM25 ranked retrieval
← FTS exampleswhat this does
Returns the top k documents ranked by BM25 relevance to the query. The shape changes from doc_ids to hits - each entry is { doc_id, score } sorted by descending score. Higher score = more relevant.
when to use it
- Search bars - users expect the best match at the top.
- Recommendations: rank a candidate pool by textual similarity.
- Any time you want "best k" rather than "all that match".
the request
GET /v1/tenants/:t/fts/:schema/:field?q=...&mode=bm25&k=10
curl -G "https://$OC_HOST/v1/tenants/$OC_TENANT/fts/shop.products/description" \
-H "Authorization: Bearer $OC_TOKEN" \
--data-urlencode "q=wireless headphones" \
--data-urlencode "mode=bm25" \
--data-urlencode "k=10"hits = db.fts.search(
"shop.products",
"description",
q="wireless headphones",
mode="bm25",
k=10,
)
for hit in hits.hits:
print(hit["doc_id"], hit["score"])const result = await db.ftsSearch("shop.products", "description", {
q: "wireless headphones",
mode: "bm25",
k: 10,
});
for (const hit of result.hits) {
console.log(hit.doc_id, hit.score);
}result, _ := db.FTSSearch(ctx, "shop.products", "description", originchain.FTSSearchRequest{
Q: "wireless headphones",
Mode: "bm25",
K: 10,
})
for _, hit := range result.Hits {
fmt.Println(hit.DocID, hit.Score)
} what you get back
{
"mode": "bm25",
"hits": [
{ "doc_id": "p001", "score": 9.42 },
{ "doc_id": "p027", "score": 7.18 },
{ "doc_id": "p014", "score": 4.55 }
]
} how it works
- The query is tokenised, then for each token the engine fetches the posting list along with term frequencies and document lengths.
- Each candidate document gets a BM25 score using the Lucene defaults:
k1 = 1.2,b = 0.75. Score grows with how often the rare terms appear and shrinks if the document is much longer than average. - A top-
kheap keeps the highest scorers; everything else is discarded.
Optional query params: fuzzy=1 for single-character typo tolerance, highlight=true to return matched-snippet text, facets=col,col for grouped counts alongside hits.
common mistakes
- Comparing scores across queries. A BM25 score of 9.4 means nothing on its own and can't be compared to the 9.4 from a different query. Use scores to rank within one result set only.
- Forgetting
k. If you omit it, you get a default cap. Setkto what you actually need; ranking the entire corpus is wasted work. - Reaching for BM25 when boolean would do. If you only need "does it match", boolean is cheaper and the answer is the same set.