examples · vector · 5 / 7

5. Top-k - dot product

what this does

Rank by dot product (inner product) - sum the element-wise product of query and stored vector. When both sides are unit-normalised (length 1), the dot product equals cosine similarity but skips the per-call normalisation, so it is a bit cheaper.

when to use it

You already L2-normalise every vector before storing - common with OpenAI text-embedding-3-small and Cohere v3.
You normalise the query the same way before calling topk.
You want the lowest-overhead similarity metric the engine offers.

the request

POST /v1/tenants/:t/vector/:table/topk

curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/vector/shop.products/topk" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query":  [0.0124, -0.0883, 0.0451, /* ... 768 unit-normalised floats ... */],
    "k":      10,
    "dim":    768,
    "metric": "dot"
  }'

hits = db.vector.topk(
    "shop.products",
    query=query_768d,   # already L2-normalised to length 1
    k=10,
    metric="dot",
)

const hits = await db.vectorTopk("shop.products", {
  query:  query768d,   // already L2-normalised to length 1
  k:      10,
  dim:    768,
  metric: "dot",
});

hits, err := db.VectorTopK(ctx, "shop.products", originchain.VectorTopKRequest{
    Query:  query768d,   // already L2-normalised to length 1
    K:      10,
    Dim:    768,
    Metric: "dot",
})

what you get back

{
  "hits": [
    { "id": "sku-9281", "score": 0.9418 },
    { "id": "sku-1144", "score": 0.9180 },
    { "id": "sku-5520", "score": 0.8895 }
    /* ... up to k entries ... */
  ]
}

score is the inner product. Higher = closer. If both sides are unit-length the score falls in [-1, 1]; if not, the score is unbounded.

request fields

Field	Required	Notes
query	yes	Array of floats. Normalise it to length 1 before sending.
k	yes	Number of hits to return.
dim	yes	Must match the table's locked dim.
metric	yes	Set to `"dot"`. Must match the metric the table was put with.

common mistakes

Dot on non-normalised vectors. Without normalisation, longer vectors win regardless of direction. A vector that is just "bigger" beats a vector that is actually more relevant. Symptom: the same few IDs always rank first.
Normalising one side and not the other. Normalise both stored vectors and the query, or neither. Mixing the two ruins the ranking.
Switching mid-table. If half the table was put pre-normalised and half wasn't, dot scores are no longer comparable. Re-embed and re-put.