03 · vector

Four metrics. Four index variants. One write.

Every vector lives inside the same atomic write as the row it describes. Hybrid sparse+dense retrieval, predicate-aware kNN, and `min_score` thresholds that admit empty as a real result.

distance metrics

Euclidean (L2)

Default. Geometric distance.

Inner Product

For pre-normalized embeddings.

Cosine

Direction only. Magnitude ignored.

Manhattan (L1)

Robust to outliers. Categorical embeddings.

index variants

Dense HNSW

Layer-0 walker. Graph-based ANN, tunable speed / recall.

Sparse

First-class put / get / top-k for BM25-style features.

Product Quantization

Codebook + storage primitives. Query path lands this sprint.

Adaptive over-fetch

Predicate-aware widening. Recall stays at target when WHERE narrows.

hybrid retrieval

Dense + sparse + a SQL filter, in one query.

Most production retrieval is hybrid - semantic similarity for recall, lexical sparse for precision, structural filters for relevance. OriginChain composes them in a single request, against one consistent snapshot.

Add a min_score threshold and an empty result is a real result - not a confidently-wrong nearest neighbour.

request

POST /v1/tenants/:t/vector/search
{
  "schema": "products",
  "k": 10,
  "dense":   { "vector": [...], "metric": "cosine" },
  "sparse":  { "tokens": { "leather": 0.84, ... } },
  "where":   "in_stock = true AND region = 'EU'",
  "min_score": 0.65
}

measured

4 × 4

metrics × index variants

< 30 ms

p99 kNN @ k=10, n=1M dense

< 50 ms

adaptive over-fetch with predicate

embedding-vs-row consistency lag

deferred

this sprintPQ query path - codebook + storage primitives shipped; executor wires in this cycle.
under evalIVF index. HNSW covers most workloads; we'll add IVF if a customer needs it.
post-1.0Cross-encoder reranking. Multi-vector documents.

Read the vector docs, then try it.

Vector docs → Architecture Try it