Four metrics. Four index variants. One write.
Every vector lives inside the same atomic write as the row it describes. Hybrid sparse+dense retrieval, predicate-aware kNN, and `min_score` thresholds that admit empty as a real result.
Default. Geometric distance.
For pre-normalized embeddings.
Direction only. Magnitude ignored.
Robust to outliers. Categorical embeddings.
Layer-0 walker. Graph-based ANN, tunable speed / recall.
First-class put / get / top-k for BM25-style features.
Codebook + storage primitives. Query path lands this sprint.
Predicate-aware widening. Recall stays at target when WHERE narrows.
Dense + sparse + a SQL filter, in one query.
Most production retrieval is hybrid - semantic similarity for recall, lexical sparse for precision, structural filters for relevance. OriginChain composes them in a single request, against one consistent snapshot.
Add a min_score threshold and an empty result is
a real result - not a confidently-wrong nearest neighbour.
POST /v1/tenants/:t/vector/search
{
"schema": "products",
"k": 10,
"dense": { "vector": [...], "metric": "cosine" },
"sparse": { "tokens": { "leather": 0.84, ... } },
"where": "in_stock = true AND region = 'EU'",
"min_score": 0.65
} - this sprintPQ query path - codebook + storage primitives shipped; executor wires in this cycle.
- under evalIVF index. HNSW covers most workloads; we'll add IVF if a customer needs it.
- post-1.0Cross-encoder reranking. Multi-vector documents.