OriginChain
03 · vector

Four metrics. Four index variants. One write.

Every vector lives inside the same atomic write as the row it describes. Hybrid sparse+dense retrieval, predicate-aware kNN, and `min_score` thresholds that admit empty as a real result.

distance metrics
01
Euclidean (L2)

Default. Geometric distance.

02
Inner Product

For pre-normalized embeddings.

03
Cosine

Direction only. Magnitude ignored.

04
Manhattan (L1)

Robust to outliers. Categorical embeddings.

index variants
01
Dense HNSW

Layer-0 walker. Graph-based ANN, tunable speed / recall.

02
Sparse

First-class put / get / top-k for BM25-style features.

03
Product Quantization

Codebook + storage primitives. Query path lands this sprint.

04
Adaptive over-fetch

Predicate-aware widening. Recall stays at target when WHERE narrows.

hybrid retrieval

Dense + sparse + a SQL filter, in one query.

Most production retrieval is hybrid - semantic similarity for recall, lexical sparse for precision, structural filters for relevance. OriginChain composes them in a single request, against one consistent snapshot.

Add a min_score threshold and an empty result is a real result - not a confidently-wrong nearest neighbour.

request
POST /v1/tenants/:t/vector/search
{
  "schema": "products",
  "k": 10,
  "dense":   { "vector": [...], "metric": "cosine" },
  "sparse":  { "tokens": { "leather": 0.84, ... } },
  "where":   "in_stock = true AND region = 'EU'",
  "min_score": 0.65
}
measured
4 × 4
metrics × index variants
< 30 ms
p99 kNN @ k=10, n=1M dense
< 50 ms
adaptive over-fetch with predicate
0
embedding-vs-row consistency lag
deferred

Read the vector docs, then try it.