docs · vector · IVF
IVF vector index.
IVF partitions your vectors into cells by their nearest centroid.
At query time, the engine picks the nprobe
closest cells to the query vector and scans only those. The result is the
path to 10M+ vectors per tenant on the standard tiers — and after the
bulk-load gate landed, 1M IVF builds in 80 s (was timing out at > 18 min).
The IVF pipeline.
- Train — k-means picks
nlistcentroids on a training sample.sqrt(N)is a reasonable starting point fornlist. - Assign — every vector is assigned to its nearest centroid. Bulk-load runs this in parallel; 1M vectors in 80 s after the Gate #4 work.
- Query — the query vector is scored against the centroids; only the
nprobeclosest cells are scanned.
Declare IVF in the manifest.
# manifest.toml — IVF on 768-dim embeddings
[[vectors]]
name = "embeddings"
dim = 768
metric = "cosine"
index = "ivf"
nlist = 1024 # number of inverted-file cells. sqrt(N) is a good start.
# IVF is independent of quantization. Add quantization = "pq" for IVF-PQ. Install centroids (admin).
Training is an HTTP admin call. Run it once when the corpus reaches training scale; re-run only if the data distribution shifts materially.
POST /v1/tenants/:t/vector/:table/install-centroids
curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/install-centroids" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"nlist": 1024,
"training_set": "sample",
"iters": 25
}' The nprobe knob.
nprobe is
the recall / latency dial. Higher = better recall, more cells scanned. The
bands below are from a synthetic 1M-vector eval at D=128, M=16,
nlist=1024 — your numbers will move with corpus geometry.
nproberecall@10latency band
1 0.71 p99 lowest. Cheap scan. 4 0.93 Default. Production sweet spot. 16 0.98 Recall-critical reads. 4× slower than nprobe=4. 64 0.995 Near-exhaustive. Quality benchmarks only. POST /v1/tenants/:t/vector/:table/topk
curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/topk" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": [/* 768 floats */],
"k": 10,
"nprobe": 4
}' When to pick IVF over HNSW.
- Working set is above ~5M vectors and HNSW memory is starting to bind.
- You want to combine with PQ (see IVF-PQ) for the 64–768× memory story.
- Workload is bulk-load heavy. The IVF bulk-load path lands 1M vectors in 80 s.
- Recall budget can tolerate the nprobe trade — the default nprobe=4 hits recall@10 ≈ 0.93 on the synthetic floor.