RAG that doesn't drift.
The atomic upsert of (row + dense + sparse) means your retrieval can't surface a vector pointing at a row the model didn't write or no longer matches.
Two stores. One source of truth, at most.
A customer updates a row in Postgres. Your sync job re-embeds and writes to Pinecone. Between those two writes, retrieval can surface the old vector pointing at the new row - a confidently-wrong answer.
This is the most common RAG failure mode in production. It's not your model. It's your stack.
POST /v1/tenants/:t/vector/search
{
"k": 10,
"dense": { "vector": [...], "metric": "cosine" },
"sparse": { "tokens": { "anchor": 0.74, ... } },
"where": "tier IN ('premium','enterprise')",
"min_score": 0.7
} Row + dense + sparse in one WAL frame. The retrieval can't return a vector pointing at a row that no longer matches.
Combine semantic + BM25-style sparse + a SQL filter in one query. One plan, one consistent snapshot.
Empty result is a real result. We won't ship you a confidently-wrong nearest neighbour.
/ask orchestrates RAG by default.
Translate the question → hybrid retrieve → re-rank → generate. Every step uses the same consistent snapshot. You can override the retrieval plan if the LLM picks wrong. /ask details →
- under designCross-encoder reranking step.
- under designMulti-vector documents (one row, multiple embeddings).