decision guide

Choosing a query shape

OriginChain answers five kinds of question - SQL, vector, full-text, graph, and natural language - against the same data. They commit atomically, but they're not interchangeable. Each is built for a specific shape of question.

This page is the decision guide. If your query looks like X, reach for Y. The capability matrix at the bottom is engine-accurate as of today - it shows what actually works, not what we plan to ship.

The shapes, side by side.

SQL POST /v1/tenants/:t/sql

use for

Filtering by exact-match WHERE on indexed columns. JOINs across up to 32 tables. GROUP BY with COUNT / SUM / AVG / MIN / MAX. LIMIT-bounded reads.

avoid for

Semantic similarity (use vector). Approximate string matching (use full-text). Multi-hop relationship questions (use graph). Natural-language questions from non-engineers (use Ask).

recognise it by

A Postgres-shaped query you'd write in DBeaver.

Vector search POST /v1/tenants/:t/vector/:table/topk

use for

Semantic similarity. Cross-language retrieval. 'Find rows like this one' even when the keywords don't match. Recommendations. De-duplication by meaning. RAG retrieval before an LLM call.

avoid for

Exact identifiers, SKUs, error codes (use SQL or full-text). Structural traversal (use graph).

recognise it by

An embedding vector plus k.

Full-text (BM25) GET /v1/tenants/:t/fts/:table/:field

use for

Exact phrase matching. Acronyms and product codes. Long-tail queries with unusual terms. Recall on documents containing the literal keyword.

avoid for

Conceptual queries where the user's wording doesn't match the document's wording (vector wins). Structured-field filtering (SQL is cheaper).

recognise it by

Words a human types into a search box.

Graph traversal GET/POST /v1/tenants/:t/graph/:schema/:algo

use for

Multi-hop relationship questions ('orders from customers I haven't reviewed yet'). Social-graph walks. Dependency chains. Shortest path. PageRank or centrality analytics. Reachability checks.

avoid for

Single-table lookups (use SQL). Semantic similarity (use vector). Large-result analytics that aren't relationship-shaped (SQL is cheaper).

recognise it by

Multi-hop or path query: 'shortest path through citations'.

Hybrid (vector + BM25) Run vector topk + FTS in parallel, fuse client-side

use for

Production retrieval. Catches both semantic match and literal keyword match. Generally outperforms either alone on standard benchmarks.

avoid for

Anything where one mode is structurally enough - don't fuse if vector alone is already perfect.

recognise it by

RAG retrieval before the LLM call.

Natural language (Ask) POST /v1/tenants/:t/ask

use for

Non-technical users asking questions of structured data. Internal dashboards. Customer-support agents. Prototype-grade analytics without writing SQL.

avoid for

Latency-critical hot paths (a cold compile costs an LLM round-trip). Queries that need to be auditable to a single SQL string.

recognise it by

An English sentence.

If your query looks like this...

Pattern-match the left column against what you're trying to do; the middle and right columns are the answer.

your queryreach forwhy

WHERE id = 'sku-9281' SQL Exact lookup on an indexed primary key.

WHERE status = 'pending' AND amount_cents > 100 SQL Multi-predicate WHERE with AND on indexed columns.

Per-customer totals over paid orders SQL GROUP BY customer + SUM(amount_cents).

Products similar to this one (no shared keywords) Vector Semantic similarity via embedding distance.

Products described as 'lightweight running shoes for marathons' Hybrid (vector + BM25) Catches the semantic match AND the literal keyword.

Find SKU ABC-1234-XL Full-text or SQL Exact-token retrieval. Vector would dilute it semantically.

Path between paper A and paper Z through citations Graph (BFS / path) Multi-hop walk. Cap with max_depth.

Shortest commute between two stations Graph (Dijkstra) Weighted shortest path. Supply edge weights via the JSON weights map.

Most influential nodes in a network Graph (PageRank) Iterative influence over a seed node set.

Customers in segment X (English question) Ask Translates the sentence to a Plan against your schemas.

Capability matrix.

What's supported today. yes = works · partial = limited shape (see the relevant reference page) · — = not the right tool for this shape, or not yet supported.

"Ask" inherits SQL's surface where the compiler can build the right Plan - if SQL doesn't support a construct (like HAVING or window functions), Ask can't either.

Feature	SQL	Vector	FTS	Graph	Ask
Exact-match WHERE on indexed col	yes	—	yes	—	yes
AND-combined WHERE conditions	yes	—	—	—	yes
OR in WHERE	—	—	—	—	partial
IN (literal list)	yes	—	—	—	yes
BETWEEN, IS NULL, LIKE	yes	—	—	—	yes
GROUP BY + COUNT / SUM / AVG / MIN / MAX	yes	—	—	—	yes
HAVING	—	—	—	—	partial
ORDER BY	—	yes	yes	—	partial
INNER / LEFT / RIGHT / FULL OUTER JOIN	yes	—	—	—	yes
LIMIT	yes	yes	yes	yes	yes
Uncorrelated IN (SELECT ...)	partial	—	—	—	partial
Correlated subqueries / EXISTS	—	—	—	—	—
CTEs (WITH)	—	—	—	—	—
Window functions	—	—	—	—	—
EXPLAIN	yes	—	—	—	yes
Transactions (BEGIN/COMMIT/ROLLBACK)	yes	—	—	—	—
HNSW · cosine / dot / L2 / Manhattan	—	yes	—	—	—
IVF / IVF-PQ for 10M+ corpora	—	yes	—	—	—
Metadata equality filter on topk	—	yes	—	—	—
fast / high_recall mode selector	—	yes	—	—	—
BM25 ranking	—	—	yes	—	—
Boolean AND	—	—	yes	—	—
Phrase (exact word order)	—	—	yes	—	—
Fuzzy / typo tolerance	—	—	yes	—	—
18-language stemming · 9-language lemmas	—	—	yes	—	—
Neighbours (forward + reverse)	—	—	—	yes	—
BFS, path, all simple paths	—	—	—	yes	—
Dijkstra / k-shortest weighted paths	—	—	—	yes	—
PageRank, betweenness, eigenvector	—	—	—	yes	—
Louvain, label-prop, components	—	—	—	yes	—
Node2Vec / GraphSAGE embeddings	—	—	—	yes	—
Atomic write (row + indexes + edges)	yes	yes	yes	yes	—
Idempotency keys	yes	yes	yes	yes	yes

Combining shapes in one app.

The shapes aren't mutually exclusive. A single bearer token can hit any of them, and the engine keeps row writes + vector + full-text indexes consistent. Three common patterns:

RAG retrieval before an LLM call. Run vector + full-text in parallel, fuse with Reciprocal Rank Fusion in your app, pass the top-k rows into your prompt. The rows you retrieve are from the same store as your application data, so authorization is consistent.
Graph filter + SQL projection. Use a multi-hop traversal to find candidate primary keys, then do a SQL WHERE id IN (...) for the full projection. Graph hop costs ~tens of ms; SQL projection is sub-millisecond.
Ask with show_plan. Let a non-technical user write the question, return the compiled plan, paste the equivalent SQL into your codebase. Ask becomes the prototype; the SQL becomes the production query.