OriginChain docs
decision guide

Choosing a query shape

OriginChain answers five kinds of question - SQL, vector, full-text, graph, and natural language - against the same data. They commit atomically, but they're not interchangeable. Each is built for a specific shape of question.

This page is the decision guide. If your query looks like X, reach for Y. The capability matrix at the bottom is engine-accurate as of today - it shows what actually works, not what we plan to ship.

The shapes, side by side.

SQL POST /v1/tenants/:t/sql
use for

Filtering by exact-match WHERE on indexed columns. JOINs across up to 32 tables. GROUP BY with COUNT / SUM / AVG / MIN / MAX. LIMIT-bounded reads.

avoid for

Semantic similarity (use vector). Approximate string matching (use full-text). Multi-hop relationship questions (use graph). Natural-language questions from non-engineers (use Ask).

recognise it by

A Postgres-shaped query you'd write in DBeaver.

Vector search POST /v1/tenants/:t/vector/:table/topk
use for

Semantic similarity. Cross-language retrieval. 'Find rows like this one' even when the keywords don't match. Recommendations. De-duplication by meaning. RAG retrieval before an LLM call.

avoid for

Exact identifiers, SKUs, error codes (use SQL or full-text). Structural traversal (use graph).

recognise it by

An embedding vector plus k.

Full-text (BM25) GET /v1/tenants/:t/fts/:table/:field
use for

Exact phrase matching. Acronyms and product codes. Long-tail queries with unusual terms. Recall on documents containing the literal keyword.

avoid for

Conceptual queries where the user's wording doesn't match the document's wording (vector wins). Structured-field filtering (SQL is cheaper).

recognise it by

Words a human types into a search box.

Graph traversal GET/POST /v1/tenants/:t/graph/:schema/:algo
use for

Multi-hop relationship questions ('orders from customers I haven't reviewed yet'). Social-graph walks. Dependency chains. Shortest path. PageRank or centrality analytics. Reachability checks.

avoid for

Single-table lookups (use SQL). Semantic similarity (use vector). Large-result analytics that aren't relationship-shaped (SQL is cheaper).

recognise it by

Multi-hop or path query: 'shortest path through citations'.

Hybrid (vector + BM25) Run vector topk + FTS in parallel, fuse client-side
use for

Production retrieval. Catches both semantic match and literal keyword match. Generally outperforms either alone on standard benchmarks.

avoid for

Anything where one mode is structurally enough - don't fuse if vector alone is already perfect.

recognise it by

RAG retrieval before the LLM call.

Natural language (Ask) POST /v1/tenants/:t/ask
use for

Non-technical users asking questions of structured data. Internal dashboards. Customer-support agents. Prototype-grade analytics without writing SQL.

avoid for

Latency-critical hot paths (a cold compile costs an LLM round-trip). Queries that need to be auditable to a single SQL string.

recognise it by

An English sentence.

If your query looks like this...

Pattern-match the left column against what you're trying to do; the middle and right columns are the answer.

your queryreach forwhy
WHERE id = 'sku-9281' SQL Exact lookup on an indexed primary key.
WHERE status = 'pending' AND amount_cents > 100 SQL Multi-predicate WHERE with AND on indexed columns.
Per-customer totals over paid orders SQL GROUP BY customer + SUM(amount_cents).
Products similar to this one (no shared keywords) Vector Semantic similarity via embedding distance.
Products described as 'lightweight running shoes for marathons' Hybrid (vector + BM25) Catches the semantic match AND the literal keyword.
Find SKU ABC-1234-XL Full-text or SQL Exact-token retrieval. Vector would dilute it semantically.
Path between paper A and paper Z through citations Graph (BFS / path) Multi-hop walk. Cap with max_depth.
Shortest commute between two stations Graph (Dijkstra) Weighted shortest path. Supply edge weights via the JSON weights map.
Most influential nodes in a network Graph (PageRank) Iterative influence over a seed node set.
Customers in segment X (English question) Ask Translates the sentence to a Plan against your schemas.

Capability matrix.

What's supported today. yes = works · partial = limited shape (see the relevant reference page) · = not the right tool for this shape, or not yet supported.

"Ask" inherits SQL's surface where the compiler can build the right Plan - if SQL doesn't support a construct (like HAVING or window functions), Ask can't either.

Feature SQL Vector FTS Graph Ask
Exact-match WHERE on indexed col yes yes yes
AND-combined WHERE conditions yes yes
OR in WHERE partial
IN (literal list) yes yes
BETWEEN, IS NULL, LIKE yes yes
GROUP BY + COUNT / SUM / AVG / MIN / MAX yes yes
HAVING partial
ORDER BY yes yes partial
INNER / LEFT / RIGHT / FULL OUTER JOIN yes yes
LIMIT yes yes yes yes yes
Uncorrelated IN (SELECT ...) partial partial
Correlated subqueries / EXISTS
CTEs (WITH)
Window functions
EXPLAIN yes yes
Transactions (BEGIN/COMMIT/ROLLBACK) yes
HNSW · cosine / dot / L2 / Manhattan yes
IVF / IVF-PQ for 10M+ corpora yes
Metadata equality filter on topk yes
fast / high_recall mode selector yes
BM25 ranking yes
Boolean AND yes
Phrase (exact word order) yes
Fuzzy / typo tolerance yes
18-language stemming · 9-language lemmas yes
Neighbours (forward + reverse) yes
BFS, path, all simple paths yes
Dijkstra / k-shortest weighted paths yes
PageRank, betweenness, eigenvector yes
Louvain, label-prop, components yes
Node2Vec / GraphSAGE embeddings yes
Atomic write (row + indexes + edges) yes yes yes yes
Idempotency keys yes yes yes yes yes

Combining shapes in one app.

The shapes aren't mutually exclusive. A single bearer token can hit any of them, and the engine keeps row writes + vector + full-text indexes consistent. Three common patterns:

  • RAG retrieval before an LLM call. Run vector + full-text in parallel, fuse with Reciprocal Rank Fusion in your app, pass the top-k rows into your prompt. The rows you retrieve are from the same store as your application data, so authorization is consistent.
  • Graph filter + SQL projection. Use a multi-hop traversal to find candidate primary keys, then do a SQL WHERE id IN (...) for the full projection. Graph hop costs ~tens of ms; SQL projection is sub-millisecond.
  • Ask with show_plan. Let a non-technical user write the question, return the compiled plan, paste the equivalent SQL into your codebase. Ask becomes the prototype; the SQL becomes the production query.