Schema for Full-text.
schema · full-text
FTS indexes are NOT declared in the schema TOML. Index a (table, field) pair with one POST; query with one GET. Synonyms, stopwords, lemmatizers, facets, highlights are all configurable per pair at runtime.
Engine surface: POST /v1/tenants/:t/fts/:table/:field family — plain text, JSON-aware, doc store, synonyms, stopwords. GET /v1/tenants/:t/fts/:table/:field — boolean / bm25 / phrase search.
Required schema fields.
Without these, this query surface doesn't function at all.
| field | effect |
|---|---|
| (none) | FTS indexes live entirely at runtime. The :table and :field on the URL are logical buckets — do NOT need to be a registered schema or column. |
Optional fields — what each one unlocks.
Add only the fields whose effect you need. Each one buys a specific capability — speed up a predicate, guard a write, or unlock a new query shape.
| field | type | default | effect |
|---|---|---|---|
| POST /fts/:t/:f body { doc_id, text } | object | — | Index plain text. BM25 inverted index built lazily on first call. |
| POST /fts/:t/:f/json body { doc_id, json, paths } | object | — | JSON-aware: walks dotted paths and indexes string leaves. Omit paths to index every string leaf. |
| POST /fts/:t/:f/doc body { doc_id, text, facets } | object | — | Store doc text + per-facet values. REQUIRED for highlight=true and facets= query params. |
| POST /fts/:t/:f/synonyms body { synonyms: {...} } | object | — | Per (table, field) synonym class map. Both index and query treat each class as equivalent. Re-install replaces in full. |
| POST /fts/:t/:f/stopwords body { stopwords: [...] } | object | — | Per (table, field) drop list. Applied at both index and query time. Re-install replaces in full. |
| GET ?q= (query string) | string | — | Query string. Whitespace-split; each token runs through the analyzer pipeline. |
| GET ?mode= | string | boolean | boolean | bm25 | phrase. Boolean = AND-of-terms. BM25 = ranked. Phrase = in-order match. |
| GET ?k= | int | 10 | Top-K cap for bm25 mode. Ignored in boolean / phrase. |
| GET ?fuzzy=0..3 | int | 0 | Edit-distance budget per term in bm25 mode. Catches typos. Capped at MAX_EDIT_DISTANCE = 3. |
| GET ?highlight=true | bool | false | Return per-hit snippet highlights. Requires the doc text stored via /doc endpoint. BM25 mode only. |
| GET ?facets=csv | string | — | Comma-separated facet field names. Returns aggregated counts alongside hits. |
What you can call (no schema knob needed).
- POST /fts/:t/:f — index plain text
- POST /fts/:t/:f/json — JSON-aware index walks dotted paths
- POST /fts/:t/:f/doc — store text + facets (required for highlight + facet aggregation)
- POST /fts/:t/:f/synonyms — install per-(table, field) synonym map
- POST /fts/:t/:f/stopwords — install per-(table, field) stopword list
- GET /fts/:t/:f?q=… — boolean / bm25 / phrase search with optional fuzzy + highlight + facets
Abbreviation legend.
| token | meaning |
|---|---|
| BM25 | Okapi BM25 — the ranking function used in mode=bm25. Same scoring family as Lucene / Elasticsearch |
| doc_id | Unique identifier per indexed document (within a table, field pair) |
| facet | Categorical attribute stored alongside the doc text for aggregate counts |
| highlight | Per-hit snippet of the matching doc text with the query terms wrapped in `<em>` tags |
| fuzzy | Edit-distance tolerance — fuzzy=1 catches single typos, fuzzy=2 catches two-char edits, etc |
| tokenizer | unicode (UAX #29 default, multilingual) | ascii (fast-path for pure-ASCII corpora) |
| stemmer | Snowball stemmer reducing word forms to a root token. 18 languages supported |
| stopwords | Tokens dropped at both index and query time (e.g. 'the', 'a', 'and') |
| synonyms | Class-based equivalence — every member of a class scores against every query that matches any other member |
Worked example.
Schema TOML — copy + register via POST /v1/tenants/:t/schemas with Content-Type: text/plain.
# ──────────────────────────────────────────────────────────────────────
# Important: FTS indexes are NOT declared in the schema TOML.
# The grammar oc_schema::Manifest accepts does NOT have a [[extractions.fts]]
# block. Tokenizer / analyzer / stem language all live on the runtime
# POST /fts/:table/:field body — set once per (table, field) pair.
#
# What the TOML IS for (FTS workflows): registering the ROW schema so the
# same row is reachable via SQL alongside FTS search. Same id keeps things
# aligned.
# ──────────────────────────────────────────────────────────────────────
namespace = "shop"
table = "products"
primary_key = ["id"]
[[columns]]
name = "id"
ty = "str"
required = true
[[columns]]
name = "name"
ty = "str"
[[columns]]
name = "description"
ty = "str"
[[columns]]
name = "category"
ty = "str"
[[indexes]]
name = "by_category"
columns = ["category"] Runtime calls.
# ════════════════════════════════════════════════════════════════════
# INDEX — 3 ways to load text
# ════════════════════════════════════════════════════════════════════
# 1) Plain text per doc — simplest
curl -X POST $BASE/v1/tenants/$T/fts/shop_products/description -H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"doc_id": "p001",
"text": "Wireless Bluetooth headphones with active noise cancellation"
}'
# 2) JSON-aware — walks dotted paths inside a nested doc
curl -X POST $BASE/v1/tenants/$T/fts/shop_products/description/json -H "Authorization: Bearer $BEARER" \
-d '{
"doc_id": "p001",
"json": {
"name": "Wireless Headphones",
"desc": { "short": "BT 5.3 ANC", "long": "Over-ear noise-cancelling..." },
"tags": ["audio", "premium"]
},
"paths": ["name", "desc.short", "desc.long", "tags"]
}'
# Omit "paths" to index every string leaf in the document.
# 3) Store doc text + facets — REQUIRED for highlight=true and facets= queries
curl -X POST $BASE/v1/tenants/$T/fts/shop_products/description/doc -H "Authorization: Bearer $BEARER" \
-d '{
"doc_id": "p001",
"text": "Wireless Bluetooth headphones with active noise cancellation",
"facets": {
"category": ["electronics"],
"brand": ["acme"],
"price_bucket": ["100-200"]
}
}'
# ════════════════════════════════════════════════════════════════════
# CONFIG — synonyms + stopwords (optional, per (table, field) pair)
# ════════════════════════════════════════════════════════════════════
# Install synonyms — each class is treated as equivalent at both index and query time
curl -X POST $BASE/v1/tenants/$T/fts/shop_products/description/synonyms -H "Authorization: Bearer $BEARER" \
-d '{
"synonyms": {
"headphones": ["earbuds", "earphones", "cans"],
"laptop": ["notebook", "computer"],
"tv": ["television"]
}
}'
# Install stopwords — dropped at both index and query time
curl -X POST $BASE/v1/tenants/$T/fts/shop_products/description/stopwords -H "Authorization: Bearer $BEARER" \
-d '{
"stopwords": ["the", "a", "an", "and", "or", "of", "in", "with", "for"]
}'
# ════════════════════════════════════════════════════════════════════
# SEARCH — 3 modes × 6 query params
# ════════════════════════════════════════════════════════════════════
# Boolean mode (DEFAULT) — AND of terms, no scoring, fastest
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wireless+headphones" \
-H "Authorization: Bearer $BEARER"
# BM25 mode — ranked, top-k cap via k=
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wireless&mode=bm25&k=10" \
-H "Authorization: Bearer $BEARER"
# Phrase mode — exact in-order match
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=noise+cancellation&mode=phrase&k=10" \
-H "Authorization: Bearer $BEARER"
# Fuzzy — every BM25 term treated as term~N. fuzzy=1 catches single typos.
# Capped at MAX_EDIT_DISTANCE = 3.
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wirless&mode=bm25&fuzzy=1&k=10" \
-H "Authorization: Bearer $BEARER"
# Highlight — returns {highlights: {description: ["…<em>wireless</em>…"]}} per hit
# Requires the stored-text doc was set via POST /doc above. BM25 mode only.
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wireless&mode=bm25&k=5&highlight=true" \
-H "Authorization: Bearer $BEARER"
# Facets — comma-separated facet field names from the stored doc.
# Returns aggregated {category: {electronics: 5, books: 2}} alongside hits.
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wireless&mode=bm25&k=5&facets=category,brand" \
-H "Authorization: Bearer $BEARER"
# Kitchen-sink — fuzzy + highlight + facets in one call
curl "$BASE/v1/tenants/$T/fts/shop_products/description?q=wirless&mode=bm25&k=5&fuzzy=1&highlight=true&facets=category,brand,price_bucket" \
-H "Authorization: Bearer $BEARER"