OriginChain docs
examples · full-text

Full-text examples — copy-paste JSON.

← All examples

Ten copy-paste JSON examples for managed full-text search. UAX #29 Unicode tokenizer, Lucene-default BM25 (k1 = 1.2, b = 0.75), 18 Snowball stemmers, boolean and phrase modes. See /docs/fts for the full reference.

Indexing

fts_index — minimal document

Index a document under a (table, field) inverted index. The tokenizer (UAX #29 Unicode) and stemmer come from the manifest.

Request
curl -X POST "$ENGINE/v1/tenants/$T/fts/articles/body" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "doc_id": "post-001",
  "text":   "OriginChain runs SQL, vector, full-text, and graph against one database."
}
JSON
Response
{
  "indexed": 1,
  "doc_id": "post-001",
  "tokens": 12,
  "lsn": 41827601,
  "elapsed_ms": 4
}

fts_index — explicit doc_id (re-index)

Re-indexing the same doc_id replaces the previous postings atomically. Old terms vanish, new terms appear.

Request
curl -X POST "$ENGINE/v1/tenants/$T/fts/articles/body" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "doc_id": "post-001",
  "text":   "Updated body. Now mentions HNSW, BM25, and Dijkstra."
}
JSON
Response
{
  "indexed": 1,
  "doc_id": "post-001",
  "tokens": 8,
  "replaced": true,
  "lsn": 41827610,
  "elapsed_ms": 4
}

fts_index — explicit language stem

Override the manifest-default stemmer. Useful when a single field carries multilingual content.

Request
curl -X POST "$ENGINE/v1/tenants/$T/fts/articles/body" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "doc_id":   "post-fr-22",
  "text":     "Recherche plein-texte gérée avec ranking BM25.",
  "language": "french"
}
JSON
Response
{
  "indexed": 1,
  "doc_id": "post-fr-22",
  "tokens": 6,
  "language": "french",
  "lsn": 41827622,
  "elapsed_ms": 5
}

Boolean search

fts_search — boolean AND

Posting-list intersection. Every term must appear in the doc.

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=boolean&q=hnsw+bm25" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-001" },
    { "doc_id": "post-014" }
  ],
  "count": 2,
  "elapsed_ms": 3
}

fts_search — boolean OR

Posting-list union. Any term suffices.

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=boolean&q=hnsw+OR+dijkstra" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-001" },
    { "doc_id": "post-014" },
    { "doc_id": "post-022" }
  ],
  "count": 3,
  "elapsed_ms": 4
}

fts_search — boolean NOT

Subtract one term's postings from another's. Same posting-list arithmetic.

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=boolean&q=database+NOT+legacy" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-001" },
    { "doc_id": "post-007" }
  ],
  "count": 2,
  "elapsed_ms": 4
}

Phrase + BM25

fts_search — phrase match

Exact word sequence. The tokenizer's positional postings power the phrase verifier.

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=phrase&q=write+ahead+log" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-014" }
  ],
  "count": 1,
  "elapsed_ms": 5
}

fts_search — BM25 ranked

Lucene defaults k1 = 1.2, b = 0.75. Returns scored top-k. Score = Σ IDF(t) · (tf · (k1+1)) / (tf + k1 · (1 − b + b · dl/avgdl)).

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=bm25&q=managed+vector+database&k=10" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-001", "score": 8.4112 },
    { "doc_id": "post-014", "score": 6.2031 },
    { "doc_id": "post-022", "score": 3.9418 }
  ],
  "count": 3,
  "elapsed_ms": 6
}

fts_search — k limit

Cap the result set. The BM25 top-k uses a bounded heap so latency is independent of corpus size beyond posting walks.

Request
curl "$ENGINE/v1/tenants/$T/fts/articles/body?mode=bm25&q=substrate+managed&k=3" \
  -H "Authorization: Bearer $OC_TOKEN"
Response
{
  "hits": [
    { "doc_id": "post-001", "score": 8.4112 },
    { "doc_id": "post-014", "score": 6.2031 },
    { "doc_id": "post-007", "score": 5.7710 }
  ],
  "count": 3,
  "elapsed_ms": 5
}

Languages