Define schemas
A schema is a TOML manifest. One file declares the columns, the primary key, secondary indexes, graph relations, full-text fields, and vector fields for one table. The substrate hashes everything off the manifest — registering it is the prerequisite to any row write.
The catalog is itself stored as rows. Adding a column or extracting a new vector is an online migration, not downtime — see ops → migrations. Reference for vector / full-text / graph query syntax lives on vector, full-text, and graph.
A full example.
One manifest. One shop.products table with rows, two indexes, a supplier relation, a BM25 description field, and a 768-dimensional embedding extraction. Every section below this one is a piece of the same shape.
# manifest.toml — full surface in one declaration
id = "shop.products"
[primary_key]
columns = ["id"]
# ── Columns ─────────────────────────────────────────────────────────
[[columns]]
name = "id" ; type = "str"
[[columns]]
name = "name" ; type = "str"
[[columns]]
name = "supplier_id" ; type = "str"
[[columns]]
name = "price" ; type = "f64"
[[columns]]
name = "category" ; type = "str"
[[columns]]
name = "description" ; type = "str"
[[columns]]
name = "created_at" ; type = "timestamp"
# ── Secondary indexes ───────────────────────────────────────────────
[[indexes]]
columns = ["category"]
[[indexes]]
columns = ["supplier_id", "created_at"]
# ── Graph relations ─────────────────────────────────────────────────
[[relations]]
name = "supplied_by"
column = "supplier_id"
target = "shop.suppliers"
bidirectional = true # default; reverse edge written atomically
# ── Full-text extractions ───────────────────────────────────────────
[[extractions.fts]]
field = "description"
tokenizer = "unicode" # unicode (UAX #29) | ascii
analyzer = ["lowercase", "stem:english", "stop:english"]
# ── Vector extractions ──────────────────────────────────────────────
[[extractions.vector]]
field = "description"
dim = 768
metric = "cosine" # cosine | dot | l2 Columns & primary keys.
Each [[columns]] block declares one field. [primary_key] can list one column (single-column PK) or several (composite PK; lookups go through a Plan-tree query, not the typed /rows/:pk route).
| type | description |
|---|---|
| str | UTF-8 string. The default for IDs, emails, names. |
| i64 | Signed 64-bit integer. |
| f64 | Double-precision float. |
| bool | Boolean. |
| ulid | ULID (lexicographically sortable). Recommended for time-ordered primary keys. |
| uuid | UUID v4. |
| timestamp | RFC3339 instant. Stored as UTC microseconds. |
| decimal | Arbitrary-precision decimal. Use for money. |
| json | Opaque JSON blob. Not indexable, not extractable. |
Secondary indexes.
Hash-keyed lookups. Single-column or composite (in declared order). The index entry is written and retired in the same WAL frame as the row — never out of sync.
[[indexes]]
columns = ["status"] # single-column hash index
[[indexes]]
columns = ["customer_id", "placed_at"] # composite, in declared order
Index byte layout: idx|<schema>|<column>|<value_bytes>|<pk_bytes>. Range scans work via prefix iteration on (schema, column, value).
Graph relations.
Declared on the table that holds the foreign-key column. When a row writes that column, the substrate emits a forward edge and (with bidirectional = true, the default) a reverse edge — both atomic with the row. Walk them with neighbors / BFS / Dijkstra.
[[relations]]
name = "supplied_by" # the verb you'll walk in code
column = "supplier_id" # the FK column on this table
target = "shop.suppliers" # the target table
bidirectional = true # default; reverse edge written atomically
# Self-relations are fine — direction tags in the key resolve the collision:
[[relations]]
name = "follows"
column = "followee_id"
target = "social.users" # same table; self-loop is allowed
bidirectional = true Full-text fields.
Each [[extractions.fts]] block makes one (table, field) BM25-searchable. The optional analyzer pipeline runs after tokenisation in declared order. Index-time and query-time use the same pipeline — never analyse one and not the other.
[[extractions.fts]]
field = "body"
tokenizer = "unicode" # unicode (UAX #29) | ascii
analyzer = ["lowercase", "stem:english", "stop:english"]
# any subset; order is the pipeline order Tokenizers
UAX #29 word segmentation + full Unicode lowercase. Handles Latin, Cyrillic, Greek, CJK, Arabic, Devanagari, Hebrew without per-language config.
Whitespace + punctuation split, ASCII lowercase, ~3× faster on pure-ASCII corpora. Falls back to unicode if it sees a non-ASCII byte.
Analyzer pipeline
-
lowercaseFull Unicode case fold (handles Turkish dotted-i, German ß, etc). -
fold_diacriticsNFKD + drop combining marks. "café" matches "cafe". -
stop:<lang>Per-locale stop-word elimination. 18 languages. -
stem:<lang>Snowball stemmer. "running" / "ran" / "runs" → one token.
Vector fields.
Each [[extractions.vector]] block declares one HNSW index. Per-table dimensionality is enforced — a write at the wrong dim is rejected at parse time.
[[extractions.vector]]
field = "summary_embedding"
dim = 1024 # enforced at write time
metric = "cosine" # cosine | dot | l2
Index defaults: M=16, ef_construction=200. Each topk call picks fast or high_recall at request time — the index is built once, the search width tunes per call. See vector reference.
Register a manifest.
POST the raw TOML to the schemas route. The body is text/plain, not JSON.
curl -X POST "https://<tenant>.<region>.db.originchain.ai/v1/tenants/$T/schemas" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: text/plain" \
--data-binary @manifest.toml
Re-registering the same id with a higher version triggers an online migration. The server stays writable through backfill and cuts over atomically.
Schema evolution.
Manifest changes follow a strict contract: monotonic version int, one of four allowed shapes per migration (add column with default, drop column, add index, add extraction), a 10% backfill rate so live traffic stays prioritised, dual-read transform during backfill, atomic cutover, abort-only-pre-cutover.
The full failure model and cutover procedure live on ops → migrations. The HTTP migration endpoints are in the API reference.