Core concepts.
OriginChain is a hash-keyed key/value substrate with a Plan tree on top. SQL, vector, full-text, and graph are not separate engines — they are different key shapes and different Plan operators over the same store. Understand the substrate, the keys, and the Plan and the rest of the surface follows.
A single hash-keyed k/v store.
The engine is a single B-tree-free, hash-indexed key/value store fronted by a write-ahead log. Every write is appended to the WAL, fsynced, then applied. Reads go through a process-wide page cache. There is no row-store / column-store / vector-engine split — every domain is a different prefix on the same keyspace.
Each tenant gets a single, region-isolated EC2 instance. No shared compute, no noisy neighbour. Writes go to one primary; an optional sync follower ships WAL frames in lockstep for RPO=0 paid tiers. The follower bootstraps from a Frame::Snapshot transfer and then tails.
Declared in TOML.
A schema manifest declares columns, indexes, relations to other schemas, and extractions (chunked text fields, vector fields, FTS fields). The catalog is itself stored as rows — adding a field is a write, not a downtime migration.
# schemas/orders.toml
name = "orders"
version = 1
[[columns]]
name = "id" ; type = "ulid" ; pk = true
[[columns]]
name = "customer" ; type = "ulid"
[[columns]]
name = "amount" ; type = "decimal"
[[columns]]
name = "status" ; type = "string"
[[columns]]
name = "placed" ; type = "timestamp"
[[indexes]]
columns = ["status"]
[[indexes]]
columns = ["customer", "placed"]
[[relations]]
edge = "customer"
target = "customers" # rel|fwd|orders|customer|<order>|<customer>
[[extractions.fts]]
field = "notes"
analyzer = "english" # snowball stem + diacritics fold + stop-words
[[extractions.vector]]
field = "summary_embedding"
dim = 1024
metric = "cosine" Indexes, relations, and extractions are honoured at write time — no separate "build index" step. Online migrations follow a strict contract: monotonic version int, one of four allowed shapes per migration, eager 10% backfill, dual-read transform, atomic cutover. See ops → migrations.
Ten domain prefixes in production.
Every byte stored on disk lives under one of these prefixes. SQL reads row|* and idx|*. Graph reads rel|*. Full-text reads fts*. Vector reads vec*. The plan cache is intent|*.
| Prefix | Byte layout | Purpose |
|---|---|---|
| row | row|<schema>|<pk_bytes> | The primary user-facing record. PK can be ULID, UUID, string, or composite. Encodes a single row as MessagePack. |
| idx | idx|<schema>|<column>|<value_bytes>|<pk_bytes> | Secondary index entries. Hash-keyed range scans work via prefix iteration on (schema, column, value). |
| rel | rel|fwd|<schema>|<edge>|<src_pk>|<dst_pk> · rel|rev|<schema>|<edge>|<dst_pk>|<src_pk> | Edges between rows. Always written in pairs (forward + reverse) so neighbours and reverse-neighbours are both O(prefix-scan). |
| chunk | chunk|<schema>|<row_pk>|<seq_u32> | Document chunks for FTS / vector — splits long text fields into addressable units while keeping the parent row intact. |
| fts | fts|<schema>|<field>|<token>|<row_pk>|<positions> | Inverted-index posting list. Token is post-tokeniser (UAX #29 + optional Snowball stem). Positions back phrase queries. |
| fts_doclen | fts_doclen|<schema>|<field>|<row_pk> | Per-doc length cache for BM25 scoring. Updated atomically with each fts insert. |
| fts_corpus | fts_corpus|<schema>|<field> | Corpus-wide stats: doc count, total token count, average doc length. One key per (schema, field). |
| vec | vec|<schema>|<field>|<row_pk> | Raw f32 embedding vector — the value the SIMD distance kernel reads. |
| vec_idx | vec_idx|<schema>|<field>|<segment_id> | Serialised HNSW graph segment. Loaded once per process into the graph cache, evicted on schema migration. |
| intent | intent|<question_hash> | Plan cache entry — the compiled Plan tree for a question template. Skips the LLM compile on cache hit. |
Eleven operators, one tree.
Both /sql and /ask compile to the same Plan tree. The tree is JSON-serialisable, cached by question hash under intent|*, and replayable. Every shipped query shape is one of these operators or a composition.
Scan ColumnScan IndexScan Filter Project Limit Sort Aggregate HashJoin OuterJoin RelationHop -- SELECT c.name, SUM(o.amount) FROM orders o
-- JOIN customers c ON c.id = o.customer
-- WHERE o.status = 'paid' GROUP BY c.name HAVING SUM(o.amount) > 1000;
Aggregate { group: [c.name], having: SUM(o.amount) > 1000 }
└── Project { c.name, o.amount }
└── HashJoin { o.customer = c.id }
├── IndexScan { idx|orders|status = "paid" }
└── Scan { row|customers } Active-passive, sync.
One primary, one optional sync follower. WAL frames replicate before the primary returns 200. A follower joining a running cluster bootstraps via Frame::Snapshot — a chunked transfer of every key in the store — then tails the live frame stream from the snapshot's LSN.
| Mode | RPO | RTO | Notes |
|---|---|---|---|
| Primary only | ~0.5s (WAL fsync) | ~5–10 min (S3 restore) | Whisper tier default. Restore replays sealed segments + tail. |
| Sync follower | 0 | ~25s (drilled) | Thunder, Storm, and Enterprise tiers. Verified end-to-end with snapshot bootstrap. |
Active-passive sync replication is the production path. Commits durably ack only after the follower has the frame on disk. RPO=0, RTO ~25 s. See ops → failover for the promotion procedure.
Single-row optimistic CAS.
Every row carries an internal _oc_row_version field. The API exposes put_row_cas, get_row_versioned, and delete_row_cas for optimistic concurrency. A CAS that loses the race fails the entire batch with a deterministic error — no partial application. Idempotency keys make retries safe; the same key plus the same body returns the original response, a different body with the same key returns 409.
Sealed segments + continuous tail.
Two streams flow to S3 in parallel: sealed WAL segments shipped on roll, and a continuous tail-shipper that flushes the open segment every few hundred milliseconds. WAL frame v2 carries an embedded timestamp_micros so restore-to-timestamp resolves below the segment boundary.
# restore an instance to a wall-clock timestamp
oc-pitr restore \
--tenant acme \
--target "2026-04-29T18:42:00Z" \
--into acme-restore-001 Sealed-segment PITR (segment-boundary granularity, ~5–10 min restore window) is included on every tier. Intra-segment LSN-precise PITR (sub-second granularity, ~0.5–1.5 s data-loss window) is a paid add-on — see pricing.