OriginChain
Industries · contracts, matters, audit

AI database for legal & compliance. Contracts, clauses, and audit on one bearer token.

The problem

Legal teams keep contracts in DMS, search them with one tool, retrieve similar clauses with another, and prove who looked at what with a third — and the audit trail rarely lines up across systems.

The OriginChain answer

OriginChain stores contracts, clause embeddings, matter records, and the access-audit log on one substrate. HNSW finds the five clauses most similar to a draft at recall@10 = 0.96 with p99 109 ms (high_recall, default) — or p99 37 ms in fast mode when a re-ranker takes over. BM25 ranks briefs that mention 'force majeure pandemic' in 14 ms, and SQL JOINs roll those into matter-level reports. Append-only audit on the same store; SOC 2 Type 1 underway, contact for audit timeline.

vector recall@10 · 100k
0.96
BM25 search p99
< 16 ms
audit
append-only, hash-chained
tenancy
single-tenant region-isolated
what they use OriginChain for

One bearer token. One endpoint. Every query shape.

Each example below is a real call against the public HTTP API. Copy the curl, set $OC_TOKEN, and you'll see the same shape of response your app gets in production. Latency numbers are measured against a Storm-tier instance in ap-south-1.

Schemas you'd register

Register these once via oc schema put or the /v1/schema endpoint, and every example below resolves against them.

schema purpose key fields
contracts Contract registry + body text contract_id · counterparty · effective_from · text
clauses Atomic clause records clause_id · contract_id · kind · text
clauses_embed Clause embeddings for retrieval clause_id · embedding[768]
matters Active matters / engagements matter_id · client · lead_attorney · status
matter_briefs Brief / memo full text brief_id · matter_id · ts · author · text
audit_access Document-access audit ts · actor · doc_id · action

SQL for analytics and reconciliation

Standard SQL with JOIN, GROUP BY, HAVING, and window functions against the same store.

sql POST /v1/sql

Contracts auto-renewing in 60 days

request: SELECT contract_id, counterparty, expiration_date FROM contracts WHERE auto_renew = true AND expiration_date BETWEEN now() AND now() + interval '60 days' ORDER BY expiration_date
indexed range scan · ~38 ms · schemas: contracts
curl
curl -X POST https://oc-acme.ap-south-1.originchain.ai/v1/sql \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"sql":"SELECT contract_id, counterparty, expiration_date FROM contracts WHERE auto_renew = true AND expiration_date BETWEEN now() AND now() + interval ''60 days'' ORDER BY expiration_date"}'
response · application/json
{
  "rows": [
    { "contract_id": "MSA-4108", "counterparty": "Acme Corp",         "expiration_date": "2026-06-04" },
    { "contract_id": "SOW-7201", "counterparty": "Borealis Holdings", "expiration_date": "2026-06-18" }
  ],
  "meta": { "latency_ms": 38 }
}

Vector search for similarity

HNSW with tunable speed/recall. Default high_recall: recall@10 = 0.96 at 100k, p99 109 ms. Fast: p99 37 ms (recall 0.69). Metadata filters during graph traversal.

vector · hnsw POST /v1/vector/topk

Five clauses most similar to a draft indemnity

request: topk against clauses_embed for a draft indemnity clause
recall@10 = 0.96 · p99 109 ms at 100k clauses (high_recall) · schemas: clauses_embed
curl
curl -X POST https://oc-acme.ap-south-1.originchain.ai/v1/vector/topk \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "clauses_embed",
    "field":  "embedding",
    "query":  "@draft_indemnity_v3",
    "k":      5,
    "metric": "cosine",
    "filter": { "kind": "indemnity" }
  }'
response · application/json
{
  "rows": [
    { "clause_id": "CL-12081", "contract_id": "MSA-4108", "score": 0.974 },
    { "clause_id": "CL-08942", "contract_id": "MSA-3912", "score": 0.961 },
    { "clause_id": "CL-11104", "contract_id": "SOW-7201", "score": 0.948 }
  ],
  "meta": { "latency_ms": 109, "index_size": 100000, "mode": "high_recall" }
}

Full-text search with BM25

Unicode tokenizer, stop-words, language stemming. Phrase, OR, and field-scoped queries.

full-text · bm25 POST /v1/fts/search

Briefs mentioning 'force majeure pandemic'

request: BM25 across matter_briefs.text
BM25 + phrase boost · ~14 ms · schemas: matter_briefs
curl
curl -X POST https://oc-acme.ap-south-1.originchain.ai/v1/fts/search \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "matter_briefs",
    "field":  "text",
    "q":      "\"force majeure\" pandemic",
    "k":      25
  }'
response · application/json
{
  "rows": [
    { "brief_id": "B-44012", "matter_id": "M-2188", "score": 9.12, "snippet": "...force majeure clause invoked during the pandemic period..." },
    { "brief_id": "B-44188", "matter_id": "M-3041", "score": 8.41, "snippet": "...pandemic-related force majeure analysis attached..." }
  ],
  "meta": { "latency_ms": 14 }
}

Natural-language questions

Plain English in. JSON out. Compiled plan cached after first touch.

natural language POST /v1/ask

Top open matters by recent activity

request: show me the 10 matters with the most brief activity this week
compiled plan cached after first touch · schemas: matters · matter_briefs
curl
curl -X POST https://oc-acme.ap-south-1.originchain.ai/v1/ask \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"q":"top 10 matters by brief count, last 7 days, status open"}'
response · application/json
{
  "rows": [
    { "matter_id": "M-2188", "client": "Acme Corp",         "briefs_7d": 14 },
    { "matter_id": "M-3041", "client": "Crescent Holdings", "briefs_7d": 11 }
  ],
  "meta": { "latency_ms": 52, "plan": "scan(matter_briefs, 7d) · group_by(matter_id) · top_k(10) · join(matters)" }
}
why one substrate

Cross-shape consistency, by construction.

When SQL, vector, full-text, and graph all read from the same hash-keyed k/v store, a row written at 09:14:02.118 is visible to every shape on the next read. No ETL window, no replication lag, no consistency tax across vendors.

single-tenant

Region-isolated dedicated instance

Your data sits in your region, on a dedicated instance with its own keys and its own resource budget. No noisy-neighbour. No shared control plane.

durable

PITR + cross-AZ replication

Every write goes to a durable WAL, replicated to a hot standby in a second AZ. Restore to any second in your retention window.

observable

OTLP metrics + audit log

Per-key latency histograms, hit rate on the plan cache, and an append-only audit log of every privileged action — exported via OTLP to your observability stack.

ready when you are

Ninety seconds to an endpoint. No stack to wire up.

Pick a region, pick a tier, and we provision a single-tenant instance on AWS. The first query you send is the first query we'll show you how to write — in English.

talk to a human