OriginChain docs
examples · ask · 4 / 6

4. JOIN expressed in English

← Ask examples
what this does

"Total order amount per customer email" needs a join: orders has the amount, customers has the email. The compiler sees the declared relation on customer_id and folds it into a hash join under an aggregate.

when to use it
  • The question pulls fields from more than one table and you've declared the relation between them.
  • You want the answer phrased in terms of a human-readable field (email) rather than the underlying foreign key (customer_id).
  • You'd rather not maintain hand-rolled JOIN SQL for evolving reports.

List both schemas in the hint. The compiler can only join across schemas that are in the visible catalog for this call.

the schemas

Register both. The [[relations]] block on shop.orders is what makes the join discoverable.

# Two related schemas. The relation lets the compiler walk customer_id
# from orders to id on customers.

namespace   = "shop"
table       = "customers"
primary_key = ["id"]

[[columns]]
name = "id"
ty   = "str"
required = true

[[columns]]
name = "email"
ty   = "str"

---

namespace   = "shop"
table       = "orders"
primary_key = ["id"]

[[columns]]
name = "id"
ty   = "str"
required = true

[[columns]]
name = "customer_id"
ty   = "str"

[[columns]]
name = "amount"
ty   = "f64"

[[relations]]
name    = "customer"
columns = ["customer_id"]
references = { schema = "shop.customers", columns = ["id"] }
seed data
# Two customers, three orders between them.
curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/rows/shop.customers/_batch" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    { "id": "c_1", "email": "alice@example.com" },
    { "id": "c_2", "email": "bob@example.com" }
  ]'

curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/rows/shop.orders/_batch" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    { "id": "o_1", "customer_id": "c_1", "amount": 19.00 },
    { "id": "o_2", "customer_id": "c_1", "amount": 43.40 },
    { "id": "o_3", "customer_id": "c_2", "amount": 129.00 }
  ]'
the request
POST /v1/tenants/:t/ask
curl -X POST "https://$OC_HOST/v1/tenants/$OC_TENANT/ask" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "nl":      "total order amount per customer email",
    "schemas": ["shop.orders", "shop.customers"]
  }'
what you get back
{
  "rows": [
    { "email": "alice@example.com", "total":  62.40 },
    { "email": "bob@example.com",   "total": 129.00 }
  ],
  "cache": "miss"
}

One row per distinct email, with the summed amount. Order is by the underlying hash-aggregator's bucket order - if you need a specific order, add "sorted by total descending" to the question.

how it works
  • The compiler reads both schemas and discovers the relation on shop.orders.customer_id → shop.customers.id.
  • The phrase "per customer email" pins the grouping column to customers.email, which forces a join.
  • The resulting plan is Aggregate(sum amount) → HashJoin(orders.customer_id = customers.id) → Scan(orders) + Scan(customers).
  • Set show_plan: true to see which side of the join the planner picked as the build vs probe.
common mistakes
  • Ambiguous "by customer". "Per customer" alone leaves the grouping column under-specified - the compiler may pick customer_id. Say "per customer email" or "per customer name" so the result is human-readable.
  • Missing the relation declaration. Without [[relations]] on one side, the compiler has no way to join the two schemas and returns no_plan_compiled.
  • Only listing one schema in the hint. The hint is an upper bound on what the compiler can see. If only shop.orders is in the list, customers is invisible and the join fails to compile.