07 · byok llm

Your key. Your audit. Your bill.

Most managed databases that do natural-language queries mark up LLM tokens 2–5×. We don't bill them at all. For high-volume RAG, this changes the unit economics outright.

why this matters

At scale, the LLM bill dwarfs the database bill.

A RAG product doing 1M /ask calls per month at $0.01 each runs $10,000 on someone's books. On a platform that marks up LLM tokens, that platform pockets a multiple of the actual provider cost. With BYOK, that $10,000 lands on your provider's invoice at your enterprise rate.

You also see every prompt, every completion, every token in your provider's dashboard. No intermediary, no batch reporting delay, no "trust us" line item.

configure a key

POST /v1/llm/keys
{
  "provider": "anthropic",
  "key":      "sk-ant-..."
}

providers

OpenAI

Chat completions.

Anthropic

Messages.

Gemini

Chat.

Groq

Fast inference for cost-sensitive workloads.

security model

Envelope-encrypted at rest

Your key is never stored in plaintext anywhere on our infrastructure.

Per-tenant isolation

Each tenant's keys are scoped and unreadable from other tenants.

Per-provider audit

Provider, model, prompt tokens, completion tokens, timestamp - visible in /usage.

Whitelist validation

For self-hosted providers: hostname checks + private-IP block.

failover

Pin a fallback. Transparent retry on error or timeout.

Configure a fallback provider per tenant. If your primary provider returns an error, rate-limits you, or times out, the next request retries on the fallback. Audit captures which provider actually answered.

deferred

under designPer-call rate limits per provider.
under designPer-tenant LLM cost reporting inside /usage.

See how /ask uses it.

/ask details → Architecture Try it