Your key. Your audit. Your bill.
Most managed databases that do natural-language queries mark up LLM tokens 2–5×. We don't bill them at all. For high-volume RAG, this changes the unit economics outright.
At scale, the LLM bill dwarfs the database bill.
A RAG product doing 1M /ask calls per month at
$0.01 each runs $10,000 on someone's books. On a platform that marks up LLM tokens, that platform
pockets a multiple of the actual provider cost. With BYOK, that $10,000 lands on your provider's
invoice at your enterprise rate.
You also see every prompt, every completion, every token in your provider's dashboard. No intermediary, no batch reporting delay, no "trust us" line item.
POST /v1/llm/keys
{
"provider": "anthropic",
"key": "sk-ant-..."
} Chat completions.
Messages.
Chat.
Fast inference for cost-sensitive workloads.
Your key is never stored in plaintext anywhere on our infrastructure.
Each tenant's keys are scoped and unreadable from other tenants.
Provider, model, prompt tokens, completion tokens, timestamp - visible in /usage.
For self-hosted providers: hostname checks + private-IP block.
Pin a fallback. Transparent retry on error or timeout.
Configure a fallback provider per tenant. If your primary provider returns an error, rate-limits you, or times out, the next request retries on the fallback. Audit captures which provider actually answered.
- under designPer-call rate limits per provider.
- under designPer-tenant LLM cost reporting inside
/usage.