Rate limits
Rate limits protect an instance from runaway requests - a buggy loop, a forgotten while True, an autoscaling event. The default limit is generous enough for most apps; the goal is to bound damage from accidents, not to gate normal usage.
Default limits.
| Window | Limit per token | Notes |
|---|---|---|
| per minute | 1,000 requests | The primary rate limit. Counted per bearer token. |
| burst | 100 requests | Short-window burst allowance. Lets you do a quick batch of writes without hitting the per-minute cap. |
| concurrent | 25 in-flight | Maximum simultaneous open requests per token. |
Limits are per-token, not per-instance. If you create multiple tokens (e.g., one per service), each gets its own budget. If you need higher limits, contact support@originchain.ai - production accounts can have limits raised on request.
Response headers.
Every successful response carries your current usage. Use these to back off proactively before you hit the limit.
# Every API response includes these headers:
X-RateLimit-Limit: 1000 # requests per minute for this token
X-RateLimit-Remaining: 847 # what you have left this minute
X-RateLimit-Reset: 1714478400 # epoch seconds when the window resets
# On 429 responses you also get:
Retry-After: 12 # seconds
When X-RateLimit-Remaining drops near zero, your code can slow itself down voluntarily instead of waiting for the 429 response.
Handling 429.
When you exceed the limit, OriginChain returns 429 rate_limited with a Retry-After header (seconds). The SDKs handle this automatically - they read the header, wait, and retry up to 3 times. If you're calling the API directly, do the same:
# The Python SDK handles 429 + Retry-After automatically.
# This is what it does under the hood:
import time
def call_with_retry(fn, max_retries=3):
for attempt in range(max_retries + 1):
try:
return fn()
except OCRateLimitedError as e:
if attempt == max_retries:
raise
time.sleep(e.retry_after or 1.0)// Same logic for raw fetch users:
async function callWithRetry(req: () => Promise<Response>, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await req();
if (res.status !== 429) return res;
if (attempt === maxRetries) return res;
const wait = parseInt(res.headers.get("Retry-After") ?? "1") * 1000;
await new Promise(r => setTimeout(r, wait));
}
throw new Error("unreachable");
}// Same logic with stdlib http:
for attempt := 0; attempt <= 3; attempt++ {
resp, err := http.DefaultClient.Do(req)
if err != nil { return err }
if resp.StatusCode != 429 { return nil }
if attempt == 3 { return fmt.Errorf("rate limited") }
wait := 1
if h := resp.Header.Get("Retry-After"); h != "" {
fmt.Sscanf(h, "%d", &wait)
}
time.Sleep(time.Duration(wait) * time.Second)
} - Tight retry loops without backoff. Retrying instantly on 429 just consumes the next minute's quota. Always honor
Retry-After. - Sharing one token across many workers. Per-token limits mean 25 workers sharing a token share 25 concurrent slots. Issue one token per worker or per service.
- Bulk inserts in single-row mode. Sending 10,000 single-row inserts will hit the rate limit. Use
_batchinstead - one request, thousands of rows. See Insert → bulk.