reference · rate limits

Rate limits

Rate limits protect an instance from runaway requests - a buggy loop, a forgotten while True, an autoscaling event. The default limit is generous enough for most apps; the goal is to bound damage from accidents, not to gate normal usage.

Default limits.

Window	Limit per token	Notes
per minute	1,000 requests	The primary rate limit. Counted per bearer token.
burst	100 requests	Short-window burst allowance. Lets you do a quick batch of writes without hitting the per-minute cap.
concurrent	25 in-flight	Maximum simultaneous open requests per token.

Limits are per-token, not per-instance. If you create multiple tokens (e.g., one per service), each gets its own budget. If you need higher limits, contact support@originchain.ai - production accounts can have limits raised on request.

Response headers.

Every successful response carries your current usage. Use these to back off proactively before you hit the limit.

# Every API response includes these headers:
X-RateLimit-Limit:     1000        # requests per minute for this token
X-RateLimit-Remaining: 847         # what you have left this minute
X-RateLimit-Reset:     1714478400  # epoch seconds when the window resets

# On 429 responses you also get:
Retry-After: 12                    # seconds

When X-RateLimit-Remaining drops near zero, your code can slow itself down voluntarily instead of waiting for the 429 response.

Handling 429.

When you exceed the limit, OriginChain returns 429 rate_limited with a Retry-After header (seconds). The SDKs handle this automatically - they read the header, wait, and retry up to 3 times. If you're calling the API directly, do the same:

retry on 429

# The Python SDK handles 429 + Retry-After automatically.
# This is what it does under the hood:

import time

def call_with_retry(fn, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            return fn()
        except OCRateLimitedError as e:
            if attempt == max_retries:
                raise
            time.sleep(e.retry_after or 1.0)

// Same logic for raw fetch users:
async function callWithRetry(req: () => Promise<Response>, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await req();
    if (res.status !== 429) return res;
    if (attempt === maxRetries) return res;
    const wait = parseInt(res.headers.get("Retry-After") ?? "1") * 1000;
    await new Promise(r => setTimeout(r, wait));
  }
  throw new Error("unreachable");
}

// Same logic with stdlib http:
for attempt := 0; attempt <= 3; attempt++ {
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return err }
    if resp.StatusCode != 429 { return nil }
    if attempt == 3 { return fmt.Errorf("rate limited") }

    wait := 1
    if h := resp.Header.Get("Retry-After"); h != "" {
        fmt.Sscanf(h, "%d", &wait)
    }
    time.Sleep(time.Duration(wait) * time.Second)
}

common mistakes

Tight retry loops without backoff. Retrying instantly on 429 just consumes the next minute's quota. Always honor Retry-After.
Sharing one token across many workers. Per-token limits mean 25 workers sharing a token share 25 concurrent slots. Issue one token per worker or per service.
Bulk inserts in single-row mode. Sending 10,000 single-row inserts will hit the rate limit. Use _batch instead - one request, thousands of rows. See Insert → bulk.