Skip to content

Rate Limits

TopRouter implements rate limits to ensure fair usage and service stability. This page explains how rate limits work and how to handle them.

How Rate Limits Work

Rate limits are applied on a per-API-key basis. Limits are measured in:

  • RPM — Requests Per Minute
  • TPM — Tokens Per Minute

Rate Limit Headers

Each API response includes headers with rate limit information:

x-ratelimit-limit-requests: 60
x-ratelimit-remaining-requests: 55
x-ratelimit-reset-requests: 30s
x-ratelimit-limit-tokens: 100000
x-ratelimit-remaining-tokens: 95000
x-ratelimit-reset-tokens: 15s

Handling Rate Limits

Check Response Headers

python
response = client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

# Access rate limit info from headers
# Implement pacing based on remaining quota

Exponential Backoff

When you receive a 429 status code, implement exponential backoff:

python
import time
import random

def retry_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if "429" in str(e):
                wait = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Best Practices

  1. Implement retry logic — Always handle 429 errors gracefully
  2. Add jitter — Use random delays to avoid thundering herd
  3. Batch requests — Combine multiple prompts when possible
  4. Cache responses — Store and reuse responses for identical queries
  5. Monitor usage — Track your request patterns in the Console
  6. Use streaming — Streaming doesn't reduce limits but improves perceived latency

Tips for High-Volume Usage

  • Use faster, more cost-effective models for high-volume tasks
  • Implement request queuing in your application
  • Distribute requests across multiple API keys if needed
  • Contact support for custom rate limit increases

INFO

Rate limits may vary by model and account tier. Check the Console for your current limits.

Unified AI API Gateway — Access 200+ models through one endpoint.