Rate Limits

TopRouter implements rate limits to ensure fair usage and service stability. This page explains how rate limits work and how to handle them.

How Rate Limits Work

Rate limits are applied on a per-API-key basis. Limits are measured in:

RPM — Requests Per Minute
TPM — Tokens Per Minute

Rate Limit Headers

Each API response includes headers with rate limit information:

x-ratelimit-limit-requests: 60
x-ratelimit-remaining-requests: 55
x-ratelimit-reset-requests: 30s
x-ratelimit-limit-tokens: 100000
x-ratelimit-remaining-tokens: 95000
x-ratelimit-reset-tokens: 15s

Handling Rate Limits

Check Response Headers

python

response = client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

# Access rate limit info from headers
# Implement pacing based on remaining quota

Exponential Backoff

When you receive a 429 status code, implement exponential backoff:

python

import time
import random

def retry_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if "429" in str(e):
                wait = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Best Practices

Implement retry logic — Always handle 429 errors gracefully
Add jitter — Use random delays to avoid thundering herd
Batch requests — Combine multiple prompts when possible
Cache responses — Store and reuse responses for identical queries
Monitor usage — Track your request patterns in the Console
Use streaming — Streaming doesn't reduce limits but improves perceived latency

Tips for High-Volume Usage

Use faster, more cost-effective models for high-volume tasks
Implement request queuing in your application
Distribute requests across multiple API keys if needed
Contact support for custom rate limit increases

INFO

Rate limits may vary by model and account tier. Check the Console for your current limits.

Rate Limits ​

How Rate Limits Work ​

Rate Limit Headers ​

Handling Rate Limits ​

Check Response Headers ​

Exponential Backoff ​

Best Practices ​

Tips for High-Volume Usage ​