Skip to content

Chat Completions

The Chat Completions API is the primary endpoint for generating text responses from AI models. It is fully compatible with the OpenAI Chat Completions API format.

Endpoint

POST https://toprouter.cc/chat/completions

Request Format

json
{
  "model": "google/gemini-3.5-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Parameters

ParameterTypeRequiredDescription
modelstringModel ID to use (e.g., google/gemini-3.5-flash)
messagesarrayArray of message objects
temperaturenumberSampling temperature (0-2), default varies by model
max_tokensintegerMaximum tokens in the response
streambooleanEnable streaming response, default false
top_pnumberNucleus sampling parameter
frequency_penaltynumberFrequency penalty (-2 to 2)
presence_penaltynumberPresence penalty (-2 to 2)
stopstring/arrayStop sequences

Message Roles

RoleDescription
systemSets the behavior and context for the AI
userThe user's input message
assistantPrevious AI responses (for multi-turn conversations)

Response Format

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1717500000,
  "model": "google/gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm here to help. What can I do for you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 15,
    "total_tokens": 35
  }
}

Streaming

Enable streaming to receive the response incrementally:

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-toprouter-key",
    base_url="https://toprouter.cc"
)

stream = client.chat.completions.create(
    model="anthropic/claude-4.6-sonnet",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Multi-Turn Conversations

Maintain conversation context by including previous messages:

python
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is a Python decorator?"},
    {"role": "assistant", "content": "A Python decorator is a function that modifies another function..."},
    {"role": "user", "content": "Can you show me an example?"}
]

response = client.chat.completions.create(
    model="anthropic/claude-4.6-sonnet",
    messages=messages
)

Vision (Multimodal)

Some models support image inputs:

python
response = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ]
)

TIP

Not all models support vision. Check the Models page for multimodal capabilities.

Unified AI API Gateway — Access 200+ models through one endpoint.