Chat Completions

The Chat Completions API is the primary endpoint for generating text responses from AI models. It is fully compatible with the OpenAI Chat Completions API format.

Endpoint

POST https://toprouter.cc/chat/completions

Request Format

json

{
  "model": "google/gemini-3.5-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Parameters

Parameter	Type	Required	Description
`model`	string	✅	Model ID to use (e.g., `google/gemini-3.5-flash`)
`messages`	array	✅	Array of message objects
`temperature`	number	❌	Sampling temperature (0-2), default varies by model
`max_tokens`	integer	❌	Maximum tokens in the response
`stream`	boolean	❌	Enable streaming response, default `false`
`top_p`	number	❌	Nucleus sampling parameter
`frequency_penalty`	number	❌	Frequency penalty (-2 to 2)
`presence_penalty`	number	❌	Presence penalty (-2 to 2)
`stop`	string/array	❌	Stop sequences

Message Roles

Role	Description
`system`	Sets the behavior and context for the AI
`user`	The user's input message
`assistant`	Previous AI responses (for multi-turn conversations)

Response Format

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1717500000,
  "model": "google/gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm here to help. What can I do for you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 15,
    "total_tokens": 35
  }
}

Streaming

Enable streaming to receive the response incrementally:

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-toprouter-key",
    base_url="https://toprouter.cc"
)

stream = client.chat.completions.create(
    model="anthropic/claude-4.6-sonnet",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Multi-Turn Conversations

Maintain conversation context by including previous messages:

python

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is a Python decorator?"},
    {"role": "assistant", "content": "A Python decorator is a function that modifies another function..."},
    {"role": "user", "content": "Can you show me an example?"}
]

response = client.chat.completions.create(
    model="anthropic/claude-4.6-sonnet",
    messages=messages
)

Vision (Multimodal)

Some models support image inputs:

python

response = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ]
)

TIP

Not all models support vision. Check the Models page for multimodal capabilities.

Chat Completions ​

Endpoint ​

Request Format ​

Parameters ​

Message Roles ​

Response Format ​

Streaming ​

Multi-Turn Conversations ​

Vision (Multimodal) ​

Chat Completions

Endpoint

Request Format

Parameters

Message Roles

Response Format

Streaming

Multi-Turn Conversations

Vision (Multimodal)