/responses [Beta]

LiteLLM provides a BETA endpoint in the spec of OpenAI's /responses API

Feature	Supported	Notes
Cost Tracking	✅	Works with all supported models
Logging	✅	Works across all integrations
End-user Tracking	✅
Streaming	✅
Fallbacks	✅	Works between supported models
Loadbalancing	✅	Works between supported models
Supported LiteLLM Versions	1.63.8+
Supported LLM providers	`openai`

Usage

Create a model response

Non-streaming

import litellm

# Non-streaming response
response = litellm.responses(
    model="gpt-4o",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

import litellm

# Streaming response
response = litellm.responses(
    model="gpt-4o",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

First, add this to your litellm proxy config.yaml:

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY

Start your LiteLLM proxy:

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Then use the OpenAI SDK pointed to your proxy:

Non-streaming

from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="http://localhost:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="gpt-4o",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="http://localhost:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="gpt-4o",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

2.4 KiB Raw Blame History

/responses [Beta]

Usage

Create a model response

Non-streaming

Streaming

Non-streaming

Streaming

2.4 KiB

Raw Blame History