import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Timeouts The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well. ### Global Timeouts ```python from litellm import Router model_list = [{...}] router = Router(model_list=model_list, timeout=30) # raise timeout error if call takes > 30s print(response) ``` ```yaml router_settings: timeout: 30 # sets a 30s timeout for the entire call ``` **Start Proxy** ```shell $ litellm --config /path/to/config.yaml ``` ### Custom Timeouts, Stream Timeouts - Per Model For each model you can set `timeout` & `stream_timeout` under `litellm_params` ```python from litellm import Router import asyncio model_list = [{ "model_name": "gpt-3.5-turbo", "litellm_params": { "model": "azure/chatgpt-v-2", "api_key": os.getenv("AZURE_API_KEY"), "api_version": os.getenv("AZURE_API_VERSION"), "api_base": os.getenv("AZURE_API_BASE"), "timeout": 300 # sets a 5 minute timeout "stream_timeout": 30 # sets a 30s timeout for streaming calls } }] # init router router = Router(model_list=model_list, routing_strategy="least-busy") async def router_acompletion(): response = await router.acompletion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}] ) print(response) return response asyncio.run(router_acompletion()) ``` ```yaml model_list: - model_name: gpt-3.5-turbo litellm_params: model: azure/gpt-turbo-small-eu api_base: https://my-endpoint-europe-berri-992.openai.azure.com/ api_key: timeout: 0.1 # timeout in (seconds) stream_timeout: 0.01 # timeout for stream requests (seconds) max_retries: 5 - model_name: gpt-3.5-turbo litellm_params: model: azure/gpt-turbo-small-ca api_base: https://my-endpoint-canada-berri992.openai.azure.com/ api_key: timeout: 0.1 # timeout in (seconds) stream_timeout: 0.01 # timeout for stream requests (seconds) max_retries: 5 ``` **Start Proxy** ```shell $ litellm --config /path/to/config.yaml ``` ### Setting Dynamic Timeouts - Per Request LiteLLM supports setting a `timeout` per request **Example Usage** ```python from litellm import Router model_list = [{...}] router = Router(model_list=model_list) response = router.completion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "what color is red"}], timeout=1 ) ``` ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "gpt-3.5-turbo", "messages": [ {"role": "user", "content": "what color is red"} ], "logit_bias": {12481: 100}, "timeout": 1 }' ``` ```python import openai client = openai.OpenAI( api_key="anything", base_url="http://0.0.0.0:4000" ) response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "user", "content": "what color is red"} ], logit_bias={12481: 100}, extra_body={"timeout": 1} # 👈 KEY CHANGE ) print(response) ``` ## Testing timeout handling To test if your retry/fallback logic can handle timeouts, you can set `mock_timeout=True` for testing. This is currently only supported on `/chat/completions` and `/completions` endpoints. Please [let us know](https://github.com/BerriAI/litellm/issues) if you need this for other endpoints. ```bash curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer sk-1234' \ --data-raw '{ "model": "gemini/gemini-1.5-flash", "messages": [ {"role": "user", "content": "hi my email is ishaan@berri.ai"} ], "mock_timeout": true # 👈 KEY CHANGE }' ```