docs responses routing
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 19s
Helm unit test / unit-test (push) Successful in 52s

This commit is contained in:
Ishaan Jaff 2025-04-21 23:05:53 -07:00
parent a7db0df043
commit ebfff975d4

View file

@ -631,10 +631,3 @@ follow_up = client.responses.create(
</TabItem> </TabItem>
</Tabs> </Tabs>
#### How It Works
1. When a user makes an initial request to the Responses API, LiteLLM caches which model deployment that returned the specific response. (Stored in Redis if you connected LiteLLM to Redis)
2. When a subsequent request includes `previous_response_id`, LiteLLM automatically routes it to the same deployment
3. If the original deployment is unavailable, or if the `previous_response_id` isn't found in the cache, LiteLLM falls back to normal routing