mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-25 18:54:30 +00:00
[Fix] Router cooldown logic - use % thresholds instead of allowed fails to cooldown deployments (#5698)
* move cooldown logic to it's own helper * add new track deployment metrics folder * increment success, fails for deployment in current minute * fix cooldown logic * fix test_aaarouter_dynamic_cooldown_message_retry_time * fix test_single_deployment_no_cooldowns_test_prod_mock_completion_calls * clean up get from deployment test * fix _async_get_healthy_deployments * add mock InternalServerError * test deployment failing 25% requests * add test_high_traffic_cooldowns_one_bad_deployment * fix vertex load test * add test for rate limit error models in cool down * change default cooldown time * fix cooldown message time * fix cooldown on 429 error * fix doc string for _should_cooldown_deployment * fix sync cooldown logic router
This commit is contained in:
parent
7c2ddba6c6
commit
c8d15544c8
11 changed files with 836 additions and 175 deletions
|
@ -5976,6 +5976,10 @@ def check_valid_key(model: str, api_key: str):
|
|||
|
||||
def _should_retry(status_code: int):
|
||||
"""
|
||||
Retries on 408, 409, 429 and 500 errors.
|
||||
|
||||
Any client error in the 400-499 range that isn't explicitly handled (such as 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, etc.) would not trigger a retry.
|
||||
|
||||
Reimplementation of openai's should retry logic, since that one can't be imported.
|
||||
https://github.com/openai/openai-python/blob/af67cfab4210d8e497c05390ce14f39105c77519/src/openai/_base_client.py#L639
|
||||
"""
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue