[Fix] Router cooldown logic - use % thresholds instead of allowed fails to cooldown deployments (#5698)

* move cooldown logic to it's own helper

* add new track deployment metrics folder

* increment success, fails for deployment in current minute

* fix cooldown logic

* fix test_aaarouter_dynamic_cooldown_message_retry_time

* fix test_single_deployment_no_cooldowns_test_prod_mock_completion_calls

* clean up get from deployment test

* fix _async_get_healthy_deployments

* add mock InternalServerError

* test deployment failing 25% requests

* add test_high_traffic_cooldowns_one_bad_deployment

* fix vertex load test

* add test for rate limit error models in cool down

* change default cooldown time

* fix cooldown message time

* fix cooldown on 429 error

* fix doc string for _should_cooldown_deployment

* fix sync cooldown logic router
This commit is contained in:
Ishaan Jaff 2024-09-14 18:01:19 -07:00 committed by GitHub
parent fc0dd3e3c2
commit 8f155327f6
11 changed files with 836 additions and 175 deletions

View file

@ -528,6 +528,15 @@ def mock_completion(
llm_provider=getattr(mock_response, "llm_provider", custom_llm_provider or "openai"), # type: ignore
model=model,
)
elif (
isinstance(mock_response, str)
and mock_response == "litellm.InternalServerError"
):
raise litellm.InternalServerError(
message="this is a mock internal server error",
llm_provider=getattr(mock_response, "llm_provider", custom_llm_provider or "openai"), # type: ignore
model=model,
)
elif isinstance(mock_response, str) and mock_response.startswith(
"Exception: content_filter_policy"
):