Ishaan Jaff
|
4d1b4beb3d
|
(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208)
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* fix test_embedding_caching_azure_individual_items_reordered
|
2024-10-14 16:34:01 +05:30 |
|
Ishaan Jaff
|
c8d15544c8
|
[Fix] Router cooldown logic - use % thresholds instead of allowed fails to cooldown deployments (#5698)
* move cooldown logic to it's own helper
* add new track deployment metrics folder
* increment success, fails for deployment in current minute
* fix cooldown logic
* fix test_aaarouter_dynamic_cooldown_message_retry_time
* fix test_single_deployment_no_cooldowns_test_prod_mock_completion_calls
* clean up get from deployment test
* fix _async_get_healthy_deployments
* add mock InternalServerError
* test deployment failing 25% requests
* add test_high_traffic_cooldowns_one_bad_deployment
* fix vertex load test
* add test for rate limit error models in cool down
* change default cooldown time
* fix cooldown message time
* fix cooldown on 429 error
* fix doc string for _should_cooldown_deployment
* fix sync cooldown logic router
|
2024-09-14 18:01:19 -07:00 |
|
Krrish Dholakia
|
cd7dd2a511
|
fix(cooldown_cache.py): fix linting errors
|
2024-08-27 07:40:28 -07:00 |
|
Krrish Dholakia
|
5572ad7241
|
fix(cooldown_cache.py): fix linting errors
|
2024-08-24 17:11:32 -07:00 |
|
Krrish Dholakia
|
33972cc79c
|
fix(router.py): enable dynamic retry after in exception string
Updates cooldown logic to cooldown individual models
Closes https://github.com/BerriAI/litellm/issues/1339
|
2024-08-24 16:59:30 -07:00 |
|