Ishaan Jaff
6220e17ebf
(feat proxy) v2 - model max budgets ( #7302 )
...
* clean up unused code
* add _PROXY_VirtualKeyModelMaxBudgetLimiter
* adjust type imports
* working _PROXY_VirtualKeyModelMaxBudgetLimiter
* fix user_api_key_model_max_budget
* fix user_api_key_model_max_budget
* update naming
* update naming
* fix changes to RouterBudgetLimiting
* test_call_with_key_over_model_budget
* test_call_with_key_over_model_budget
* handle _get_request_model_budget_config
* e2e test for test_call_with_key_over_model_budget
* clean up test
* run ci/cd again
* add validate_model_max_budget
* docs fix
* update doc
* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter
* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
050499ec8f
Litellm dev readd prompt caching ( #7299 )
...
* fix(router.py): re-add saving model id on prompt caching valid successful deployment
* fix(router.py): introduce optional pre_call_checks
isolate prompt caching logic in a separate file
* fix(prompt_caching_deployment_check.py): fix import
* fix(router.py): new 'async_filter_deployments' event hook
allows custom logger to filter deployments returned to routing strategy
* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing
* fix(cooldown_callbacks.py): fix linting error
* fix(budget_limiter.py): move budget logger to async_filter_deployment hook
* test: add unit test
* test(test_router_helper_utils.py): add unit testing
* fix(budget_limiter.py): fix linting errors
* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Ishaan Jaff
533381d4ad
tag budgets fixes
2024-12-18 10:28:37 -08:00
Ishaan Jaff
2459f9735d
(feat) Add Tag-based budgets on litellm router / proxy ( #7236 )
...
* add BudgetConfig
* add _get_tags_from_request_kwargs
* test_tag_budgets_e2e_test_expect_to_fail
* add a check for request tags
* fix _async_get_cache_keys_for_router_budget_limiting
* fix test
* fix _sync_in_memory_spend_with_redis
* _async_get_cache_keys_for_router_budget_limiting
* fix _init_tag_budgets
* fix type casting
* docs show error for tag budget limit hit
* fix _get_tags_from_request_kwargs
* fix undo change
2024-12-14 17:28:36 -08:00
Ishaan Jaff
bc46916bb3
(feat - Router / Proxy ) Allow setting budget limits per LLM deployment ( #7220 )
...
* fix test_deployment_budget_limits_e2e_test
* refactor async_log_success_event to track spend for provider + deployment
* fix format
* rename class to RouterBudgetLimiting
* rename func
* rename types used for budgets
* add new types for deployment budgets
* add budget limits for deployments
* fix checking budgets set for provider
* update file names
* fix linting error
* _track_provider_remaining_budget_prometheus
* async_filter_deployments
* fix model list passed to router
* update error
* test_deployment_budgets_e2e_test_expect_to_fail
* fix test case
* run deployment budget limits
2024-12-13 19:15:51 -08:00