* fix(proxy_server.py): only update k,v pair if v is not empty/null
Fixes https://github.com/BerriAI/litellm/issues/6787
* test(test_router.py): cleanup duplicate calls
* test: add new test stream options drop params test
* test: update optional params / stream options test to test for vertex ai mistral route specifically
Addresses https://github.com/BerriAI/litellm/issues/7309
* fix(proxy_server.py): fix linting errors
* fix: fix linting errors
* fix(main.py): support passing max retries to azure/openai embedding integrations
Fixes https://github.com/BerriAI/litellm/issues/7003
* feat(team_endpoints.py): allow updating team model aliases
Closes https://github.com/BerriAI/litellm/issues/6956
* feat(router.py): allow specifying model id as fallback - skips any cooldown check
Allows a default model to be checked if all models in cooldown
s/o @micahjsmith
* docs(reliability.md): add fallback to specific model to docs
* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util
Allows user to identify if messages/tools have prompt caching
Related issue: https://github.com/BerriAI/litellm/issues/6784
* feat(router.py): store model id for prompt caching valid prompt
Allows routing to that model id on subsequent requests
* fix(router.py): only cache if prompt is valid prompt caching prompt
prevents storing unnecessary items in cache
* feat(router.py): support routing prompt caching enabled models to previous deployments
Closes https://github.com/BerriAI/litellm/issues/6784
* test: fix linting errors
* feat(databricks/): convert basemodel to dict and exclude none values
allow passing pydantic message to databricks
* fix(utils.py): ensure all chat completion messages are dict
* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)
* add custom_llm_provider to SpendLogsPayload
* add custom_llm_provider to SpendLogs
* add custom llm provider to SpendLogs payload
* test_spend_logs_payload
* Add MLflow to the side bar (#7031)
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* (bug fix) SpendLogs update DB catch all possible DB errors for retrying (#7082)
* catch DB_CONNECTION_ERROR_TYPES
* fix DB retry mechanism for SpendLog updates
* use DB_CONNECTION_ERROR_TYPES in auth checks
* fix exp back off for writing SpendLogs
* use _raise_failed_update_spend_exception to ensure errors print as NON blocking
* test_update_spend_logs_multiple_batches_with_failure
* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)
* fix model cost map fireworks ai "supports_response_schema": true,
* fix supports_response_schema
* fix map openai params fireworks ai
* test_map_response_format
* test_map_response_format
* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)
* bump: version 1.53.9 → 1.54.0
* fix deepinfra
* litellm db fixes LiteLLM_UserTable (#7089)
* ci/cd queue new release
* fix llama-3.3-70b-versatile
* refactor - use consistent file naming convention `AI21/` -> `ai21` (#7090)
* fix refactor - use consistent file naming convention
* ci/cd run again
* fix naming structure
* fix use consistent naming (#7092)
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
* fix(key_management_endpoints.py): override metadata field value on update
allow user to override tags
* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric
allow disabling end user cost tracking on prometheus - fixes cardinality issue
* fix(litellm_pre_call_utils.py): add key/team level enforced params
Fixes https://github.com/BerriAI/litellm/issues/6652
* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update
* docs(enterprise.md): add docs on enforcing required params for llm requests
* Add support of Galadriel API (#7005)
* fix(router.py): robust retry after handling
set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment
* test(test_router.py): fix test
* feat(bedrock/): add support for 'nova' models
also adds explicit 'converse/' route for simpler routing
* fix: fix 'supports_pdf_input'
return if model supports pdf input on get_model_info
* feat(converse_transformation.py): support bedrock pdf input
* docs(document_understanding.md): add document understanding to docs
* fix(litellm_pre_call_utils.py): fix linting error
* fix(init.py): fix passing of bedrock converse models
* feat(bedrock/converse): support 'response_format={"type": "json_object"}'
* fix(converse_handler.py): fix linting error
* fix(base_llm_unit_tests.py): fix test
* fix: fix test
* test: fix test
* test: fix test
* test: remove duplicate test
---------
Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
* docs(config_settings.md): document all router_settings
* ci(config.yml): add router_settings doc test to ci/cd
* test: debug test on ci/cd
* test: debug ci/cd test
* test: fix test
* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call
Causes downstream errors if ui just fails to load team list
* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests
adds complete coverage for all 'response_format' values to ci/cd
* feat(router.py): support wildcard routes in `get_router_model_info()`
Addresses https://github.com/BerriAI/litellm/issues/6914
* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models
Allows for ratelimit tracking for gemini models even with wildcard routing enabled
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): add tpm/rpm tracking on success/failure to global_router
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): support wildcard routes on router.get_model_group_usage()
* fix(router.py): fix linting error
* fix(router.py): implement get_remaining_tokens_and_requests
Addresses https://github.com/BerriAI/litellm/issues/6914
* fix(router.py): fix linting errors
* test: fix test
* test: fix tests
* docs(config_settings.md): add missing dd env vars to docs
* fix(router.py): check if hidden params is dict
* refactor(proxy_server.py): add debug logging around license check event (refactor position in startup_event logic)
* fix(proxy/_types.py): allow admin_allowed_routes to be any str
* fix(router.py): raise 400-status code error for no 'model_name' error on router
Fixes issue with status code when unknown model name passed with pattern matching enabled
* fix(converse_handler.py): add claude 3-5 haiku to bedrock converse models
* test: update testing to replace claude-instant-1.2
* fix(router.py): fix router.moderation calls
* test: update test to remove claude-instant-1
* fix(router.py): support model_list values in router.moderation
* test: fix test
* test: fix test
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* perf(cooldown_cache.py): improve cooldown cache, to store cache results in memory for 5s, prevents redis call from being made on each request
reduces 100ms latency per call with caching enabled on router
* fix: fix test
* fix(cooldown_cache.py): handle if a result is None
* fix(cooldown_cache.py): add debug statements
* refactor(dual_cache.py): move to using an in-memory check for batch get cache, to prevent redis from being hit for every call
* fix(cooldown_cache.py): fix linting erropr
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* refactor(router.py): move assistants api endpoints to using 1 pass-through factory function
Reduces code, increases testing coverage
* refactor(router.py): reduce _common_check_available_deployment function size
make code more maintainable - reduce possible errors
* test(router_code_coverage.py): include batch_utils + pattern matching in enforced 100% code coverage
Improves reliability
* fix(router.py): fix model id match model dump