* fix(main.py): support 'mock_timeout=true' param
allows mock requests on proxy to have a time delay, for testing
* fix(main.py): ensure mock timeouts raise litellm.Timeout error
triggers retry/fallbacks
* fix: fix fallback + mock timeout testing
* fix(router.py): always return remaining tpm/rpm limits, if limits are known
allows for rate limit headers to be guaranteed
* docs(timeout.md): add docs on mock timeout = true
* fix(main.py): fix linting errors
* test: fix test
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint
Allow users to view what the available guardrails are
* docs: document new `/guardrails/list` endpoint
* docs(enterprise.md): update docs
* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats
* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests
* fix: fix linting errors
* fix: remove unused import
* feat(router.py): support passing model-specific messages in fallbacks
* docs(routing.md): separate router timeouts into separate doc
allow for 1 fallbacks doc (across proxy/router)
* docs(routing.md): cleanup router docs
* docs(reliability.md): cleanup docs
* docs(reliability.md): cleaned up fallback doc
just have 1 doc across sdk/proxy
simplifies docs
* docs(reliability.md): add setting model-specific fallback prompts
* fix: fix linting errors
* test: skip test causing openai rate limit errros
* test: fix test
* test: run vertex test first to catch error
* fix(health.md): add rerank model health check information
* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits
* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true
* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature
* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini
* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models
needed as o1-preview, and o1-mini models don't support 'system message
* fix(o1_transformation.py): translate system message based on if o1 model supports it
* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview
o1 currently doesn't support streaming, but the other model versions do
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix: fix linting errors
* fix: update '_transform_messages'
* fix(o1_transformation.py): fix provider passed for supported param checks
* test(base_llm_unit_tests.py): skip test if api takes >5s to respond
* fix(utils.py): return false in 'supports_factory' if can't find value
* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1
* feat(openai.py): support stream faking natively in openai handler
Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(openai.py): use inference param instead of original optional param
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* docs(input.md): document 'extra_headers' param support
* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)
Co-authored-by: Ryan Hoium <rhoium>
---------
Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>
* fix(main.py): fix retries being multiplied when using openai sdk
Closes https://github.com/BerriAI/litellm/pull/7130
* docs(prompt_management.md): add langfuse prompt management doc
* feat(team_endpoints.py): allow teams to add their own models
Enables teams to call their own finetuned models via the proxy
* test: add better enforcement check testing for `/model/new` now that teams can add their own models
* docs(team_model_add.md): tutorial for allowing teams to add their own models
* test: fix test
* fix test_deployment_budget_limits_e2e_test
* refactor async_log_success_event to track spend for provider + deployment
* fix format
* rename class to RouterBudgetLimiting
* rename func
* rename types used for budgets
* add new types for deployment budgets
* add budget limits for deployments
* fix checking budgets set for provider
* update file names
* fix linting error
* _track_provider_remaining_budget_prometheus
* async_filter_deployments
* fix model list passed to router
* update error
* test_deployment_budgets_e2e_test_expect_to_fail
* fix test case
* run deployment budget limits
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls
* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith
* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.
Closes https://github.com/BerriAI/litellm/issues/7213
* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var
allows user to use local watsonx
Fixes https://github.com/BerriAI/litellm/issues/4991
* fix(litellm_logging.py): cleanup unused function
* test: skip bad ibm test
* add unit test for test_datadog_static_methods
* docs dd vars
* test_datadog_payload_environment_variables
* test_datadog_static_methods
* docs env vars
* fix table
* fix(main.py): support passing max retries to azure/openai embedding integrations
Fixes https://github.com/BerriAI/litellm/issues/7003
* feat(team_endpoints.py): allow updating team model aliases
Closes https://github.com/BerriAI/litellm/issues/6956
* feat(router.py): allow specifying model id as fallback - skips any cooldown check
Allows a default model to be checked if all models in cooldown
s/o @micahjsmith
* docs(reliability.md): add fallback to specific model to docs
* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util
Allows user to identify if messages/tools have prompt caching
Related issue: https://github.com/BerriAI/litellm/issues/6784
* feat(router.py): store model id for prompt caching valid prompt
Allows routing to that model id on subsequent requests
* fix(router.py): only cache if prompt is valid prompt caching prompt
prevents storing unnecessary items in cache
* feat(router.py): support routing prompt caching enabled models to previous deployments
Closes https://github.com/BerriAI/litellm/issues/6784
* test: fix linting errors
* feat(databricks/): convert basemodel to dict and exclude none values
allow passing pydantic message to databricks
* fix(utils.py): ensure all chat completion messages are dict
* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)
* add custom_llm_provider to SpendLogsPayload
* add custom_llm_provider to SpendLogs
* add custom llm provider to SpendLogs payload
* test_spend_logs_payload
* Add MLflow to the side bar (#7031)
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* (bug fix) SpendLogs update DB catch all possible DB errors for retrying (#7082)
* catch DB_CONNECTION_ERROR_TYPES
* fix DB retry mechanism for SpendLog updates
* use DB_CONNECTION_ERROR_TYPES in auth checks
* fix exp back off for writing SpendLogs
* use _raise_failed_update_spend_exception to ensure errors print as NON blocking
* test_update_spend_logs_multiple_batches_with_failure
* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)
* fix model cost map fireworks ai "supports_response_schema": true,
* fix supports_response_schema
* fix map openai params fireworks ai
* test_map_response_format
* test_map_response_format
* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)
* bump: version 1.53.9 → 1.54.0
* fix deepinfra
* litellm db fixes LiteLLM_UserTable (#7089)
* ci/cd queue new release
* fix llama-3.3-70b-versatile
* refactor - use consistent file naming convention `AI21/` -> `ai21` (#7090)
* fix refactor - use consistent file naming convention
* ci/cd run again
* fix naming structure
* fix use consistent naming (#7092)
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>