* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* fix(main.py): support 'mock_timeout=true' param
allows mock requests on proxy to have a time delay, for testing
* fix(main.py): ensure mock timeouts raise litellm.Timeout error
triggers retry/fallbacks
* fix: fix fallback + mock timeout testing
* fix(router.py): always return remaining tpm/rpm limits, if limits are known
allows for rate limit headers to be guaranteed
* docs(timeout.md): add docs on mock timeout = true
* fix(main.py): fix linting errors
* test: fix test
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint
Allow users to view what the available guardrails are
* docs: document new `/guardrails/list` endpoint
* docs(enterprise.md): update docs
* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats
* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests
* fix: fix linting errors
* fix: remove unused import
* feat(router.py): support passing model-specific messages in fallbacks
* docs(routing.md): separate router timeouts into separate doc
allow for 1 fallbacks doc (across proxy/router)
* docs(routing.md): cleanup router docs
* docs(reliability.md): cleanup docs
* docs(reliability.md): cleaned up fallback doc
just have 1 doc across sdk/proxy
simplifies docs
* docs(reliability.md): add setting model-specific fallback prompts
* fix: fix linting errors
* test: skip test causing openai rate limit errros
* test: fix test
* test: run vertex test first to catch error
* fix(health.md): add rerank model health check information
* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits
* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true
* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature
* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini
* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models
needed as o1-preview, and o1-mini models don't support 'system message
* fix(o1_transformation.py): translate system message based on if o1 model supports it
* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview
o1 currently doesn't support streaming, but the other model versions do
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix: fix linting errors
* fix: update '_transform_messages'
* fix(o1_transformation.py): fix provider passed for supported param checks
* test(base_llm_unit_tests.py): skip test if api takes >5s to respond
* fix(utils.py): return false in 'supports_factory' if can't find value
* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1
* feat(openai.py): support stream faking natively in openai handler
Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(openai.py): use inference param instead of original optional param
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* docs(input.md): document 'extra_headers' param support
* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)
Co-authored-by: Ryan Hoium <rhoium>
---------
Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>
* fix(main.py): fix retries being multiplied when using openai sdk
Closes https://github.com/BerriAI/litellm/pull/7130
* docs(prompt_management.md): add langfuse prompt management doc
* feat(team_endpoints.py): allow teams to add their own models
Enables teams to call their own finetuned models via the proxy
* test: add better enforcement check testing for `/model/new` now that teams can add their own models
* docs(team_model_add.md): tutorial for allowing teams to add their own models
* test: fix test
* fix test_deployment_budget_limits_e2e_test
* refactor async_log_success_event to track spend for provider + deployment
* fix format
* rename class to RouterBudgetLimiting
* rename func
* rename types used for budgets
* add new types for deployment budgets
* add budget limits for deployments
* fix checking budgets set for provider
* update file names
* fix linting error
* _track_provider_remaining_budget_prometheus
* async_filter_deployments
* fix model list passed to router
* update error
* test_deployment_budgets_e2e_test_expect_to_fail
* fix test case
* run deployment budget limits
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls
* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith
* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.
Closes https://github.com/BerriAI/litellm/issues/7213
* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var
allows user to use local watsonx
Fixes https://github.com/BerriAI/litellm/issues/4991
* fix(litellm_logging.py): cleanup unused function
* test: skip bad ibm test