* fix(utils.py): e2e azure tts cost tracking working
moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case
Fixes https://github.com/BerriAI/litellm/issues/7223
* fix: fix linting errors
* build(model_prices_and_context_window.json): add bedrock llama 3.3
Closes https://github.com/BerriAI/litellm/issues/7329
* fix(openai.py): fix return type for sync openai httpx response
* test: update test
* fix(spend_tracking_utils.py): fix if check
* fix(spend_tracking_utils.py): fix if check
* test: improve debugging for test
* fix: fix import
* fix(proxy_track_cost_callback.py): log to db if only end user param given
* fix: allows for jwt-auth based end user id spend tracking to work
* fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id'
more stable - works with jwt-auth based end user tracking as well
* test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs
* test: update test to use end_user api key hash param
* fix(langfuse.py): support end user cost tracking via jwt auth + langfuse
logs end user to langfuse if decoded from jwt token
* fix: fix linting errors
* test: fix test
* test: fix test
* fix: fix end user id extraction
* fix: run test earlier
* feat(router.py): support passing model-specific messages in fallbacks
* docs(routing.md): separate router timeouts into separate doc
allow for 1 fallbacks doc (across proxy/router)
* docs(routing.md): cleanup router docs
* docs(reliability.md): cleanup docs
* docs(reliability.md): cleaned up fallback doc
just have 1 doc across sdk/proxy
simplifies docs
* docs(reliability.md): add setting model-specific fallback prompts
* fix: fix linting errors
* test: skip test causing openai rate limit errros
* test: fix test
* test: run vertex test first to catch error
* fix(proxy_server.py): only update k,v pair if v is not empty/null
Fixes https://github.com/BerriAI/litellm/issues/6787
* test(test_router.py): cleanup duplicate calls
* test: add new test stream options drop params test
* test: update optional params / stream options test to test for vertex ai mistral route specifically
Addresses https://github.com/BerriAI/litellm/issues/7309
* fix(proxy_server.py): fix linting errors
* fix: fix linting errors
* fix(openai.py): fix returning o1 non-streaming requests
fixes issue where fake stream always true for o1
* build(model_prices_and_context_window.json): add 'supports_vision' for o1 models
* fix: add internal server error exception mapping
* fix(base_llm_unit_tests.py): drop temperature from test
* test: mark prompt caching as a flaky test
* fix(health.md): add rerank model health check information
* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits
* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true
* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature
* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini
* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models
needed as o1-preview, and o1-mini models don't support 'system message
* fix(o1_transformation.py): translate system message based on if o1 model supports it
* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview
o1 currently doesn't support streaming, but the other model versions do
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix: fix linting errors
* fix: update '_transform_messages'
* fix(o1_transformation.py): fix provider passed for supported param checks
* test(base_llm_unit_tests.py): skip test if api takes >5s to respond
* fix(utils.py): return false in 'supports_factory' if can't find value
* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1
* feat(openai.py): support stream faking natively in openai handler
Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(openai.py): use inference param instead of original optional param
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* fix(proxy_server.py): pass model access groups to get_key/get_team models
allows end user to see actual models they have access to, instead of default models
* fix(auth_checks.py): fix linting errors
* fix: fix linting errors
* fix(factory.py): skip empty text blocks for bedrock user messages
Fixes https://github.com/BerriAI/litellm/issues/7169
* Add support for Gemini 2.0 GoogleSearch tool (#7257)
* Add support for google_search tool in gemini 2.0
* Add/modify tests
* Fix grounding check
* Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST
* Swap order of tools
* DFix formatting
* fix(get_api_base.py): return api base in streaming response
Fixes https://github.com/BerriAI/litellm/issues/7249
Closes https://github.com/BerriAI/litellm/pull/7250
* fix(cost_calculator.py): only set base model to model if not none
Fixes https://github.com/BerriAI/litellm/issues/7223
* fix(cost_calculator.py): enforce stricter order when picking model for cost calculation
* fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided
* fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given
* fix(cost_calculator.py): handle `custom_llm_provider-` scenario
* fix(cost_calculator.py): e2e working tts cost tracking
ensures initial message is passed in, to cost calculator
* fix(factory.py): suppress linting errors
* fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model
* fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly
* test: handle none env var value in flaky test
* fix(litellm_logging.py): fix linting errors
---------
Co-authored-by: Sam B <samlingx@gmail.com>
* fix(utils.py): fix openai-like api response format parsing
Fixes issue passing structured output to litellm_proxy/ route
* fix(cost_calculator.py): fix whisper transcription cost calc to use file duration, not response time
'
* test: skip test if credentials not found
* docs(input.md): document 'extra_headers' param support
* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)
Co-authored-by: Ryan Hoium <rhoium>
---------
Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>