* feat: add new model provider Novita AI
* feat: use deepseek r1 model for examples in Novita AI docs
* fix: fix tests
* fix: fix tests for novita
* fix: fix novita transformation
* fix(o_series_transformation.py): fix optional param check for o-series models
o3-mini and o-1 do not support parallel tool calling
* fix(utils.py): support 'drop_params' for 'thinking' param across models
allows switching to older claude versions (or non-anthropic models) and param to be safely dropped
* fix: fix passing thinking param in optional params
allows dropping thinking_param where not applicable
* test: update old model
* fix(utils.py): fix linting errors
* fix(main.py): add param to acompletion
* fix(main.py): fix key leak error when unknown provider given
don't return passed in args if unknown route on embedding
* fix(main.py): remove instances of {args} being passed in exception
prevent potential key leaks
* test(code_coverage/prevent_key_leaks_in_codebase.py): ban usage of {args} in codebase
* fix: fix linting errors
* fix: remove unused variable
* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param
Closes https://github.com/BerriAI/litellm/issues/8500
* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model
* style: cleanup invalid json trailing commma
* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template
enables passing complete tokenizer config of model to litellm
Allows calling deepseek on bedrock with the correct prompt template
* fix(utils.py): fix register_prompt_template for custom model names
* test(test_prompt_factory.py): fix test
* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model
* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls
enables proxy admin to set base model for ft bedrock deepseek model
* feat(bedrock/invoke): support deepseek_r1 route for bedrock
makes it easy to apply the right chat template to that call
* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work
* test(test_completion.py): add e2e mock test for bedrock deepseek
* docs(bedrock.md): document new deepseek_r1 route for bedrock
allows us to use the right config
* fix(exception_mapping_utils.py): catch read operation timeout
* fix(azure.py): ensure max_retries=0 is respected
Fixes https://github.com/BerriAI/litellm/issues/6129
* fix(test_openai.py): add unit test to ensure openai sdk calls always respect max_retries = 0
* test(test_azure_openai.py): add unit testing for azure_text/ route
* fix(azure.py): fix passing max retries on streaming
* fix(azure.py): fix azure max retries on async completion + streaming
* fix(completion/handler.py): fix azure text async completion + streaming
* test(test_azure_openai.py): ensure azure openai max retries always respected
* test(test_azure_o_series.py): add testing to ensure max retries always respected
* Added gemini providers for 2.0-flash and 2.0-flash lite (#8321)
* Update model_prices_and_context_window.json
added gemini providers for 2.0-flash and 2.0-flash light
* Update model_prices_and_context_window.json
fixed URL
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* Convert tool use arguments to string before counting tokens (#6989)
In at least some cases the `messages["tool_calls"]["function"]["arguments"]` is a dict, not a string. In order to tokenize it properly it needs to be a string. In the case that it is already a string this is a noop, which is also fine.
* build(model_prices_and_context_window.json): add gemini 2.0 flash lite pricing
* build(model_prices_and_context_window.json): add gemini commercial rate limits
* fix(utils.py): fix linting error
* refactor(utils.py): refactor to maintain function size
---------
Co-authored-by: Bardia Khosravi <bardiakhosravi95@gmail.com>
Co-authored-by: Josh Morrow <josh@jcmorrow.com>
* initial transform for invoke
* invoke transform_response
* working - able to make request
* working get_complete_url
* working - invoke now runs on llm_http_handler
* fix unused imports
* track litellm overhead ms
* working stream request
* sign_request transform
* sign_request update
* use has_async_custom_stream_wrapper property
* use get_async_custom_stream_wrapper in base llm http handler
* fix make_call in invoke handler
* fix invoke with streaming get_async_custom_stream_wrapper
* working bedrock async streaming with invoke
* fix make call handler for bedrock
* test_all_model_configs
* fix test_bedrock_custom_prompt_template
* sync streaming for bedrock invoke
* fix _add_stream_param_to_request_body
* test_async_text_completion_bedrock
* fix transform_request
* fix get_supported_openai_params
* fix test supports tool choice
* fix test_supports_tool_choice
* add unit test coverage for bedrock invoke transform
* fix location of transformation files
* update import loc
* fix bedrock invoke unit tests
* fix import for max completion tokens
* refactor(deepseek/): move deepseek to base llm http handler
Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457
* fix(gpt_transformation.py): support stream parsing for gpt-like calls
* test(test_deepseek_completion.py): add async streaming test
* fix(gpt_transformation.py): fix import
* fix(gpt_transformation.py): return full api base and content type
* fix(vertex_ai/gemini/transformation.py): handle 'http://' image urls
* test: add base test for `http:` url's
* fix(factory.py/get_image_details): follow redirects
allows http calls to work
* fix(codestral/): fix stream chunk parsing on last chunk of stream
* Azure ad token provider (#6917)
* Update azure.py
Added optional parameter azure ad token provider
* Added parameter to main.py
* Found token provider arg location
* Fixed embeddings
* Fixed ad token provider
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* fix: fix linting errors
* fix(main.py): leave out o1 route for azure ad token provider, for now
get v0 out for sync azure gpt route to begin with
* test: skip http:// test for fireworks ai
model does not support it
* refactor: cleanup dead code
* fix: revert http:// url passthrough for gemini
google ai studio raises errors
* test: fix test
---------
Co-authored-by: bahtman <anton@baht.dk>
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param
Closes https://github.com/BerriAI/litellm/issues/8182
* fix(main.py): ensure `reasoning_effort` is a mapped openai param
* refactor(azure/): rename o1_[x] files to o_series_[x]
* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort
* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests
* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers
Makes it easy to switch from openai to anthropic
* fix: fix linting errors
* fix(base_llm_unit_tests.py): fix test
* fix(main.py): add missing param
* fix: support azure o3 model family for fake streaming workaround (#8162)
* fix: support azure o3 model family for fake streaming workaround
* refactor: rename helper to is_o_series_model for clarity
* update function calling parameters for o3 models (#8178)
* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3
ensures max_tokens is correctly translated for o3
* feat(openai/): refactor o1 files to be 'o_series' files
expands naming to cover o3
* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets
* test(test_azure_o_series.py): assert stream faked for azure o3 mini
Resolves https://github.com/BerriAI/litellm/pull/8162
* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing
* docs(azure.md): update doc with `o_series/` model name
---------
Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
* docs: cleanup doc
* feat(bedrock/): initial commit adding bedrock/converse_like/<model> route support
allows routing to a converse like endpoint
Resolves https://github.com/BerriAI/litellm/issues/8085
* feat(bedrock/chat/converse_transformation.py): make converse config base config compatible
enables new 'converse_like' route
* feat(converse_transformation.py): enables using the proxy with converse like api endpoint
Resolves https://github.com/BerriAI/litellm/issues/8085
* docs(reliability.md): add doc on disabling fallbacks per request
* feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param
Allows setting dynamic model timeouts from vercel's AI sdk
* test(test_proxy_server.py): add simple unit test for reading request timeout
* test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read
* feat(main.py): support passing metadata to openai in preview
Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371
* fix(main.py): fix passing openai metadata
* docs(request_headers.md): document new request headers
* build: Merge branch 'main' into litellm_dev_01_27_2025_p3
* test: loosen test
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request
adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)
* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming
Fixes https://github.com/BerriAI/litellm/issues/7942
* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"
This reverts commit 7a052a64e3.
* fix(deepseek-r-1): return reasoning_content as a top-level param
ensures compatibility with existing tools that use it
* fix: fix linting error
* test(test_completion_cost.py): add sdk test to ensure base model is used for cost tracking
* test(test_completion_cost.py): add sdk test to ensure custom pricing works
* fix(main.py): add base model cost tracking support for embedding calls
Enables base model cost tracking for embedding calls when base model set as a litellm_param
* fix(litellm_logging.py): update logging object with litellm params - including base model, if given
ensures base model param is always tracked
* fix(main.py): fix linting errors
* fix(http_handler.py): support passing ssl verify dynamically and using the correct httpx client based on passed ssl verify param
Fixes https://github.com/BerriAI/litellm/issues/6499
* feat(llm_http_handler.py): support passing `ssl_verify=False` dynamically in call args
Closes https://github.com/BerriAI/litellm/issues/6499
* fix(proxy/utils.py): prevent bad logs from breaking all cost tracking + reset list regardless of success/failure
prevents malformed logs from causing all spend tracking to break since they're constantly retried
* test(test_proxy_utils.py): add test to ensure bad log is dropped
* test(test_proxy_utils.py): ensure in-memory spend logs reset after bad log error
* test(test_user_api_key_auth.py): add unit test to ensure end user id as str works
* fix(auth_utils.py): ensure extracted end user id is always a str
prevents db cost tracking errors
* test(test_auth_utils.py): ensure get end user id from request body always returns a string
* test: update tests
* test: skip bedrock test- behaviour now supported
* test: fix testing
* refactor(spend_tracking_utils.py): reduce size of get_logging_payload
* test: fix test
* bump: version 1.59.4 → 1.59.5
* Revert "bump: version 1.59.4 → 1.59.5"
This reverts commit 1182b46b2e.
* fix(utils.py): fix spend logs retry logic
* fix(spend_tracking_utils.py): fix get tags
* fix(spend_tracking_utils.py): fix end user id spend tracking on pass-through endpoints
* feat(main.py): add new 'provider_specific_header' param
allows passing extra header for specific provider
* fix(litellm_pre_call_utils.py): add unit test for pre call utils
* test(test_bedrock_completion.py): skip test now that bedrock supports this