* fix(litellm_proxy/chat/transformation.py): support 'thinking' param
Fixes https://github.com/BerriAI/litellm/issues/9380
* feat(azure/gpt_transformation.py): add azure audio model support
Closes https://github.com/BerriAI/litellm/issues/6305
* fix(utils.py): use provider_config in common functions
* fix(utils.py): add missing provider configs to get_chat_provider_config
* test: fix test
* fix: fix path
* feat(utils.py): make bedrock invoke nova config baseconfig compatible
* fix: fix linting errors
* fix(azure_ai/transformation.py): remove buggy optional param filtering for azure ai
Removes incorrect check for support tool choice when calling azure ai - prevented calling models with response_format unless on litell model cost map
* fix(amazon_cohere_transformation.py): fix bedrock invoke cohere transformation to inherit from coherechatconfig
* test: fix azure ai tool choice mapping
* fix: fix model cost map to add 'supports_tool_choice' to cohere models
* fix(get_supported_openai_params.py): check if custom llm provider in llm providers
* fix(get_supported_openai_params.py): fix llm provider in list check
* fix: fix ruff check errors
* fix: support defs when calling bedrock nova
* fix(factory.py): fix test
* test: move test to just checking async
* fix(transformation.py): handle function call with no schema
* fix(utils.py): handle pydantic base model in message tool calls
Fix https://github.com/BerriAI/litellm/issues/9321
* fix(vertex_and_google_ai_studio.py): handle tools=[]
Fixes https://github.com/BerriAI/litellm/issues/9080
* test: remove max token restriction
* test: fix basic test
* fix(get_supported_openai_params.py): fix check
* fix(converse_transformation.py): support fake streaming for meta.llama3-3-70b-instruct-v1:0
* fix: fix test
* fix: parse out empty dictionary on dbrx streaming + tool calls
* fix(handle-'strict'-param-when-calling-fireworks-ai): fireworks ai does not support 'strict' param
* fix: fix ruff check
'
* fix: handle no strict in function
* fix: revert bedrock change - handle in separate PR
* fix running litellm on windows
* fix importing litellm
* _init_hypercorn_server
* linting fix
* TestProxyInitializationHelpers
* ci/cd run again
* ci/cd run again
* Adding VertexAI Claude 3.7 Sonnet (#8774)
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* build(model_prices_and_context_window.json): add anthropic 3-7 models on vertex ai and bedrock
* Support video_url (#8743)
* Support video_url
Support VLMs that works with video.
Example implemenation in vllm: https://github.com/vllm-project/vllm/pull/10020
* llms openai.py: Add ChatCompletionVideoObject
Add data structures to support `video_url` in chat completion
* test test_completion.py: add test for video_url
* Arize Phoenix - ensure correct endpoint/protocol are used; and default to phoenix cloud (#8750)
* minor fixes to default to http and to ensure that the correct endpoint is used
* Update test_arize_phoenix.py
* prioritize http over grpc
---------
Co-authored-by: Emerson Gomes <emerson.gomes@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Pang Wu <104795337+pang-wu@users.noreply.github.com>
Co-authored-by: Nate Mar <67926244+nate-mar@users.noreply.github.com>
* fix(team_endpoints.py): cleanup user <-> team association on team delete
Fixes issue where user table still listed team membership post delete
* test(test_team.py): update e2e test - ensure user/team membership is deleted on team delete
* fix(base_invoke_transformation.py): fix deepseek r1 transformation
remove deepseek name from model url
* test(test_completion.py): assert model route not in url
* feat(base_invoke_transformation.py): infer region name from model arn
prevent errors due to different region name in env var vs. model arn, respect if explicitly set in call though
* test: fix test
* test: skip on internal server error
* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param
Closes https://github.com/BerriAI/litellm/issues/8500
* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model
* style: cleanup invalid json trailing commma
* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template
enables passing complete tokenizer config of model to litellm
Allows calling deepseek on bedrock with the correct prompt template
* fix(utils.py): fix register_prompt_template for custom model names
* test(test_prompt_factory.py): fix test
* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model
* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls
enables proxy admin to set base model for ft bedrock deepseek model
* feat(bedrock/invoke): support deepseek_r1 route for bedrock
makes it easy to apply the right chat template to that call
* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work
* test(test_completion.py): add e2e mock test for bedrock deepseek
* docs(bedrock.md): document new deepseek_r1 route for bedrock
allows us to use the right config
* fix(exception_mapping_utils.py): catch read operation timeout
* fix(router.py): add more deployment timeout debug information for timeout errors
help understand why some calls in high-traffic don't respect their model-specific timeouts
* test(test_convert_dict_to_response.py): unit test ensuring empty str is not converted to None
Addresses https://github.com/BerriAI/litellm/issues/8507
* fix(convert_dict_to_response.py): handle empty message str - don't return back as 'None'
Fixes https://github.com/BerriAI/litellm/issues/8507
* test(test_completion.py): add e2e test
* test(base_llm_unit_tests.py): add test to ensure drop params is respected
* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility
* build: add cherry picked commits
* fix handleSubmit
* update handleAddModelSubmit
* add jest testing for ui
* add step for running ui unit tests
* add validate json step to add model
* ui jest testing fixes
* update package lock
* ci/cd run again
* fix antd import
* run jest tests first
* fix antd install
* fix ui unit tests
* fix unit test ui
* docs(reliability.md): add doc on disabling fallbacks per request
* feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param
Allows setting dynamic model timeouts from vercel's AI sdk
* test(test_proxy_server.py): add simple unit test for reading request timeout
* test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read
* feat(main.py): support passing metadata to openai in preview
Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371
* fix(main.py): fix passing openai metadata
* docs(request_headers.md): document new request headers
* build: Merge branch 'main' into litellm_dev_01_27_2025_p3
* test: loosen test
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request
adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)
* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming
Fixes https://github.com/BerriAI/litellm/issues/7942
* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"
This reverts commit 7a052a64e3.
* fix(deepseek-r-1): return reasoning_content as a top-level param
ensures compatibility with existing tools that use it
* fix: fix linting error
* feat(main.py): add new 'provider_specific_header' param
allows passing extra header for specific provider
* fix(litellm_pre_call_utils.py): add unit test for pre call utils
* test(test_bedrock_completion.py): skip test now that bedrock supports this
* fix(types/utils.py): support returning 'reasoning_content' for deepseek models
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(convert_dict_to_response.py): return deepseek response in provider_specific_field
allows for separating openai vs. non-openai params in model response
* fix(utils.py): support 'provider_specific_field' in delta chunk as well
allows deepseek reasoning content chunk to be returned to user from stream as well
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route
* fix(watsonx/): fix watsonx_text/ route with space id
* fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls
* fix(utils.py): rename to '..fields'
* fix: fix linting errors
* fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons
e from being non-oai compatible
* fix: cleanup unused imports
* docs(deepseek.md): add docs for deepseek reasoning model
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811
* fix(router.py): fix mock timeout check
* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)
This error happens if you provide multiple fallback models to the completion function with model name defined in each one.
* fix(router.py): remove mock_timeout before sending to request
prevents reuse in fallbacks
* test: update test
* test: revert test change - wrong pr
---------
Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* fix(utils.py): fix openai-like api response format parsing
Fixes issue passing structured output to litellm_proxy/ route
* fix(cost_calculator.py): fix whisper transcription cost calc to use file duration, not response time
'
* test: skip test if credentials not found
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls
* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith
* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.
Closes https://github.com/BerriAI/litellm/issues/7213
* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var
allows user to use local watsonx
Fixes https://github.com/BerriAI/litellm/issues/4991
* fix(litellm_logging.py): cleanup unused function
* test: skip bad ibm test
* fix(azure/): support passing headers to azure openai endpoints
Fixes https://github.com/BerriAI/litellm/issues/6217
* fix(utils.py): move default tokenizer to just openai
hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls
* fix(router.py): fix pattern matching router - add generic "*" to it as well
Fixes issue where generic "*" model access group wouldn't show up
* fix(pattern_match_deployments.py): match to more specific pattern
match to more specific pattern
allows setting generic wildcard model access group and excluding specific models more easily
* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty
don't delete all router models b/c of empty list
Fixes https://github.com/BerriAI/litellm/issues/7196
* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api
* fix(fireworks_ai/): support passing response_format + tool call in same message
Addresses https://github.com/BerriAI/litellm/issues/7135
* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"
This reverts commit 6a30dc6929.
* test: fix test
* fix(replicate/): fix replicate default retry/polling logic
* test: add unit testing for router pattern matching
* test: update test to use default oai tokenizer
* test: mark flaky test
* test: skip flaky test
* fix use new format for Cohere config
* fix base llm http handler
* Litellm code qa common config (#7116)
* feat(base_llm): initial commit for common base config class
Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132
* feat(base_llm/): add transform request/response abstract methods to base config class
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* use base transform helpers
* use base_llm_http_handler for cohere
* working cohere using base llm handler
* add async cohere chat completion support on base handler
* fix completion code
* working sync cohere stream
* add async support cohere_chat
* fix types get_model_response_iterator
* async / sync tests cohere
* feat cohere using base llm class
* fix linting errors
* fix _abc error
* add cohere params to transformation
* remove old cohere file
* fix type error
* fix merge conflicts
* fix cohere merge conflicts
* fix linting error
* fix litellm.llms.custom_httpx.http_handler.HTTPHandler.post
* fix passing cohere specific params
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* fix(key_management_endpoints.py): override metadata field value on update
allow user to override tags
* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric
allow disabling end user cost tracking on prometheus - fixes cardinality issue
* fix(litellm_pre_call_utils.py): add key/team level enforced params
Fixes https://github.com/BerriAI/litellm/issues/6652
* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update
* docs(enterprise.md): add docs on enforcing required params for llm requests
* Add support of Galadriel API (#7005)
* fix(router.py): robust retry after handling
set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment
* test(test_router.py): fix test
* feat(bedrock/): add support for 'nova' models
also adds explicit 'converse/' route for simpler routing
* fix: fix 'supports_pdf_input'
return if model supports pdf input on get_model_info
* feat(converse_transformation.py): support bedrock pdf input
* docs(document_understanding.md): add document understanding to docs
* fix(litellm_pre_call_utils.py): fix linting error
* fix(init.py): fix passing of bedrock converse models
* feat(bedrock/converse): support 'response_format={"type": "json_object"}'
* fix(converse_handler.py): fix linting error
* fix(base_llm_unit_tests.py): fix test
* fix: fix test
* test: fix test
* test: fix test
* test: remove duplicate test
---------
Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
* fix(anthropic/chat/transformation.py): add json schema as values: json_schema
fixes passing pydantic obj to anthropic
Fixes https://github.com/BerriAI/litellm/issues/6766
* (feat): Add timestamp_granularities parameter to transcription API (#6457)
* Add timestamp_granularities parameter to transcription API
* add param to the local test
* fix(databricks/chat.py): handle max_retries optional param handling for openai-like calls
Fixes issue with calling finetuned vertex ai models via databricks route
* build(ui/): add team admins via proxy ui
* fix: fix linting error
* test: fix test
* docs(vertex.md): refactor docs
* test: handle overloaded anthropic model error
* test: remove duplicate test
* test: fix test
* test: update test to handle model overloaded error
---------
Co-authored-by: Show <35062952+BrunooShow@users.noreply.github.com>