* feat(llm_passthrough_endpoints.py): support mistral passthrough
Closes https://github.com/BerriAI/litellm/issues/9051
* feat(llm_passthrough_endpoints.py): initial commit for adding vllm passthrough route
* feat(vllm/common_utils.py): add new vllm model info route
make it possible to use vllm passthrough route via factory function
* fix(llm_passthrough_endpoints.py): add all methods to vllm passthrough route
* fix: fix linting error
* fix: fix linting error
* fix: fix ruff check
* fix(proxy/_types.py): add new passthrough routes
* docs(config_settings.md): add mistral env vars to docs
* fix(litellm_proxy/chat/transformation.py): support 'thinking' param
Fixes https://github.com/BerriAI/litellm/issues/9380
* feat(azure/gpt_transformation.py): add azure audio model support
Closes https://github.com/BerriAI/litellm/issues/6305
* fix(utils.py): use provider_config in common functions
* fix(utils.py): add missing provider configs to get_chat_provider_config
* test: fix test
* fix: fix path
* feat(utils.py): make bedrock invoke nova config baseconfig compatible
* fix: fix linting errors
* fix(azure_ai/transformation.py): remove buggy optional param filtering for azure ai
Removes incorrect check for support tool choice when calling azure ai - prevented calling models with response_format unless on litell model cost map
* fix(amazon_cohere_transformation.py): fix bedrock invoke cohere transformation to inherit from coherechatconfig
* test: fix azure ai tool choice mapping
* fix: fix model cost map to add 'supports_tool_choice' to cohere models
* fix(get_supported_openai_params.py): check if custom llm provider in llm providers
* fix(get_supported_openai_params.py): fix llm provider in list check
* fix: fix ruff check errors
* fix: support defs when calling bedrock nova
* fix(factory.py): fix test
* Add date picker to usage tab + Add reasoning_content token tracking across all providers on streaming (#9722)
* feat(new_usage.tsx): add date picker for new usage tab
allow user to look back on their usage data
* feat(anthropic/chat/transformation.py): report reasoning tokens in completion token details
allows usage tracking on how many reasoning tokens are actually being used
* feat(streaming_chunk_builder.py): return reasoning_tokens in anthropic/openai streaming response
allows tracking reasoning_token usage across providers
* Fix update team metadata + fix bulk adding models on Ui (#9721)
* fix(handle_add_model_submit.tsx): fix bulk adding models
* fix(team_info.tsx): fix team metadata update
Fixes https://github.com/BerriAI/litellm/issues/9689
* (v0) Unified file id - allow calling multiple providers with same file id (#9718)
* feat(files_endpoints.py): initial commit adding 'target_model_names' support
allow developer to specify all the models they want to call with the file
* feat(files_endpoints.py): return unified files endpoint
* test(test_files_endpoints.py): add validation test - if invalid purpose submitted
* feat: more updates
* feat: initial working commit of unified file id translation
* fix: additional fixes
* fix(router.py): remove model replace logic in jsonl on acreate_file
enables file upload to work for chat completion requests as well
* fix(files_endpoints.py): remove whitespace around model name
* fix(azure/handler.py): return acreate_file with correct response type
* fix: fix linting errors
* test: fix mock test to run on github actions
* fix: fix ruff errors
* fix: fix file too large error
* fix(utils.py): remove redundant var
* test: modify test to work on github actions
* test: update tests
* test: more debug logs to understand ci/cd issue
* test: fix test for respx
* test: skip mock respx test
fails on ci/cd - not clear why
* fix: fix ruff check
* fix: fix test
* fix(model_connection_test.tsx): fix linting error
* test: update unit tests
* build(pyproject.toml): add new dev dependencies - for type checking
* build: reformat files to fit black
* ci: reformat to fit black
* ci(test-litellm.yml): make tests run clear
* build(pyproject.toml): add ruff
* fix: fix ruff checks
* build(mypy/): fix mypy linting errors
* fix(hashicorp_secret_manager.py): fix passing cert for tls auth
* build(mypy/): resolve all mypy errors
* test: update test
* fix: fix black formatting
* build(pre-commit-config.yaml): use poetry run black
* fix(proxy_server.py): fix linting error
* fix: fix ruff safe representation error
* fix(anthropic/chat/transformation.py): Don't set tool choice on response_format conversion when thinking is enabled
Not allowed by Anthropic
Fixes https://github.com/BerriAI/litellm/issues/8901
* refactor: move test to base anthropic chat tests
ensures consistent behaviour across vertex/anthropic/bedrock
* fix(anthropic/chat/transformation.py): if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens
* feat(converse_transformation.py): correctly handle thinking + response format on Bedrock Converse
Fixes https://github.com/BerriAI/litellm/issues/8901
* fix(converse_transformation.py): correctly handle adding max tokens
* test: handle service unavailable error
* fix(invoke_handler.py): fix converse streaming - return signature + ensure consistency with anthropic api response
* build(model_prices_and_context_window.json): fix anthropic api claude-3-7 max output tokens
with beta header this is 128k
Resolves https://github.com/BerriAI/litellm/issues/8964
* feat(handler.py): handle new anthropic 'thinking_delta' block on streaming
Fixes https://github.com/BerriAI/litellm/issues/8825
* Fix missing signature_delta in thinking blocks when streaming from Claude 3.7 (#8797)
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* test: update test to enforce signature found
* feat(refactor-signature-param-to-be-'signature'-instead-of-'signature_delta'): keeps it in sync with anthropic
* fix: fix linting error
---------
Co-authored-by: Martin Krasser <krasserm@googlemail.com>
* fix(anthropic_claude3_transformation.py): fix amazon anthropic claude 3 tool calling transformation on invoke route
move to using anthropic config as base
* fix(utils.py): expose anthropic config via providerconfigmanager
* fix(llm_http_handler.py): support json mode on async completion calls
* fix(invoke_handler/make_call): support json mode for anthropic called via bedrock invoke
* fix(anthropic/): handle 'response_format: {"type": "text"}` + migrate amazon claude 3 invoke config to inherit from anthropic config
Prevents error when passing in 'response_format: {"type": "text"}
* test: fix test
* fix(utils.py): fix base invoke provider check
* fix(anthropic_claude3_transformation.py): don't pass 'stream' param
* fix: fix linting errors
* fix(converse_transformation.py): handle response_format type=text for converse
* feat(bedrock/converse/transformation.py): support claude-3-7-sonnet reasoning_Content transformation
Closes https://github.com/BerriAI/litellm/issues/8777
* fix(bedrock/): support returning `reasoning_content` on streaming for claude-3-7
Resolves https://github.com/BerriAI/litellm/issues/8777
* feat(bedrock/): unify converse reasoning content blocks for consistency across anthropic and bedrock
* fix(anthropic/chat/transformation.py): handle deepseek-style 'reasoning_content' extraction within transformation.py
simpler logic
* feat(bedrock/): fix streaming to return blocks in consistent format
* fix: fix linting error
* test: fix test
* feat(factory.py): fix bedrock thinking block translation on tool calling
allows passing the thinking blocks back to bedrock for tool calling
* fix(types/utils.py): don't exclude provider_specific_fields on model dump
ensures consistent responses
* fix: fix linting errors
* fix(convert_dict_to_response.py): pass reasoning_content on root
* fix: test
* fix(streaming_handler.py): add helper util for setting model id
* fix(streaming_handler.py): fix setting model id on model response stream chunk
* fix(streaming_handler.py): fix linting error
* fix(streaming_handler.py): fix linting error
* fix(types/utils.py): add provider_specific_fields to model stream response
* fix(streaming_handler.py): copy provider specific fields and add them to the root of the streaming response
* fix(streaming_handler.py): fix check
* fix: fix test
* fix(types/utils.py): ensure messages content is always openai compatible
* fix(types/utils.py): fix delta object to always be openai compatible
only introduce new params if variable exists
* test: fix bedrock nova tests
* test: skip flaky test
* test: skip flaky test in ci/cd
* fix(o_series_transformation.py): fix optional param check for o-series models
o3-mini and o-1 do not support parallel tool calling
* fix(utils.py): support 'drop_params' for 'thinking' param across models
allows switching to older claude versions (or non-anthropic models) and param to be safely dropped
* fix: fix passing thinking param in optional params
allows dropping thinking_param where not applicable
* test: update old model
* fix(utils.py): fix linting errors
* fix(main.py): add param to acompletion
* fix(base_utils.py): supported nested json schema passed in for anthropic calls
* refactor(base_utils.py): refactor ref parsing to prevent infinite loop
* test(test_openai_endpoints.py): refactor anthropic test to use bedrock
* fix(langfuse_prompt_management.py): add unit test for sync langfuse calls
Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757
* build: ensure all regional bedrock models have same supported values as base bedrock model
prevents drift
* test(base_llm_unit_tests.py): add testing for nested pydantic objects
* fix(test_utils.py): add test_get_potential_model_names
* fix(anthropic/chat/transformation.py): support nested pydantic objects
Fixes https://github.com/BerriAI/litellm/issues/7755
* feat(pass_through_endpoints.py): fix anthropic end user cost tracking
* fix(anthropic/chat/transformation.py): use returned provider model for anthropic
handles anthropic `-latest` tag in request body throwing cost calculation errors
ensures we can be accurate in our model cost tracking
* feat(model_prices_and_context_window.json): add gemini-2.0-flash-thinking-exp pricing
* test: update test to use assumption that user_api_key_dict can get anthropic user id
* test: fix test
* fix: fix test
* fix(anthropic_pass_through.py): uncomment previous anthropic end-user cost tracking code block
can't guarantee user api key dict always has end user id - too many code paths
* fix(user_api_key_auth.py): this allows end user id from request body to always be read and set in auth object
* fix(auth_check.py): fix linting error
* test: fix auth check
* fix(auth_utils.py): fix get end user id to handle metadata = None
* build(model_prices_and_context_window.json): add azure o1 pricing
Closes https://github.com/BerriAI/litellm/issues/7712
* refactor: replace regex with string method for whitespace check in stop-sequences handling (#7713)
* Allows overriding keep_alive time in ollama (#7079)
* Allows overriding keep_alive time in ollama
* Also adds to ollama_chat
* Adds some info on the docs about this parameter
* fix: together ai warning (#7688)
Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>
* fix(proxy_server.py): handle config containing thread locked objects when using get_config_state
* fix(proxy_server.py): add exception to debug
* build(model_prices_and_context_window.json): update 'supports_vision' for azure o1
---------
Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com>
Co-authored-by: Regis David Souza Mesquita <github@rdsm.dev>
Co-authored-by: Carl <45709281+capsenz@users.noreply.github.com>
Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>
* build(ui/): update ui
* fix: drop unsupported non-whitespace characters for real when calling… (#7484)
* fix: drop unsupported non-whitespace characters for real when calling anthropic with stop sequences
* test: add parameterized test for _map_stop_sequences method in AnthropicConfig
---------
Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com>
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* fix(azure/): support passing headers to azure openai endpoints
Fixes https://github.com/BerriAI/litellm/issues/6217
* fix(utils.py): move default tokenizer to just openai
hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls
* fix(router.py): fix pattern matching router - add generic "*" to it as well
Fixes issue where generic "*" model access group wouldn't show up
* fix(pattern_match_deployments.py): match to more specific pattern
match to more specific pattern
allows setting generic wildcard model access group and excluding specific models more easily
* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty
don't delete all router models b/c of empty list
Fixes https://github.com/BerriAI/litellm/issues/7196
* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api
* fix(fireworks_ai/): support passing response_format + tool call in same message
Addresses https://github.com/BerriAI/litellm/issues/7135
* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"
This reverts commit 6a30dc6929.
* test: fix test
* fix(replicate/): fix replicate default retry/polling logic
* test: add unit testing for router pattern matching
* test: update test to use default oai tokenizer
* test: mark flaky test
* test: skip flaky test
* fix use new format for Cohere config
* fix base llm http handler
* Litellm code qa common config (#7116)
* feat(base_llm): initial commit for common base config class
Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132
* feat(base_llm/): add transform request/response abstract methods to base config class
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* use base transform helpers
* use base_llm_http_handler for cohere
* working cohere using base llm handler
* add async cohere chat completion support on base handler
* fix completion code
* working sync cohere stream
* add async support cohere_chat
* fix types get_model_response_iterator
* async / sync tests cohere
* feat cohere using base llm class
* fix linting errors
* fix _abc error
* add cohere params to transformation
* remove old cohere file
* fix type error
* fix merge conflicts
* fix cohere merge conflicts
* fix linting error
* fix litellm.llms.custom_httpx.http_handler.HTTPHandler.post
* fix passing cohere specific params
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* feat(base_llm): initial commit for common base config class
Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132
* feat(base_llm/): add transform request/response abstract methods to base config class
* feat(cohere-+-clarifai): refactor integrations to use common base config class
* fix: fix linting errors
* refactor(anthropic/): move anthropic + vertex anthropic to use base config
* test: fix xai test
* test: fix tests
* fix: fix linting errors
* test: comment out WIP test
* fix(transformation.py): fix is pdf used check
* fix: fix linting error
* use 1 file for AnthropicPassthroughLoggingHandler
* add support for anthropic streaming usage tracking
* ci/cd run again
* fix - add real streaming for anthropic pass through
* remove unused function stream_response
* working anthropic streaming logging
* fix code quality
* fix use 1 file for vertex success handler
* use helper for _handle_logging_vertex_collected_chunks
* enforce vertex streaming to use sse for streaming
* test test_basic_vertex_ai_pass_through_streaming_with_spendlog
* fix type hints
* add comment
* fix linting
* add pass through logging unit testing
* fix(anthropic/chat/transformation.py): add json schema as values: json_schema
fixes passing pydantic obj to anthropic
Fixes https://github.com/BerriAI/litellm/issues/6766
* (feat): Add timestamp_granularities parameter to transcription API (#6457)
* Add timestamp_granularities parameter to transcription API
* add param to the local test
* fix(databricks/chat.py): handle max_retries optional param handling for openai-like calls
Fixes issue with calling finetuned vertex ai models via databricks route
* build(ui/): add team admins via proxy ui
* fix: fix linting error
* test: fix test
* docs(vertex.md): refactor docs
* test: handle overloaded anthropic model error
* test: remove duplicate test
* test: fix test
* test: update test to handle model overloaded error
---------
Co-authored-by: Show <35062952+BrunooShow@users.noreply.github.com>
* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742)
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743)
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error