* feat(proxy/_types.py): add new jwt field params
allows users + services to auth into proxy
* feat(handle_jwt.py): allow team role proxy access
allows proxy admin to set allowed team roles
* fix(proxy/_types.py): add 'routes' to role based permissions
allow proxy admin to restrict what routes a team can access easily
* feat(handle_jwt.py): support more flexible role based route access
v2 on role based 'allowed_routes'
* test(test_jwt.py): add unit test for rbac for proxy routes
* feat(handle_jwt.py): ensure cost tracking always works for any jwt request with `enforce_rbac=True`
* docs(token_auth.md): add documentation on controlling model access via OIDC Roles
* test: increase time delay before retrying
* test: handle model overloaded for test
* test(base_llm_unit_tests.py): add test to ensure drop params is respected
* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility
* build: add cherry picked commits
* fix(vertex_ai/gemini/transformation.py): handle 'http://' image urls
* test: add base test for `http:` url's
* fix(factory.py/get_image_details): follow redirects
allows http calls to work
* fix(codestral/): fix stream chunk parsing on last chunk of stream
* Azure ad token provider (#6917)
* Update azure.py
Added optional parameter azure ad token provider
* Added parameter to main.py
* Found token provider arg location
* Fixed embeddings
* Fixed ad token provider
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* fix: fix linting errors
* fix(main.py): leave out o1 route for azure ad token provider, for now
get v0 out for sync azure gpt route to begin with
* test: skip http:// test for fireworks ai
model does not support it
* refactor: cleanup dead code
* fix: revert http:// url passthrough for gemini
google ai studio raises errors
* test: fix test
---------
Co-authored-by: bahtman <anton@baht.dk>
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param
Closes https://github.com/BerriAI/litellm/issues/8182
* fix(main.py): ensure `reasoning_effort` is a mapped openai param
* refactor(azure/): rename o1_[x] files to o_series_[x]
* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort
* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests
* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers
Makes it easy to switch from openai to anthropic
* fix: fix linting errors
* fix(base_llm_unit_tests.py): fix test
* fix(main.py): add missing param
* fix(base_utils.py): supported nested json schema passed in for anthropic calls
* refactor(base_utils.py): refactor ref parsing to prevent infinite loop
* test(test_openai_endpoints.py): refactor anthropic test to use bedrock
* fix(langfuse_prompt_management.py): add unit test for sync langfuse calls
Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757
* fix(bedrock/converse_handler.py): fix bedrock region name on async calls
* fix(utils.py): fix split model handling
Fixes bedrock cost calculation when region name is given
* feat(_health_endpoints.py): support health checking datadog integration
Closes https://github.com/BerriAI/litellm/issues/7921
* build: ensure all regional bedrock models have same supported values as base bedrock model
prevents drift
* test(base_llm_unit_tests.py): add testing for nested pydantic objects
* fix(test_utils.py): add test_get_potential_model_names
* fix(anthropic/chat/transformation.py): support nested pydantic objects
Fixes https://github.com/BerriAI/litellm/issues/7755
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* test: fix azure o1 test
* test: fix tests
* fix: fix test
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* fix(openai.py): fix returning o1 non-streaming requests
fixes issue where fake stream always true for o1
* build(model_prices_and_context_window.json): add 'supports_vision' for o1 models
* fix: add internal server error exception mapping
* fix(base_llm_unit_tests.py): drop temperature from test
* test: mark prompt caching as a flaky test
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* fix(main.py): support passing max retries to azure/openai embedding integrations
Fixes https://github.com/BerriAI/litellm/issues/7003
* feat(team_endpoints.py): allow updating team model aliases
Closes https://github.com/BerriAI/litellm/issues/6956
* feat(router.py): allow specifying model id as fallback - skips any cooldown check
Allows a default model to be checked if all models in cooldown
s/o @micahjsmith
* docs(reliability.md): add fallback to specific model to docs
* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util
Allows user to identify if messages/tools have prompt caching
Related issue: https://github.com/BerriAI/litellm/issues/6784
* feat(router.py): store model id for prompt caching valid prompt
Allows routing to that model id on subsequent requests
* fix(router.py): only cache if prompt is valid prompt caching prompt
prevents storing unnecessary items in cache
* feat(router.py): support routing prompt caching enabled models to previous deployments
Closes https://github.com/BerriAI/litellm/issues/6784
* test: fix linting errors
* feat(databricks/): convert basemodel to dict and exclude none values
allow passing pydantic message to databricks
* fix(utils.py): ensure all chat completion messages are dict
* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)
* add custom_llm_provider to SpendLogsPayload
* add custom_llm_provider to SpendLogs
* add custom llm provider to SpendLogs payload
* test_spend_logs_payload
* Add MLflow to the side bar (#7031)
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* (bug fix) SpendLogs update DB catch all possible DB errors for retrying (#7082)
* catch DB_CONNECTION_ERROR_TYPES
* fix DB retry mechanism for SpendLog updates
* use DB_CONNECTION_ERROR_TYPES in auth checks
* fix exp back off for writing SpendLogs
* use _raise_failed_update_spend_exception to ensure errors print as NON blocking
* test_update_spend_logs_multiple_batches_with_failure
* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)
* fix model cost map fireworks ai "supports_response_schema": true,
* fix supports_response_schema
* fix map openai params fireworks ai
* test_map_response_format
* test_map_response_format
* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)
* bump: version 1.53.9 → 1.54.0
* fix deepinfra
* litellm db fixes LiteLLM_UserTable (#7089)
* ci/cd queue new release
* fix llama-3.3-70b-versatile
* refactor - use consistent file naming convention `AI21/` -> `ai21` (#7090)
* fix refactor - use consistent file naming convention
* ci/cd run again
* fix naming structure
* fix use consistent naming (#7092)
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
* fix(cost_calculator.py): move to using `.get_model_info()` for cost per token calculations
ensures cost tracking is reliable - handles edge cases of parsing model cost map
* build(model_prices_and_context_window.json): add 'supports_response_schema' for select tgai models
Fixes https://github.com/BerriAI/litellm/pull/7037#discussion_r1872157329
* build(model_prices_and_context_window.json): remove 'pdf input' and 'vision' support from nova micro in model map
Bedrock docs indicate no support for micro - https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html
* fix(converse_transformation.py): support amazon nova tool use
* fix(opentelemetry): Add missing LLM request type attribute to spans (#7041)
* feat(opentelemetry): add LLM request type attribute to spans
* lint
* fix: curl usage (#7038)
curl -d, --data <data> is lowercase d
curl -D, --dump-header <filename> is uppercase D
references:
https://curl.se/docs/manpage.html#-dhttps://curl.se/docs/manpage.html#-D
* fix(spend_tracking.py): handle empty 'id' in model response - when creating spend log
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(streaming_chunk_builder.py): handle initial id being empty string
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(anthropic_passthrough_logging_handler.py): add end user cost tracking for anthropic pass through endpoint
* docs(pass_through/): refactor docs location + add table on supported features for pass through endpoints
* feat(anthropic_passthrough_logging_handler.py): support end user cost tracking via anthropic sdk
* docs(anthropic_completion.md): add docs on passing end user param for cost tracking on anthropic sdk
* fix(litellm_logging.py): use standard logging payload if present in kwargs
prevent datadog logging error for pass through endpoints
* docs(bedrock.md): add rerank api usage example to docs
* bugfix/change dummy tool name format (#7053)
* fix viewing keys (#7042)
* ui new build
* build(model_prices_and_context_window.json): add bedrock region models to model cost map (#7044)
* bye (#6982)
* (fix) litellm router.aspeech (#6962)
* doc Migrating Databases
* fix aspeech on router
* test_audio_speech_router
* test_audio_speech_router
* docs show supported providers on batches api doc
* change dummy tool name format
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix: fix linting errors
* test: update test
* fix(litellm_logging.py): fix pass through check
* fix(test_otel_logging.py): fix test
* fix(cost_calculator.py): update handling for cost per second
* fix(cost_calculator.py): fix cost check
* test: fix test
* (fix) adding public routes when using custom header (#7045)
* get_api_key_from_custom_header
* add test_get_api_key_from_custom_header
* fix testing use 1 file for test user api key auth
* fix test user api key auth
* test_custom_api_key_header_name
* build: update ui build
---------
Co-authored-by: Doron Kopit <83537683+doronkopit5@users.noreply.github.com>
Co-authored-by: lloydchang <lloydchang@gmail.com>
Co-authored-by: hgulersen <haymigulersen@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix(together_ai/chat): only return response_format + tools for supported models
Fixes https://github.com/BerriAI/litellm/issues/6972
* feat(bedrock/rerank): initial working commit for bedrock rerank api support
Closes https://github.com/BerriAI/litellm/issues/7021
* feat(bedrock/rerank): async bedrock rerank api support
Addresses https://github.com/BerriAI/litellm/issues/7021
* build(model_prices_and_context_window.json): add 'supports_prompt_caching' for bedrock models + cleanup cross-region from model list (duplicate information - lead to inconsistencies )
* docs(json_mode.md): clarify model support for json schema
Closes https://github.com/BerriAI/litellm/issues/6998
* fix(_service_logger.py): handle dd callback in list
ensure failed spend tracking is logged to datadog
* feat(converse_transformation.py): translate from anthropic format to bedrock format
Closes https://github.com/BerriAI/litellm/issues/7030
* fix: fix linting errors
* test: fix test
* fix(key_management_endpoints.py): override metadata field value on update
allow user to override tags
* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric
allow disabling end user cost tracking on prometheus - fixes cardinality issue
* fix(litellm_pre_call_utils.py): add key/team level enforced params
Fixes https://github.com/BerriAI/litellm/issues/6652
* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update
* docs(enterprise.md): add docs on enforcing required params for llm requests
* Add support of Galadriel API (#7005)
* fix(router.py): robust retry after handling
set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment
* test(test_router.py): fix test
* feat(bedrock/): add support for 'nova' models
also adds explicit 'converse/' route for simpler routing
* fix: fix 'supports_pdf_input'
return if model supports pdf input on get_model_info
* feat(converse_transformation.py): support bedrock pdf input
* docs(document_understanding.md): add document understanding to docs
* fix(litellm_pre_call_utils.py): fix linting error
* fix(init.py): fix passing of bedrock converse models
* feat(bedrock/converse): support 'response_format={"type": "json_object"}'
* fix(converse_handler.py): fix linting error
* fix(base_llm_unit_tests.py): fix test
* fix: fix test
* test: fix test
* test: fix test
* test: remove duplicate test
---------
Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
* docs(config_settings.md): document all router_settings
* ci(config.yml): add router_settings doc test to ci/cd
* test: debug test on ci/cd
* test: debug ci/cd test
* test: fix test
* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call
Causes downstream errors if ui just fails to load team list
* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests
adds complete coverage for all 'response_format' values to ci/cd
* feat(router.py): support wildcard routes in `get_router_model_info()`
Addresses https://github.com/BerriAI/litellm/issues/6914
* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models
Allows for ratelimit tracking for gemini models even with wildcard routing enabled
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): add tpm/rpm tracking on success/failure to global_router
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): support wildcard routes on router.get_model_group_usage()
* fix(router.py): fix linting error
* fix(router.py): implement get_remaining_tokens_and_requests
Addresses https://github.com/BerriAI/litellm/issues/6914
* fix(router.py): fix linting errors
* test: fix test
* test: fix tests
* docs(config_settings.md): add missing dd env vars to docs
* fix(router.py): check if hidden params is dict
* fix(key_management_endpoints.py): fix user-membership check when creating team key
* docs: add deprecation notice on original `/v1/messages` endpoint + add better swagger tags on pass-through endpoints
* fix(gemini/): fix image_url handling for gemini
Fixes https://github.com/BerriAI/litellm/issues/6897
* fix(teams.tsx): fix member add when role is 'user'
* fix(team_endpoints.py): /team/member_add
fix adding several new members to team
* test(test_vertex.py): remove redundant test
* test(test_proxy_server.py): fix team member add tests
* Fix Vertex AI function calling invoke: use JSON format instead of protobuf text format. (#6702)
* test: test tool_call conversion when arguments is empty dict
Fixes https://github.com/BerriAI/litellm/issues/6833
* fix(openai_like/handler.py): return more descriptive error message
Fixes https://github.com/BerriAI/litellm/issues/6812
* test: skip overloaded model
* docs(anthropic.md): update anthropic docs to show how to route to any new model
* feat(groq/): fake stream when 'response_format' param is passed
Groq doesn't support streaming when response_format is set
* feat(groq/): add response_format support for groq
Closes https://github.com/BerriAI/litellm/issues/6845
* fix(o1_handler.py): remove fake streaming for o1
Closes https://github.com/BerriAI/litellm/issues/6801
* build(model_prices_and_context_window.json): add groq llama3.2b model pricing
Closes https://github.com/BerriAI/litellm/issues/6807
* fix(utils.py): fix handling ollama response format param
Fixes https://github.com/BerriAI/litellm/issues/6848#issuecomment-2491215485
* docs(sidebars.js): refactor chat endpoint placement
* fix: fix linting errors
* test: fix test
* test: fix test
* fix(openai_like/handler): handle max retries
* fix(streaming_handler.py): fix streaming check for openai-compatible providers
* test: update test
* test: correctly handle model is overloaded error
* test: update test
* test: fix test
* test: mark flaky test
---------
Co-authored-by: Guowang Li <Guowang@users.noreply.github.com>
* fix(anthropic/chat/transformation.py): add json schema as values: json_schema
fixes passing pydantic obj to anthropic
Fixes https://github.com/BerriAI/litellm/issues/6766
* (feat): Add timestamp_granularities parameter to transcription API (#6457)
* Add timestamp_granularities parameter to transcription API
* add param to the local test
* fix(databricks/chat.py): handle max_retries optional param handling for openai-like calls
Fixes issue with calling finetuned vertex ai models via databricks route
* build(ui/): add team admins via proxy ui
* fix: fix linting error
* test: fix test
* docs(vertex.md): refactor docs
* test: handle overloaded anthropic model error
* test: remove duplicate test
* test: fix test
* test: update test to handle model overloaded error
---------
Co-authored-by: Show <35062952+BrunooShow@users.noreply.github.com>
* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742)
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743)
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error
* fix(deepseek/chat): convert content list to str
Fixes https://github.com/BerriAI/litellm/issues/6642
* test(test_deepseek_completion.py): implement base llm unit tests
increase robustness across providers
* fix(router.py): support content policy violation fallbacks with default fallbacks
* fix(opentelemetry.py): refactor to move otel imports behing flag
Fixes https://github.com/BerriAI/litellm/issues/6636
* fix(opentelemtry.py): close span on success completion
* fix(user_api_key_auth.py): allow user_role to default to none
* fix: mark flaky test
* fix(opentelemetry.py): move otelconfig.from_env to inside the init
prevent otel errors raised just by importing the litellm class
* fix(user_api_key_auth.py): fix auth error