* fix(model_checks.py): update returning known model from wildcard to filter based on given model prefix
ensures wildcard route - `vertex_ai/gemini-*` just returns known vertex_ai/gemini- models
* test(test_proxy_utils.py): add unit testing for new 'get_known_models_from_wildcard' helper
* test(test_models.py): add e2e testing for `/model_group/info` endpoint
* feat(prometheus.py): support tracking total requests by user_email on prometheus
adds initial support for tracking total requests by user_email
* test(test_prometheus.py): add testing to ensure user email is always tracked
* test: update testing for new prometheus metric
* test(test_prometheus_unit_tests.py): add user email to total proxy metric
* test: update tests
* test: fix spend tests
* test: fix test
* fix(pagerduty.py): fix linting error
* fix(litellm_logging.py): support saving applied guardrails in logging object
allows list of applied guardrails to be logged for proxy admin's knowledge
* feat(spend_tracking_utils.py): log applied guardrails to spend logs
makes it easy for admin to know what guardrails were applied on a request
* ci(config.yml): uninstall posthog from ci/cd
* test: fix tests
* test: update test
* add initial test for assembly ai
* start using PassthroughEndpointRouter
* migrate to lllm passthrough endpoints
* add assembly ai as a known provider
* fix PassthroughEndpointRouter
* fix set_pass_through_credentials
* working EU request to assembly ai pass through endpoint
* add e2e test assembly
* test_assemblyai_routes_with_bad_api_key
* clean up pass through endpoint router
* e2e testing for assembly ai pass through
* test assembly ai e2e testing
* delete assembly ai models
* fix code quality
* ui working assembly ai api base flow
* fix install assembly ai
* update model call details with kwargs for pass through logging
* fix tracking assembly ai model in response
* _handle_assemblyai_passthrough_logging
* fix test_initialize_deployment_for_pass_through_unsupported_provider
* TestPassthroughEndpointRouter
* _get_assembly_transcript
* fix assembly ai pt logging tests
* fix assemblyai_proxy_route
* fix _get_assembly_region_from_url
* test(base_llm_unit_tests.py): add test to ensure drop params is respected
* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility
* build: add cherry picked commits
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param
Closes https://github.com/BerriAI/litellm/issues/8182
* fix(main.py): ensure `reasoning_effort` is a mapped openai param
* refactor(azure/): rename o1_[x] files to o_series_[x]
* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort
* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests
* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers
Makes it easy to switch from openai to anthropic
* fix: fix linting errors
* fix(base_llm_unit_tests.py): fix test
* fix(main.py): add missing param
* Litellm dev 01 29 2025 p4 (#8107)
* fix(key_management_endpoints.py): always get db team
Fixes https://github.com/BerriAI/litellm/issues/7983
* test(test_key_management.py): add unit test enforcing check_db_only is always true on key generate checks
* test: fix test
* test: skip gemini thinking
* Litellm dev 01 29 2025 p3 (#8106)
* fix(__init__.py): reduces size of __init__.py and reduces scope for errors by using correct param
* refactor(__init__.py): refactor init by cleaning up redundant params
* refactor(__init__.py): move more constants into constants.py
cleanup root
* refactor(__init__.py): more cleanup
* feat(__init__.py): expose new 'disable_hf_tokenizer_download' param
enables hf model usage in offline env
* docs(config_settings.md): document new disable_hf_tokenizer_download param
* fix: fix linting error
* fix: fix unsafe comparison
* test: fix test
* docs(public_teams.md): add doc showing how to expose public teams for users to join
* docs: add beta disclaimer on public teams
* test: update tests
* refactor(factory.py): refactor async bedrock message transformation to use async get request for image url conversion
improve latency of bedrock call
* test(test_bedrock_completion.py): add unit testing to ensure async image url get called for async bedrock call
* refactor(factory.py): refactor bedrock translation to use BedrockImageProcessor
reduces duplicate code
* fix(factory.py): fix bug not allowing pdf's to be processed
* fix(factory.py): fix bedrock converse document understanding with image url
* docs(bedrock.md): clarify all bedrock document types are supported
* refactor: cleanup redundant test + unused imports
* perf: improve perf with reusable clients
* test: fix test
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request
adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)
* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming
Fixes https://github.com/BerriAI/litellm/issues/7942
* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"
This reverts commit 7a052a64e3.
* fix(deepseek-r-1): return reasoning_content as a top-level param
ensures compatibility with existing tools that use it
* fix: fix linting error
* fix(bedrock/converse_handler.py): fix bedrock region name on async calls
* fix(utils.py): fix split model handling
Fixes bedrock cost calculation when region name is given
* feat(_health_endpoints.py): support health checking datadog integration
Closes https://github.com/BerriAI/litellm/issues/7921
* feat(router.py): add retry headers to response
makes it easy to add testing to ensure model-specific retries are respected
* fix(add_retry_headers.py): clarify attempted retries vs. max retries
* test(test_fallbacks.py): add test for checking if max retries set for model is respected
* test(test_fallbacks.py): assert values for attempted retries and max retries are as expected
* fix(utils.py): return timeout in litellm proxy response headers
* test(test_fallbacks.py): add test to assert model specific timeout used on timeout error
* test: add bad model with timeout to proxy
* fix: fix linting error
* fix(router.py): fix get model list from model alias
* test: loosen test restriction - account for other events on proxy
* feat(main.py): add new 'provider_specific_header' param
allows passing extra header for specific provider
* fix(litellm_pre_call_utils.py): add unit test for pre call utils
* test(test_bedrock_completion.py): skip test now that bedrock supports this
* fix(types/utils.py): support returning 'reasoning_content' for deepseek models
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(convert_dict_to_response.py): return deepseek response in provider_specific_field
allows for separating openai vs. non-openai params in model response
* fix(utils.py): support 'provider_specific_field' in delta chunk as well
allows deepseek reasoning content chunk to be returned to user from stream as well
Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218
* fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route
* fix(watsonx/): fix watsonx_text/ route with space id
* fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls
* fix(utils.py): rename to '..fields'
* fix: fix linting errors
* fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons
e from being non-oai compatible
* fix: cleanup unused imports
* docs(deepseek.md): add docs for deepseek reasoning model
* fix(utils.py): don't pass 'anthropic-beta' header to vertex - will cause request to fail
* fix(utils.py): add flag to allow user to disable filtering invalid headers
ensure user can control behaviour
* style(utils.py): cleanup message
* test(test_utils.py): add unit test to cover invalid header filtering
* fix(proxy_server.py): fix custom openapi schema generation
* fix(utils.py): pass extra headers if set
* fix(main.py): fix image variation to use 'client' param
* refactor: initial commit for using separate sync vs. async transformation routes for bedrock
ensures no blocking calls e.g. when converting image url to b64
* perf(converse_transformation.py): make bedrock converse transformation async
asyncify's the bedrock message transformation - useful for handling image urls for bedrock
* fix(converse_handler.py): fix logging for async streaming
* style: cleanup unused imports
* feat(main.py): initial commit for `/image/variations` endpoint support
* refactor(base_llm/): introduce new base llm base config for image variation endpoints
* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler
* fix: test
* feat(openai/): working openai `/image/variation` endpoint calls via sdk
* feat(topaz/): topaz sync image variation call support
Addresses https://github.com/BerriAI/litellm/issues/7593
'
* fix(topaz/transformation.py): fix linting errors
* fix(openai/image_variations/handler.py): fix passing json data
* fix(main.py): image_variation/
support async image variation route - `aimage_variation`
* fix(test_get_model_info.py): fix test
* fix: cleanup unused imports
* feat(openai/): add async `/image/variations` endpoint support
* feat(topaz/): support async `/image/variations` calls
* fix: test
* fix(utils.py): fix get_model_info_helper for no model info w/ provider config
handles situation where model info is not known but provider config exists
* test(test_router_fallbacks.py): mark flaky test
* fix: fix unused imports
* test: bump otel load test perf threshold - accounts for current load tests hitting same server
* feat(langfuse.py): log the used prompt when prompt management used
* test: fix test
* docs(self_serve.md): add doc on restricting personal key creation on ui
* feat(s3.py): support s3 logging with team alias prefixes (if available)
New preview feature
* fix(main.py): remove old if block - simplify to just await if coroutine returned
fixes lm_studio async embedding error
* fix(langfuse.py): handle get prompt check
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url
* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic
deduplicates code
* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages
* docs(prompt_management.md): update prompt management to be in beta
given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)
* refactor(prompt_management_base.py): introduce base class for prompt management
allows consistent behaviour across prompt management integrations
* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base
* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set
allows tracking what prompt was used for what purpose
* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse
allows logging prompt id / prompt variables to langfuse
* test: fix test
* fix(router.py): cleanup unused imports
* fix: fix linting error
* fix: fix trace param typing
* fix: fix linting errors
* fix: fix code qa check
* fix(streaming_chunk_builder_utils.py): add test for groq tool calling + streaming + combine chunks
Addresses https://github.com/BerriAI/litellm/issues/7621
* fix(streaming_utils.py): fix modelresponseiterator for openai like chunk parser
ensures chunk parser uses the correct tool call id when translating the chunk
Fixes https://github.com/BerriAI/litellm/issues/7621
* build(model_hub.tsx): display cost pricing on model hub
* build(model_hub.tsx): show cost per token pricing + complete model information
* fix(types/utils.py): fix usage object handling
* fix(types/utils.py): support langfuse + humanloop routes on llm router
* fix(main.py): remove acompletion elif block
just await if coroutine returned
* refactor(prometheus.py): refactor to remove `_tag` metrics and incorporate in regular metrics
* fix(prometheus.py): handle label values not set in enum values
* feat(prometheus.py): working e2e custom metadata labels
* docs(prometheus.md): update docs to clarify how custom metrics would work
* test(test_prometheus_unit_tests.py): fix test
* test: add unit testing
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory
makes adding new metrics easier
* feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric'
* refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory
makes adding new metrics easier
* feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric'
* feat(prometheus.py): support tracking success events with custom metrics
* refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary
reduces scope for missing values
* feat(prometheus.py): refactor all tags to support custom metadata tags
enables metadata tags to be used across for e2e tracking
* fix(prometheus.py): fix requested model on success event enum_values
* test: fix test
* test: fix test
* test: handle filenotfound error
* docs(prometheus.md): add new values to prometheus
* docs(prometheus.md): document adding custom metrics on prometheus
* bump: version 1.56.5 → 1.56.6
* fix(internal_user_endpoints.py): fix team list sort - handle team_alias being set + None
* fix(key_management_endpoints.py): allow team admin to create key for member via admin ui
Fixes https://github.com/BerriAI/litellm/issues/7482
* fix(proxy_server.py): allow querying info on specific model group via `/model_group/info`
allows client-side user to get model info from proxy
* fix(proxy_server.py): add docstring on `/model_group/info` showing how to filter by model name
* test(test_proxy_utils.py): add unit test for returning model group info filtered
* fix(proxy_server.py): fix query param
* fix(test_Get_model_info.py): handle no whitelisted bedrock modells
* fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class
* fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well
* feat(proxy_server.py): support setting custom tokenizer on config.yaml
Allows customizing value for `/utils/token_counter`
* fix(proxy_server.py): fix linting errors
* test: skip if file not found
* style: cleanup unused import
* docs(configs.md): add docs on setting custom tokenizer
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* test: fix azure o1 test
* test: fix tests
* fix: fix test
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* feat(deepgram/): initial e2e support for deepgram stt
Uses deepgram's `/listen` endpoint to transcribe speech to text
Closes https://github.com/BerriAI/litellm/issues/4875
* fix: fix linting errors
* test: fix test