* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811
* fix(router.py): fix mock timeout check
* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)
This error happens if you provide multiple fallback models to the completion function with model name defined in each one.
* fix(router.py): remove mock_timeout before sending to request
prevents reuse in fallbacks
* test: update test
* test: revert test change - wrong pr
---------
Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
* feat(pass_through_endpoints.py): fix anthropic end user cost tracking
* fix(anthropic/chat/transformation.py): use returned provider model for anthropic
handles anthropic `-latest` tag in request body throwing cost calculation errors
ensures we can be accurate in our model cost tracking
* feat(model_prices_and_context_window.json): add gemini-2.0-flash-thinking-exp pricing
* test: update test to use assumption that user_api_key_dict can get anthropic user id
* test: fix test
* fix: fix test
* fix(anthropic_pass_through.py): uncomment previous anthropic end-user cost tracking code block
can't guarantee user api key dict always has end user id - too many code paths
* fix(user_api_key_auth.py): this allows end user id from request body to always be read and set in auth object
* fix(auth_check.py): fix linting error
* test: fix auth check
* fix(auth_utils.py): fix get end user id to handle metadata = None
* fix(factory.py): fix bedrock document url check
Make check more generic - if starts with 'text' or 'application' assume it's a document and let it go through
Fixes https://github.com/BerriAI/litellm/issues/7746
* feat(key_management_endpoints.py): support writing new key alias to aws secret manager - on key rotation
adds rotation endpoint to aws key management hook - allows for rotated litellm virtual keys with new key alias to be written to it
* feat(key_management_event_hooks.py): support rotating keys and updating secret manager
* refactor(base_secret_manager.py): support rotate secret at the base level
since it's just an abstraction function, it's easy to implement at the base manager level
* style: cleanup unused imports
* use lru cache wrapper
* use lru_cache_wrapper for _cached_get_model_info_helper
* fix _get_traceback_str_for_error
* huggingface/mistralai/Mistral-7B-Instruct-v0.3
* feat(main.py): initial commit for `/image/variations` endpoint support
* refactor(base_llm/): introduce new base llm base config for image variation endpoints
* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler
* fix: test
* feat(openai/): working openai `/image/variation` endpoint calls via sdk
* feat(topaz/): topaz sync image variation call support
Addresses https://github.com/BerriAI/litellm/issues/7593
'
* fix(topaz/transformation.py): fix linting errors
* fix(openai/image_variations/handler.py): fix passing json data
* fix(main.py): image_variation/
support async image variation route - `aimage_variation`
* fix(test_get_model_info.py): fix test
* fix: cleanup unused imports
* feat(openai/): add async `/image/variations` endpoint support
* feat(topaz/): support async `/image/variations` calls
* fix: test
* fix(utils.py): fix get_model_info_helper for no model info w/ provider config
handles situation where model info is not known but provider config exists
* test(test_router_fallbacks.py): mark flaky test
* fix: fix unused imports
* test: bump otel load test perf threshold - accounts for current load tests hitting same server
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names
prevents bad health checks on wildcard routes
* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
* feat(langfuse.py): log the used prompt when prompt management used
* test: fix test
* docs(self_serve.md): add doc on restricting personal key creation on ui
* feat(s3.py): support s3 logging with team alias prefixes (if available)
New preview feature
* fix(main.py): remove old if block - simplify to just await if coroutine returned
fixes lm_studio async embedding error
* fix(langfuse.py): handle get prompt check
* test(test_get_model_info.py): add unit test confirming router deployment updates global 'get_model_info'
* fix(get_supported_openai_params.py): fix custom llm provider 'get_supported_openai_params'
Fixes https://github.com/BerriAI/litellm/issues/7668
* docs(azure.md): clarify how azure ad token refresh on proxy works
Closes https://github.com/BerriAI/litellm/issues/7665
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url
* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic
deduplicates code
* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages
* docs(prompt_management.md): update prompt management to be in beta
given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)
* refactor(prompt_management_base.py): introduce base class for prompt management
allows consistent behaviour across prompt management integrations
* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base
* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set
allows tracking what prompt was used for what purpose
* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse
allows logging prompt id / prompt variables to langfuse
* test: fix test
* fix(router.py): cleanup unused imports
* fix: fix linting error
* fix: fix trace param typing
* fix: fix linting errors
* fix: fix code qa check
* fix(main.py): fix lm_studio/ embedding routing
adds the mapping + updates docs with example
* docs(self_serve.md): update doc to show how to auto-add sso users to teams
* fix(streaming_handler.py): simplify async iterator check, to just check if streaming response is an async iterable
* fix(streaming_chunk_builder_utils.py): add test for groq tool calling + streaming + combine chunks
Addresses https://github.com/BerriAI/litellm/issues/7621
* fix(streaming_utils.py): fix modelresponseiterator for openai like chunk parser
ensures chunk parser uses the correct tool call id when translating the chunk
Fixes https://github.com/BerriAI/litellm/issues/7621
* build(model_hub.tsx): display cost pricing on model hub
* build(model_hub.tsx): show cost per token pricing + complete model information
* fix(types/utils.py): fix usage object handling
* fix(custom_logger.py): expose new 'async_get_chat_completion_prompt' event hook
* fix(custom_logger.py): langfuse_prompt_management.py
remove 'headers' from custom logger 'async_get_chat_completion_prompt' and 'get_chat_completion_prompt' event hooks
* feat(router.py): expose new function for prompt management based routing
* feat(router.py): partial working router prompt factory logic
allows load balanced model to be used for model name w/ langfuse prompt management call
* feat(router.py): fix prompt management with load balanced model group
* feat(langfuse_prompt_management.py): support reading in openai params from langfuse
enables user to define optional params on langfuse vs. client code
* test(test_Router.py): add unit test for router based langfuse prompt management
* fix: fix linting errors
* docs(friendliai.md): update FriendliAI documentation and model details
* docs(friendliai.md): remove unused imports for cleaner documentation
* feat: add support for parallel function calling, system messages, and response schema in model configuration
* test(test_utils.py): initial test for valid models
Addresses https://github.com/BerriAI/litellm/issues/7525
* fix: test
* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint
* refactor(fireworks_ai/): support checking model info on `/v1/models` route
* docs(set_keys.md): update docs to clarify check llm provider api usage
* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth
* fix(watsonx): read in watsonx token from env var
* fix: fix linting errors
* fix(utils.py): fix provider config check
* style: cleanup unused imports
* fix(redact_messages.py): fix redact messages for non-model response input to be dictionary
fixes issue with otel logging when message redaction is enabled
* fix(proxy_server.py): fix langfuse key leak in exception string
* test: fix test
* test: fix test
* test: fix tests
* feat(deepgram/transformation.py): support reading in deepgram api base from env var
* fix(litellm_logging.py): make skipping log message a .info
easier to see
* docs(logging.md): add doc on turn off all tracking/logging for a request
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class
* fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well
* feat(proxy_server.py): support setting custom tokenizer on config.yaml
Allows customizing value for `/utils/token_counter`
* fix(proxy_server.py): fix linting errors
* test: skip if file not found
* style: cleanup unused import
* docs(configs.md): add docs on setting custom tokenizer
* feat(deepgram/): initial e2e support for deepgram stt
Uses deepgram's `/listen` endpoint to transcribe speech to text
Closes https://github.com/BerriAI/litellm/issues/4875
* fix: fix linting errors
* test: fix test
* feat(main.py): mock_response() - support 'litellm.ContextWindowExceededError' in mock response
enabled quicker router/fallback/proxy debug on context window errors
* feat(exception_mapping_utils.py): extract special litellm errors from error str if calling `litellm_proxy/` as provider
Closes https://github.com/BerriAI/litellm/issues/7259
* fix(user_api_key_auth.py): specify 'Received Proxy Server Request' is span kind server
Closes https://github.com/BerriAI/litellm/issues/7298
* refactor(prometheus.py): refactor to use a factory method for setting label values
allows for enforcing end user id disabling on prometheus e2e
* fix: fix linting error
* fix(prometheus.py): ensure label factory drops end-user value if disabled by user
* fix(prometheus.py): specify service_type in end user tracking get
* test: fix test
* test: add unit test for prometheus factory
* test: improve test (cover flag not set scenario)
* test(test_prometheus.py): e2e test covering if 'end_user_id' shows up in testing if disabled
scrapes the `/metrics` endpoint and scans text to check if id appears in emitted metrics
* fix(prometheus.py): stringify status code before logging it
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* fix(utils.py): default custom_llm_provider=None for 'supports_response_schema'
Closes https://github.com/BerriAI/litellm/issues/7397
* refactor(langfuse/): call langfuse logger inside customlogger compatible langfuse class, refactor langfuse logger to use verbose_logger.debug instead of print_verbose
* refactor(litellm_pre_call_utils.py): move config based team callbacks inside dynamic team callback logic
enables simpler unit testing for config-based team callbacks
* fix(proxy/_types.py): handle teamcallbackmetadata - none values
drop none values if present. if all none, use default dict to avoid downstream errors
* test(test_proxy_utils.py): add unit test preventing future issues - asserts team_id in config state not popped off across calls
Fixes https://github.com/BerriAI/litellm/issues/6787
* fix(langfuse_prompt_management.py): add success + failure logging event support
* fix: fix linting error
* test: fix test
* test: fix test
* test: override o1 prompt caching - openai currently not working
* test: fix test
* run azure testing on ci/cd
* update docs on azure batches endpoints
* add input azure.jsonl
* refactor - use separate file for batches endpoints
* fixes for passing custom llm provider to /batch endpoints
* pass custom llm provider to files endpoints
* update azure batches doc
* add info for azure batches api
* update batches endpoints
* use simple helper for raising proxy exception
* update config.yml
* fix imports
* add type hints to get_litellm_params
* update get_litellm_params
* update get_litellm_params
* update get slp
* QOL - stop double logging a create batch operations on custom loggers
* re use slp from og event
* _create_standard_logging_object_for_completed_batch
* fix linting errors
* reduce num changes in PR
* update BATCH_STATUS_POLL_MAX_ATTEMPTS
* fix(prometheus.py): support streaming end user litellm_proxy_total_requests_metric tracking
* fix(prometheus.py): add 'requested_model' and 'end_user_id' to 'litellm_request_total_latency_metric_bucket'
enables latency tracking by end user + requested model
* fix(prometheus.py): add end user, user and requested model metrics to 'litellm_llm_api_latency_metric'
* test: update prometheus unit tests
* test(test_prometheus.py): update tests
* test(test_prometheus.py): fix test
* test: reorder test
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint
Allow users to view what the available guardrails are
* docs: document new `/guardrails/list` endpoint
* docs(enterprise.md): update docs
* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats
* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests
* fix: fix linting errors
* fix: remove unused import