* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param
Closes https://github.com/BerriAI/litellm/issues/8500
* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model
* style: cleanup invalid json trailing commma
* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template
enables passing complete tokenizer config of model to litellm
Allows calling deepseek on bedrock with the correct prompt template
* fix(utils.py): fix register_prompt_template for custom model names
* test(test_prompt_factory.py): fix test
* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model
* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls
enables proxy admin to set base model for ft bedrock deepseek model
* feat(bedrock/invoke): support deepseek_r1 route for bedrock
makes it easy to apply the right chat template to that call
* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work
* test(test_completion.py): add e2e mock test for bedrock deepseek
* docs(bedrock.md): document new deepseek_r1 route for bedrock
allows us to use the right config
* fix(exception_mapping_utils.py): catch read operation timeout
* initial transform for invoke
* invoke transform_response
* working - able to make request
* working get_complete_url
* working - invoke now runs on llm_http_handler
* fix unused imports
* track litellm overhead ms
* working stream request
* sign_request transform
* sign_request update
* use has_async_custom_stream_wrapper property
* use get_async_custom_stream_wrapper in base llm http handler
* fix make_call in invoke handler
* fix invoke with streaming get_async_custom_stream_wrapper
* working bedrock async streaming with invoke
* fix make call handler for bedrock
* test_all_model_configs
* fix test_bedrock_custom_prompt_template
* sync streaming for bedrock invoke
* fix _add_stream_param_to_request_body
* test_async_text_completion_bedrock
* fix transform_request
* fix get_supported_openai_params
* fix test supports tool choice
* fix test_supports_tool_choice
* add unit test coverage for bedrock invoke transform
* fix location of transformation files
* update import loc
* fix bedrock invoke unit tests
* fix import for max completion tokens
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param
Closes https://github.com/BerriAI/litellm/issues/8182
* fix(main.py): ensure `reasoning_effort` is a mapped openai param
* refactor(azure/): rename o1_[x] files to o_series_[x]
* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort
* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests
* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers
Makes it easy to switch from openai to anthropic
* fix: fix linting errors
* fix(base_llm_unit_tests.py): fix test
* fix(main.py): add missing param
* fix: support azure o3 model family for fake streaming workaround (#8162)
* fix: support azure o3 model family for fake streaming workaround
* refactor: rename helper to is_o_series_model for clarity
* update function calling parameters for o3 models (#8178)
* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3
ensures max_tokens is correctly translated for o3
* feat(openai/): refactor o1 files to be 'o_series' files
expands naming to cover o3
* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets
* test(test_azure_o_series.py): assert stream faked for azure o3 mini
Resolves https://github.com/BerriAI/litellm/pull/8162
* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing
* docs(azure.md): update doc with `o_series/` model name
---------
Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
* add support for using llama spec with bedrock
* fix get_bedrock_invoke_provider
* add support for using bedrock provider in mappings
* working request
* test_bedrock_custom_deepseek
* test_bedrock_custom_deepseek
* fix _get_model_id_for_llama_like_model
* test_bedrock_custom_deepseek
* doc DeepSeek-R1-Distill-Llama-70B
* test_bedrock_custom_deepseek
* Litellm dev 01 29 2025 p4 (#8107)
* fix(key_management_endpoints.py): always get db team
Fixes https://github.com/BerriAI/litellm/issues/7983
* test(test_key_management.py): add unit test enforcing check_db_only is always true on key generate checks
* test: fix test
* test: skip gemini thinking
* Litellm dev 01 29 2025 p3 (#8106)
* fix(__init__.py): reduces size of __init__.py and reduces scope for errors by using correct param
* refactor(__init__.py): refactor init by cleaning up redundant params
* refactor(__init__.py): move more constants into constants.py
cleanup root
* refactor(__init__.py): more cleanup
* feat(__init__.py): expose new 'disable_hf_tokenizer_download' param
enables hf model usage in offline env
* docs(config_settings.md): document new disable_hf_tokenizer_download param
* fix: fix linting error
* fix: fix unsafe comparison
* test: fix test
* docs(public_teams.md): add doc showing how to expose public teams for users to join
* docs: add beta disclaimer on public teams
* test: update tests
* feat(lowest_tpm_rpm_v2.py): fix redis cache check to use >= instead of >
makes it consistent
* test(test_custom_guardrails.py): add more unit testing on default on guardrails
ensure it runs if user sent guardrail list is empty
* docs(quick_start.md): clarify default on guardrails run even if user guardrails list contains other guardrails
* refactor(litellm_logging.py): refactor no-log to helper util
allows for more consistent behavior
* feat(litellm_logging.py): add event hook to verbose logs
* fix(litellm_logging.py): add unit testing to ensure `litellm.disable_no_log_param` is respected
* docs(logging.md): document how to disable 'no-log' param
* test: fix test to handle feb
* test: cleanup old bedrock model
* fix: fix router check
* test(test_completion_cost.py): add unit testing to ensure all bedrock models with region name have cost tracked
* feat: initial script to get bedrock pricing from amazon api
ensures bedrock pricing is accurate
* build(model_prices_and_context_window.json): correct bedrock model prices based on api check
ensures accurate bedrock pricing
* ci(config.yml): add bedrock pricing check to ci/cd
ensures litellm always maintains up-to-date pricing for bedrock models
* ci(config.yml): add beautiful soup to ci/cd
* test: bump groq model
* test: fix test
* fix(utils.py): move adding custom logger callback to success event into separate function + don't add success callback to failure event
if user is explicitly choosing 'success' callback, don't log failure as well
* test(test_utils.py): add unit test to ensure custom logger callback only adds callback to specific event
* fix(utils.py): remove string from list of callbacks once corresponding callback class is added
prevents floating values - simplifies testing
* fix(utils.py): fix linting error
* test: cleanup args before test
* test: fix test
* test: update test
* test: fix test
* fix(utils.py): don't pass 'anthropic-beta' header to vertex - will cause request to fail
* fix(utils.py): add flag to allow user to disable filtering invalid headers
ensure user can control behaviour
* style(utils.py): cleanup message
* test(test_utils.py): add unit test to cover invalid header filtering
* fix(proxy_server.py): fix custom openapi schema generation
* fix(utils.py): pass extra headers if set
* fix(main.py): fix image variation to use 'client' param
* refactor: initial commit for using separate sync vs. async transformation routes for bedrock
ensures no blocking calls e.g. when converting image url to b64
* perf(converse_transformation.py): make bedrock converse transformation async
asyncify's the bedrock message transformation - useful for handling image urls for bedrock
* fix(converse_handler.py): fix logging for async streaming
* style: cleanup unused imports
* fix base aws llm
* fix auth with aws role
* test aws base llm
* fix base aws llm init
* run ci/cd again
* fix get_credentials
* ci/cd run again
* _auth_with_aws_role
* feat(main.py): initial commit for `/image/variations` endpoint support
* refactor(base_llm/): introduce new base llm base config for image variation endpoints
* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler
* fix: test
* feat(openai/): working openai `/image/variation` endpoint calls via sdk
* feat(topaz/): topaz sync image variation call support
Addresses https://github.com/BerriAI/litellm/issues/7593
'
* fix(topaz/transformation.py): fix linting errors
* fix(openai/image_variations/handler.py): fix passing json data
* fix(main.py): image_variation/
support async image variation route - `aimage_variation`
* fix(test_get_model_info.py): fix test
* fix: cleanup unused imports
* feat(openai/): add async `/image/variations` endpoint support
* feat(topaz/): support async `/image/variations` calls
* fix: test
* fix(utils.py): fix get_model_info_helper for no model info w/ provider config
handles situation where model info is not known but provider config exists
* test(test_router_fallbacks.py): mark flaky test
* fix: fix unused imports
* test: bump otel load test perf threshold - accounts for current load tests hitting same server
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names
prevents bad health checks on wildcard routes
* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
* docs(friendliai.md): update FriendliAI documentation and model details
* docs(friendliai.md): remove unused imports for cleaner documentation
* feat: add support for parallel function calling, system messages, and response schema in model configuration
* test(test_utils.py): initial test for valid models
Addresses https://github.com/BerriAI/litellm/issues/7525
* fix: test
* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint
* refactor(fireworks_ai/): support checking model info on `/v1/models` route
* docs(set_keys.md): update docs to clarify check llm provider api usage
* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth
* fix(watsonx): read in watsonx token from env var
* fix: fix linting errors
* fix(utils.py): fix provider config check
* style: cleanup unused imports
* refactor(prometheus.py): refactor to remove `_tag` metrics and incorporate in regular metrics
* fix(prometheus.py): handle label values not set in enum values
* feat(prometheus.py): working e2e custom metadata labels
* docs(prometheus.md): update docs to clarify how custom metrics would work
* test(test_prometheus_unit_tests.py): fix test
* test: add unit testing
* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory
makes adding new metrics easier
* feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric'
* refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory
makes adding new metrics easier
* feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric'
* feat(prometheus.py): support tracking success events with custom metrics
* refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary
reduces scope for missing values
* feat(prometheus.py): refactor all tags to support custom metadata tags
enables metadata tags to be used across for e2e tracking
* fix(prometheus.py): fix requested model on success event enum_values
* test: fix test
* test: fix test
* test: handle filenotfound error
* docs(prometheus.md): add new values to prometheus
* docs(prometheus.md): document adding custom metrics on prometheus
* bump: version 1.56.5 → 1.56.6
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* feat(deepgram/): initial e2e support for deepgram stt
Uses deepgram's `/listen` endpoint to transcribe speech to text
Closes https://github.com/BerriAI/litellm/issues/4875
* fix: fix linting errors
* test: fix test
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* ui fix - allow searching model list + fix bug on filtering
* qa fix - use correct provider name for azure_text
* ui wrap content onto next line
* ui fix - allow selecting current UI session when logging in
* ui session budgets