* fix base aws llm
* fix auth with aws role
* test aws base llm
* fix base aws llm init
* run ci/cd again
* fix get_credentials
* ci/cd run again
* _auth_with_aws_role
* feat(main.py): initial commit for `/image/variations` endpoint support
* refactor(base_llm/): introduce new base llm base config for image variation endpoints
* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler
* fix: test
* feat(openai/): working openai `/image/variation` endpoint calls via sdk
* feat(topaz/): topaz sync image variation call support
Addresses https://github.com/BerriAI/litellm/issues/7593
'
* fix(topaz/transformation.py): fix linting errors
* fix(openai/image_variations/handler.py): fix passing json data
* fix(main.py): image_variation/
support async image variation route - `aimage_variation`
* fix(test_get_model_info.py): fix test
* fix: cleanup unused imports
* feat(openai/): add async `/image/variations` endpoint support
* feat(topaz/): support async `/image/variations` calls
* fix: test
* fix(utils.py): fix get_model_info_helper for no model info w/ provider config
handles situation where model info is not known but provider config exists
* test(test_router_fallbacks.py): mark flaky test
* fix: fix unused imports
* test: bump otel load test perf threshold - accounts for current load tests hitting same server
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names
prevents bad health checks on wildcard routes
* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
* docs(friendliai.md): update FriendliAI documentation and model details
* docs(friendliai.md): remove unused imports for cleaner documentation
* feat: add support for parallel function calling, system messages, and response schema in model configuration
* test(test_utils.py): initial test for valid models
Addresses https://github.com/BerriAI/litellm/issues/7525
* fix: test
* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint
* refactor(fireworks_ai/): support checking model info on `/v1/models` route
* docs(set_keys.md): update docs to clarify check llm provider api usage
* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth
* fix(watsonx): read in watsonx token from env var
* fix: fix linting errors
* fix(utils.py): fix provider config check
* style: cleanup unused imports
* refactor(prometheus.py): refactor to remove `_tag` metrics and incorporate in regular metrics
* fix(prometheus.py): handle label values not set in enum values
* feat(prometheus.py): working e2e custom metadata labels
* docs(prometheus.md): update docs to clarify how custom metrics would work
* test(test_prometheus_unit_tests.py): fix test
* test: add unit testing
* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory
makes adding new metrics easier
* feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric'
* refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory
makes adding new metrics easier
* feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric'
* feat(prometheus.py): support tracking success events with custom metrics
* refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary
reduces scope for missing values
* feat(prometheus.py): refactor all tags to support custom metadata tags
enables metadata tags to be used across for e2e tracking
* fix(prometheus.py): fix requested model on success event enum_values
* test: fix test
* test: fix test
* test: handle filenotfound error
* docs(prometheus.md): add new values to prometheus
* docs(prometheus.md): document adding custom metrics on prometheus
* bump: version 1.56.5 → 1.56.6
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* feat(deepgram/): initial e2e support for deepgram stt
Uses deepgram's `/listen` endpoint to transcribe speech to text
Closes https://github.com/BerriAI/litellm/issues/4875
* fix: fix linting errors
* test: fix test
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* ui fix - allow searching model list + fix bug on filtering
* qa fix - use correct provider name for azure_text
* ui wrap content onto next line
* ui fix - allow selecting current UI session when logging in
* ui session budgets
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* fix(azure/): support passing headers to azure openai endpoints
Fixes https://github.com/BerriAI/litellm/issues/6217
* fix(utils.py): move default tokenizer to just openai
hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls
* fix(router.py): fix pattern matching router - add generic "*" to it as well
Fixes issue where generic "*" model access group wouldn't show up
* fix(pattern_match_deployments.py): match to more specific pattern
match to more specific pattern
allows setting generic wildcard model access group and excluding specific models more easily
* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty
don't delete all router models b/c of empty list
Fixes https://github.com/BerriAI/litellm/issues/7196
* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api
* fix(fireworks_ai/): support passing response_format + tool call in same message
Addresses https://github.com/BerriAI/litellm/issues/7135
* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"
This reverts commit 6a30dc6929.
* test: fix test
* fix(replicate/): fix replicate default retry/polling logic
* test: add unit testing for router pattern matching
* test: update test to use default oai tokenizer
* test: mark flaky test
* test: skip flaky test
* fix(acompletion): support fallbacks on acompletion
allows health checks for wildcard routes to use fallback models
* test: update cohere generate api testing
* add max tokens to health check (#7000)
* fix: fix health check test
* test: update testing
---------
Co-authored-by: Cameron <561860+wallies@users.noreply.github.com>