* feat(deepgram/transformation.py): support reading in deepgram api base from env var
* fix(litellm_logging.py): make skipping log message a .info
easier to see
* docs(logging.md): add doc on turn off all tracking/logging for a request
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* test: fix azure o1 test
* test: fix tests
* fix: fix test
* refactor(utils.py): migrate amazon titan config to base config
* refactor(utils.py): refactor bedrock meta invoke model translation to use base config
* refactor(utils.py): move bedrock ai21 to base config
* refactor(utils.py): move bedrock cohere to base config
* refactor(utils.py): move bedrock mistral to use base config
* refactor(utils.py): move all provider optional param translations to using a config
* docs(clientside_auth.md): clarify how to pass vertex region to litellm proxy
* fix(utils.py): handle scenario where custom llm provider is none / empty
* fix: fix get config
* test(test_otel_load_tests.py): widen perf margin
* fix(utils.py): fix get provider config check to handle custom llm's
* fix(utils.py): fix check
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* feat(deepgram/): initial e2e support for deepgram stt
Uses deepgram's `/listen` endpoint to transcribe speech to text
Closes https://github.com/BerriAI/litellm/issues/4875
* fix: fix linting errors
* test: fix test
* fix(azure_ai/transformation.py): route ai.services.azure calls to the azure provider route
requires token to be passed in as 'api-key'
Closes https://github.com/BerriAI/litellm/issues/7275
* fix(key_management_endpoints.py): enforce user is member of team, if team_id set and team_id exists in team table
* fix(key_management_endpoints.py): handle assigned_user_id = none
* feat(create_key_button.tsx): allow assigning keys to other users
allows proxy admin to easily assign other people keys
* build(create_key_button.tsx): fix error message display
don't swallow the error message for key creation failure
* build(create_key_button.tsx): allow proxy admin to edit team id
* build(create_key_button.tsx): allow proxy admin to assign keys to other users
* build(edit_user.tsx): clarify how 'user budgets' are applied
* test: remove dup test
* fix(key_management_endpoints.py): don't raise error if team not in db
'
* test: fix test
* init commit ft jobs logging
* add ft logging
* add logging for FineTuningJob
* simple FT Job create test
* simplify Azure fine tuning to use all methods in OAI ft
* update doc string
* add aretrieve_fine_tuning_job
* re use from litellm.proxy.utils import handle_exception_on_proxy
* fix naming
* add /fine_tuning/jobs/{fine_tuning_job_id:path}
* remove unused imports
* update func signature
* run ci/cd again
* ci/cd run again
* fix code qulity
* ci/cd run again
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* fix(invoke_handler.py): fix mock response iterator to handle tool calling
returns tool call if returned by model response
* fix(prometheus.py): add new 'tokens_by_tag' metric on prometheus
allows tracking 'token usage' by task
* feat(prometheus.py): add input + output token tracking by tag
* feat(prometheus.py): add tag based deployment failure tracking
allows admin to track failure by use-case
* use 1 file for azure batches handling
* add cancel_batch endpoint
* add a cancel batch on open ai
* add cancel_batch endpoint
* add cancel batches to test
* remove unused imports
* test_batches_operations
* update test_batches_operations
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint
Allow users to view what the available guardrails are
* docs: document new `/guardrails/list` endpoint
* docs(enterprise.md): update docs
* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats
* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests
* fix: fix linting errors
* fix: remove unused import
* fix(utils.py): e2e azure tts cost tracking working
moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case
Fixes https://github.com/BerriAI/litellm/issues/7223
* fix: fix linting errors
* build(model_prices_and_context_window.json): add bedrock llama 3.3
Closes https://github.com/BerriAI/litellm/issues/7329
* fix(openai.py): fix return type for sync openai httpx response
* test: update test
* fix(spend_tracking_utils.py): fix if check
* fix(spend_tracking_utils.py): fix if check
* test: improve debugging for test
* fix: fix import
* fix(openai.py): fix returning o1 non-streaming requests
fixes issue where fake stream always true for o1
* build(model_prices_and_context_window.json): add 'supports_vision' for o1 models
* fix: add internal server error exception mapping
* fix(base_llm_unit_tests.py): drop temperature from test
* test: mark prompt caching as a flaky test
* fix(health.md): add rerank model health check information
* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits
* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true
* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature
* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini
* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models
needed as o1-preview, and o1-mini models don't support 'system message
* fix(o1_transformation.py): translate system message based on if o1 model supports it
* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview
o1 currently doesn't support streaming, but the other model versions do
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix: fix linting errors
* fix: update '_transform_messages'
* fix(o1_transformation.py): fix provider passed for supported param checks
* test(base_llm_unit_tests.py): skip test if api takes >5s to respond
* fix(utils.py): return false in 'supports_factory' if can't find value
* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1
* feat(openai.py): support stream faking natively in openai handler
Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview
Fixes https://github.com/BerriAI/litellm/issues/7292
* fix(openai.py): use inference param instead of original optional param
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* fix(factory.py): skip empty text blocks for bedrock user messages
Fixes https://github.com/BerriAI/litellm/issues/7169
* Add support for Gemini 2.0 GoogleSearch tool (#7257)
* Add support for google_search tool in gemini 2.0
* Add/modify tests
* Fix grounding check
* Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST
* Swap order of tools
* DFix formatting
* fix(get_api_base.py): return api base in streaming response
Fixes https://github.com/BerriAI/litellm/issues/7249
Closes https://github.com/BerriAI/litellm/pull/7250
* fix(cost_calculator.py): only set base model to model if not none
Fixes https://github.com/BerriAI/litellm/issues/7223
* fix(cost_calculator.py): enforce stricter order when picking model for cost calculation
* fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided
* fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given
* fix(cost_calculator.py): handle `custom_llm_provider-` scenario
* fix(cost_calculator.py): e2e working tts cost tracking
ensures initial message is passed in, to cost calculator
* fix(factory.py): suppress linting errors
* fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model
* fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly
* test: handle none env var value in flaky test
* fix(litellm_logging.py): fix linting errors
---------
Co-authored-by: Sam B <samlingx@gmail.com>
* fix(utils.py): fix openai-like api response format parsing
Fixes issue passing structured output to litellm_proxy/ route
* fix(cost_calculator.py): fix whisper transcription cost calc to use file duration, not response time
'
* test: skip test if credentials not found
* docs(input.md): document 'extra_headers' param support
* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)
Co-authored-by: Ryan Hoium <rhoium>
---------
Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls
* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith
* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.
Closes https://github.com/BerriAI/litellm/issues/7213
* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var
allows user to use local watsonx
Fixes https://github.com/BerriAI/litellm/issues/4991
* fix(litellm_logging.py): cleanup unused function
* test: skip bad ibm test
* feat(bedrock/): add bedrock converse top k param
Closes https://github.com/BerriAI/litellm/issues/7087
* Fix bedrock empty content error (#7177)
* add resolver
* handle empty content on bedrock with default content
* use existing default message, tests
* Update tests/llm_translation/test_bedrock_completion.py
* fix tests
* Revert "add resolver"
This reverts commit c717e376ee.
* fallback to empty
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* fix(factory.py): handle empty content blocks in messages
Fixes https://github.com/BerriAI/litellm/issues/7169
* feat(router.py): add stripped model check to model fallback search
if model_name="openai/gpt-3.5-turbo" and fallback=[{"gpt-3.5-turbo"..}] the fallback should just work as expected
* fix: fix linting error
* fix(factory.py): fix linting error
* fix(factory.py): in base case still support skip empty text blocks
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* fix(azure/): support passing headers to azure openai endpoints
Fixes https://github.com/BerriAI/litellm/issues/6217
* fix(utils.py): move default tokenizer to just openai
hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls
* fix(router.py): fix pattern matching router - add generic "*" to it as well
Fixes issue where generic "*" model access group wouldn't show up
* fix(pattern_match_deployments.py): match to more specific pattern
match to more specific pattern
allows setting generic wildcard model access group and excluding specific models more easily
* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty
don't delete all router models b/c of empty list
Fixes https://github.com/BerriAI/litellm/issues/7196
* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api
* fix(fireworks_ai/): support passing response_format + tool call in same message
Addresses https://github.com/BerriAI/litellm/issues/7135
* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"
This reverts commit 6a30dc6929.
* test: fix test
* fix(replicate/): fix replicate default retry/polling logic
* test: add unit testing for router pattern matching
* test: update test to use default oai tokenizer
* test: mark flaky test
* test: skip flaky test
* fix(acompletion): support fallbacks on acompletion
allows health checks for wildcard routes to use fallback models
* test: update cohere generate api testing
* add max tokens to health check (#7000)
* fix: fix health check test
* test: update testing
---------
Co-authored-by: Cameron <561860+wallies@users.noreply.github.com>