* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error
Fixes https://github.com/BerriAI/litellm/issues/7291
* test: fix test
* fix(main.py): add hosted_vllm/ support for embeddings endpoint
Closes https://github.com/BerriAI/litellm/issues/7290
* docs(vllm.md): add docs on vllm embeddings usage
* fix(__init__.py): fix sambanova model test
* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
* fix(azure/): support passing headers to azure openai endpoints
Fixes https://github.com/BerriAI/litellm/issues/6217
* fix(utils.py): move default tokenizer to just openai
hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls
* fix(router.py): fix pattern matching router - add generic "*" to it as well
Fixes issue where generic "*" model access group wouldn't show up
* fix(pattern_match_deployments.py): match to more specific pattern
match to more specific pattern
allows setting generic wildcard model access group and excluding specific models more easily
* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty
don't delete all router models b/c of empty list
Fixes https://github.com/BerriAI/litellm/issues/7196
* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api
* fix(fireworks_ai/): support passing response_format + tool call in same message
Addresses https://github.com/BerriAI/litellm/issues/7135
* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"
This reverts commit 6a30dc6929.
* test: fix test
* fix(replicate/): fix replicate default retry/polling logic
* test: add unit testing for router pattern matching
* test: update test to use default oai tokenizer
* test: mark flaky test
* test: skip flaky test
* fix(acompletion): support fallbacks on acompletion
allows health checks for wildcard routes to use fallback models
* test: update cohere generate api testing
* add max tokens to health check (#7000)
* fix: fix health check test
* test: update testing
---------
Co-authored-by: Cameron <561860+wallies@users.noreply.github.com>
* refactor(fireworks_ai/): inherit from openai like base config
refactors fireworks ai to use a common config
* test: fix import in test
* refactor(watsonx/): refactor watsonx to use llm base config
refactors chat + completion routes to base config path
* fix: fix linting error
* refactor: inherit base llm config for oai compatible routes
* test: fix test
* test: fix test
* refactor(fireworks_ai/): inherit from openai like base config
refactors fireworks ai to use a common config
* test: fix import in test
* refactor(watsonx/): refactor watsonx to use llm base config
refactors chat + completion routes to base config path
* fix: fix linting error
* test: fix test
* fix: fix test
* feat(base_llm): initial commit for common base config class
Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132
* feat(base_llm/): add transform request/response abstract methods to base config class
* feat(cohere-+-clarifai): refactor integrations to use common base config class
* fix: fix linting errors
* refactor(anthropic/): move anthropic + vertex anthropic to use base config
* test: fix xai test
* test: fix tests
* fix: fix linting errors
* test: comment out WIP test
* fix(transformation.py): fix is pdf used check
* fix: fix linting error
* fix(main.py): support passing max retries to azure/openai embedding integrations
Fixes https://github.com/BerriAI/litellm/issues/7003
* feat(team_endpoints.py): allow updating team model aliases
Closes https://github.com/BerriAI/litellm/issues/6956
* feat(router.py): allow specifying model id as fallback - skips any cooldown check
Allows a default model to be checked if all models in cooldown
s/o @micahjsmith
* docs(reliability.md): add fallback to specific model to docs
* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util
Allows user to identify if messages/tools have prompt caching
Related issue: https://github.com/BerriAI/litellm/issues/6784
* feat(router.py): store model id for prompt caching valid prompt
Allows routing to that model id on subsequent requests
* fix(router.py): only cache if prompt is valid prompt caching prompt
prevents storing unnecessary items in cache
* feat(router.py): support routing prompt caching enabled models to previous deployments
Closes https://github.com/BerriAI/litellm/issues/6784
* test: fix linting errors
* feat(databricks/): convert basemodel to dict and exclude none values
allow passing pydantic message to databricks
* fix(utils.py): ensure all chat completion messages are dict
* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)
* add custom_llm_provider to SpendLogsPayload
* add custom_llm_provider to SpendLogs
* add custom llm provider to SpendLogs payload
* test_spend_logs_payload
* Add MLflow to the side bar (#7031)
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* (bug fix) SpendLogs update DB catch all possible DB errors for retrying (#7082)
* catch DB_CONNECTION_ERROR_TYPES
* fix DB retry mechanism for SpendLog updates
* use DB_CONNECTION_ERROR_TYPES in auth checks
* fix exp back off for writing SpendLogs
* use _raise_failed_update_spend_exception to ensure errors print as NON blocking
* test_update_spend_logs_multiple_batches_with_failure
* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)
* fix model cost map fireworks ai "supports_response_schema": true,
* fix supports_response_schema
* fix map openai params fireworks ai
* test_map_response_format
* test_map_response_format
* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)
* bump: version 1.53.9 → 1.54.0
* fix deepinfra
* litellm db fixes LiteLLM_UserTable (#7089)
* ci/cd queue new release
* fix llama-3.3-70b-versatile
* refactor - use consistent file naming convention `AI21/` -> `ai21` (#7090)
* fix refactor - use consistent file naming convention
* ci/cd run again
* fix naming structure
* fix use consistent naming (#7092)
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
* feat(langfuse/): support langfuse prompt management
Initial working commit for langfuse prompt management support
Closes https://github.com/BerriAI/litellm/issues/6269
* test: update test
* fix(litellm_logging.py): suppress linting error
* fix(cost_calculator.py): move to using `.get_model_info()` for cost per token calculations
ensures cost tracking is reliable - handles edge cases of parsing model cost map
* build(model_prices_and_context_window.json): add 'supports_response_schema' for select tgai models
Fixes https://github.com/BerriAI/litellm/pull/7037#discussion_r1872157329
* build(model_prices_and_context_window.json): remove 'pdf input' and 'vision' support from nova micro in model map
Bedrock docs indicate no support for micro - https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html
* fix(converse_transformation.py): support amazon nova tool use
* fix(opentelemetry): Add missing LLM request type attribute to spans (#7041)
* feat(opentelemetry): add LLM request type attribute to spans
* lint
* fix: curl usage (#7038)
curl -d, --data <data> is lowercase d
curl -D, --dump-header <filename> is uppercase D
references:
https://curl.se/docs/manpage.html#-dhttps://curl.se/docs/manpage.html#-D
* fix(spend_tracking.py): handle empty 'id' in model response - when creating spend log
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(streaming_chunk_builder.py): handle initial id being empty string
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(anthropic_passthrough_logging_handler.py): add end user cost tracking for anthropic pass through endpoint
* docs(pass_through/): refactor docs location + add table on supported features for pass through endpoints
* feat(anthropic_passthrough_logging_handler.py): support end user cost tracking via anthropic sdk
* docs(anthropic_completion.md): add docs on passing end user param for cost tracking on anthropic sdk
* fix(litellm_logging.py): use standard logging payload if present in kwargs
prevent datadog logging error for pass through endpoints
* docs(bedrock.md): add rerank api usage example to docs
* bugfix/change dummy tool name format (#7053)
* fix viewing keys (#7042)
* ui new build
* build(model_prices_and_context_window.json): add bedrock region models to model cost map (#7044)
* bye (#6982)
* (fix) litellm router.aspeech (#6962)
* doc Migrating Databases
* fix aspeech on router
* test_audio_speech_router
* test_audio_speech_router
* docs show supported providers on batches api doc
* change dummy tool name format
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix: fix linting errors
* test: update test
* fix(litellm_logging.py): fix pass through check
* fix(test_otel_logging.py): fix test
* fix(cost_calculator.py): update handling for cost per second
* fix(cost_calculator.py): fix cost check
* test: fix test
* (fix) adding public routes when using custom header (#7045)
* get_api_key_from_custom_header
* add test_get_api_key_from_custom_header
* fix testing use 1 file for test user api key auth
* fix test user api key auth
* test_custom_api_key_header_name
* build: update ui build
---------
Co-authored-by: Doron Kopit <83537683+doronkopit5@users.noreply.github.com>
Co-authored-by: lloydchang <lloydchang@gmail.com>
Co-authored-by: hgulersen <haymigulersen@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix(key_management_endpoints.py): override metadata field value on update
allow user to override tags
* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric
allow disabling end user cost tracking on prometheus - fixes cardinality issue
* fix(litellm_pre_call_utils.py): add key/team level enforced params
Fixes https://github.com/BerriAI/litellm/issues/6652
* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update
* docs(enterprise.md): add docs on enforcing required params for llm requests
* Add support of Galadriel API (#7005)
* fix(router.py): robust retry after handling
set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment
* test(test_router.py): fix test
* feat(bedrock/): add support for 'nova' models
also adds explicit 'converse/' route for simpler routing
* fix: fix 'supports_pdf_input'
return if model supports pdf input on get_model_info
* feat(converse_transformation.py): support bedrock pdf input
* docs(document_understanding.md): add document understanding to docs
* fix(litellm_pre_call_utils.py): fix linting error
* fix(init.py): fix passing of bedrock converse models
* feat(bedrock/converse): support 'response_format={"type": "json_object"}'
* fix(converse_handler.py): fix linting error
* fix(base_llm_unit_tests.py): fix test
* fix: fix test
* test: fix test
* test: fix test
* test: remove duplicate test
---------
Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
* add async_log_failure_event for dd
* use standard logging payload for DD logging
* use standard logging payload for DD
* fix use SLP status
* allow opting into _create_v0_logging_payload
* add unit tests for DD logging payload
* fix dd logging tests
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.
* fix(utils.py): allow disabling end user cost tracking with new param
Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small
* docs(configs.md): add disable_end_user_cost_tracking reference to docs
* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role
Enables admin to restrict key creation, and assign team admins to handle distributing keys
* test(test_key_management.py): add unit testing for personal / team key restriction checks
* docs: add docs on restricting key creation
* docs(finetuned_models.md): add new guide on calling finetuned models
* docs(input.md): cleanup anthropic supported params
Closes https://github.com/BerriAI/litellm/issues/6856
* test(test_embedding.py): add test for passing extra headers via embedding
* feat(cohere/embed): pass client to async embedding
* feat(rerank.py): add `/v1/rerank` if missing for cohere base url
Closes https://github.com/BerriAI/litellm/issues/6844
* fix(main.py): pass extra_headers param to openai
Fixes https://github.com/BerriAI/litellm/issues/6836
* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set
Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically
* fix(handler.py): fix linting error
* fix: fix typing
* build: add conftest to proxy_admin_ui_tests/
* test: fix test
* fix: fix linting errors
* test: fix test
* fix: fix pass through testing
* add SecretManager to httpxSpecialProvider
* fix importing AWSSecretsManagerV2
* add unit testing for writing keys to AWS secret manager
* use KeyManagementEventHooks for key/generated events
* us event hooks for key management endpoints
* working AWSSecretsManagerV2
* fix write secret to AWS secret manager on /key/generate
* fix KeyManagementSettings
* use tasks for key management hooks
* add async_delete_secret
* add test for async_delete_secret
* use _delete_virtual_keys_from_secret_manager
* fix test secret manager
* test_key_generate_with_secret_manager_call
* fix check for key_management_settings
* sync_read_secret
* test_aws_secret_manager
* fix sync_read_secret
* use helper to check when _should_read_secret_from_secret_manager
* test_get_secret_with_access_mode
* test - handle eol model claude-2, use claude-2.1 instead
* docs AWS secret manager
* fix test_read_nonexistent_secret
* fix test_supports_response_schema
* ci/cd run again
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error