* feat(langfuse/): support langfuse prompt management
Initial working commit for langfuse prompt management support
Closes https://github.com/BerriAI/litellm/issues/6269
* test: update test
* fix(litellm_logging.py): suppress linting error
* fix(cost_calculator.py): move to using `.get_model_info()` for cost per token calculations
ensures cost tracking is reliable - handles edge cases of parsing model cost map
* build(model_prices_and_context_window.json): add 'supports_response_schema' for select tgai models
Fixes https://github.com/BerriAI/litellm/pull/7037#discussion_r1872157329
* build(model_prices_and_context_window.json): remove 'pdf input' and 'vision' support from nova micro in model map
Bedrock docs indicate no support for micro - https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html
* fix(converse_transformation.py): support amazon nova tool use
* fix(opentelemetry): Add missing LLM request type attribute to spans (#7041)
* feat(opentelemetry): add LLM request type attribute to spans
* lint
* fix: curl usage (#7038)
curl -d, --data <data> is lowercase d
curl -D, --dump-header <filename> is uppercase D
references:
https://curl.se/docs/manpage.html#-dhttps://curl.se/docs/manpage.html#-D
* fix(spend_tracking.py): handle empty 'id' in model response - when creating spend log
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(streaming_chunk_builder.py): handle initial id being empty string
Fixes https://github.com/BerriAI/litellm/issues/7023
* fix(anthropic_passthrough_logging_handler.py): add end user cost tracking for anthropic pass through endpoint
* docs(pass_through/): refactor docs location + add table on supported features for pass through endpoints
* feat(anthropic_passthrough_logging_handler.py): support end user cost tracking via anthropic sdk
* docs(anthropic_completion.md): add docs on passing end user param for cost tracking on anthropic sdk
* fix(litellm_logging.py): use standard logging payload if present in kwargs
prevent datadog logging error for pass through endpoints
* docs(bedrock.md): add rerank api usage example to docs
* bugfix/change dummy tool name format (#7053)
* fix viewing keys (#7042)
* ui new build
* build(model_prices_and_context_window.json): add bedrock region models to model cost map (#7044)
* bye (#6982)
* (fix) litellm router.aspeech (#6962)
* doc Migrating Databases
* fix aspeech on router
* test_audio_speech_router
* test_audio_speech_router
* docs show supported providers on batches api doc
* change dummy tool name format
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix: fix linting errors
* test: update test
* fix(litellm_logging.py): fix pass through check
* fix(test_otel_logging.py): fix test
* fix(cost_calculator.py): update handling for cost per second
* fix(cost_calculator.py): fix cost check
* test: fix test
* (fix) adding public routes when using custom header (#7045)
* get_api_key_from_custom_header
* add test_get_api_key_from_custom_header
* fix testing use 1 file for test user api key auth
* fix test user api key auth
* test_custom_api_key_header_name
* build: update ui build
---------
Co-authored-by: Doron Kopit <83537683+doronkopit5@users.noreply.github.com>
Co-authored-by: lloydchang <lloydchang@gmail.com>
Co-authored-by: hgulersen <haymigulersen@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix(together_ai/chat): only return response_format + tools for supported models
Fixes https://github.com/BerriAI/litellm/issues/6972
* feat(bedrock/rerank): initial working commit for bedrock rerank api support
Closes https://github.com/BerriAI/litellm/issues/7021
* feat(bedrock/rerank): async bedrock rerank api support
Addresses https://github.com/BerriAI/litellm/issues/7021
* build(model_prices_and_context_window.json): add 'supports_prompt_caching' for bedrock models + cleanup cross-region from model list (duplicate information - lead to inconsistencies )
* docs(json_mode.md): clarify model support for json schema
Closes https://github.com/BerriAI/litellm/issues/6998
* fix(_service_logger.py): handle dd callback in list
ensure failed spend tracking is logged to datadog
* feat(converse_transformation.py): translate from anthropic format to bedrock format
Closes https://github.com/BerriAI/litellm/issues/7030
* fix: fix linting errors
* test: fix test
* add the logprobs param for fireworks ai (#6915)
* add the logprobs param for fireworks ai
* (feat) pass through llm endpoints - add `PATCH` support (vertex context caching requires for update ops) (#6924)
* add PATCH for pass through endpoints
* test_pass_through_routes_support_all_methods
* sonnet supports pdf, haiku does not (#6928)
* (feat) DataDog Logger - Add Failure logging + use Standard Logging payload (#6929)
* add async_log_failure_event for dd
* use standard logging payload for DD logging
* use standard logging payload for DD
* fix use SLP status
* allow opting into _create_v0_logging_payload
* add unit tests for DD logging payload
* fix dd logging tests
* (feat) log proxy auth errors on datadog (#6931)
* add new dd type for auth errors
* add async_log_proxy_authentication_errors
* fix comment
* use async_log_proxy_authentication_errors
* test_datadog_post_call_failure_hook
* test_async_log_proxy_authentication_errors
* (feat) Allow using include to include external YAML files in a config.yaml (#6922)
* add helper to process inlcudes directive on yaml
* add doc on config management
* unit tests for `include` on config.yaml
* bump: version 1.52.16 → 1.53.
* (feat) dd logger - set tags according to the values set by those env vars (#6933)
* dd logger, inherit from .envs
* test_datadog_payload_environment_variables
* fix _get_datadog_service
* build(ui/): update ui build
* bump: version 1.53.0 → 1.53.1
* Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)"
This reverts commit 68e59824a3.
* LiteLLM Minor Fixes & Improvements (11/26/2024) (#6913)
* docs(config_settings.md): document all router_settings
* ci(config.yml): add router_settings doc test to ci/cd
* test: debug test on ci/cd
* test: debug ci/cd test
* test: fix test
* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call
Causes downstream errors if ui just fails to load team list
* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests
adds complete coverage for all 'response_format' values to ci/cd
* feat(router.py): support wildcard routes in `get_router_model_info()`
Addresses https://github.com/BerriAI/litellm/issues/6914
* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models
Allows for ratelimit tracking for gemini models even with wildcard routing enabled
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): add tpm/rpm tracking on success/failure to global_router
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): support wildcard routes on router.get_model_group_usage()
* fix(router.py): fix linting error
* fix(router.py): implement get_remaining_tokens_and_requests
Addresses https://github.com/BerriAI/litellm/issues/6914
* fix(router.py): fix linting errors
* test: fix test
* test: fix tests
* docs(config_settings.md): add missing dd env vars to docs
* fix(router.py): check if hidden params is dict
* LiteLLM Minor Fixes & Improvements (11/27/2024) (#6943)
* fix(http_parsing_utils.py): remove `ast.literal_eval()` from http utils
Security fix - https://huntr.com/bounties/96a32812-213c-4819-ba4e-36143d35e95b?token=bf414bbd77f8b346556e
64ab2dd9301ea44339910877ea50401c76f977e36cdd78272f5fb4ca852a88a7e832828aae1192df98680544ee24aa98f3cf6980d8
bab641a66b7ccbc02c0e7d4ddba2db4dbe7318889dc0098d8db2d639f345f574159814627bb084563bad472e2f990f825bff0878a9
e281e72c88b4bc5884d637d186c0d67c9987c57c3f0caf395aff07b89ad2b7220d1dd7d1b427fd2260b5f01090efce5250f8b56ea2
c0ec19916c24b23825d85ce119911275944c840a1340d69e23ca6a462da610
* fix(converse/transformation.py): support bedrock apac cross region inference
Fixes https://github.com/BerriAI/litellm/issues/6905
* fix(user_api_key_auth.py): add auth check for websocket endpoint
Fixes https://github.com/BerriAI/litellm/issues/6926
* fix(user_api_key_auth.py): use `model` from query param
* fix: fix linting error
* test: run flaky tests first
* docs: update the docs (#6923)
* (bug fix) /key/update was not storing `budget_duration` in the DB (#6941)
* fix - store budget_duration for keys
* test_generate_and_update_key
* test_update_user_unit_test
* fix user update
* (fix) handle json decode errors for DD exception logging (#6934)
* fix JSONDecodeError
* handle async_log_proxy_authentication_errors
* fix test_async_log_proxy_authentication_errors_get_request
* Revert "Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)""
This reverts commit 5d13302e6b.
* (docs + fix) Add docs on Moderations endpoint, Text Completion (#6947)
* fix _pass_through_moderation_endpoint_factory
* fix route_llm_request
* doc moderations api
* docs on /moderations
* add e2e tests for moderations api
* docs moderations api
* test_pass_through_moderation_endpoint_factory
* docs text completion
* (feat) add enforcement for unique key aliases on /key/update and /key/generate (#6944)
* add enforcement for unique key aliases
* fix _enforce_unique_key_alias
* fix _enforce_unique_key_alias
* fix _enforce_unique_key_alias
* test_enforce_unique_key_alias
* (fix) tag merging / aggregation logic (#6932)
* use 1 helper to merge tags + ensure unique ness
* test_add_litellm_data_to_request_duplicate_tags
* fix _merge_tags
* fix proxy utils test
* fix doc string
* (feat) Allow disabling ErrorLogs written to the DB (#6940)
* fix - allow disabling logging error logs
* docs on disabling error logs
* doc string for _PROXY_failure_handler
* test_disable_error_logs
* rename file
* fix rename file
* increase test coverage for test_enable_error_logs
* fix(key_management_endpoints.py): support 'tags' param on `/key/update` (#6945)
* LiteLLM Minor Fixes & Improvements (11/29/2024) (#6965)
* fix(factory.py): ensure tool call converts image url
Fixes https://github.com/BerriAI/litellm/issues/6953
* fix(transformation.py): support mp4 + pdf url's for vertex ai
Fixes https://github.com/BerriAI/litellm/issues/6936
* fix(http_handler.py): mask gemini api key in error logs
Fixes https://github.com/BerriAI/litellm/issues/6963
* docs(prometheus.md): update prometheus FAQs
* feat(auth_checks.py): ensure specific model access > wildcard model access
if wildcard model is in access group, but specific model is not - deny access
* fix(auth_checks.py): handle auth checks for team based model access groups
handles scenario where model access group used for wildcard models
* fix(internal_user_endpoints.py): support adding guardrails on `/user/update`
Fixes https://github.com/BerriAI/litellm/issues/6942
* fix(key_management_endpoints.py): fix prepare_metadata_fields helper
* fix: fix tests
* build(requirements.txt): bump openai dep version
fixes proxies argument
* test: fix tests
* fix(http_handler.py): fix error message masking
* fix(bedrock_guardrails.py): pass in prepped data
* test: fix test
* test: fix nvidia nim test
* fix(http_handler.py): return original response headers
* fix: revert maskedhttpstatuserror
* test: update tests
* test: cleanup test
* fix(key_management_endpoints.py): fix metadata field update logic
* fix(key_management_endpoints.py): maintain initial order of guardrails in key update
* fix(key_management_endpoints.py): handle prepare metadata
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix key management errors
* fix(key_management_endpoints.py): update metadata
* test: update test
* refactor: add more debug statements
* test: skip flaky test
* test: fix test
* fix: fix test
* fix: fix update metadata logic
* fix: fix test
* ci(config.yml): change db url for e2e ui testing
* bump: version 1.53.1 → 1.53.2
* Updated config.yml
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>
* fix(exceptions.py): ensure ratelimit error code == 429, type == "throttling_error"
Fixes https://github.com/BerriAI/litellm/pull/6973
* fix(utils.py): add jina ai dimensions embedding param support
Fixes https://github.com/BerriAI/litellm/issues/6591
* fix(exception_mapping_utils.py): add bedrock 'prompt is too long' exception to context window exceeded error exception mapping
Fixes https://github.com/BerriAI/litellm/issues/6629
Closes https://github.com/BerriAI/litellm/pull/6975
* fix(litellm_logging.py): strip trailing slash for api base
Closes https://github.com/BerriAI/litellm/pull/6859
* test: skip timeout issue
---------
Co-authored-by: ershang-dou <erlie.shang@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>
* fix get_standard_logging_object_payload
* fix async_post_call_failure_hook
* fix post_call_failure_hook
* fix change
* fix _is_proxy_only_error
* fix async_post_call_failure_hook
* fix getting request body
* remove redundant code
* use a well named original function name for auth errors
* fix logging auth fails on DD
* fix using request body
* use helper for _handle_logging_proxy_only_error
* add the logprobs param for fireworks ai (#6915)
* add the logprobs param for fireworks ai
* (feat) pass through llm endpoints - add `PATCH` support (vertex context caching requires for update ops) (#6924)
* add PATCH for pass through endpoints
* test_pass_through_routes_support_all_methods
* sonnet supports pdf, haiku does not (#6928)
* (feat) DataDog Logger - Add Failure logging + use Standard Logging payload (#6929)
* add async_log_failure_event for dd
* use standard logging payload for DD logging
* use standard logging payload for DD
* fix use SLP status
* allow opting into _create_v0_logging_payload
* add unit tests for DD logging payload
* fix dd logging tests
* (feat) log proxy auth errors on datadog (#6931)
* add new dd type for auth errors
* add async_log_proxy_authentication_errors
* fix comment
* use async_log_proxy_authentication_errors
* test_datadog_post_call_failure_hook
* test_async_log_proxy_authentication_errors
* (feat) Allow using include to include external YAML files in a config.yaml (#6922)
* add helper to process inlcudes directive on yaml
* add doc on config management
* unit tests for `include` on config.yaml
* bump: version 1.52.16 → 1.53.
* (feat) dd logger - set tags according to the values set by those env vars (#6933)
* dd logger, inherit from .envs
* test_datadog_payload_environment_variables
* fix _get_datadog_service
* build(ui/): update ui build
* bump: version 1.53.0 → 1.53.1
* Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)"
This reverts commit 68e59824a3.
* LiteLLM Minor Fixes & Improvements (11/26/2024) (#6913)
* docs(config_settings.md): document all router_settings
* ci(config.yml): add router_settings doc test to ci/cd
* test: debug test on ci/cd
* test: debug ci/cd test
* test: fix test
* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call
Causes downstream errors if ui just fails to load team list
* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests
adds complete coverage for all 'response_format' values to ci/cd
* feat(router.py): support wildcard routes in `get_router_model_info()`
Addresses https://github.com/BerriAI/litellm/issues/6914
* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models
Allows for ratelimit tracking for gemini models even with wildcard routing enabled
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): add tpm/rpm tracking on success/failure to global_router
Addresses https://github.com/BerriAI/litellm/issues/6914
* feat(router.py): support wildcard routes on router.get_model_group_usage()
* fix(router.py): fix linting error
* fix(router.py): implement get_remaining_tokens_and_requests
Addresses https://github.com/BerriAI/litellm/issues/6914
* fix(router.py): fix linting errors
* test: fix test
* test: fix tests
* docs(config_settings.md): add missing dd env vars to docs
* fix(router.py): check if hidden params is dict
* LiteLLM Minor Fixes & Improvements (11/27/2024) (#6943)
* fix(http_parsing_utils.py): remove `ast.literal_eval()` from http utils
Security fix - https://huntr.com/bounties/96a32812-213c-4819-ba4e-36143d35e95b?token=bf414bbd77f8b346556e
64ab2dd9301ea44339910877ea50401c76f977e36cdd78272f5fb4ca852a88a7e832828aae1192df98680544ee24aa98f3cf6980d8
bab641a66b7ccbc02c0e7d4ddba2db4dbe7318889dc0098d8db2d639f345f574159814627bb084563bad472e2f990f825bff0878a9
e281e72c88b4bc5884d637d186c0d67c9987c57c3f0caf395aff07b89ad2b7220d1dd7d1b427fd2260b5f01090efce5250f8b56ea2
c0ec19916c24b23825d85ce119911275944c840a1340d69e23ca6a462da610
* fix(converse/transformation.py): support bedrock apac cross region inference
Fixes https://github.com/BerriAI/litellm/issues/6905
* fix(user_api_key_auth.py): add auth check for websocket endpoint
Fixes https://github.com/BerriAI/litellm/issues/6926
* fix(user_api_key_auth.py): use `model` from query param
* fix: fix linting error
* test: run flaky tests first
* docs: update the docs (#6923)
* (bug fix) /key/update was not storing `budget_duration` in the DB (#6941)
* fix - store budget_duration for keys
* test_generate_and_update_key
* test_update_user_unit_test
* fix user update
* (fix) handle json decode errors for DD exception logging (#6934)
* fix JSONDecodeError
* handle async_log_proxy_authentication_errors
* fix test_async_log_proxy_authentication_errors_get_request
* Revert "Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)""
This reverts commit 5d13302e6b.
* (docs + fix) Add docs on Moderations endpoint, Text Completion (#6947)
* fix _pass_through_moderation_endpoint_factory
* fix route_llm_request
* doc moderations api
* docs on /moderations
* add e2e tests for moderations api
* docs moderations api
* test_pass_through_moderation_endpoint_factory
* docs text completion
* (feat) add enforcement for unique key aliases on /key/update and /key/generate (#6944)
* add enforcement for unique key aliases
* fix _enforce_unique_key_alias
* fix _enforce_unique_key_alias
* fix _enforce_unique_key_alias
* test_enforce_unique_key_alias
* (fix) tag merging / aggregation logic (#6932)
* use 1 helper to merge tags + ensure unique ness
* test_add_litellm_data_to_request_duplicate_tags
* fix _merge_tags
* fix proxy utils test
* fix doc string
* (feat) Allow disabling ErrorLogs written to the DB (#6940)
* fix - allow disabling logging error logs
* docs on disabling error logs
* doc string for _PROXY_failure_handler
* test_disable_error_logs
* rename file
* fix rename file
* increase test coverage for test_enable_error_logs
* fix(key_management_endpoints.py): support 'tags' param on `/key/update` (#6945)
* LiteLLM Minor Fixes & Improvements (11/29/2024) (#6965)
* fix(factory.py): ensure tool call converts image url
Fixes https://github.com/BerriAI/litellm/issues/6953
* fix(transformation.py): support mp4 + pdf url's for vertex ai
Fixes https://github.com/BerriAI/litellm/issues/6936
* fix(http_handler.py): mask gemini api key in error logs
Fixes https://github.com/BerriAI/litellm/issues/6963
* docs(prometheus.md): update prometheus FAQs
* feat(auth_checks.py): ensure specific model access > wildcard model access
if wildcard model is in access group, but specific model is not - deny access
* fix(auth_checks.py): handle auth checks for team based model access groups
handles scenario where model access group used for wildcard models
* fix(internal_user_endpoints.py): support adding guardrails on `/user/update`
Fixes https://github.com/BerriAI/litellm/issues/6942
* fix(key_management_endpoints.py): fix prepare_metadata_fields helper
* fix: fix tests
* build(requirements.txt): bump openai dep version
fixes proxies argument
* test: fix tests
* fix(http_handler.py): fix error message masking
* fix(bedrock_guardrails.py): pass in prepped data
* test: fix test
* test: fix nvidia nim test
* fix(http_handler.py): return original response headers
* fix: revert maskedhttpstatuserror
* test: update tests
* test: cleanup test
* fix(key_management_endpoints.py): fix metadata field update logic
* fix(key_management_endpoints.py): maintain initial order of guardrails in key update
* fix(key_management_endpoints.py): handle prepare metadata
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix key management errors
* fix(key_management_endpoints.py): update metadata
* test: update test
* refactor: add more debug statements
* test: skip flaky test
* test: fix test
* fix: fix test
* fix: fix update metadata logic
* fix: fix test
* ci(config.yml): change db url for e2e ui testing
* bump: version 1.53.1 → 1.53.2
* Updated config.yml
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>
* fix(exceptions.py): ensure ratelimit error code == 429, type == "throttling_error"
Fixes https://github.com/BerriAI/litellm/pull/6973
* fix(utils.py): add jina ai dimensions embedding param support
Fixes https://github.com/BerriAI/litellm/issues/6591
* fix(exception_mapping_utils.py): add bedrock 'prompt is too long' exception to context window exceeded error exception mapping
Fixes https://github.com/BerriAI/litellm/issues/6629
Closes https://github.com/BerriAI/litellm/pull/6975
* fix(litellm_logging.py): strip trailing slash for api base
Closes https://github.com/BerriAI/litellm/pull/6859
* test: skip timeout issue
---------
Co-authored-by: ershang-dou <erlie.shang@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>
* add async_log_failure_event for dd
* use standard logging payload for DD logging
* use standard logging payload for DD
* fix use SLP status
* allow opting into _create_v0_logging_payload
* add unit tests for DD logging payload
* fix dd logging tests
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.
* fix(utils.py): allow disabling end user cost tracking with new param
Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small
* docs(configs.md): add disable_end_user_cost_tracking reference to docs
* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role
Enables admin to restrict key creation, and assign team admins to handle distributing keys
* test(test_key_management.py): add unit testing for personal / team key restriction checks
* docs: add docs on restricting key creation
* docs(finetuned_models.md): add new guide on calling finetuned models
* docs(input.md): cleanup anthropic supported params
Closes https://github.com/BerriAI/litellm/issues/6856
* test(test_embedding.py): add test for passing extra headers via embedding
* feat(cohere/embed): pass client to async embedding
* feat(rerank.py): add `/v1/rerank` if missing for cohere base url
Closes https://github.com/BerriAI/litellm/issues/6844
* fix(main.py): pass extra_headers param to openai
Fixes https://github.com/BerriAI/litellm/issues/6836
* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set
Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically
* fix(handler.py): fix linting error
* fix: fix typing
* build: add conftest to proxy_admin_ui_tests/
* test: fix test
* fix: fix linting errors
* test: fix test
* fix: fix pass through testing
* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742)
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743)
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
* fix(caching): convert arg to equivalent kwargs in llm caching handler
prevent unexpected errors
* fix(caching_handler.py): don't pass args to caching
* fix(caching): remove all *args from caching.py
* fix(caching): consistent function signatures + abc method
* test(caching_unit_tests.py): add unit tests for llm caching
ensures coverage for common caching scenarios across different implementations
* refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one
* fix(router.py): drop redis password requirement
* fix(proxy_server.py): fix faulty slack alerting check
* fix(langfuse.py): avoid copying functions/thread lock objects in metadata
fixes metadata copy error when parent otel span in metadata
* test: update test
* add langsmith_api_key to StandardCallbackDynamicParams
* create a file for langsmith types
* langsmith add key / team based logging
* add key based logging for langsmith
* fix langsmith key based logging
* fix linting langsmith
* remove NOQA violation
* add unit test coverage for all helpers in test langsmith
* test_langsmith_key_based_logging
* docs langsmith key based logging
* run langsmith tests in logging callback tests
* fix logging testing
* test_langsmith_key_based_logging
* test_add_callback_via_key_litellm_pre_call_utils_langsmith
* add debug statement langsmith key based logging
* test_langsmith_key_based_logging
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error
* log error on prometheus service failure hook
* use a more accurate function name for wrapper that handles logging db metrics
* fix log_db_metrics
* test_log_db_metrics_failure_error_types
* fix linting
* fix auth checks
* add unit testing for standard logging payload
* unit testing for static methods in litellm_logging
* add code coverage check for litellm_logging
* litellm_logging_code_coverage
* test_get_final_response_obj
* fix validate_redacted_message_span_attributes
* test validate_redacted_message_span_attributes
* feat: initial commit for watsonx chat endpoint support
Closes https://github.com/BerriAI/litellm/issues/6562
* feat(watsonx/chat/handler.py): support tool calling for watsonx
Closes https://github.com/BerriAI/litellm/issues/6562
* fix(streaming_utils.py): return empty chunk instead of failing if streaming value is invalid dict
ensures streaming works for ibm watsonx
* fix(openai_like/chat/handler.py): ensure asynchttphandler is passed correctly for openai like calls
* fix: ensure exception mapping works well for watsonx calls
* fix(openai_like/chat/handler.py): handle async streaming correctly
* feat(main.py): Make it clear when a user is passing an invalid message
add validation for user content message
Closes https://github.com/BerriAI/litellm/issues/6565
* fix: cleanup
* fix(utils.py): loosen validation check, to just make sure content types are valid
make litellm robust to future content updates
* fix: fix linting erro
* fix: fix linting errors
* fix(utils.py): make validation check more flexible
* test: handle langfuse list index out of range error
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* bump: version 1.51.4 → 1.51.5
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* test: mark flaky test
* test: handle anthropic api instability
* test: update test
* test: bump num retries on langfuse tests - their api is quite bad
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* fix(core_helpers.py): return None, instead of raising kwargs is None error
Closes https://github.com/BerriAI/litellm/issues/6500
* docs(cost_tracking.md): cleanup doc
* fix(vertex_and_google_ai_studio.py): handle function call with no params passed in
Closes https://github.com/BerriAI/litellm/issues/6495
* test(test_router_timeout.py): add test for router timeout + retry logic
* test: update test to use module level values
* (fix) Prometheus - Log Postgres DB latency, status on prometheus (#6484)
* fix logging DB fails on prometheus
* unit testing log to otel wrapper
* unit testing for service logger + prometheus
* use LATENCY buckets for service logging
* fix service logging
* docs clarify vertex vs gemini
* (router_strategy/) ensure all async functions use async cache methods (#6489)
* fix router strat
* use async set / get cache in router_strategy
* add coverage for router strategy
* fix imports
* fix batch_get_cache
* use async methods for least busy
* fix least busy use async methods
* fix test_dual_cache_increment
* test async_get_available_deployment when routing_strategy="least-busy"
* (fix) proxy - fix when `STORE_MODEL_IN_DB` should be set (#6492)
* set store_model_in_db at the top
* correctly use store_model_in_db global
* (fix) `PrometheusServicesLogger` `_get_metric` should return metric in Registry (#6486)
* fix logging DB fails on prometheus
* unit testing log to otel wrapper
* unit testing for service logger + prometheus
* use LATENCY buckets for service logging
* fix service logging
* fix _get_metric in prom services logger
* add clear doc string
* unit testing for prom service logger
* bump: version 1.51.0 → 1.51.1
* Add `azure/gpt-4o-mini-2024-07-18` to model_prices_and_context_window.json (#6477)
* Update utils.py (#6468)
Fixed missing keys
* (perf) Litellm redis router fix - ~100ms improvement (#6483)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* perf(cooldown_cache.py): improve cooldown cache, to store cache results in memory for 5s, prevents redis call from being made on each request
reduces 100ms latency per call with caching enabled on router
* fix: fix test
* fix(cooldown_cache.py): handle if a result is None
* fix(cooldown_cache.py): add debug statements
* refactor(dual_cache.py): move to using an in-memory check for batch get cache, to prevent redis from being hit for every call
* fix(cooldown_cache.py): fix linting erropr
* refactor(prometheus.py): move to using standard logging payload for reading the remaining request / tokens
Ensures prometheus token tracking works for anthropic as well
* fix: fix linting error
* fix(redis_cache.py): make sure ttl is always int (handle float values)
Fixes issue where redis_client.ex was not working correctly due to float ttl
* fix: fix linting error
* test: update test
* fix: fix linting error
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: vibhanshu-ob <115142120+vibhanshu-ob@users.noreply.github.com>
* fix logging DB fails on prometheus
* unit testing log to otel wrapper
* unit testing for service logger + prometheus
* use LATENCY buckets for service logging
* fix service logging
* add type for dd llm obs request ob
* working dd llm obs
* datadog use well defined type
* clean up
* unit test test_create_llm_obs_payload
* fix linting
* add datadog_llm_observability
* add datadog_llm_observability
* docs DD LLM obs
* run testing again
* document DD_ENV
* test_create_llm_obs_payload
* testing for failure events prometheus
* set set_llm_deployment_failure_metrics
* test_async_post_call_failure_hook
* unit testing for all prometheus functions
* fix linting
* feat(litellm_logging.py): refactor standard_logging_payload function to be <50 LOC
fixes issue where usage information was not following typed values
* fix(litellm_logging.py): fix completion start time handling
* unit testig for prometheus
* unit testing for success metrics
* use 1 helper for _increment_token_metrics
* use helper for _increment_remaining_budget_metrics
* use _increment_remaining_budget_metrics
* use _increment_top_level_request_and_spend_metrics
* use helper for _set_latency_metrics
* remove noqa violation
* fix test prometheus
* test prometheus
* unit testing for all prometheus helper functions
* fix prom unit tests
* fix unit tests prometheus
* fix unit test prom
* (refactor) use _assemble_complete_response_from_streaming_chunks
* add unit test for test_assemble_complete_response_from_streaming_chunks_1
* fix assemble complete_streaming_response
* config add logging_testing
* add logging_coverage in codecov
* test test_assemble_complete_response_from_streaming_chunks_3
* add unit tests for _assemble_complete_response_from_streaming_chunks
* fix remove unused / junk function
* add test for streaming_chunks when error assembling