* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742)
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743)
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error
* fix(deepseek/chat): convert content list to str
Fixes https://github.com/BerriAI/litellm/issues/6642
* test(test_deepseek_completion.py): implement base llm unit tests
increase robustness across providers
* fix(router.py): support content policy violation fallbacks with default fallbacks
* fix(opentelemetry.py): refactor to move otel imports behing flag
Fixes https://github.com/BerriAI/litellm/issues/6636
* fix(opentelemtry.py): close span on success completion
* fix(user_api_key_auth.py): allow user_role to default to none
* fix: mark flaky test
* fix(opentelemetry.py): move otelconfig.from_env to inside the init
prevent otel errors raised just by importing the litellm class
* fix(user_api_key_auth.py): fix auth error
* fix(pattern_match_deployments.py): default to user input if unable to map based on wildcards
* test: fix test
* test: reset test name
* test: update conftest to reload proxy server module between tests
* ci(config.yml): move langfuse out of local_testing
reduce ci/cd time
* ci(config.yml): cleanup langfuse ci/cd tests
* fix: update test to not use global proxy_server app module
* ci: move caching to a separate test pipeline
speed up ci pipeline
* test: update conftest to check if proxy_server attr exists before reloading
* build(conftest.py): don't block on inability to reload proxy_server
* ci(config.yml): update caching unit test filter to work on 'cache' keyword as well
* fix(encrypt_decrypt_utils.py): use function to get salt key
* test: mark flaky test
* test: handle anthropic overloaded errors
* refactor: create separate ci/cd pipeline for proxy unit tests
make ci/cd faster
* ci(config.yml): add litellm_proxy_unit_testing to build_and_test jobs
* ci(config.yml): generate prisma binaries for proxy unit tests
* test: readd vertex_key.json
* ci(config.yml): remove `-s` from proxy_unit_test cmd
speed up test
* ci: remove any 'debug' logging flag
speed up ci pipeline
* test: fix test
* test(test_braintrust.py): rerun
* test: add delay for braintrust test
* feat: initial commit for watsonx chat endpoint support
Closes https://github.com/BerriAI/litellm/issues/6562
* feat(watsonx/chat/handler.py): support tool calling for watsonx
Closes https://github.com/BerriAI/litellm/issues/6562
* fix(streaming_utils.py): return empty chunk instead of failing if streaming value is invalid dict
ensures streaming works for ibm watsonx
* fix(openai_like/chat/handler.py): ensure asynchttphandler is passed correctly for openai like calls
* fix: ensure exception mapping works well for watsonx calls
* fix(openai_like/chat/handler.py): handle async streaming correctly
* feat(main.py): Make it clear when a user is passing an invalid message
add validation for user content message
Closes https://github.com/BerriAI/litellm/issues/6565
* fix: cleanup
* fix(utils.py): loosen validation check, to just make sure content types are valid
make litellm robust to future content updates
* fix: fix linting erro
* fix: fix linting errors
* fix(utils.py): make validation check more flexible
* test: handle langfuse list index out of range error
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* bump: version 1.51.4 → 1.51.5
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* test: mark flaky test
* test: handle anthropic api instability
* test: update test
* test: bump num retries on langfuse tests - their api is quite bad
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* refactor: move gemini translation logic inside the transformation.py file
easier to isolate the gemini translation logic
* fix(gemini-transformation): support multiple tool calls in message body
Merges https://github.com/BerriAI/litellm/pull/6487/files
* test(test_vertex.py): add remaining tests from https://github.com/BerriAI/litellm/pull/6487
* fix(gemini-transformation): return tool calls for multiple tool calls
* fix: support passing logprobs param for vertex + gemini
* feat(vertex_ai): add logprobs support for gemini calls
* fix(anthropic/chat/transformation.py): fix disable parallel tool use flag
* fix: fix linting error
* fix(_logging.py): log stacktrace information in json logs
Closes https://github.com/BerriAI/litellm/issues/6497
* fix(utils.py): fix mem leak for async stream + completion
Uses a global executor pool instead of creating a new thread on each request
Fixes https://github.com/BerriAI/litellm/issues/6404
* fix(factory.py): handle tool call + content in assistant message for bedrock
* fix: fix import
* fix(factory.py): maintain support for content as a str in assistant response
* fix: fix import
* test: cleanup test
* fix(vertex_and_google_ai_studio/): return none for content if no str value
* test: retry flaky tests
* (UI) Fix viewing members, keys in a team + added testing (#6514)
* fix listing teams on ui
* LiteLLM Minor Fixes & Improvements (10/28/2024) (#6475)
* fix(anthropic/chat/transformation.py): support anthropic disable_parallel_tool_use param
Fixes https://github.com/BerriAI/litellm/issues/6456
* feat(anthropic/chat/transformation.py): support anthropic computer tool use
Closes https://github.com/BerriAI/litellm/issues/6427
* fix(vertex_ai/common_utils.py): parse out '$schema' when calling vertex ai
Fixes issue when trying to call vertex from vercel sdk
* fix(main.py): add 'extra_headers' support for azure on all translation endpoints
Fixes https://github.com/BerriAI/litellm/issues/6465
* fix: fix linting errors
* fix(transformation.py): handle no beta headers for anthropic
* test: cleanup test
* fix: fix linting error
* fix: fix linting errors
* fix: fix linting errors
* fix(transformation.py): handle dummy tool call
* fix(main.py): fix linting error
* fix(azure.py): pass required param
* LiteLLM Minor Fixes & Improvements (10/24/2024) (#6441)
* fix(azure.py): handle /openai/deployment in azure api base
* fix(factory.py): fix faulty anthropic tool result translation check
Fixes https://github.com/BerriAI/litellm/issues/6422
* fix(gpt_transformation.py): add support for parallel_tool_calls to azure
Fixes https://github.com/BerriAI/litellm/issues/6440
* fix(factory.py): support anthropic prompt caching for tool results
* fix(vertex_ai/common_utils): don't pop non-null required field
Fixes https://github.com/BerriAI/litellm/issues/6426
* feat(vertex_ai.py): support code_execution tool call for vertex ai + gemini
Closes https://github.com/BerriAI/litellm/issues/6434
* build(model_prices_and_context_window.json): Add 'supports_assistant_prefill' for bedrock claude-3-5-sonnet v2 models
Closes https://github.com/BerriAI/litellm/issues/6437
* fix(types/utils.py): fix linting
* test: update test to include required fields
* test: fix test
* test: handle flaky test
* test: remove e2e test - hitting gemini rate limits
* Litellm dev 10 26 2024 (#6472)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* (Testing) Add unit testing for DualCache - ensure in memory cache is used when expected (#6471)
* test test_dual_cache_get_set
* unit testing for dual cache
* fix async_set_cache_sadd
* test_dual_cache_local_only
* redis otel tracing + async support for latency routing (#6452)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* fix(dual_cache.py): set default value for parent_otel_span
* fix(transformation.py): support 'response_format' for anthropic calls
* fix(transformation.py): check for cache_control inside 'function' block
* fix: fix linting error
* fix: fix linting errors
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* ui new build
* Add retry strat (#6520)
Signed-off-by: dbczumar <corey.zumar@databricks.com>
* (fix) slack alerting - don't spam the failed cost tracking alert for the same model (#6543)
* fix use failing_model as cache key for failed_tracking_alert
* fix use standard logging payload for getting response cost
* fix kwargs.get("response_cost")
* fix getting response cost
* (feat) add XAI ChatCompletion Support (#6373)
* init commit for XAI
* add full logic for xai chat completion
* test_completion_xai
* docs xAI
* add xai/grok-beta
* test_xai_chat_config_get_openai_compatible_provider_info
* test_xai_chat_config_map_openai_params
* add xai streaming test
---------
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
* fix listing teams on ui
* LiteLLM Minor Fixes & Improvements (10/28/2024) (#6475)
* fix(anthropic/chat/transformation.py): support anthropic disable_parallel_tool_use param
Fixes https://github.com/BerriAI/litellm/issues/6456
* feat(anthropic/chat/transformation.py): support anthropic computer tool use
Closes https://github.com/BerriAI/litellm/issues/6427
* fix(vertex_ai/common_utils.py): parse out '$schema' when calling vertex ai
Fixes issue when trying to call vertex from vercel sdk
* fix(main.py): add 'extra_headers' support for azure on all translation endpoints
Fixes https://github.com/BerriAI/litellm/issues/6465
* fix: fix linting errors
* fix(transformation.py): handle no beta headers for anthropic
* test: cleanup test
* fix: fix linting error
* fix: fix linting errors
* fix: fix linting errors
* fix(transformation.py): handle dummy tool call
* fix(main.py): fix linting error
* fix(azure.py): pass required param
* LiteLLM Minor Fixes & Improvements (10/24/2024) (#6441)
* fix(azure.py): handle /openai/deployment in azure api base
* fix(factory.py): fix faulty anthropic tool result translation check
Fixes https://github.com/BerriAI/litellm/issues/6422
* fix(gpt_transformation.py): add support for parallel_tool_calls to azure
Fixes https://github.com/BerriAI/litellm/issues/6440
* fix(factory.py): support anthropic prompt caching for tool results
* fix(vertex_ai/common_utils): don't pop non-null required field
Fixes https://github.com/BerriAI/litellm/issues/6426
* feat(vertex_ai.py): support code_execution tool call for vertex ai + gemini
Closes https://github.com/BerriAI/litellm/issues/6434
* build(model_prices_and_context_window.json): Add 'supports_assistant_prefill' for bedrock claude-3-5-sonnet v2 models
Closes https://github.com/BerriAI/litellm/issues/6437
* fix(types/utils.py): fix linting
* test: update test to include required fields
* test: fix test
* test: handle flaky test
* test: remove e2e test - hitting gemini rate limits
* Litellm dev 10 26 2024 (#6472)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* (Testing) Add unit testing for DualCache - ensure in memory cache is used when expected (#6471)
* test test_dual_cache_get_set
* unit testing for dual cache
* fix async_set_cache_sadd
* test_dual_cache_local_only
* redis otel tracing + async support for latency routing (#6452)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* fix(dual_cache.py): set default value for parent_otel_span
* fix(transformation.py): support 'response_format' for anthropic calls
* fix(transformation.py): check for cache_control inside 'function' block
* fix: fix linting error
* fix: fix linting errors
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* fix(core_helpers.py): return None, instead of raising kwargs is None error
Closes https://github.com/BerriAI/litellm/issues/6500
* docs(cost_tracking.md): cleanup doc
* fix(vertex_and_google_ai_studio.py): handle function call with no params passed in
Closes https://github.com/BerriAI/litellm/issues/6495
* test(test_router_timeout.py): add test for router timeout + retry logic
* test: update test to use module level values
* (fix) Prometheus - Log Postgres DB latency, status on prometheus (#6484)
* fix logging DB fails on prometheus
* unit testing log to otel wrapper
* unit testing for service logger + prometheus
* use LATENCY buckets for service logging
* fix service logging
* docs clarify vertex vs gemini
* (router_strategy/) ensure all async functions use async cache methods (#6489)
* fix router strat
* use async set / get cache in router_strategy
* add coverage for router strategy
* fix imports
* fix batch_get_cache
* use async methods for least busy
* fix least busy use async methods
* fix test_dual_cache_increment
* test async_get_available_deployment when routing_strategy="least-busy"
* (fix) proxy - fix when `STORE_MODEL_IN_DB` should be set (#6492)
* set store_model_in_db at the top
* correctly use store_model_in_db global
* (fix) `PrometheusServicesLogger` `_get_metric` should return metric in Registry (#6486)
* fix logging DB fails on prometheus
* unit testing log to otel wrapper
* unit testing for service logger + prometheus
* use LATENCY buckets for service logging
* fix service logging
* fix _get_metric in prom services logger
* add clear doc string
* unit testing for prom service logger
* bump: version 1.51.0 → 1.51.1
* Add `azure/gpt-4o-mini-2024-07-18` to model_prices_and_context_window.json (#6477)
* Update utils.py (#6468)
Fixed missing keys
* (perf) Litellm redis router fix - ~100ms improvement (#6483)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* perf(cooldown_cache.py): improve cooldown cache, to store cache results in memory for 5s, prevents redis call from being made on each request
reduces 100ms latency per call with caching enabled on router
* fix: fix test
* fix(cooldown_cache.py): handle if a result is None
* fix(cooldown_cache.py): add debug statements
* refactor(dual_cache.py): move to using an in-memory check for batch get cache, to prevent redis from being hit for every call
* fix(cooldown_cache.py): fix linting erropr
* refactor(prometheus.py): move to using standard logging payload for reading the remaining request / tokens
Ensures prometheus token tracking works for anthropic as well
* fix: fix linting error
* fix(redis_cache.py): make sure ttl is always int (handle float values)
Fixes issue where redis_client.ex was not working correctly due to float ttl
* fix: fix linting error
* test: update test
* fix: fix linting error
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: vibhanshu-ob <115142120+vibhanshu-ob@users.noreply.github.com>
* fix(anthropic/chat/transformation.py): support anthropic disable_parallel_tool_use param
Fixes https://github.com/BerriAI/litellm/issues/6456
* feat(anthropic/chat/transformation.py): support anthropic computer tool use
Closes https://github.com/BerriAI/litellm/issues/6427
* fix(vertex_ai/common_utils.py): parse out '$schema' when calling vertex ai
Fixes issue when trying to call vertex from vercel sdk
* fix(main.py): add 'extra_headers' support for azure on all translation endpoints
Fixes https://github.com/BerriAI/litellm/issues/6465
* fix: fix linting errors
* fix(transformation.py): handle no beta headers for anthropic
* test: cleanup test
* fix: fix linting error
* fix: fix linting errors
* fix: fix linting errors
* fix(transformation.py): handle dummy tool call
* fix(main.py): fix linting error
* fix(azure.py): pass required param
* LiteLLM Minor Fixes & Improvements (10/24/2024) (#6441)
* fix(azure.py): handle /openai/deployment in azure api base
* fix(factory.py): fix faulty anthropic tool result translation check
Fixes https://github.com/BerriAI/litellm/issues/6422
* fix(gpt_transformation.py): add support for parallel_tool_calls to azure
Fixes https://github.com/BerriAI/litellm/issues/6440
* fix(factory.py): support anthropic prompt caching for tool results
* fix(vertex_ai/common_utils): don't pop non-null required field
Fixes https://github.com/BerriAI/litellm/issues/6426
* feat(vertex_ai.py): support code_execution tool call for vertex ai + gemini
Closes https://github.com/BerriAI/litellm/issues/6434
* build(model_prices_and_context_window.json): Add 'supports_assistant_prefill' for bedrock claude-3-5-sonnet v2 models
Closes https://github.com/BerriAI/litellm/issues/6437
* fix(types/utils.py): fix linting
* test: update test to include required fields
* test: fix test
* test: handle flaky test
* test: remove e2e test - hitting gemini rate limits
* Litellm dev 10 26 2024 (#6472)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* (Testing) Add unit testing for DualCache - ensure in memory cache is used when expected (#6471)
* test test_dual_cache_get_set
* unit testing for dual cache
* fix async_set_cache_sadd
* test_dual_cache_local_only
* redis otel tracing + async support for latency routing (#6452)
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
* fix(dual_cache.py): set default value for parent_otel_span
* fix(transformation.py): support 'response_format' for anthropic calls
* fix(transformation.py): check for cache_control inside 'function' block
* fix: fix linting error
* fix: fix linting errors
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix(azure.py): handle /openai/deployment in azure api base
* fix(factory.py): fix faulty anthropic tool result translation check
Fixes https://github.com/BerriAI/litellm/issues/6422
* fix(gpt_transformation.py): add support for parallel_tool_calls to azure
Fixes https://github.com/BerriAI/litellm/issues/6440
* fix(factory.py): support anthropic prompt caching for tool results
* fix(vertex_ai/common_utils): don't pop non-null required field
Fixes https://github.com/BerriAI/litellm/issues/6426
* feat(vertex_ai.py): support code_execution tool call for vertex ai + gemini
Closes https://github.com/BerriAI/litellm/issues/6434
* build(model_prices_and_context_window.json): Add 'supports_assistant_prefill' for bedrock claude-3-5-sonnet v2 models
Closes https://github.com/BerriAI/litellm/issues/6437
* fix(types/utils.py): fix linting
* test: update test to include required fields
* test: fix test
* test: handle flaky test
* test: remove e2e test - hitting gemini rate limits
* docs(bedrock.md): clarify bedrock auth in litellm docs
* fix(convert_dict_to_response.py): Fixes https://github.com/BerriAI/litellm/issues/6387
* feat(pattern_match_deployments.py): more robust handling for wildcard routes (model_name: custom_route/* -> openai/*)
Enables user to expose custom routes to users with dynamic handling
* test: add more testing
* docs(custom_pricing.md): add debug tutorial for custom pricing
* test: skip codestral test - unreachable backend
* test: fix test
* fix(pattern_matching_deployments.py): fix typing
* test: cleanup codestral tests - backend api unavailable
* (refactor) prometheus async_log_success_event to be under 100 LOC (#6416)
* unit testig for prometheus
* unit testing for success metrics
* use 1 helper for _increment_token_metrics
* use helper for _increment_remaining_budget_metrics
* use _increment_remaining_budget_metrics
* use _increment_top_level_request_and_spend_metrics
* use helper for _set_latency_metrics
* remove noqa violation
* fix test prometheus
* test prometheus
* unit testing for all prometheus helper functions
* fix prom unit tests
* fix unit tests prometheus
* fix unit test prom
* (refactor) router - use static methods for client init utils (#6420)
* use InitalizeOpenAISDKClient
* use InitalizeOpenAISDKClient static method
* fix # noqa: PLR0915
* (code cleanup) remove unused and undocumented logging integrations - litedebugger, berrispend (#6406)
* code cleanup remove unused and undocumented code files
* fix unused logging integrations cleanup
* bump: version 1.50.3 → 1.50.4
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix(utils.py): add 'disallowed_special' for token counting on .encode()
Fixes error when '<
endoftext
>' in string
* Revert "(fix) standard logging metadata + add unit testing (#6366)" (#6381)
This reverts commit 8359cb6fa9.
* add new 35 mode lcard (#6378)
* Add claude 3 5 sonnet 20241022 models for all provides (#6380)
* Add Claude 3.5 v2 on Amazon Bedrock and Vertex AI.
* added anthropic/claude-3-5-sonnet-20241022
* add new 35 mode lcard
---------
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: lowjiansheng <15527690+lowjiansheng@users.noreply.github.com>
* test(skip-flaky-google-context-caching-test): google is not reliable. their sample code is also not working
* Fix metadata being overwritten in speech() (#6295)
* fix: adding missing redis cluster kwargs (#6318)
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
* Add support for `max_completion_tokens` in Azure OpenAI (#6376)
Now that Azure supports `max_completion_tokens`, no need for special handling for this param and let it pass thru. More details: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure#api-support
* build(model_prices_and_context_window.json): add voyage-finance-2 pricing
Closes https://github.com/BerriAI/litellm/issues/6371
* build(model_prices_and_context_window.json): fix llama3.1 pricing model name on map
Closes https://github.com/BerriAI/litellm/issues/6310
* feat(realtime_streaming.py): just log specific events
Closes https://github.com/BerriAI/litellm/issues/6267
* fix(utils.py): more robust checking if unmapped vertex anthropic model belongs to that family of models
Fixes https://github.com/BerriAI/litellm/issues/6383
* Fix Ollama stream handling for tool calls with None content (#6155)
* test(test_max_completions): update test now that azure supports 'max_completion_tokens'
* fix(handler.py): fix linting error
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: John HU <hszqqq12@gmail.com>
Co-authored-by: Ali Arian <113945203+ali-arian@users.noreply.github.com>
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
Co-authored-by: Anand Taralika <46954145+taralika@users.noreply.github.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
* fix get_response_headers
* unit testing for get headers
* unit testing for anthropic / azure openai headers
* increase test coverage for test_completion_response_ratelimit_headers
* fix test rate limit headers
* nvidia nim support embedding config
* add nvidia config in init
* nvidia nim embeddings
* docs nvidia nim embeddings
* docs embeddings on nvidia nim
* fix llm translation test
* feat(together_ai/completion): handle together ai completion calls
* fix: handle list of int / list of list of int for text completion calls
* fix(utils.py): check if base model in bedrock converse model list
Fixes https://github.com/BerriAI/litellm/issues/6003
* test(test_optional_params.py): add unit tests for bedrock optional param mapping
Fixes https://github.com/BerriAI/litellm/issues/6003
* feat(utils.py): enable passing dummy tool call for anthropic/bedrock calls if tool_use blocks exist
Fixes https://github.com/BerriAI/litellm/issues/5388
* fixed an issue with tool use of claude models with anthropic and bedrock (#6013)
* fix(utils.py): handle empty schema for anthropic/bedrock
Fixes https://github.com/BerriAI/litellm/issues/6012
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(proxy_cli.py): fix import route for app + health checks path (#6026)
* (testing): Enable testing us.anthropic.claude-3-haiku-20240307-v1:0. (#6018)
* fix(proxy_cli.py): fix import route for app + health checks gettsburg.wav
Fixes https://github.com/BerriAI/litellm/issues/5999
---------
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
---------
Co-authored-by: Ved Patwardhan <54766411+vedpatwardhan@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
* add test for using images with custom openai endpoints
* run all otel tests
* update name of test
* add custom openai model to test config
* add test for setting supports_vision=True for model
* fix test guardrails aporia
* docs supports vison
* fix yaml
* fix yaml
* docs supports vision
* fix bedrock guardrail test
* fix cohere rerank test
* update model_group doc string
* add better prints on test
* add requester_metadata in standard logging payload
* log requester_metadata in metadata
* use StandardLoggingPayload for logging
* docs StandardLoggingPayload
* fix import
* include standard logging object in failure
* add test for requester metadata
* handle completion_tokens_details
* add test for completion_tokens_details
* add max_completion_tokens
* add max_completion_tokens
* add max_completion_tokens support for OpenAI models
* add max_completion_tokens param
* add max_completion_tokens for bedrock converse models
* add test for converse maxTokens
* fix openai o1 param mapping test
* move test optional params
* add max_completion_tokens for anthropic api
* fix conftest
* add max_completion tokens for vertex ai partner models
* add max_completion_tokens for fireworks ai
* add max_completion_tokens for hf rest api
* add test for param mapping
* add param mapping for vertex, gemini + testing
* predibase is the most unstable and unusable llm api in prod, can't handle our ci/cd
* add max_completion_tokens to openai supported params
* fix fireworks ai param mapping