Commit graph

247 commits

Author SHA1 Message Date
Krish Dholakia
61afdab228 refactor(sagemaker/): separate chat + completion routes + make them b… (#7151)
* refactor(sagemaker/): separate chat + completion routes + make them both use base llm config

Addresses https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132

* fix(main.py): pass hf model name + custom prompt dict to litellm params
2024-12-10 19:40:05 -08:00
Krish Dholakia
4eeaaeeacd refactor(fireworks_ai/): inherit from openai like base config (#7146)
* refactor(fireworks_ai/): inherit from openai like base config

refactors fireworks ai to use a common config

* test: fix import in test

* refactor(watsonx/): refactor watsonx to use llm base config

refactors chat + completion routes to base config path

* fix: fix linting error

* test: fix test

* fix: fix test
2024-12-10 16:15:19 -08:00
Ishaan Jaff
0df4dc51de (Refactor) Code Quality improvement - Use Common base handler for anthropic_text/ (#7143)
* add anthropic text provider

* add ANTHROPIC_TEXT to LlmProviders

* fix anthropic text implementation

* working anthropic text claude-2

* test_acompletion_claude2_stream

* add param mapping for anthropic text

* fix unused imports

* fix anthropic completion handler.py
2024-12-10 12:23:58 -08:00
Ishaan Jaff
1b377d5229 (Refactor) Code Quality improvement - Use Common base handler for Cohere /generate API (#7122)
* use validate_environment in common utils

* use transform request / response for cohere

* remove unused file

* use cohere base_llm_http_handler

* working cohere generate api on llm http handler

* streaming cohere generate api

* fix get_model_response_iterator

* fix streaming handler

* fix get_model_response_iterator

* test_cohere_generate_api_completion

* fix linting error

* fix testing cohere raising error

* fix get_model_response_iterator type

* add testing cohere generate api
2024-12-10 10:44:42 -08:00
Ishaan Jaff
9c2316b7ec (Refactor) Code Quality improvement - Use Common base handler for cloudflare/ provider (#7127)
* add get_complete_url to base config

* cloudflare - refactor to following existing pattern

* migrate cloudflare chat completions to base llm http handler

* fix unused import

* fix fake stream in cloudflare

* fix cloudflare transformation

* fix naming for BaseModelResponseIterator

* add async cloudflare streaming test

* test cloudflare

* add handler.py

* add handler.py in cohere handler.py
2024-12-10 10:12:22 -08:00
Ishaan Jaff
28ff38e35d (Refactor) Code Quality improvement - Use Common base handler for clarifai/ (#7125)
* use base_llm_http_handler for clarifai

* fix clarifai completion

* handle faking streaming base llm http handler

* add fake streaming for clarifai

* add FakeStreamResponseIterator for base model iterator

* fix get_model_response_iterator

* fix base model iterator

* fix base model iterator

* add support for faking sync streams clarfiai

* add fake streaming for clarifai

* remove unused code

* fix import

* fix llm http handler

* test_async_completion_clarifai

* fix clarifai tests

* fix linting
2024-12-09 21:04:48 -08:00
Ishaan Jaff
c5e0407703 (Refactor) Code Quality improvement - use Common base handler for Cohere (#7117)
* fix use new format for Cohere config

* fix base llm http handler

* Litellm code qa common config (#7116)

* feat(base_llm): initial commit for common base config class

Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132

* feat(base_llm/): add transform request/response abstract methods to base config class

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* use base transform helpers

* use base_llm_http_handler for cohere

* working cohere using base llm handler

* add async cohere chat completion support on base handler

* fix completion code

* working sync cohere stream

* add async support cohere_chat

* fix types get_model_response_iterator

* async / sync tests cohere

* feat  cohere using base llm class

* fix linting errors

* fix _abc error

* add cohere params to transformation

* remove old cohere file

* fix type error

* fix merge conflicts

* fix cohere merge conflicts

* fix linting error

* fix litellm.llms.custom_httpx.http_handler.HTTPHandler.post

* fix passing cohere specific params

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
2024-12-09 17:45:29 -08:00
Krish Dholakia
501885d653 Litellm code qa common config (#7113)
* feat(base_llm): initial commit for common base config class

Addresses code qa critique https://github.com/andrewyng/aisuite/issues/113#issuecomment-2512369132

* feat(base_llm/): add transform request/response abstract methods to base config class

* feat(cohere-+-clarifai): refactor integrations to use common base config class

* fix: fix linting errors

* refactor(anthropic/): move anthropic + vertex anthropic to use base config

* test: fix xai test

* test: fix tests

* fix: fix linting errors

* test: comment out WIP test

* fix(transformation.py): fix is pdf used check

* fix: fix linting error
2024-12-09 15:58:25 -08:00
Krish Dholakia
20e8dc35e1 feat(langfuse/): support langfuse prompt management (#7073)
* feat(langfuse/): support langfuse prompt management

Initial working commit for langfuse prompt management support

Closes https://github.com/BerriAI/litellm/issues/6269

* test: update test

* fix(litellm_logging.py): suppress linting error
2024-12-06 23:10:22 -08:00
Krish Dholakia
df3da2e5d2 Litellm dev 12 06 2024 (#7067)
* fix(edit_budget_modal.tsx): call `/budget/update` endpoint instead of `/budget/new`

allows updating existing budget on ui

* fix(user_api_key_auth.py): support cost tracking for end user via jwt field

* fix(presidio.py): support pii masking on sync logging callbacks

enables masking before logging to langfuse

* feat(utils.py): support retry policy logic inside '.completion()'

Fixes https://github.com/BerriAI/litellm/issues/6623

* fix(utils.py): support retry by retry policy on async logic as well

* fix(handle_jwt.py): set leeway default leeway value

* test: fix test to handle jwt audience claim
2024-12-06 22:44:18 -08:00
Ishaan Jaff
ce1e4b1d5e (feat) Allow enabling logging message / response for specific virtual keys (#7071)
* redact_message_input_output_from_logging

* initialize_standard_callback_dynamic_params

* allow dynamically opting out of redaction

* test_redact_msgs_from_logs_with_dynamic_params

* fix AddTeamCallback

* _get_turn_off_message_logging_from_dynamic_params

* test_global_redaction_with_dynamic_params

* test_dynamic_turn_off_message_logging

* docs Disable/Enable Message redaction

* fix doe qual check

* _get_turn_off_message_logging_from_dynamic_params
2024-12-06 21:25:36 -08:00
Krish Dholakia
92a7e8e3e9 LiteLLM Minor Fixes & Improvements (12/05/2024) (#7051)
* fix(cost_calculator.py): move to using `.get_model_info()` for cost per token calculations

ensures cost tracking is reliable - handles edge cases of parsing model cost map

* build(model_prices_and_context_window.json): add 'supports_response_schema' for select tgai models

Fixes https://github.com/BerriAI/litellm/pull/7037#discussion_r1872157329

* build(model_prices_and_context_window.json): remove 'pdf input' and 'vision' support from nova micro in model map

Bedrock docs indicate no support for micro - https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html

* fix(converse_transformation.py): support amazon nova tool use

* fix(opentelemetry): Add missing LLM request type attribute to spans (#7041)

* feat(opentelemetry): add LLM request type attribute to spans

* lint

* fix: curl usage (#7038)

curl -d, --data <data> is lowercase d
curl -D, --dump-header <filename> is uppercase D

references:
https://curl.se/docs/manpage.html#-d
https://curl.se/docs/manpage.html#-D

* fix(spend_tracking.py): handle empty 'id' in model response - when creating spend log

Fixes https://github.com/BerriAI/litellm/issues/7023

* fix(streaming_chunk_builder.py): handle initial id being empty string

Fixes https://github.com/BerriAI/litellm/issues/7023

* fix(anthropic_passthrough_logging_handler.py): add end user cost tracking for anthropic pass through endpoint

* docs(pass_through/): refactor docs location + add table on supported features for pass through endpoints

* feat(anthropic_passthrough_logging_handler.py): support end user cost tracking via anthropic sdk

* docs(anthropic_completion.md): add docs on passing end user param for cost tracking on anthropic sdk

* fix(litellm_logging.py): use standard logging payload if present in kwargs

prevent datadog logging error for pass through endpoints

* docs(bedrock.md): add rerank api usage example to docs

* bugfix/change dummy tool name format (#7053)

* fix viewing keys (#7042)

* ui new build

* build(model_prices_and_context_window.json): add bedrock region models to model cost map (#7044)

* bye (#6982)

* (fix) litellm router.aspeech  (#6962)

* doc Migrating Databases

* fix aspeech on router

* test_audio_speech_router

* test_audio_speech_router

* docs show supported providers on batches api doc

* change dummy tool name format

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>

* fix: fix linting errors

* test: update test

* fix(litellm_logging.py): fix pass through check

* fix(test_otel_logging.py): fix test

* fix(cost_calculator.py): update handling for cost per second

* fix(cost_calculator.py): fix cost check

* test: fix test

* (fix) adding public routes when using custom header  (#7045)

* get_api_key_from_custom_header

* add test_get_api_key_from_custom_header

* fix testing use 1 file for test user api key auth

* fix test user api key auth

* test_custom_api_key_header_name

* build: update ui build

---------

Co-authored-by: Doron Kopit <83537683+doronkopit5@users.noreply.github.com>
Co-authored-by: lloydchang <lloydchang@gmail.com>
Co-authored-by: hgulersen <haymigulersen@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
2024-12-06 14:29:53 -08:00
Krish Dholakia
a392bd9772 fix(key_management_endpoints.py): override metadata field value on up… (#7008)
* fix(key_management_endpoints.py): override metadata field value on update

allow user to override tags

* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric

allow disabling end user cost tracking on prometheus - fixes cardinality issue

* fix(litellm_pre_call_utils.py): add key/team level enforced params

Fixes https://github.com/BerriAI/litellm/issues/6652

* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update

* docs(enterprise.md): add docs on enforcing required params for llm requests

* Add support of Galadriel API (#7005)

* fix(router.py): robust retry after handling

set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment

* test(test_router.py): fix test

* feat(bedrock/): add support for 'nova' models

also adds explicit 'converse/' route for simpler routing

* fix: fix 'supports_pdf_input'

return if model supports pdf input on get_model_info

* feat(converse_transformation.py): support bedrock pdf input

* docs(document_understanding.md): add document understanding to docs

* fix(litellm_pre_call_utils.py): fix linting error

* fix(init.py): fix passing of bedrock converse models

* feat(bedrock/converse): support 'response_format={"type": "json_object"}'

* fix(converse_handler.py): fix linting error

* fix(base_llm_unit_tests.py): fix test

* fix: fix test

* test: fix test

* test: fix test

* test: remove duplicate test

---------

Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
2024-12-03 23:03:50 -08:00
Krrish Dholakia
70f7d7e787 feat(databricks/chat): support structured outputs on databricks
Closes https://github.com/BerriAI/litellm/pull/6978

- handles content as list for dbrx, - handles streaming+response_format for dbrx
2024-12-02 23:08:19 -08:00
Ishaan Jaff
5656e06fe6 (fixes) datadog logging - handle 1MB max log size on DD (#6996)
* fix dd truncate_standard_logging_payload_content

* dd truncate_standard_logging_payload_content

* fix test_datadog_payload_content_truncation

* add clear msg on _truncate_text

* test_truncate_standard_logging_payload

* fix linting error

* fix linting errors
2024-12-02 23:01:42 -08:00
Ishaan Jaff
204d83b3d1 (fix) logging Auth errors on datadog (#6995)
* fix get_standard_logging_object_payload

* fix async_post_call_failure_hook

* fix post_call_failure_hook

* fix change

* fix _is_proxy_only_error

* fix async_post_call_failure_hook

* fix getting request body

* remove redundant code

* use a well named original function name for auth errors

* fix logging auth fails on DD

* fix using request body

* use helper for _handle_logging_proxy_only_error
2024-12-02 23:01:21 -08:00
Krish Dholakia
4244201e48 Litellm 12 02 2024 (#6994)
* add the logprobs param for fireworks ai (#6915)

* add the logprobs param for fireworks ai

* (feat) pass through llm endpoints - add `PATCH` support (vertex context caching requires for update ops)  (#6924)

* add PATCH for pass through endpoints

* test_pass_through_routes_support_all_methods

* sonnet supports pdf, haiku does not (#6928)

* (feat) DataDog Logger - Add Failure logging + use Standard Logging payload (#6929)

* add async_log_failure_event for dd

* use standard logging payload for DD logging

* use standard logging payload for DD

* fix use SLP status

* allow opting into _create_v0_logging_payload

* add unit tests for DD logging payload

* fix dd logging tests

* (feat) log proxy auth errors on datadog  (#6931)

* add new dd type for auth errors

* add async_log_proxy_authentication_errors

* fix comment

* use async_log_proxy_authentication_errors

* test_datadog_post_call_failure_hook

* test_async_log_proxy_authentication_errors

* (feat) Allow using include to include external YAML files in a config.yaml (#6922)

* add helper to process inlcudes directive on yaml

* add doc on config management

* unit tests for `include` on config.yaml

* bump: version 1.52.16 → 1.53.

* (feat) dd logger - set tags according to the values set by those env vars  (#6933)

* dd logger, inherit from .envs

* test_datadog_payload_environment_variables

* fix _get_datadog_service

* build(ui/): update ui build

* bump: version 1.53.0 → 1.53.1

* Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)"

This reverts commit 68e59824a3.

* LiteLLM Minor Fixes & Improvements (11/26/2024)  (#6913)

* docs(config_settings.md): document all router_settings

* ci(config.yml): add router_settings doc test to ci/cd

* test: debug test on ci/cd

* test: debug ci/cd test

* test: fix test

* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call

Causes downstream errors if ui just fails to load team list

* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests

adds complete coverage for all 'response_format' values to ci/cd

* feat(router.py): support wildcard routes in `get_router_model_info()`

Addresses https://github.com/BerriAI/litellm/issues/6914

* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models

Allows for ratelimit tracking for gemini models even with wildcard routing enabled

Addresses https://github.com/BerriAI/litellm/issues/6914

* feat(router.py): add tpm/rpm tracking on success/failure to global_router

Addresses https://github.com/BerriAI/litellm/issues/6914

* feat(router.py): support wildcard routes on router.get_model_group_usage()

* fix(router.py): fix linting error

* fix(router.py): implement get_remaining_tokens_and_requests

Addresses https://github.com/BerriAI/litellm/issues/6914

* fix(router.py): fix linting errors

* test: fix test

* test: fix tests

* docs(config_settings.md): add missing dd env vars to docs

* fix(router.py): check if hidden params is dict

* LiteLLM Minor Fixes & Improvements (11/27/2024) (#6943)

* fix(http_parsing_utils.py): remove `ast.literal_eval()` from http utils

Security fix - https://huntr.com/bounties/96a32812-213c-4819-ba4e-36143d35e95b?token=bf414bbd77f8b346556e
64ab2dd9301ea44339910877ea50401c76f977e36cdd78272f5fb4ca852a88a7e832828aae1192df98680544ee24aa98f3cf6980d8
bab641a66b7ccbc02c0e7d4ddba2db4dbe7318889dc0098d8db2d639f345f574159814627bb084563bad472e2f990f825bff0878a9
e281e72c88b4bc5884d637d186c0d67c9987c57c3f0caf395aff07b89ad2b7220d1dd7d1b427fd2260b5f01090efce5250f8b56ea2
c0ec19916c24b23825d85ce119911275944c840a1340d69e23ca6a462da610

* fix(converse/transformation.py): support bedrock apac cross region inference

Fixes https://github.com/BerriAI/litellm/issues/6905

* fix(user_api_key_auth.py): add auth check for websocket endpoint

Fixes https://github.com/BerriAI/litellm/issues/6926

* fix(user_api_key_auth.py): use `model` from query param

* fix: fix linting error

* test: run flaky tests first

* docs: update the docs (#6923)

* (bug fix) /key/update was not storing `budget_duration` in the DB  (#6941)

* fix - store budget_duration for keys

* test_generate_and_update_key

* test_update_user_unit_test

* fix user update

* (fix) handle json decode errors for DD exception logging (#6934)

* fix JSONDecodeError

* handle async_log_proxy_authentication_errors

* fix test_async_log_proxy_authentication_errors_get_request

* Revert "Revert "(feat) Allow using include to include external YAML files in a config.yaml (#6922)""

This reverts commit 5d13302e6b.

* (docs + fix) Add docs on Moderations endpoint, Text Completion  (#6947)

* fix _pass_through_moderation_endpoint_factory

* fix route_llm_request

* doc moderations api

* docs on /moderations

* add e2e tests for moderations api

* docs moderations api

* test_pass_through_moderation_endpoint_factory

* docs text completion

* (feat) add enforcement for unique key aliases on /key/update and /key/generate  (#6944)

* add enforcement for unique key aliases

* fix _enforce_unique_key_alias

* fix _enforce_unique_key_alias

* fix _enforce_unique_key_alias

* test_enforce_unique_key_alias

* (fix) tag merging / aggregation logic   (#6932)

* use 1 helper to merge tags + ensure unique ness

* test_add_litellm_data_to_request_duplicate_tags

* fix _merge_tags

* fix proxy utils test

* fix doc string

* (feat) Allow disabling ErrorLogs written to the DB  (#6940)

* fix - allow disabling logging error logs

* docs on disabling error logs

* doc string for _PROXY_failure_handler

* test_disable_error_logs

* rename file

* fix rename file

* increase test coverage for test_enable_error_logs

* fix(key_management_endpoints.py): support 'tags' param on `/key/update` (#6945)

* LiteLLM Minor Fixes & Improvements (11/29/2024)  (#6965)

* fix(factory.py): ensure tool call converts image url

Fixes https://github.com/BerriAI/litellm/issues/6953

* fix(transformation.py): support mp4 + pdf url's for vertex ai

Fixes https://github.com/BerriAI/litellm/issues/6936

* fix(http_handler.py): mask gemini api key in error logs

Fixes https://github.com/BerriAI/litellm/issues/6963

* docs(prometheus.md): update prometheus FAQs

* feat(auth_checks.py): ensure specific model access > wildcard model access

if wildcard model is in access group, but specific model is not - deny access

* fix(auth_checks.py): handle auth checks for team based model access groups

handles scenario where model access group used for wildcard models

* fix(internal_user_endpoints.py): support adding guardrails on `/user/update`

Fixes https://github.com/BerriAI/litellm/issues/6942

* fix(key_management_endpoints.py): fix prepare_metadata_fields helper

* fix: fix tests

* build(requirements.txt): bump openai dep version

fixes proxies argument

* test: fix tests

* fix(http_handler.py): fix error message masking

* fix(bedrock_guardrails.py): pass in prepped data

* test: fix test

* test: fix nvidia nim test

* fix(http_handler.py): return original response headers

* fix: revert maskedhttpstatuserror

* test: update tests

* test: cleanup test

* fix(key_management_endpoints.py): fix metadata field update logic

* fix(key_management_endpoints.py): maintain initial order of guardrails in key update

* fix(key_management_endpoints.py): handle prepare metadata

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix: fix key management errors

* fix(key_management_endpoints.py): update metadata

* test: update test

* refactor: add more debug statements

* test: skip flaky test

* test: fix test

* fix: fix test

* fix: fix update metadata logic

* fix: fix test

* ci(config.yml): change db url for e2e ui testing

* bump: version 1.53.1 → 1.53.2

* Updated config.yml

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>

* fix(exceptions.py): ensure ratelimit error code == 429, type == "throttling_error"

Fixes https://github.com/BerriAI/litellm/pull/6973

* fix(utils.py): add jina ai dimensions embedding param support

Fixes https://github.com/BerriAI/litellm/issues/6591

* fix(exception_mapping_utils.py): add bedrock 'prompt is too long' exception to context window exceeded error exception mapping

Fixes https://github.com/BerriAI/litellm/issues/6629

Closes https://github.com/BerriAI/litellm/pull/6975

* fix(litellm_logging.py): strip trailing slash for api base

Closes https://github.com/BerriAI/litellm/pull/6859

* test: skip timeout issue

---------

Co-authored-by: ershang-dou <erlie.shang@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Sara Han <127759186+sdiazlor@users.noreply.github.com>
2024-12-02 22:00:01 -08:00
Krish Dholakia
1c8010bf57 Litellm dev 11 30 2024 (#6974)
* feat(cohere/chat.py): return citations in model response

Closes https://github.com/BerriAI/litellm/issues/6814

* fix(cohere/chat.py): fix linting errors

* fix(langsmith.py): support 'run_id' for langsmith

Fixes https://github.com/BerriAI/litellm/issues/6862

* fix(langsmith.py): fix langsmith quickstart

Fixes https://github.com/BerriAI/litellm/issues/6861

* fix: suppress linting error

* LiteLLM Minor Fixes & Improvements (11/29/2024)  (#6965)

* fix(factory.py): ensure tool call converts image url

Fixes https://github.com/BerriAI/litellm/issues/6953

* fix(transformation.py): support mp4 + pdf url's for vertex ai

Fixes https://github.com/BerriAI/litellm/issues/6936

* fix(http_handler.py): mask gemini api key in error logs

Fixes https://github.com/BerriAI/litellm/issues/6963

* docs(prometheus.md): update prometheus FAQs

* feat(auth_checks.py): ensure specific model access > wildcard model access

if wildcard model is in access group, but specific model is not - deny access

* fix(auth_checks.py): handle auth checks for team based model access groups

handles scenario where model access group used for wildcard models

* fix(internal_user_endpoints.py): support adding guardrails on `/user/update`

Fixes https://github.com/BerriAI/litellm/issues/6942

* fix(key_management_endpoints.py): fix prepare_metadata_fields helper

* fix: fix tests

* build(requirements.txt): bump openai dep version

fixes proxies argument

* test: fix tests

* fix(http_handler.py): fix error message masking

* fix(bedrock_guardrails.py): pass in prepped data

* test: fix test

* test: fix nvidia nim test

* fix(http_handler.py): return original response headers

* fix: revert maskedhttpstatuserror

* test: update tests

* test: cleanup test

* fix(key_management_endpoints.py): fix metadata field update logic

* fix(key_management_endpoints.py): maintain initial order of guardrails in key update

* fix(key_management_endpoints.py): handle prepare metadata

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix: fix key management errors

* fix(key_management_endpoints.py): update metadata

* test: update test

* refactor: add more debug statements

* test: skip flaky test

* test: fix test

* fix: fix test

* fix: fix update metadata logic

* fix: fix test

* ci(config.yml): change db url for e2e ui testing

* test: add more debug logs to langsmith

* fix: test change

* build(config.yml): fix db url

'
2024-12-02 21:03:33 -08:00
Ishaan Jaff
72afed5b7e (QOL improvement) Provider budget routing - allow using 1s, 1d, 1mo, 2mo etc (#6885)
* use 1 file for duration_in_seconds

* add to readme.md

* re use duration_in_seconds

* fix importing _extract_from_regex, get_last_day_of_month

* fix import

* update provider budget routing

* fix - remove dup test
2024-11-23 16:59:46 -08:00
Krish Dholakia
26f5f9c211 LiteLLM Minor Fixes & Improvements (11/23/2024) (#6870)
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.

* fix(utils.py): allow disabling end user cost tracking with new param

Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small

* docs(configs.md): add disable_end_user_cost_tracking reference to docs

* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role

Enables admin to restrict key creation, and assign team admins to handle distributing keys

* test(test_key_management.py): add unit testing for personal / team key restriction checks

* docs: add docs on restricting key creation

* docs(finetuned_models.md): add new guide on calling finetuned models

* docs(input.md): cleanup anthropic supported params

Closes https://github.com/BerriAI/litellm/issues/6856

* test(test_embedding.py): add test for passing extra headers via embedding

* feat(cohere/embed): pass client to async embedding

* feat(rerank.py): add `/v1/rerank` if missing for cohere base url

Closes https://github.com/BerriAI/litellm/issues/6844

* fix(main.py): pass extra_headers param to openai

Fixes https://github.com/BerriAI/litellm/issues/6836

* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set

Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically

* fix(handler.py): fix linting error

* fix: fix typing

* build: add conftest to proxy_admin_ui_tests/

* test: fix test

* fix: fix linting errors

* test: fix test

* fix: fix pass through testing
2024-11-23 15:17:40 +05:30
Krish Dholakia
4eca6ede4e Litellm dev 11 21 2024 (#6837)
* Fix Vertex AI function calling invoke: use JSON format instead of protobuf text format. (#6702)

* test: test tool_call conversion when arguments is empty dict

Fixes https://github.com/BerriAI/litellm/issues/6833

* fix(openai_like/handler.py): return more descriptive error message

Fixes https://github.com/BerriAI/litellm/issues/6812

* test: skip overloaded model

* docs(anthropic.md): update anthropic docs to show how to route to any new model

* feat(groq/): fake stream when 'response_format' param is passed

Groq doesn't support streaming when response_format is set

* feat(groq/): add response_format support for groq

Closes https://github.com/BerriAI/litellm/issues/6845

* fix(o1_handler.py): remove fake streaming for o1

Closes https://github.com/BerriAI/litellm/issues/6801

* build(model_prices_and_context_window.json): add groq llama3.2b model pricing

Closes https://github.com/BerriAI/litellm/issues/6807

* fix(utils.py): fix handling ollama response format param

Fixes https://github.com/BerriAI/litellm/issues/6848#issuecomment-2491215485

* docs(sidebars.js): refactor chat endpoint placement

* fix: fix linting errors

* test: fix test

* test: fix test

* fix(openai_like/handler): handle max retries

* fix(streaming_handler.py): fix streaming check for openai-compatible providers

* test: update test

* test: correctly handle model is overloaded error

* test: update test

* test: fix test

* test: mark flaky test

---------

Co-authored-by: Guowang Li <Guowang@users.noreply.github.com>
2024-11-22 01:53:52 +05:30
Krish Dholakia
3b3e19643c LiteLLM Minor Fixes & Improvements (11/13/2024) (#6729)
* fix(utils.py): add logprobs support for together ai

Fixes

https://github.com/BerriAI/litellm/issues/6724

* feat(pass_through_endpoints/): add anthropic/ pass-through endpoint

adds new `anthropic/` pass-through endpoint + refactors docs

* feat(spend_management_endpoints.py): allow /global/spend/report to query team + customer id

enables seeing spend for a customer in a team

* Add integration with MLflow Tracing (#6147)

* Add MLflow logger

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Streaming handling

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* address comments and fix issues

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* address comments and fix issues

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Move logger construction code

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Add docs

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* async handlers

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* new picture

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* fix(mlflow.py): fix ruff linting errors

* ci(config.yml): add mlflow to ci testing

* fix: fix test

* test: fix test

* Litellm key update fix (#6710)

* fix(caching): convert arg to equivalent kwargs in llm caching handler

prevent unexpected errors

* fix(caching_handler.py): don't pass args to caching

* fix(caching): remove all *args from caching.py

* fix(caching): consistent function signatures + abc method

* test(caching_unit_tests.py): add unit tests for llm caching

ensures coverage for common caching scenarios across different implementations

* refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one

* fix(router.py): drop redis password requirement

* fix(proxy_server.py): fix faulty slack alerting check

* fix(langfuse.py): avoid copying functions/thread lock objects in metadata

fixes metadata copy error when parent otel span in metadata

* test: update test

* fix(key_management_endpoints.py): fix /key/update with metadata update

* fix(key_management_endpoints.py): fix key_prepare_update helper

* fix(key_management_endpoints.py): reset value to none if set in key update

* fix: update test

'

* Litellm dev 11 11 2024 (#6693)

* fix(__init__.py): add 'watsonx_text' as mapped llm api route

Fixes https://github.com/BerriAI/litellm/issues/6663

* fix(opentelemetry.py): fix passing parallel tool calls to otel

Fixes https://github.com/BerriAI/litellm/issues/6677

* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling

reduces bugs in repo

* fix(__init__.py): update provider-model mapping to include all known provider-model mappings

Fixes https://github.com/BerriAI/litellm/issues/6669

* feat(anthropic): support passing document in llm api call

* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function

* fix(factory.py): fix linting error

* add clear doc string for GCS bucket logging

* Add docs to export logs to Laminar (#6674)

* Add docs to export logs to Laminar

* minor fix: newline at end of file

* place laminar after http and grpc

* (Feat) Add langsmith key based logging (#6682)

* add langsmith_api_key to StandardCallbackDynamicParams

* create a file for langsmith types

* langsmith add key / team based logging

* add key based logging for langsmith

* fix langsmith key based logging

* fix linting langsmith

* remove NOQA violation

* add unit test coverage for all helpers in test langsmith

* test_langsmith_key_based_logging

* docs langsmith key based logging

* run langsmith tests in logging callback tests

* fix logging testing

* test_langsmith_key_based_logging

* test_add_callback_via_key_litellm_pre_call_utils_langsmith

* add debug statement langsmith key based logging

* test_langsmith_key_based_logging

* (fix) OpenAI's optional messages[].name  does not work with Mistral API  (#6701)

* use helper for _transform_messages mistral

* add test_message_with_name to base LLMChat test

* fix linting

* add xAI on Admin UI (#6680)

* (docs) add benchmarks on 1K RPS  (#6704)

* docs litellm proxy benchmarks

* docs GCS bucket

* doc fix - reduce clutter on logging doc title

* (feat) add cost tracking stable diffusion 3 on Bedrock  (#6676)

* add cost tracking for sd3

* test_image_generation_bedrock

* fix get model info for image cost

* add cost_calculator for stability 1 models

* add unit testing for bedrock image cost calc

* test_cost_calculator_with_no_optional_params

* add test_cost_calculator_basic

* correctly allow size Optional

* fix cost_calculator

* sd3 unit tests cost calc

* fix raise correct error 404 when /key/info is called on non-existent key  (#6653)

* fix raise correct error on /key/info

* add not_found_error error

* fix key not found in DB error

* use 1 helper for checking token hash

* fix error code on key info

* fix test key gen prisma

* test_generate_and_call_key_info

* test fix test_call_with_valid_model_using_all_models

* fix key info tests

* bump: version 1.52.4 → 1.52.5

* add defaults used for GCS logging

* LiteLLM Minor Fixes & Improvements (11/12/2024)  (#6705)

* fix(caching): convert arg to equivalent kwargs in llm caching handler

prevent unexpected errors

* fix(caching_handler.py): don't pass args to caching

* fix(caching): remove all *args from caching.py

* fix(caching): consistent function signatures + abc method

* test(caching_unit_tests.py): add unit tests for llm caching

ensures coverage for common caching scenarios across different implementations

* refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one

* fix(router.py): drop redis password requirement

* fix(proxy_server.py): fix faulty slack alerting check

* fix(langfuse.py): avoid copying functions/thread lock objects in metadata

fixes metadata copy error when parent otel span in metadata

* test: update test

* bump: version 1.52.5 → 1.52.6

* (feat) helm hook to sync db schema  (#6715)

* v0 migration job

* fix job

* fix migrations job.yml

* handle standalone DB on helm hook

* fix argo cd annotations

* fix db migration helm hook

* fix migration job

* doc fix Using Http/2 with Hypercorn

* (fix proxy redis) Add redis sentinel support  (#6154)

* add sentinel_password support

* add doc for setting redis sentinel password

* fix redis sentinel - use sentinel password

* Fix: Update gpt-4o costs to that of gpt-4o-2024-08-06 (#6714)

Fixes #6713

* (fix) using Anthropic `response_format={"type": "json_object"}`  (#6721)

* add support for response_format=json anthropic

* add test_json_response_format to baseLLM ChatTest

* fix test_litellm_anthropic_prompt_caching_tools

* fix test_anthropic_function_call_with_no_schema

* test test_create_json_tool_call_for_response_format

* (feat) Add cost tracking for Azure Dall-e-3 Image Generation  + use base class to ensure basic image generation tests pass  (#6716)

* add BaseImageGenTest

* use 1 class for unit testing

* add debugging to BaseImageGenTest

* TestAzureOpenAIDalle3

* fix response_cost_calculator

* test_basic_image_generation

* fix img gen basic test

* fix _select_model_name_for_cost_calc

* fix test_aimage_generation_bedrock_with_optional_params

* fix undo changes cost tracking

* fix response_cost_calculator

* fix test_cost_azure_gpt_35

* fix remove dup test (#6718)

* (build) update db helm hook

* (build) helm db pre sync hook

* (build) helm db sync hook

* test: run test_team_logging firdst

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Dinmukhamed Mailibay <47117969+dinmukhamedm@users.noreply.github.com>
Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de>

* test: update test

* test: skip anthropic overloaded error

* test: cleanup test

* test: update tests

* test: fix test

* test: handle gemini overloaded model error

* test: handle internal server error

* test: handle anthropic overloaded error

* test: handle claude instability

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Dinmukhamed Mailibay <47117969+dinmukhamedm@users.noreply.github.com>
Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de>
2024-11-15 11:18:31 +05:30
Ishaan Jaff
30f255ae2b [Feature]: Stop swallowing up AzureOpenAi exception responses in litellm's implementation for a BadRequestError (#6745)
* fix azure exceptions

* test_bad_request_error_contains_httpx_response

* test_bad_request_error_contains_httpx_response

* use safe access to get exception response

* fix get attr
2024-11-14 15:54:28 -08:00
Krish Dholakia
2bf23b0c7d LiteLLM Minor Fixes & Improvement (11/14/2024) (#6730)
* fix(ollama.py): fix get model info request

Fixes https://github.com/BerriAI/litellm/issues/6703

* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param

* docs(anthropic.md): document all supported openai params for anthropic

* test: fix tests

* fix: fix tests

* feat(jina_ai/): add rerank support

Closes https://github.com/BerriAI/litellm/issues/6691

* test: handle service unavailable error

* fix(handler.py): refactor together ai rerank call

* test: update test to handle overloaded error

* test: fix test

* Litellm router trace (#6742)

* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks

* feat(router.py): log trace id across retry/fallback logic

allows grouping llm logs for the same request

* test: fix tests

* fix: fix test

* fix(transformation.py): only set non-none stop_sequences

* Litellm router disable fallbacks (#6743)

* bump: version 1.52.6 → 1.52.7

* feat(router.py): enable dynamically disabling fallbacks

Allows for enabling/disabling fallbacks per key

* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key

* test: fix test

* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error

* test: handle gemini error

* test: fix test

* fix: new run
2024-11-15 01:02:54 +05:30
Krish Dholakia
302786bd5b LiteLLM Minor Fixes & Improvements (11/12/2024) (#6705)
* fix(caching): convert arg to equivalent kwargs in llm caching handler

prevent unexpected errors

* fix(caching_handler.py): don't pass args to caching

* fix(caching): remove all *args from caching.py

* fix(caching): consistent function signatures + abc method

* test(caching_unit_tests.py): add unit tests for llm caching

ensures coverage for common caching scenarios across different implementations

* refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one

* fix(router.py): drop redis password requirement

* fix(proxy_server.py): fix faulty slack alerting check

* fix(langfuse.py): avoid copying functions/thread lock objects in metadata

fixes metadata copy error when parent otel span in metadata

* test: update test
2024-11-12 22:50:51 +05:30
Krish Dholakia
a9038087cb Litellm dev 11 07 2024 (#6649)
* fix(streaming_handler.py): save finish_reasons which might show up mid-stream (store last received one)

Fixes https://github.com/BerriAI/litellm/issues/6104

* refactor: add readme to litellm_core_utils/

make it easier to navigate

* fix(team_endpoints.py): return team id + object for invalid team in `/team/list`

* fix(streaming_handler.py): remove import

* fix(pattern_match_deployments.py): default to user input if unable to map based on wildcards (#6646)

* fix(pattern_match_deployments.py): default to user input if unable to… (#6632)

* fix(pattern_match_deployments.py): default to user input if unable to map based on wildcards

* test: fix test

* test: reset test name

* test: update conftest to reload proxy server module between tests

* ci(config.yml): move langfuse out of local_testing

reduce ci/cd time

* ci(config.yml): cleanup langfuse ci/cd tests

* fix: update test to not use global proxy_server app module

* ci: move caching to a separate test pipeline

speed up ci pipeline

* test: update conftest to check if proxy_server attr exists before reloading

* build(conftest.py): don't block on inability to reload proxy_server

* ci(config.yml): update caching unit test filter to work on 'cache' keyword as well

* fix(encrypt_decrypt_utils.py): use function to get salt key

* test: mark flaky test

* test: handle anthropic overloaded errors

* refactor: create separate ci/cd pipeline for proxy unit tests

make ci/cd faster

* ci(config.yml): add litellm_proxy_unit_testing to build_and_test jobs

* ci(config.yml): generate prisma binaries for proxy unit tests

* test: readd vertex_key.json

* ci(config.yml): remove `-s` from proxy_unit_test cmd

speed up test

* ci: remove any 'debug' logging flag

speed up ci pipeline

* test: fix test

* test(test_braintrust.py): rerun

* test: add delay for braintrust test

* chore: comment for maritalk (#6607)

* Update gpt-4o-2024-08-06, and o1-preview, o1-mini models in model cost map  (#6654)

* Adding supports_response_schema to gpt-4o-2024-08-06 models

* o1 models do not support vision

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>

* (QOL improvement) add unit testing for all static_methods in litellm_logging.py  (#6640)

* add unit testing for standard logging payload

* unit testing for static methods in litellm_logging

* add code coverage check for litellm_logging

* litellm_logging_code_coverage

* test_get_final_response_obj

* fix validate_redacted_message_span_attributes

* test validate_redacted_message_span_attributes

* (feat) log error class, function_name on prometheus service failure hook + only log DB related failures on DB service hook  (#6650)

* log error on prometheus service failure hook

* use a more accurate function name for wrapper that handles logging db metrics

* fix log_db_metrics

* test_log_db_metrics_failure_error_types

* fix linting

* fix auth checks

* Update several Azure AI models in model cost map (#6655)

* Adding Azure Phi 3/3.5 models to model cost map

* Update gpt-4o-mini models

* Adding missing Azure Mistral models to model cost map

* Adding Azure Llama3.2 models to model cost map

* Fix Gemini-1.5-flash pricing

* Fix Gemini-1.5-flash output pricing

* Fix Gemini-1.5-pro prices

* Fix Gemini-1.5-flash output prices

* Correct gemini-1.5-pro prices

* Correction on Vertex Llama3.2 entry

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>

* fix(streaming_handler.py): fix linting error

* test: remove duplicate test

causes gemini ratelimit error

---------

Co-authored-by: nobuo kawasaki <nobu007@users.noreply.github.com>
Co-authored-by: Emerson Gomes <emerson.gomes@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-11-08 19:34:22 +05:30
Ishaan Jaff
dfab0495c9 (QOL improvement) add unit testing for all static_methods in litellm_logging.py (#6640)
* add unit testing for standard logging payload

* unit testing for static methods in litellm_logging

* add code coverage check for litellm_logging

* litellm_logging_code_coverage

* test_get_final_response_obj

* fix validate_redacted_message_span_attributes

* test validate_redacted_message_span_attributes
2024-11-07 16:26:53 -08:00
nobuo kawasaki
2b6e78e82a chore: comment for maritalk (#6607) 2024-11-07 12:20:12 -08:00
Krish Dholakia
bf198abadc LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590)
* fix(pattern_matching_router.py): update model name using correct function

* fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563)

Co-authored-by: seva <seva@inita.com>

* fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage

Closes https://github.com/BerriAI/litellm/issues/6488

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* fix ImageObject conversion (#6584)

* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)

* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion

* fix allow using 15 seconds for premium license check

* testing fix bedrock deprecated cohere.command-text-v14

* (feat) add `Predicted Outputs` for OpenAI  (#6594)

* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output

* (fix) Vertex Improve Performance when using `image_url`  (#6593)

* fix transformation vertex

* test test_process_gemini_image

* test_image_completion_request

* testing fix - bedrock has deprecated cohere.command-text-v14

* fix vertex pdf

* bump: version 1.51.5 → 1.52.0

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check

* fix(lowest_tpm_rpm_v2.py): return headers in correct format

* test: update test

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* test: remove eol model

* fix(proxy_server.py): fix db config loading logic

* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten

* test: skip test if required env var is missing

* test: fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>

* test: mark flaky test

* test: handle anthropic api instability

* test(test_proxy_utils.py): add testing for db config update logic

* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597)

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* fix ImageObject conversion (#6584)

* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)

* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion

* fix allow using 15 seconds for premium license check

* testing fix bedrock deprecated cohere.command-text-v14

* (feat) add `Predicted Outputs` for OpenAI  (#6594)

* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output

* (fix) Vertex Improve Performance when using `image_url`  (#6593)

* fix transformation vertex

* test test_process_gemini_image

* test_image_completion_request

* testing fix - bedrock has deprecated cohere.command-text-v14

* fix vertex pdf

* bump: version 1.51.5 → 1.52.0

* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>

* fix(langfuse.py): fix linting errors

* fix: fix linting errors

* fix: fix casting error

* fix: fix typing error

* fix: add more tests

* fix(utils.py): fix return_processed_chunk_logic

* Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615)

This reverts commit 1a7f7bdfb7.

* docs fix clarify team_id on team based logging

* doc fix team based logging with langfuse

* fix flake8 checks

* test: bump sleep time

* refactor: replace claude-instant-1.2 with haiku in testing

* fix(proxy_server.py): move to using sl payload in track_cost_callback

* fix(proxy_server.py): fix linting errors

* fix(proxy_server.py): fallback to kwargs(response_cost) if given

* test: remove claude-instant-1 from tests

* test: fix claude test

* docs fix clarify team_id on team based logging

* doc fix team based logging with langfuse

* build: remove lint.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com>
Co-authored-by: seva <seva@inita.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
2024-11-07 04:17:05 +05:30
Krish Dholakia
741da7e182 LiteLLM Minor Fixes & Improvements (11/04/2024) (#6572)
* feat: initial commit for watsonx chat endpoint support

Closes https://github.com/BerriAI/litellm/issues/6562

* feat(watsonx/chat/handler.py): support tool calling for watsonx

Closes https://github.com/BerriAI/litellm/issues/6562

* fix(streaming_utils.py): return empty chunk instead of failing if streaming value is invalid dict

ensures streaming works for ibm watsonx

* fix(openai_like/chat/handler.py): ensure asynchttphandler is passed correctly for openai like calls

* fix: ensure exception mapping works well for watsonx calls

* fix(openai_like/chat/handler.py): handle async streaming correctly

* feat(main.py): Make it clear when a user is passing an invalid message

add validation for user content message

 Closes https://github.com/BerriAI/litellm/issues/6565

* fix: cleanup

* fix(utils.py): loosen validation check, to just make sure content types are valid

make litellm robust to future content updates

* fix: fix linting erro

* fix: fix linting errors

* fix(utils.py): make validation check more flexible

* test: handle langfuse list index out of range error

* Litellm dev 11 02 2024 (#6561)

* fix(dual_cache.py): update in-memory check for redis batch get cache

Fixes latency delay for async_batch_redis_cache

* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set

* feat(user_api_key_auth.py): add parent otel component for auth

allows us to isolate how much latency is added by auth checks

* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)

reduces latency by 200ms

* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)

Reduces latency by 400-800ms

* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls

reduces latency by 50-100ms

* fix: fix linting error

* fix(_service_logger.py): fix import

* fix(user_api_key_auth.py): fix service logging

* fix(dual_cache.py): don't pass 'self'

* fix: fix python3.8 error

* fix: fix init]

* bump: version 1.51.4 → 1.51.5

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* Litellm dev 11 02 2024 (#6561)

* fix(dual_cache.py): update in-memory check for redis batch get cache

Fixes latency delay for async_batch_redis_cache

* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set

* feat(user_api_key_auth.py): add parent otel component for auth

allows us to isolate how much latency is added by auth checks

* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)

reduces latency by 200ms

* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)

Reduces latency by 400-800ms

* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls

reduces latency by 50-100ms

* fix: fix linting error

* fix(_service_logger.py): fix import

* fix(user_api_key_auth.py): fix service logging

* fix(dual_cache.py): don't pass 'self'

* fix: fix python3.8 error

* fix: fix init]

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* fix ImageObject conversion (#6584)

* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)

* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion

* fix allow using 15 seconds for premium license check

* testing fix bedrock deprecated cohere.command-text-v14

* (feat) add `Predicted Outputs` for OpenAI  (#6594)

* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output

* (fix) Vertex Improve Performance when using `image_url`  (#6593)

* fix transformation vertex

* test test_process_gemini_image

* test_image_completion_request

* testing fix - bedrock has deprecated cohere.command-text-v14

* fix vertex pdf

* bump: version 1.51.5 → 1.52.0

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check

* fix(lowest_tpm_rpm_v2.py): return headers in correct format

* test: update test

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* test: remove eol model

* fix(proxy_server.py): fix db config loading logic

* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten

* test: skip test if required env var is missing

* test: fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>

* test: mark flaky test

* test: handle anthropic api instability

* test: update test

* test: bump num retries on langfuse tests - their api is quite bad

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
2024-11-06 17:53:46 +05:30
Ishaan Jaff
0f5817352c (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion
2024-11-04 15:47:48 -08:00
Krish Dholakia
e7ce45236a Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained
2024-11-05 03:51:26 +05:30
Krish Dholakia
976926e231 LiteLLM Minor Fixes & Improvements (11/01/2024) (#6551)
* fix: add lm_studio support

* fix(cohere_transformation.py): fix transformation logic for azure cohere embedding model name

Fixes https://github.com/BerriAI/litellm/issues/6540

* fix(utils.py): require base64 str to begin with `data:`

Fixes https://github.com/BerriAI/litellm/issues/6541

* fix: cleanup tests

* docs(guardrails.md): fix typo

* fix(opentelemetry.py): move to `.exception` and update 'response_obj' value to handle 'None' case

Fixes https://github.com/BerriAI/litellm/issues/6510

* fix: fix linting noqa placement
2024-11-02 02:09:31 +05:30
Ishaan Jaff
bd1cbc2f51 (feat) add XAI ChatCompletion Support (#6373)
* init commit for XAI

* add full logic for xai chat completion

* test_completion_xai

* docs xAI

* add xai/grok-beta

* test_xai_chat_config_get_openai_compatible_provider_info

* test_xai_chat_config_map_openai_params

* add xai streaming test
2024-11-01 20:37:09 +05:30
Krish Dholakia
4bde45e7f2 Litellm dev 10 29 2024 (#6502)
* fix(core_helpers.py): return None, instead of raising kwargs is None error

Closes https://github.com/BerriAI/litellm/issues/6500

* docs(cost_tracking.md): cleanup doc

* fix(vertex_and_google_ai_studio.py): handle function call with no params passed in

Closes https://github.com/BerriAI/litellm/issues/6495

* test(test_router_timeout.py): add test for router timeout + retry logic

* test: update test to use module level values

* (fix) Prometheus - Log Postgres DB latency, status on prometheus  (#6484)

* fix logging DB fails on prometheus

* unit testing log to otel wrapper

* unit testing for service logger + prometheus

* use LATENCY buckets for service logging

* fix service logging

* docs clarify vertex vs gemini

* (router_strategy/) ensure all async functions use async cache methods (#6489)

* fix router strat

* use async set / get cache in router_strategy

* add coverage for router strategy

* fix imports

* fix batch_get_cache

* use async methods for least busy

* fix least busy use async methods

* fix test_dual_cache_increment

* test async_get_available_deployment when routing_strategy="least-busy"

* (fix) proxy - fix when `STORE_MODEL_IN_DB` should be set (#6492)

* set store_model_in_db at the top

* correctly use store_model_in_db global

* (fix) `PrometheusServicesLogger` `_get_metric` should return metric in Registry  (#6486)

* fix logging DB fails on prometheus

* unit testing log to otel wrapper

* unit testing for service logger + prometheus

* use LATENCY buckets for service logging

* fix service logging

* fix _get_metric in prom services logger

* add clear doc string

* unit testing for prom service logger

* bump: version 1.51.0 → 1.51.1

* Add `azure/gpt-4o-mini-2024-07-18` to model_prices_and_context_window.json (#6477)

* Update utils.py (#6468)

Fixed missing keys

* (perf) Litellm redis router fix - ~100ms improvement (#6483)

* docs(exception_mapping.md): add missing exception types

Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183

* fix(main.py): register custom model pricing with specific key

Ensure custom model pricing is registered to the specific model+provider key combination

* test: make testing more robust for custom pricing

* fix(redis_cache.py): instrument otel logging for sync redis calls

ensures complete coverage for all redis cache calls

* refactor: pass parent_otel_span for redis caching calls in router

allows for more observability into what calls are causing latency issues

* test: update tests with new params

* refactor: ensure e2e otel tracing for router

* refactor(router.py): add more otel tracing acrosss router

catch all latency issues for router requests

* fix: fix linting error

* fix(router.py): fix linting error

* fix: fix test

* test: fix tests

* fix(dual_cache.py): pass ttl to redis cache

* fix: fix param

* perf(cooldown_cache.py): improve cooldown cache, to store cache results in memory for 5s, prevents redis call from being made on each request

reduces 100ms latency per call with caching enabled on router

* fix: fix test

* fix(cooldown_cache.py): handle if a result is None

* fix(cooldown_cache.py): add debug statements

* refactor(dual_cache.py): move to using an in-memory check for batch get cache, to prevent redis from being hit for every call

* fix(cooldown_cache.py): fix linting erropr

* refactor(prometheus.py): move to using standard logging payload for reading the remaining request / tokens

Ensures prometheus token tracking works for anthropic as well

* fix: fix linting error

* fix(redis_cache.py): make sure ttl is always int (handle float values)

Fixes issue where redis_client.ex was not working correctly due to float ttl

* fix: fix linting error

* test: update test

* fix: fix linting error

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: vibhanshu-ob <115142120+vibhanshu-ob@users.noreply.github.com>
2024-10-29 22:04:16 -07:00
Ishaan Jaff
34d0c7e917 (Feat) New Logging integration - add Datadog LLM Observability support (#6449)
* add type for dd llm obs request ob

* working dd llm obs

* datadog use well defined type

* clean up

* unit test test_create_llm_obs_payload

* fix linting

* add datadog_llm_observability

* add datadog_llm_observability

* docs DD LLM obs

* run testing again

* document DD_ENV

* test_create_llm_obs_payload
2024-10-28 22:01:32 +05:30
Krish Dholakia
197655bf2a LiteLLM Minor Fixes & Improvements (10/24/2024) (#6421)
* fix(utils.py): support passing dynamic api base to validate_environment

Returns True if just api base is required and api base is passed

* fix(litellm_pre_call_utils.py): feature flag sending client headers to llm api

Fixes https://github.com/BerriAI/litellm/issues/6410

* fix(anthropic/chat/transformation.py): return correct error message

* fix(http_handler.py): add error response text in places where we expect it

* fix(factory.py): handle base case of no non-system messages to bedrock

Fixes https://github.com/BerriAI/litellm/issues/6411

* feat(cohere/embed): Support cohere image embeddings

Closes https://github.com/BerriAI/litellm/issues/6413

* fix(__init__.py): fix linting error

* docs(supported_embedding.md): add image embedding example to docs

* feat(cohere/embed): use cohere embedding returned usage for cost calc

* build(model_prices_and_context_window.json): add embed-english-v3.0 details (image cost + 'supports_image_input' flag)

* fix(cohere_transformation.py): fix linting error

* test(test_proxy_server.py): cleanup test

* test: cleanup test

* fix: fix linting errors
2024-10-25 15:55:56 -07:00
Ishaan Jaff
286f560a28 fix linting 2024-10-25 18:43:00 +04:00
Ishaan Jaff
179e2970d3 Merge pull request #6430 from BerriAI/litellm_allow_internal_user_to_regen_tokens
(admin ui / auth fix) Allow internal user to call /key/{token}/regenerate
2024-10-25 18:27:11 +04:00
Ishaan Jaff
79f0a483ef fix type error 2024-10-25 18:23:40 +04:00
Ishaan Jaff
246ac07b78 fix typing on StandardLoggingMetadata 2024-10-25 10:55:54 +04:00
Krish Dholakia
3b768388fd feat(litellm_pre_call_utils.py): support 'add_user_information_to_llm… (#6390)
* feat(litellm_pre_call_utils.py): support 'add_user_information_to_llm_headers' param

enables passing user info to backend llm (user request for custom vllm server)

* fix(litellm_logging.py): fix linting error
2024-10-24 22:03:16 -07:00
Krish Dholakia
b190b6b825 feat(proxy_server.py): check if views exist on proxy server startup +… (#6360)
* feat(proxy_server.py): check if views exist on proxy server startup + refactor startup event logic to <50 LOC

* refactor(redis_cache.py): use a default cache value when writing to r… (#6358)

* refactor(redis_cache.py): use a default cache value when writing to redis

prevent redis from blowing up in high traffic

* refactor(redis_cache.py): refactor all cache writes to use self.get_ttl

ensures default ttl always used when writing to redis

Prevents redis db from blowing up in prod

* feat(proxy_cli.py): add new 'log_config' cli param (#6352)

* feat(proxy_cli.py): add new 'log_config' cli param

Allows passing logging.conf to uvicorn on startup

* docs(cli.md): add logging conf to uvicorn cli docs

* fix(get_llm_provider_logic.py): fix default api base for litellm_proxy

Fixes https://github.com/BerriAI/litellm/issues/6332

* feat(openai_like/embedding): Add support for jina ai embeddings

Closes https://github.com/BerriAI/litellm/issues/6337

* docs(deploy.md): update entrypoint.sh filepath post-refactor

Fixes outdated docs

* feat(prometheus.py): emit time_to_first_token metric on prometheus

Closes https://github.com/BerriAI/litellm/issues/6334

* fix(prometheus.py): only emit time to first token metric if stream is True

enables more accurate ttft usage

* test: handle vertex api instability

* fix(get_llm_provider_logic.py): fix import

* fix(openai.py): fix deepinfra default api base

* fix(anthropic/transformation.py): remove anthropic beta header (#6361)

* docs(sidebars.js): add jina ai embedding to docs

* docs(sidebars.js): add jina ai to left nav

* bump: version 1.50.1 → 1.50.2

* langfuse use helper for get_langfuse_logging_config

* Refactor: apply early return (#6369)

* (refactor) remove berrispendLogger - unused logging integration  (#6363)

* fix remove berrispendLogger

* remove unused clickhouse logger

* fix docs configs.md

* (fix) standard logging metadata + add unit testing  (#6366)

* fix setting StandardLoggingMetadata

* add unit testing for standard logging metadata

* fix otel logging test

* fix linting

* fix typing

* Revert "(fix) standard logging metadata + add unit testing  (#6366)" (#6381)

This reverts commit 8359cb6fa9.

* add new 35 mode lcard (#6378)

* Add claude 3 5 sonnet 20241022 models for all provides (#6380)

* Add Claude 3.5 v2 on Amazon Bedrock and Vertex AI.

* added anthropic/claude-3-5-sonnet-20241022

* add new 35 mode lcard

---------

Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: lowjiansheng <15527690+lowjiansheng@users.noreply.github.com>

* test(skip-flaky-google-context-caching-test): google is not reliable. their sample code is also not working

* test(test_alangfuse.py): handle flaky langfuse test better

* (feat) Arize - Allow using Arize HTTP endpoint  (#6364)

* arize use helper for get_arize_opentelemetry_config

* use helper to get Arize OTEL config

* arize add helpers for arize

* docs allow using arize http endpoint

* fix importing OTEL for Arize

* use static methods for ArizeLogger

* fix ArizeLogger tests

* Litellm dev 10 22 2024 (#6384)

* fix(utils.py): add 'disallowed_special' for token counting on .encode()

Fixes error when '<
endoftext
>' in string

* Revert "(fix) standard logging metadata + add unit testing  (#6366)" (#6381)

This reverts commit 8359cb6fa9.

* add new 35 mode lcard (#6378)

* Add claude 3 5 sonnet 20241022 models for all provides (#6380)

* Add Claude 3.5 v2 on Amazon Bedrock and Vertex AI.

* added anthropic/claude-3-5-sonnet-20241022

* add new 35 mode lcard

---------

Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: lowjiansheng <15527690+lowjiansheng@users.noreply.github.com>

* test(skip-flaky-google-context-caching-test): google is not reliable. their sample code is also not working

* Fix metadata being overwritten in speech() (#6295)

* fix: adding missing redis cluster kwargs (#6318)

Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>

* Add support for `max_completion_tokens` in Azure OpenAI (#6376)

Now that Azure supports `max_completion_tokens`, no need for special handling for this param and let it pass thru. More details: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure#api-support

* build(model_prices_and_context_window.json): add voyage-finance-2 pricing

Closes https://github.com/BerriAI/litellm/issues/6371

* build(model_prices_and_context_window.json): fix llama3.1 pricing model name on map

Closes https://github.com/BerriAI/litellm/issues/6310

* feat(realtime_streaming.py): just log specific events

Closes https://github.com/BerriAI/litellm/issues/6267

* fix(utils.py): more robust checking if unmapped vertex anthropic model belongs to that family of models

Fixes https://github.com/BerriAI/litellm/issues/6383

* Fix Ollama stream handling for tool calls with None content (#6155)

* test(test_max_completions): update test now that azure supports 'max_completion_tokens'

* fix(handler.py): fix linting error

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: John HU <hszqqq12@gmail.com>
Co-authored-by: Ali Arian <113945203+ali-arian@users.noreply.github.com>
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
Co-authored-by: Anand Taralika <46954145+taralika@users.noreply.github.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* bump: version 1.50.2 → 1.50.3

* build(deps): bump http-proxy-middleware in /docs/my-website (#6395)

Bumps [http-proxy-middleware](https://github.com/chimurai/http-proxy-middleware) from 2.0.6 to 2.0.7.
- [Release notes](https://github.com/chimurai/http-proxy-middleware/releases)
- [Changelog](https://github.com/chimurai/http-proxy-middleware/blob/v2.0.7/CHANGELOG.md)
- [Commits](https://github.com/chimurai/http-proxy-middleware/compare/v2.0.6...v2.0.7)

---
updated-dependencies:
- dependency-name: http-proxy-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* (docs + testing) Correctly document the timeout value used by litellm proxy is 6000 seconds + add to best practices for prod  (#6339)

* fix docs use documented timeout

* document request timeout

* add test for litellm.request_timeout

* add test for checking value of timeout

* (refactor) move convert dict to model response to llm_response_utils/ (#6393)

* refactor move convert dict to model response

* fix imports

* fix import _handle_invalid_parallel_tool_calls

* (refactor) litellm.Router client initialization utils  (#6394)

* refactor InitalizeOpenAISDKClient

* use helper func for _should_create_openai_sdk_client_for_model

* use static methods for set client on litellm router

* reduce LOC in _get_client_initialization_params

* fix _should_create_openai_sdk_client_for_model

* code quality fix

* test test_should_create_openai_sdk_client_for_model

* test test_get_client_initialization_params_openai

* fix mypy linting errors

* fix OpenAISDKClientInitializationParams

* test_get_client_initialization_params_all_env_vars

* test_get_client_initialization_params_azure_ai_studio_mistral

* test_get_client_initialization_params_default_values

* fix _get_client_initialization_params

* (fix) Langfuse key based logging  (#6372)

* langfuse use helper for get_langfuse_logging_config

* fix get_langfuse_logger_for_request

* fix import

* fix get_langfuse_logger_for_request

* test_get_langfuse_logger_for_request_with_dynamic_params

* unit testing for test_get_langfuse_logger_for_request_with_no_dynamic_params

* parameterized langfuse testing

* fix langfuse test

* fix langfuse logging

* fix test_aaalangfuse_logging_metadata

* fix langfuse log metadata test

* fix langfuse logger

* use create_langfuse_logger_from_credentials

* fix test_get_langfuse_logger_for_request_with_no_dynamic_params

* fix correct langfuse/ folder structure

* use static methods for langfuse logger

* add commment on langfuse handler

* fix linting error

* add unit testing for langfuse logging

* fix linting

* fix failure handler langfuse

* Revert "(refactor) litellm.Router client initialization utils  (#6394)" (#6403)

This reverts commit b70147f63b.

* def test_text_completion_with_echo(stream): (#6401)

test

* fix linting - remove # noqa PLR0915 from fixed function

* test: cleanup codestral tests - backend api unavailable

* (refactor) prometheus async_log_success_event to be under 100 LOC  (#6416)

* unit testig for prometheus

* unit testing for success metrics

* use 1 helper for _increment_token_metrics

* use helper for _increment_remaining_budget_metrics

* use _increment_remaining_budget_metrics

* use _increment_top_level_request_and_spend_metrics

* use helper for _set_latency_metrics

* remove noqa violation

* fix test prometheus

* test prometheus

* unit testing for all prometheus helper functions

* fix prom unit tests

* fix unit tests prometheus

* fix unit test prom

* (refactor) router - use static methods for client init utils  (#6420)

* use InitalizeOpenAISDKClient

* use InitalizeOpenAISDKClient static method

* fix  # noqa: PLR0915

* (code cleanup) remove unused and undocumented logging integrations - litedebugger, berrispend  (#6406)

* code cleanup remove unused and undocumented code files

* fix unused logging integrations cleanup

* bump: version 1.50.3 → 1.50.4

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Hakan Taşköprü <Haknt@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: John HU <hszqqq12@gmail.com>
Co-authored-by: Ali Arian <113945203+ali-arian@users.noreply.github.com>
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
Co-authored-by: Anand Taralika <46954145+taralika@users.noreply.github.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-24 22:02:15 -07:00
Krish Dholakia
6de876e72f LiteLLM Minor Fixes & Improvements (10/23/2024) (#6407)
* docs(bedrock.md): clarify bedrock auth in litellm docs

* fix(convert_dict_to_response.py): Fixes https://github.com/BerriAI/litellm/issues/6387

* feat(pattern_match_deployments.py): more robust handling for wildcard routes (model_name: custom_route/* -> openai/*)

Enables user to expose custom routes to users with dynamic handling

* test: add more testing

* docs(custom_pricing.md): add debug tutorial for custom pricing

* test: skip codestral test - unreachable backend

* test: fix test

* fix(pattern_matching_deployments.py): fix typing

* test: cleanup codestral tests - backend api unavailable

* (refactor) prometheus async_log_success_event to be under 100 LOC  (#6416)

* unit testig for prometheus

* unit testing for success metrics

* use 1 helper for _increment_token_metrics

* use helper for _increment_remaining_budget_metrics

* use _increment_remaining_budget_metrics

* use _increment_top_level_request_and_spend_metrics

* use helper for _set_latency_metrics

* remove noqa violation

* fix test prometheus

* test prometheus

* unit testing for all prometheus helper functions

* fix prom unit tests

* fix unit tests prometheus

* fix unit test prom

* (refactor) router - use static methods for client init utils  (#6420)

* use InitalizeOpenAISDKClient

* use InitalizeOpenAISDKClient static method

* fix  # noqa: PLR0915

* (code cleanup) remove unused and undocumented logging integrations - litedebugger, berrispend  (#6406)

* code cleanup remove unused and undocumented code files

* fix unused logging integrations cleanup

* bump: version 1.50.3 → 1.50.4

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-10-24 19:01:41 -07:00
Krish Dholakia
ee3c5b645e feat(litellm_logging.py): refactor standard_logging_payload function … (#6388)
* feat(litellm_logging.py): refactor standard_logging_payload function to be <50 LOC

fixes issue where usage information was not following typed values

* fix(litellm_logging.py): fix completion start time handling
2024-10-24 18:59:01 -07:00
Ishaan Jaff
a4b2763fa6 (code cleanup) remove unused and undocumented logging integrations - litedebugger, berrispend (#6406)
* code cleanup remove unused and undocumented code files

* fix unused logging integrations cleanup
2024-10-24 19:27:50 +04:00
Ishaan Jaff
a74a07e459 (fix) Langfuse key based logging (#6372)
* langfuse use helper for get_langfuse_logging_config

* fix get_langfuse_logger_for_request

* fix import

* fix get_langfuse_logger_for_request

* test_get_langfuse_logger_for_request_with_dynamic_params

* unit testing for test_get_langfuse_logger_for_request_with_no_dynamic_params

* parameterized langfuse testing

* fix langfuse test

* fix langfuse logging

* fix test_aaalangfuse_logging_metadata

* fix langfuse log metadata test

* fix langfuse logger

* use create_langfuse_logger_from_credentials

* fix test_get_langfuse_logger_for_request_with_no_dynamic_params

* fix correct langfuse/ folder structure

* use static methods for langfuse logger

* add commment on langfuse handler

* fix linting error

* add unit testing for langfuse logging

* fix linting

* fix failure handler langfuse
2024-10-23 18:24:22 +05:30
Ishaan Jaff
f45393ac0a (refactor) move convert dict to model response to llm_response_utils/ (#6393)
* refactor move convert dict to model response

* fix imports

* fix import _handle_invalid_parallel_tool_calls
2024-10-23 17:32:14 +05:30
Krish Dholakia
1113cb7a2a Litellm dev 10 22 2024 (#6384)
* fix(utils.py): add 'disallowed_special' for token counting on .encode()

Fixes error when '<
endoftext
>' in string

* Revert "(fix) standard logging metadata + add unit testing  (#6366)" (#6381)

This reverts commit 8359cb6fa9.

* add new 35 mode lcard (#6378)

* Add claude 3 5 sonnet 20241022 models for all provides (#6380)

* Add Claude 3.5 v2 on Amazon Bedrock and Vertex AI.

* added anthropic/claude-3-5-sonnet-20241022

* add new 35 mode lcard

---------

Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: lowjiansheng <15527690+lowjiansheng@users.noreply.github.com>

* test(skip-flaky-google-context-caching-test): google is not reliable. their sample code is also not working

* Fix metadata being overwritten in speech() (#6295)

* fix: adding missing redis cluster kwargs (#6318)

Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>

* Add support for `max_completion_tokens` in Azure OpenAI (#6376)

Now that Azure supports `max_completion_tokens`, no need for special handling for this param and let it pass thru. More details: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure#api-support

* build(model_prices_and_context_window.json): add voyage-finance-2 pricing

Closes https://github.com/BerriAI/litellm/issues/6371

* build(model_prices_and_context_window.json): fix llama3.1 pricing model name on map

Closes https://github.com/BerriAI/litellm/issues/6310

* feat(realtime_streaming.py): just log specific events

Closes https://github.com/BerriAI/litellm/issues/6267

* fix(utils.py): more robust checking if unmapped vertex anthropic model belongs to that family of models

Fixes https://github.com/BerriAI/litellm/issues/6383

* Fix Ollama stream handling for tool calls with None content (#6155)

* test(test_max_completions): update test now that azure supports 'max_completion_tokens'

* fix(handler.py): fix linting error

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: John HU <hszqqq12@gmail.com>
Co-authored-by: Ali Arian <113945203+ali-arian@users.noreply.github.com>
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
Co-authored-by: Anand Taralika <46954145+taralika@users.noreply.github.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
2024-10-22 21:18:54 -07:00
Ishaan Jaff
300ac43025 (feat) Arize - Allow using Arize HTTP endpoint (#6364)
* arize use helper for get_arize_opentelemetry_config

* use helper to get Arize OTEL config

* arize add helpers for arize

* docs allow using arize http endpoint

* fix importing OTEL for Arize

* use static methods for ArizeLogger

* fix ArizeLogger tests
2024-10-23 09:38:35 +05:30