Commit graph

652 commits

Author SHA1 Message Date
Krish Dholakia
2fc6262675
fix(route_llm_request.py): move to using common router, even for clie… (#8966)
* fix(route_llm_request.py): move to using common router, even for client-side credentials

ensures fallbacks / cooldown logic still works

* test(test_route_llm_request.py): add unit test for route request

* feat(router.py): generate unique model id when clientside credential passed in

Prevents cooldowns for api key 1 from impacting api key 2

* test(test_router.py): update testing to ensure original litellm params not mutated

* fix(router.py): upsert clientside call into llm router model list

enables cooldown logic to work accurately

* fix: fix linting error

* test(test_router_utils.py): add direct test for new util on router
2025-03-03 22:57:08 -08:00
Krrish Dholakia
4418e6dd14 build: merge branch 2025-03-02 08:31:57 -08:00
Ishaan Jaff
300d7825f5
(Observability) - Add more detailed dd tracing on Proxy Auth, Bedrock Auth (#8693)
* add dd tracer

* fix dd tracing

* add @tracer.wrap() on def user_api_key_auth

* add async_function_with_retries

* remove dead code

* add tracer.wrap on base aws llm

* add tracer.wrap on base aws llm

* fix print verbose

* fix dd tracing

* trace base aws llm

* fix test base aws llm

* fix converse transform

* test base aws llm

* BASE_AWS_LLM_PATH

* BASE_AWS_LLM_PATH

* test dd tracing
2025-02-20 18:00:41 -08:00
Ishaan Jaff
b88762b63c
(Polish/Fixes) - Fixes for Adding Team Specific Models (#8645)
* refactor get model info for team models

* allow adding a model to a team when creating team specific model

* ui update selected Team on Team Dropdown

* test_team_model_association

* testing for team specific models

* test_get_team_specific_model

* test: skip on internal server error

* remove model alias card on teams page

* linting fix _get_team_specific_model

* fix DeploymentTypedDict

* fix linting error

* fix code quality

* fix model info checks

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
2025-02-18 21:11:57 -08:00
Krish Dholakia
2340f1b31f
Pass router tags in request headers - x-litellm-tags (#8609)
* feat(litellm_pre_call_utils.py): support `x-litellm-tags` request header

allow tag based routing + spend tracking via request headers

* docs(request_headers.md): document new `x-litellm-tags` for tag based routing and spend tracking

* docs(tag_routing.md): add to docs

* fix(utils.py): only pass str values for openai metadata param

* fix(utils.py): drop non-str values for metadata param to openai

preview-feature, otel span was being sent in
2025-02-18 08:26:22 -08:00
Ishaan Jaff
8024300825
(UI) Improvements to Add Team Model Flow (#8603)
* ui - use common team dropdown component

* re-use team component

* rename org field on add model

* handle add model submit

* working view model_id and team_id on root models page

* cleaner

* show all fields

* working model info view

* working team info selector

* clean up team id

* new component for model dashboard

* ui show table with dropdown

* make public model names like email

* revert changes to litellm model name

* fix litellm model name

* ui fix public model

* fix mappings

* fix conditional text input

* fix message

* ui fix bulk add models

* _add_team_model_to_db

* move model mgmt helper funcs

* test_add_team_model_to_db

* ui - display model team model name

* fix add model tab

* fix remove redundant info tab on models page

* dont pass model mappings all the way through

* fix jarring model name when adding team models

* fix edit model button

* delete button on model info

* ui fix model dashboard

* fix DeploymentTypedDict

* _is_model_access_group_for_wildcard_route

* test _get_public_model_name

* ui fix viewing public model name

* fix linting error

* fix linting errors

* fix selectedModel logic
2025-02-17 18:37:14 -08:00
Ishaan Jaff
6b3bfa2b42
(Feat) - return x-litellm-attempted-fallbacks in responses from litellm proxy (#8558)
* add_fallback_headers_to_response

* test x-litellm-attempted-fallbacks

* unit test attempted fallbacks

* fix add_fallback_headers_to_response

* docs document response headers

* fix file name
2025-02-15 14:54:23 -08:00
Krish Dholakia
a9276f27f9
fix(main.py): fix key leak error when unknown provider given (#8556)
* fix(main.py): fix key leak error when unknown provider given

don't return passed in args if unknown route on embedding

* fix(main.py): remove instances of {args} being passed in exception

prevent potential key leaks

* test(code_coverage/prevent_key_leaks_in_codebase.py): ban usage of {args} in codebase

* fix: fix linting errors

* fix: remove unused variable
2025-02-15 14:02:55 -08:00
Krish Dholakia
f5841eb84d
fix(router.py): add more deployment timeout debug information for tim… (#8523)
* fix(router.py): add more deployment timeout debug information for timeout errors

help understand why some calls in high-traffic don't respect their model-specific timeouts

* test(test_convert_dict_to_response.py): unit test ensuring empty str is not converted to None

Addresses https://github.com/BerriAI/litellm/issues/8507

* fix(convert_dict_to_response.py): handle empty message str - don't return back as 'None'

Fixes https://github.com/BerriAI/litellm/issues/8507

* test(test_completion.py): add e2e test
2025-02-13 17:10:22 -08:00
Krish Dholakia
e26d7df91b
Litellm dev 02 10 2025 p2 (#8443)
* Fixed issue #8246 (#8250)

* Fixed issue #8246

* Added unit tests for discard() and for remove_callback_from_list_by_object()

* fix(openai.py): support dynamic passing of organization param to openai

handles scenario where client-side org id is passed to openai

---------

Co-authored-by: Erez Hadad <erezh@il.ibm.com>
2025-02-10 17:53:46 -08:00
Krish Dholakia
5d170162d3
fix(nvidia_nim/embed.py): add 'dimensions' support (#8302)
* fix(nvidia_nim/embed.py): add 'dimensions' support

Fixes https://github.com/BerriAI/litellm/issues/8238

* fix(proxy_Server.py): initialize router redis cache if setup on proxy

Fixes https://github.com/BerriAI/litellm/issues/6602

* test: add unit testing for new helper function
2025-02-07 16:19:32 -08:00
Ishaan Jaff
65c91cbbbc
(QA+UI) - e2e flow for adding assembly ai passthrough endpoints (#8337)
* add initial test for assembly ai

* start using PassthroughEndpointRouter

* migrate to lllm passthrough endpoints

* add assembly ai as a known provider

* fix PassthroughEndpointRouter

* fix set_pass_through_credentials

* working EU request to assembly ai pass through endpoint

* add e2e test assembly

* test_assemblyai_routes_with_bad_api_key

* clean up pass through endpoint router

* e2e testing for assembly ai pass through

* test assembly ai e2e testing

* delete assembly ai models

* fix code quality

* ui working assembly ai api base flow

* fix install assembly ai

* update model call details with kwargs for pass through logging

* fix tracking assembly ai model in response

* _handle_assemblyai_passthrough_logging

* fix test_initialize_deployment_for_pass_through_unsupported_provider

* TestPassthroughEndpointRouter

* _get_assembly_transcript

* fix assembly ai pt logging tests

* fix assemblyai_proxy_route

* fix _get_assembly_region_from_url
2025-02-06 18:27:54 -08:00
Ishaan Jaff
b535c9bdc0
(Bug Fix - Langfuse) - fix for when model response has choices=[] (#8339)
* refactor _get_langfuse_input_output_content

* test_langfuse_logging_completion_with_malformed_llm_response

* fix _get_langfuse_input_output_content

* fixes for langfuse linting

* unit testing for get chat/text content for langfuse

* fix _should_raise_content_policy_error
2025-02-06 18:02:26 -08:00
Krish Dholakia
69a6da4727
Litellm dev 01 30 2025 p2 (#8134)
* feat(lowest_tpm_rpm_v2.py): fix redis cache check to use >= instead of >

makes it consistent

* test(test_custom_guardrails.py): add more unit testing on default on guardrails

ensure it runs if user sent guardrail list is empty

* docs(quick_start.md): clarify default on guardrails run even if user guardrails list contains other guardrails

* refactor(litellm_logging.py): refactor no-log to helper util

allows for more consistent behavior

* feat(litellm_logging.py): add event hook to verbose logs

* fix(litellm_logging.py): add unit testing to ensure `litellm.disable_no_log_param` is respected

* docs(logging.md): document how to disable 'no-log' param

* test: fix test to handle feb

* test: cleanup old bedrock model

* fix: fix router check
2025-01-30 22:18:53 -08:00
Ishaan Jaff
8a235e7d38
(Refactor / QA) - Use LoggingCallbackManager to append callbacks and ensure no duplicate callbacks are added (#8112)
* LoggingCallbackManager

* add logging_callback_manager

* use logging_callback_manager

* add add_litellm_failure_callback

* use add_litellm_callback

* use add_litellm_async_success_callback

* add_litellm_async_failure_callback

* linting fix

* fix logging callback manager

* test_duplicate_multiple_loggers_test

* use _reset_all_callbacks

* fix testing with dup callbacks

* test_basic_image_generation

* reset callbacks for tests

* fix check for _add_custom_logger_to_list

* fix test_amazing_sync_embedding

* fix _get_custom_logger_key

* fix batches testing

* fix _reset_all_callbacks

* fix _check_callback_list_size

* add callback_manager_test

* fix test gemini-2.0-flash-thinking-exp-01-21
2025-01-30 19:35:50 -08:00
Ishaan Jaff
b6d61ec22b
(Feat) pass through vertex - allow using credentials defined on litellm router for vertex pass through (#8100)
* test_add_vertex_pass_through_deployment

* VertexPassThroughRouter

* fix use_in_pass_through

* VertexPassThroughRouter

* fix vertex_credentials

* allow using _initialize_deployment_for_pass_through

* test_add_vertex_pass_through_deployment

* _set_default_vertex_config

* fix verbose_proxy_logger

* fix use_in_pass_through

* fix _get_token_and_url

* test_get_vertex_location_from_url

* test_get_vertex_credentials_none

* run pt unit testing again

* fix add_vertex_credentials

* test_adding_deployments.py

* rename file
2025-01-29 17:54:02 -08:00
Krish Dholakia
513b1904ab
Add attempted-retries and timeout values to response headers + more testing (#7926)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 14s
* feat(router.py): add retry headers to response

makes it easy to add testing to ensure model-specific retries are respected

* fix(add_retry_headers.py): clarify attempted retries vs. max retries

* test(test_fallbacks.py): add test for checking if max retries set for model is respected

* test(test_fallbacks.py): assert values for attempted retries and max retries are as expected

* fix(utils.py): return timeout in litellm proxy response headers

* test(test_fallbacks.py): add test to assert model specific timeout used on timeout error

* test: add bad model with timeout to proxy

* fix: fix linting error

* fix(router.py): fix get model list from model alias

* test: loosen test restriction - account for other events on proxy
2025-01-22 22:19:44 -08:00
Krish Dholakia
64e1df1f14
Litellm dev 01 20 2025 p3 (#7890)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(router.py): pass stream timeout correctly for non openai / azure models

Fixes https://github.com/BerriAI/litellm/issues/7870

* test(test_router_timeout.py): add test for streaming

* test(test_router_timeout.py): add unit testing for new router functions

* docs(ollama.md): link to section on calling ollama within docker container

* test: remove redundant test

* test: fix test to include timeout value

* docs(config_settings.md): document new router settings param
2025-01-20 21:46:36 -08:00
Krish Dholakia
4b23420a20
Litellm dev 01 20 2025 p1 (#7884)
* fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out

* feat(openai.py): return timeout value + time taken on openai timeout errors

helps debug timeout errors

* fix(utils.py): fix num retries extraction logic when num_retries = 0

* fix(config_settings.md): litellm_logging.py

support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true

 Enables easier debug

* test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check

* fix(litellm_logging.py): fix linting error
2025-01-20 21:45:48 -08:00
Krish Dholakia
1bea338597
LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826)
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811

* fix(router.py): fix mock timeout check

* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)

This error happens if you provide multiple fallback models to the completion function with model name defined in each one.

* fix(router.py): remove mock_timeout before sending to request

prevents reuse in fallbacks

* test: update test

* test: revert test change - wrong pr

---------

Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
2025-01-17 20:59:21 -08:00
Krish Dholakia
80f7af510b
Improve Proxy Resiliency: Cooldown single-deployment model groups if 100% calls failed in high traffic (#7823)
* refactor(_is_cooldown_required): move '_is_cooldown_required' into cooldown_handlers.py

* refactor(cooldown_handlers.py): move cooldown constants into `.constants.py`

* fix(cooldown_handlers.py): remove if single deployment don't cooldown logic

move to traffic based cooldown logic

Addresses https://github.com/BerriAI/litellm/issues/7822

* fix: add unit tests for '_should_cooldown_deployment'

* test: ensure all tests pass

* test: update test

* fix(cooldown_handlers.py): don't cooldown single deployment models for anything besides traffic related errors

* fix(cooldown_handlers.py): fix cooldown handler logic

* fix(cooldown_handlers.py): fix check
2025-01-17 20:17:02 -08:00
Ishaan Jaff
f1335362cf
(core sdk fix) - fix fallbacks stuck in infinite loop (#7751)
* test_acompletion_fallbacks_basic

* use common run_async_function

* fix completion_with_fallbacks

* fix completion with fallbacks

* fix fallback utils

* test_acompletion_fallbacks_basic

* test_completion_fallbacks_sync

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 19:34:34 -08:00
Ishaan Jaff
15b52039d2
(litellm sdk speedup router) - adds a helper _cached_get_model_group_info to use when trying to get deployment tpm/rpm limits (#7719)
* fix _cached_get_model_group_info

* fixes get_remaining_model_group_usage

* test_cached_get_model_group_info
2025-01-12 15:14:54 -08:00
Krish Dholakia
27892acdfc
Litellm dev 01 10 2025 p3 (#7682)
* feat(langfuse.py): log the used prompt when prompt management used

* test: fix test

* docs(self_serve.md): add doc on restricting personal key creation on ui

* feat(s3.py): support s3 logging with team alias prefixes (if available)

New preview feature

* fix(main.py): remove old if block - simplify to just await if coroutine returned

fixes lm_studio async embedding error

* fix(langfuse.py): handle get prompt check
2025-01-10 21:56:42 -08:00
Krish Dholakia
c10ae8879e
fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… (#7660)
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url

* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic

deduplicates code

* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages

* docs(prompt_management.md): update prompt management to be in beta

given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)

* refactor(prompt_management_base.py): introduce base class for prompt management

allows consistent behaviour across prompt management integrations

* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base

* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set

allows tracking what prompt was used for what purpose

* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse

allows logging prompt id / prompt variables to langfuse

* test: fix test

* fix(router.py): cleanup unused imports

* fix: fix linting error

* fix: fix trace param typing

* fix: fix linting errors

* fix: fix code qa check
2025-01-10 07:31:59 -08:00
Krish Dholakia
07c5f136f1
fix(utils.py): fix select tokenizer for custom tokenizer (#7599)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 36s
* fix(utils.py): fix select tokenizer for custom tokenizer

* fix(router.py): fix 'utils/token_counter' endpoint
2025-01-07 22:37:09 -08:00
Krish Dholakia
fef7839e8a
Litellm dev 01 06 2025 p1 (#7594)
* fix(custom_logger.py): expose new 'async_get_chat_completion_prompt' event hook

* fix(custom_logger.py): langfuse_prompt_management.py

remove 'headers' from custom logger 'async_get_chat_completion_prompt' and 'get_chat_completion_prompt' event hooks

* feat(router.py): expose new function for prompt management based routing

* feat(router.py): partial working router prompt factory logic

allows load balanced model to be used for model name w/ langfuse prompt management call

* feat(router.py): fix prompt management with load balanced model group

* feat(langfuse_prompt_management.py): support reading in openai params from langfuse

enables user to define optional params on langfuse vs. client code

* test(test_Router.py): add unit test for router based langfuse prompt management

* fix: fix linting errors
2025-01-06 21:26:21 -08:00
Krish Dholakia
d43d83f9ef
feat(router.py): support request prioritization for text completion c… (#7540)
* feat(router.py): support request prioritization for text completion calls

* fix(internal_user_endpoints.py): fix sql query to return all keys, including null team id keys on `/user/info`

Fixes https://github.com/BerriAI/litellm/issues/7485

* fix: fix linting errors

* fix: fix linting error

* test(test_router_helper_utils.py): add direct test for '_schedule_factory'

Fixes code qa test
2025-01-03 19:35:44 -08:00
Krish Dholakia
347779b813
Litellm dev 12 30 2024 p1 (#7480)
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model

* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests

skip as o1 on azure doesn't support tool calling yet

* fix: initial commit of azure o1 handler using openai caller

simplifies calling + allows fake streaming logic alr. implemented for openai to just work

* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models

azure does not currently support streaming for o1

* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info

enables user to toggle on when azure allows o1 streaming without needing to bump versions

* style(router.py): remove 'give feedback/get help' messaging when router is used

Prevents noisy messaging

Closes https://github.com/BerriAI/litellm/issues/5942

* test: fix azure o1 test

* test: fix tests

* fix: fix test
2024-12-30 21:52:52 -08:00
Krish Dholakia
cfb6890b9f
Litellm dev 12 28 2024 p2 (#7458)
* docs(sidebar.js): docs for support model access groups for wildcard routes

* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route

* refactor(docs/): make control model access a root-level doc in proxy sidebar

easier to discover how to control model access on litellm

* docs: more cleanup

* feat(fireworks_ai/): add document inlining support

Enables user to call non-vision models with images/pdfs/etc.

* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util

* docs(docs/): add document inlining details to fireworks ai docs

* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline

allows client-side disabling of this feature for proxy users

* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models

now true as fireworks ai supports document inlining

* test: fix tests

* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
2024-12-28 19:38:06 -08:00
Ishaan Jaff
5e8c64f128
(Bug fix) missing model_group field in logs for aspeech call types (#7392)
* fix use _update_kwargs_before_fallbacks

* test assert standard_logging_object includes model_group

* test_datadog_non_serializable_messages

* update test
2024-12-27 17:00:39 -08:00
Krish Dholakia
539f166166
Support budget/rate limit tiers for keys (#7429)
* feat(proxy/utils.py): get associated litellm budget from db in combined_view for key

allows user to create rate limit tiers and associate those to keys

* feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set

allows rate limit tiers to be easily applied to keys

* docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers

make feature discoverable

* feat(key_management_endpoints.py): return litellm_budget_table value in key generate

make it easy for user to know associated budget on key creation

* fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate`

* docs(key_management_endpoints.py): document budget_id usage

* refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it

* docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs

* fix(customer_endpoints.py): use new pydantic obj name

* docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm

* Litellm dev 12 26 2024 p2 (#7432)

* (Feat) Add logging for `POST v1/fine_tuning/jobs`  (#7426)

* init commit ft jobs logging

* add ft logging

* add logging for FineTuningJob

* simple FT Job create test

* (docs) - show all supported Azure OpenAI endpoints in overview  (#7428)

* azure batches

* update doc

* docs azure endpoints

* docs endpoints on azure

* docs azure batches api

* docs azure batches api

* fix(key_management_endpoints.py): fix key update to actually work

* test(test_key_management.py): add e2e test asserting ui key update call works

* fix: proxy/_types - fix linting erros

* test: update test

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* fix: test

* fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers

* fix: fix linting errors

* test: fix test

* fix: remove unused import

* test: update test

* docs(customer_endpoints.py): document new model_max_budget param

* test: specify unique key alias

* docs(budget_management_endpoints.py): document new model_max_budget param

* test: fix test

* test: fix tests

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-12-26 19:05:27 -08:00
Krish Dholakia
48316520f4
LiteLLM Minor Fixes & Improvements (12/23/2024) - P2 (#7386)
* fix(main.py): support 'mock_timeout=true' param

allows mock requests on proxy to have a time delay, for testing

* fix(main.py): ensure mock timeouts raise litellm.Timeout error

triggers retry/fallbacks

* fix: fix fallback + mock timeout testing

* fix(router.py): always return remaining tpm/rpm limits, if limits are known

allows for rate limit headers to be guaranteed

* docs(timeout.md): add docs on mock timeout = true

* fix(main.py): fix linting errors

* test: fix test
2024-12-23 17:41:27 -08:00
Krish Dholakia
70a9ea99f2
Controll fallback prompts client-side (#7334)
* feat(router.py): support passing model-specific messages in fallbacks

* docs(routing.md): separate router timeouts into separate doc

allow for 1 fallbacks doc (across proxy/router)

* docs(routing.md): cleanup router docs

* docs(reliability.md): cleanup docs

* docs(reliability.md): cleaned up fallback doc

just have 1 doc across sdk/proxy

simplifies docs

* docs(reliability.md): add setting model-specific fallback prompts

* fix: fix linting errors

* test: skip test causing openai rate limit errros

* test: fix test

* test: run vertex test first to catch error
2024-12-20 19:09:53 -08:00
Ishaan Jaff
c7f14e936a
(code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Ishaan Jaff
6261ec3599
(feat proxy) v2 - model max budgets (#7302)
* clean up unused code

* add _PROXY_VirtualKeyModelMaxBudgetLimiter

* adjust type imports

* working _PROXY_VirtualKeyModelMaxBudgetLimiter

* fix user_api_key_model_max_budget

* fix user_api_key_model_max_budget

* update naming

* update naming

* fix changes to RouterBudgetLimiting

* test_call_with_key_over_model_budget

* test_call_with_key_over_model_budget

* handle _get_request_model_budget_config

* e2e test for test_call_with_key_over_model_budget

* clean up test

* run ci/cd again

* add validate_model_max_budget

* docs fix

* update doc

* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter

* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
2f08341a08
Litellm dev readd prompt caching (#7299)
* fix(router.py): re-add saving model id on prompt caching valid successful deployment

* fix(router.py): introduce optional pre_call_checks

isolate prompt caching logic in a separate file

* fix(prompt_caching_deployment_check.py): fix import

* fix(router.py): new 'async_filter_deployments' event hook

allows custom logger to filter deployments returned to routing strategy

* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing

* fix(cooldown_callbacks.py): fix linting error

* fix(budget_limiter.py): move budget logger to async_filter_deployment hook

* test: add unit test

* test(test_router_helper_utils.py): add unit testing

* fix(budget_limiter.py): fix linting errors

* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Krish Dholakia
0fe8bfe87a
fix(proxy_server.py): pass model access groups to get_key/get_team mo… (#7281)
* fix(proxy_server.py): pass model access groups to get_key/get_team models

allows end user to see actual models they have access to, instead of default models

* fix(auth_checks.py): fix linting errors

* fix: fix linting errors
2024-12-18 09:33:33 -08:00
Krish Dholakia
9494733a4b
Litellm dev 12 14 2024 p1 (#7231)
* fix(router.py): fix reading + using deployment-specific num retries on router

Fixes https://github.com/BerriAI/litellm/issues/7001

* fix(router.py): ensure 'timeout' in litellm_params overrides any value in router settings

Refactors all routes to use common '_update_kwargs_with_deployment' which has the timeout handling

* fix(router.py): fix timeout check
2024-12-14 22:22:29 -08:00
Krish Dholakia
516c2a6a70
Litellm remove circular imports (#7232)
* fix(utils.py): initial commit to remove circular imports - moves llmproviders to utils.py

* fix(router.py): fix 'litellm.EmbeddingResponse' import from router.py

'

* refactor: fix litellm.ModelResponse import on pass through endpoints

* refactor(litellm_logging.py): fix circular import for custom callbacks literal

* fix(factory.py): fix circular imports inside prompt factory

* fix(cost_calculator.py): fix circular import for 'litellm.Usage'

* fix(proxy_server.py): fix potential circular import with `litellm.Router'

* fix(proxy/utils.py): fix potential circular import in `litellm.Router`

* fix: remove circular imports in 'auth_checks' and 'guardrails/'

* fix(prompt_injection_detection.py): fix router impor t

* fix(vertex_passthrough_logging_handler.py): fix potential circular imports in vertex pass through

* fix(anthropic_pass_through_logging_handler.py): fix potential circular imports

* fix(slack_alerting.py-+-ollama_chat.py): fix modelresponse import

* fix(base.py): fix potential circular import

* fix(handler.py): fix potential circular ref in codestral + cohere handler's

* fix(azure.py): fix potential circular imports

* fix(gpt_transformation.py): fix modelresponse import

* fix(litellm_logging.py): add logging base class - simplify typing

makes it easy for other files to type check the logging obj without introducing circular imports

* fix(azure_ai/embed): fix potential circular import on handler.py

* fix(databricks/): fix potential circular imports in databricks/

* fix(vertex_ai/): fix potential circular imports on vertex ai embeddings

* fix(vertex_ai/image_gen): fix import

* fix(watsonx-+-bedrock): cleanup imports

* refactor(anthropic-pass-through-+-petals): cleanup imports

* refactor(huggingface/): cleanup imports

* fix(ollama-+-clarifai): cleanup circular imports

* fix(openai_like/): fix impor t

* fix(openai_like/): fix embedding handler

cleanup imports

* refactor(openai.py): cleanup imports

* fix(sagemaker/transformation.py): fix import

* ci(config.yml): add circular import test to ci/cd
2024-12-14 16:28:34 -08:00
Ishaan Jaff
163529b40b
(feat - Router / Proxy ) Allow setting budget limits per LLM deployment (#7220)
* fix test_deployment_budget_limits_e2e_test

* refactor async_log_success_event to track spend for provider + deployment

* fix format

* rename class to RouterBudgetLimiting

* rename func

* rename types used for budgets

* add new types for deployment budgets

* add budget limits for deployments

* fix checking budgets set for provider

* update file names

* fix linting error

* _track_provider_remaining_budget_prometheus

* async_filter_deployments

* fix model list passed to router

* update error

* test_deployment_budgets_e2e_test_expect_to_fail

* fix test case

* run deployment budget limits
2024-12-13 19:15:51 -08:00
Krish Dholakia
7643b3087c
Litellm dev 12 11 2024 v2 (#7215)
* feat(bedrock/): add bedrock converse top k param

Closes https://github.com/BerriAI/litellm/issues/7087

* Fix bedrock empty content error (#7177)

* add resolver

* handle empty content on bedrock with default content

* use existing default message, tests

* Update tests/llm_translation/test_bedrock_completion.py

* fix tests

* Revert "add resolver"

This reverts commit c717e376ee.

* fallback to empty

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix(factory.py): handle empty content blocks in messages

Fixes https://github.com/BerriAI/litellm/issues/7169

* feat(router.py): add stripped model check to model fallback search

if model_name="openai/gpt-3.5-turbo" and fallback=[{"gpt-3.5-turbo"..}] the fallback should just work as expected

* fix: fix linting error

* fix(factory.py): fix linting error

* fix(factory.py): in base case still support skip empty text blocks

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-12-13 12:49:57 -08:00
Krish Dholakia
e68bb4e051
Litellm dev 12 12 2024 (#7203)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 47s
* fix(azure/): support passing headers to azure openai endpoints

Fixes https://github.com/BerriAI/litellm/issues/6217

* fix(utils.py): move default tokenizer to just openai

hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls

* fix(router.py): fix pattern matching router - add generic "*" to it as well

Fixes issue where generic "*" model access group wouldn't show up

* fix(pattern_match_deployments.py): match to more specific pattern

match to more specific pattern

allows setting generic wildcard model access group and excluding specific models more easily

* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty

don't delete all router models  b/c of empty list

Fixes https://github.com/BerriAI/litellm/issues/7196

* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api

* fix(fireworks_ai/): support passing response_format + tool call in same message

Addresses https://github.com/BerriAI/litellm/issues/7135

* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"

This reverts commit 6a30dc6929.

* test: fix test

* fix(replicate/): fix replicate default retry/polling logic

* test: add unit testing for router pattern matching

* test: update test to use default oai tokenizer

* test: mark flaky test

* test: skip flaky test
2024-12-13 08:54:03 -08:00
Ishaan Jaff
7ff9a905d2
(fix) latency fix - revert prompt caching check on litellm router (#7211)
* attempt to fix latency issue

* fix latency issues for router prompt caching
2024-12-12 20:50:16 -08:00
Krish Dholakia
0c0498dd60
Litellm dev 12 07 2024 (#7086)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
* fix(main.py): support passing max retries to azure/openai embedding integrations

Fixes https://github.com/BerriAI/litellm/issues/7003

* feat(team_endpoints.py): allow updating team model aliases

Closes https://github.com/BerriAI/litellm/issues/6956

* feat(router.py): allow specifying model id as fallback - skips any cooldown check

Allows a default model to be checked if all models in cooldown

s/o @micahjsmith

* docs(reliability.md): add fallback to specific model to docs

* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util

Allows user to identify if messages/tools have prompt caching

Related issue: https://github.com/BerriAI/litellm/issues/6784

* feat(router.py): store model id for prompt caching valid prompt

Allows routing to that model id on subsequent requests

* fix(router.py): only cache if prompt is valid prompt caching prompt

prevents storing unnecessary items in cache

* feat(router.py): support routing prompt caching enabled models to previous deployments

Closes https://github.com/BerriAI/litellm/issues/6784

* test: fix linting errors

* feat(databricks/): convert basemodel to dict and exclude none values

allow passing pydantic message to databricks

* fix(utils.py): ensure all chat completion messages are dict

* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)

* add custom_llm_provider to SpendLogsPayload

* add custom_llm_provider to SpendLogs

* add custom llm provider to SpendLogs payload

* test_spend_logs_payload

* Add MLflow to the side bar (#7031)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* (bug fix) SpendLogs update DB catch all possible DB errors for retrying  (#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure

* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)

* fix model cost map fireworks ai "supports_response_schema": true,

* fix supports_response_schema

* fix map openai params fireworks ai

* test_map_response_format

* test_map_response_format

* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)

* bump: version 1.53.9 → 1.54.0

* fix deepinfra

* litellm db fixes LiteLLM_UserTable (#7089)

* ci/cd queue new release

* fix llama-3.3-70b-versatile

* refactor - use consistent file naming convention `AI21/` -> `ai21`  (#7090)

* fix refactor - use consistent file naming convention

* ci/cd run again

* fix naming structure

* fix use consistent naming (#7092)

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
2024-12-08 00:30:33 -08:00
Ishaan Jaff
36e99ebce7
fix use consistent naming (#7092)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
2024-12-07 22:01:00 -08:00
Krish Dholakia
e4493248ae
Litellm dev 12 06 2024 (#7067)
* fix(edit_budget_modal.tsx): call `/budget/update` endpoint instead of `/budget/new`

allows updating existing budget on ui

* fix(user_api_key_auth.py): support cost tracking for end user via jwt field

* fix(presidio.py): support pii masking on sync logging callbacks

enables masking before logging to langfuse

* feat(utils.py): support retry policy logic inside '.completion()'

Fixes https://github.com/BerriAI/litellm/issues/6623

* fix(utils.py): support retry by retry policy on async logic as well

* fix(handle_jwt.py): set leeway default leeway value

* test: fix test to handle jwt audience claim
2024-12-06 22:44:18 -08:00
Ishaan Jaff
aedf81ef1c fix router test_audio_speech_router 2024-12-05 20:41:44 -08:00
Ishaan Jaff
fb1ec4b6be
(fix) litellm router.aspeech (#6962)
* doc Migrating Databases

* fix aspeech on router

* test_audio_speech_router

* test_audio_speech_router
2024-12-05 13:39:50 -08:00
Krish Dholakia
6bb934c0ac
fix(key_management_endpoints.py): override metadata field value on up… (#7008)
* fix(key_management_endpoints.py): override metadata field value on update

allow user to override tags

* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric

allow disabling end user cost tracking on prometheus - fixes cardinality issue

* fix(litellm_pre_call_utils.py): add key/team level enforced params

Fixes https://github.com/BerriAI/litellm/issues/6652

* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update

* docs(enterprise.md): add docs on enforcing required params for llm requests

* Add support of Galadriel API (#7005)

* fix(router.py): robust retry after handling

set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment

* test(test_router.py): fix test

* feat(bedrock/): add support for 'nova' models

also adds explicit 'converse/' route for simpler routing

* fix: fix 'supports_pdf_input'

return if model supports pdf input on get_model_info

* feat(converse_transformation.py): support bedrock pdf input

* docs(document_understanding.md): add document understanding to docs

* fix(litellm_pre_call_utils.py): fix linting error

* fix(init.py): fix passing of bedrock converse models

* feat(bedrock/converse): support 'response_format={"type": "json_object"}'

* fix(converse_handler.py): fix linting error

* fix(base_llm_unit_tests.py): fix test

* fix: fix test

* test: fix test

* test: fix test

* test: remove duplicate test

---------

Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
2024-12-03 23:03:50 -08:00