Commit graph

355 commits

Author SHA1 Message Date
Ishaan Jaff
00c596a852
(Feat) - Allow viewing Request/Response Logs stored in GCS Bucket (#8449)
* BaseRequestResponseFetchFromCustomLogger

* get_active_base_request_response_fetch_from_custom_logger

* get_request_response_payload

* ui_view_request_response_for_request_id

* fix uiSpendLogDetailsCall

* fix get_request_response_payload

* ui fix RequestViewer

* use 1 class AdditionalLoggingUtils

* ui_view_request_response_for_request_id

* cache the prefetch logs details

* refactor prefetch

* test view request/resp logs

* fix code quality

* fix get_request_response_payload

* uninstall posthog
prevent it from being added in ci/cd

* fix posthog

* fix traceloop test

* fix linting error
2025-02-10 20:38:55 -08:00
Krish Dholakia
e26d7df91b
Litellm dev 02 10 2025 p2 (#8443)
* Fixed issue #8246 (#8250)

* Fixed issue #8246

* Added unit tests for discard() and for remove_callback_from_list_by_object()

* fix(openai.py): support dynamic passing of organization param to openai

handles scenario where client-side org id is passed to openai

---------

Co-authored-by: Erez Hadad <erezh@il.ibm.com>
2025-02-10 17:53:46 -08:00
Krish Dholakia
1dd3713f1a
Anthropic Citations API Support (#8382)
* test(test_anthropic_completion.py): add test ensuring anthropic structured output response is consistent

Resolves https://github.com/BerriAI/litellm/issues/8291

* feat(anthropic.py): support citations api with new user document message format

Resolves https://github.com/BerriAI/litellm/issues/7970

* fix(anthropic/chat/transformation.py): return citations as a provider-specific-field

Resolves https://github.com/BerriAI/litellm/issues/7970

* feat(anthropic/chat/handler.py): add streaming citations support

Resolves https://github.com/BerriAI/litellm/issues/7970

* fix(handler.py): fix code qa error

* fix(handler.py): only set provider specific fields if non-empty dict

* docs(anthropic.md): add citations api to anthropic docs
2025-02-07 22:27:01 -08:00
Krish Dholakia
b5850b6b65
Handle azure deepseek reasoning response (#8288) (#8366)
* Handle azure deepseek reasoning response (#8288)

* Handle deepseek reasoning response

* Add helper method + unit test

* Fix: Follow infinity api url format (#8346)

* Follow infinity api url format

* Update test_infinity.py

* fix(infinity/transformation.py): fix linting error

---------

Co-authored-by: vibhavbhat <vibhavb00@gmail.com>
Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2025-02-07 17:45:51 -08:00
Krish Dholakia
f651d51f26
Litellm dev 02 07 2025 p2 (#8377)
* fix(caching_routes.py): mask redis password on `/cache/ping` route

* fix(caching_routes.py): fix linting erro

* fix(caching_routes.py): fix linting error on caching routes

* fix: fix test - ignore mask_dict - has a breakpoint

* fix(azure.py): add timeout param + elapsed time in azure timeout error

* fix(http_handler.py): add elapsed time to http timeout request

makes it easier to debug how long request took before failing
2025-02-07 17:30:38 -08:00
Ishaan Jaff
b535c9bdc0
(Bug Fix - Langfuse) - fix for when model response has choices=[] (#8339)
* refactor _get_langfuse_input_output_content

* test_langfuse_logging_completion_with_malformed_llm_response

* fix _get_langfuse_input_output_content

* fixes for langfuse linting

* unit testing for get chat/text content for langfuse

* fix _should_raise_content_policy_error
2025-02-06 18:02:26 -08:00
Krish Dholakia
443ae55904
Azure OpenAI improvements - o3 native streaming, improved tool call + response format handling (#8292)
* fix(convert_dict_to_response.py): only convert if response is the response_format tool call passed in

Fixes https://github.com/BerriAI/litellm/issues/8241

* fix(gpt_transformation.py): makes sure response format / tools conversion doesn't remove previous tool calls

* refactor(gpt_transformation.py): refactor out json schema converstion to base config

keeps logic consistent across providers

* fix(o_series_transformation.py): support o3 mini native streaming

Fixes https://github.com/BerriAI/litellm/issues/8274

* fix(gpt_transformation.py): remove unused variables

* test: update test
2025-02-05 19:38:58 -08:00
Krish Dholakia
8d3a942fbd
Litellm staging (#8270)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 15s
* fix(opik.py): cleanup

* docs(opik_integration.md): cleanup opik integration docs

* fix(redact_messages.py): fix redact messages check header logic

ensures stringified bool value in header is still asserted to true

 allows dynamic message redaction

* feat(redact_messages.py): support `x-litellm-enable-message-redaction` request header

allows dynamic message redaction
2025-02-04 22:35:48 -08:00
Krish Dholakia
3c813b3a87
Fix deepseek calling - refactor to use base_llm_http_handler (#8266)
* refactor(deepseek/): move deepseek to base llm http handler

Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457

* fix(gpt_transformation.py): support stream parsing for gpt-like calls

* test(test_deepseek_completion.py): add async streaming test

* fix(gpt_transformation.py): fix import

* fix(gpt_transformation.py): return full api base and content type
2025-02-04 22:30:00 -08:00
Krrish Dholakia
5b08289d88 fix(factory.py): fix bedrock http:// handling 2025-02-03 18:15:14 -08:00
Krish Dholakia
6834c5ecaf
Easier user onboarding via SSO (#8187)
* fix(ui_sso.py): use common `get_user_object` logic across jwt + ui sso auth

Allows finding users by their email, and attaching the sso user id to the user if found

* Improve Team Management flow on UI  (#8204)

* build(teams.tsx): refactor teams page to make it easier to add members to a team

make a row in table clickable -> allows user to add users to team they intended

* build(teams.tsx): make it clear user should click on team id to view team details

simplifies team management by putting team details on separate page

* build(team_info.tsx): separately show user id and user email

make it easy for user to understand the information they're seeing

* build(team_info.tsx): add back in 'add member' button

* build(team_info.tsx): working team member update on team_info.tsx

* build(team_info.tsx): enable team member delete on ui

allow user to delete accidental adds

* build(internal_user_endpoints.py): expose new endpoint for ui to allow filtering on user table

allows proxy admin to quickly find user they're looking for

* feat(team_endpoints.py): expose new team filter endpoint for ui

allows proxy admin to easily find team they're looking for

* feat(user_search_modal.tsx): allow admin to filter on users when adding new user to teams

* test: mark flaky test

* test: mark flaky test

* fix(exception_mapping_utils.py): fix anthropic text route error

* fix(ui_sso.py): handle situation when user not in db
2025-02-02 23:02:33 -08:00
Krish Dholakia
1105e35538
Complete o3 model support (#8183)
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param

Closes https://github.com/BerriAI/litellm/issues/8182

* fix(main.py): ensure `reasoning_effort` is a mapped openai param

* refactor(azure/): rename o1_[x] files to o_series_[x]

* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort

* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests

* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers

Makes it easy to switch from openai to anthropic

* fix: fix linting errors

* fix(base_llm_unit_tests.py): fix test

* fix(main.py): add missing param
2025-02-02 22:36:37 -08:00
Krish Dholakia
23f458d2da
Improved O3 + Azure O3 support (#8181)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix: support azure o3 model family for fake streaming workaround (#8162)

* fix: support azure o3 model family for fake streaming workaround

* refactor: rename helper to is_o_series_model for clarity

* update function calling parameters for o3 models (#8178)

* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3

ensures max_tokens is correctly translated for o3

* feat(openai/): refactor o1 files to be 'o_series' files

expands naming to cover o3

* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets

* test(test_azure_o_series.py): assert stream faked for azure o3 mini

Resolves https://github.com/BerriAI/litellm/pull/8162

* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing

* docs(azure.md): update doc with `o_series/` model name

---------

Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
2025-02-01 09:52:28 -08:00
Krish Dholakia
91ed05df29
Litellm dev contributor prs 01 31 2025 (#8168)
* Add O3-Mini for Azure and Remove Vision Support (#8161)

* Azure Released O3-mini at the same time as OAI, so i've added support here. Confirmed to work with Sweden Central.

* [FIX] replace cgi for python 3.13 with email.Message as suggested in PEP 594 (#8160)

* Update model_prices_and_context_window.json (#8120)

codestral2501 pricing on vertex_ai

* Fix/db view names (#8119)

* Fix to case sensitive DB Views name

* Fix to case sensitive DB View names

* Added quotes to check query as well

* Added quotes to create view query

* test: handle server error  for flaky test

vertex ai has unstable endpoints

---------

Co-authored-by: Wanis Elabbar <70503629+elabbarw@users.noreply.github.com>
Co-authored-by: Honghua Dong <dhh1995@163.com>
Co-authored-by: superpoussin22 <vincent.nadal@orange.fr>
Co-authored-by: Miguel Armenta <37154380+ma-armenta@users.noreply.github.com>
2025-02-01 09:05:20 -08:00
Ishaan Jaff
2cf0daa31c
(Fixes) OpenAI Streaming Token Counting + Fixes usage track when litellm.turn_off_message_logging=True (#8156)
* working streaming usage tracking

* fix test_async_chat_openai_stream_options

* fix await asyncio.sleep(1)

* test_async_chat_azure

* fix s3 logging

* fix get_stream_options

* fix get_stream_options

* fix streaming handler

* test_stream_token_counting_with_redaction

* fix codeql concern
2025-01-31 15:06:37 -08:00
Krish Dholakia
de261e2120
Doc updates + management endpoint fixes (#8138)
* Litellm dev 01 29 2025 p4 (#8107)

* fix(key_management_endpoints.py): always get db team

Fixes https://github.com/BerriAI/litellm/issues/7983

* test(test_key_management.py): add unit test enforcing check_db_only is always true on key generate checks

* test: fix test

* test: skip gemini thinking

* Litellm dev 01 29 2025 p3 (#8106)

* fix(__init__.py): reduces size of __init__.py and reduces scope for errors by using correct param

* refactor(__init__.py): refactor init by cleaning up redundant params

* refactor(__init__.py): move more constants into constants.py

cleanup root

* refactor(__init__.py): more cleanup

* feat(__init__.py): expose new 'disable_hf_tokenizer_download' param

enables hf model usage in offline env

* docs(config_settings.md): document new disable_hf_tokenizer_download param

* fix: fix linting error

* fix: fix unsafe comparison

* test: fix test

* docs(public_teams.md): add doc showing how to expose public teams for users to join

* docs: add beta disclaimer on public teams

* test: update tests
2025-01-30 22:56:41 -08:00
Krish Dholakia
69a6da4727
Litellm dev 01 30 2025 p2 (#8134)
* feat(lowest_tpm_rpm_v2.py): fix redis cache check to use >= instead of >

makes it consistent

* test(test_custom_guardrails.py): add more unit testing on default on guardrails

ensure it runs if user sent guardrail list is empty

* docs(quick_start.md): clarify default on guardrails run even if user guardrails list contains other guardrails

* refactor(litellm_logging.py): refactor no-log to helper util

allows for more consistent behavior

* feat(litellm_logging.py): add event hook to verbose logs

* fix(litellm_logging.py): add unit testing to ensure `litellm.disable_no_log_param` is respected

* docs(logging.md): document how to disable 'no-log' param

* test: fix test to handle feb

* test: cleanup old bedrock model

* fix: fix router check
2025-01-30 22:18:53 -08:00
Ishaan Jaff
8a235e7d38
(Refactor / QA) - Use LoggingCallbackManager to append callbacks and ensure no duplicate callbacks are added (#8112)
* LoggingCallbackManager

* add logging_callback_manager

* use logging_callback_manager

* add add_litellm_failure_callback

* use add_litellm_callback

* use add_litellm_async_success_callback

* add_litellm_async_failure_callback

* linting fix

* fix logging callback manager

* test_duplicate_multiple_loggers_test

* use _reset_all_callbacks

* fix testing with dup callbacks

* test_basic_image_generation

* reset callbacks for tests

* fix check for _add_custom_logger_to_list

* fix test_amazing_sync_embedding

* fix _get_custom_logger_key

* fix batches testing

* fix _reset_all_callbacks

* fix _check_callback_list_size

* add callback_manager_test

* fix test gemini-2.0-flash-thinking-exp-01-21
2025-01-30 19:35:50 -08:00
Krish Dholakia
8eaa5dc797
Bedrock document processing fixes (#8005)
* refactor(factory.py): refactor async bedrock message transformation to use async get request for image url conversion

improve latency of bedrock call

* test(test_bedrock_completion.py): add unit testing to ensure async image url get called for async bedrock call

* refactor(factory.py): refactor bedrock translation to use BedrockImageProcessor

reduces duplicate code

* fix(factory.py): fix bug not allowing pdf's to be processed

* fix(factory.py): fix bedrock converse document understanding with image url

* docs(bedrock.md): clarify all bedrock document types are supported

* refactor: cleanup redundant test + unused imports

* perf: improve perf with reusable clients

* test: fix test
2025-01-28 17:48:32 -08:00
Krish Dholakia
2eaa0079f2
feat(handle_jwt.py): initial commit adding custom RBAC support on jwt… (#8037)
* feat(handle_jwt.py): initial commit adding custom RBAC support on jwt auth

allows admin to define user role field and allowed roles which map to 'internal_user' on litellm

* fix(auth_checks.py): ensure user allowed to access model, when calling via personal keys

Fixes https://github.com/BerriAI/litellm/issues/8029

* feat(handle_jwt.py): support role based access with model permission control on proxy

Allows admin to just grant users roles on IDP (e.g. Azure AD/Keycloak) and user can immediately start calling models

* docs(rbac): add docs on rbac for model access control

make it clear how admin can use roles to control model access on proxy

* fix: fix linting errors

* test(test_user_api_key_auth.py): add unit testing to ensure rbac role is correctly enforced

* test(test_user_api_key_auth.py): add more testing

* test(test_users.py): add unit testing to ensure user model access is always checked for new keys

Resolves https://github.com/BerriAI/litellm/issues/8029

* test: fix unit test

* fix(dot_notation_indexing.py): fix typing to work with python 3.8
2025-01-28 16:27:06 -08:00
Krish Dholakia
6bafdbc546
Litellm dev 01 25 2025 p4 (#8006)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 34s
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request

adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)

* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming

Fixes https://github.com/BerriAI/litellm/issues/7942

* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"

This reverts commit 7a052a64e3.

* fix(deepseek-r-1): return reasoning_content as a top-level param

ensures compatibility with existing tools that use it

* fix: fix linting error
2025-01-26 08:01:05 -08:00
Krish Dholakia
03eef5a2a0
Fix custom pricing - separate provider info from model info (#7990)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 34s
* fix(utils.py): initial commit fixing custom cost tracking

refactors out provider specific model info from `get_model_info` - this was causing custom costs to be registered incorrectly

* fix(utils.py): cleanup `_supports_factory` to check provider info, if model info is None

some providers support features like vision across all models

* fix(utils.py): refactor to use _supports_factory

* test: update testing

* fix: fix linting errors

* test: fix testing
2025-01-25 21:49:28 -08:00
Krish Dholakia
8ca3229b26
Ensure base_model cost tracking works across all endpoints (#7989)
* test(test_completion_cost.py): add sdk test to ensure base model is used for cost tracking

* test(test_completion_cost.py): add sdk test to ensure custom pricing works

* fix(main.py): add base model cost tracking support for embedding calls

Enables base model cost tracking for embedding calls when base model set as a litellm_param

* fix(litellm_logging.py): update logging object with litellm params - including base model, if given

ensures base model param is always tracked

* fix(main.py): fix linting errors
2025-01-24 21:05:26 -08:00
Ishaan Jaff
74caef0843
(Feat) - Add GCS Pub/Sub Logging integration for sending DB SpendLogs to BigQuery (#7976)
* add pub_sub

* fix custom batch logger for GCS PUB/SUB

* GCS_PUBSUB_PROJECT_ID

* e2e gcs pub sub

* add gcs pub sub

* fix logging

* add GcsPubSubLogger

* fix pub sub

* add pub sub

* docs gcs pub / sub

* docs on pub sub controls

* test_gcs_pub_sub

* fix publish_message

* test_async_gcs_pub_sub

* test_async_gcs_pub_sub
2025-01-24 20:57:20 -08:00
Krish Dholakia
c6e9240405
Add datadog health check support + fix bedrock converse cost tracking w/ region name specified (#7958)
* fix(bedrock/converse_handler.py): fix bedrock region name on async calls

* fix(utils.py): fix split model handling

Fixes bedrock cost calculation when region name is given

* feat(_health_endpoints.py): support health checking datadog integration

Closes https://github.com/BerriAI/litellm/issues/7921
2025-01-23 22:17:09 -08:00
Krish Dholakia
76795dba39
Deepseek r1 support + watsonx qa improvements (#7907)
* fix(types/utils.py): support returning 'reasoning_content' for deepseek models

Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218

* fix(convert_dict_to_response.py): return deepseek response in provider_specific_field

allows for separating openai vs. non-openai params in model response

* fix(utils.py): support 'provider_specific_field' in delta chunk as well

allows deepseek reasoning content chunk to be returned to user from stream as well

Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218

* fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route

* fix(watsonx/): fix watsonx_text/ route with space id

* fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls

* fix(utils.py): rename to '..fields'

* fix: fix linting errors

* fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons
e from being non-oai compatible

* fix: cleanup unused imports

* docs(deepseek.md): add docs for deepseek reasoning model
2025-01-21 23:13:15 -08:00
Ishaan Jaff
4caf4c0277
(Feat - prometheus) - emit litellm_overhead_latency_metric (#7913)
* add track_llm_api_timing

* add track_llm_api_timing

* test_litellm_overhead

* use ResponseMetadata class for setting hidden params and response overhead

* instrument http handler

* fix track_llm_api_timing

* track_llm_api_timing

* emit response overhead on hidden params

* fix resp metadata

* fix make_sync_openai_embedding_request

* test_aaaaatext_completion_endpoint fixes

* _get_value_from_hidden_params

* set_hidden_params

* test_litellm_overhead

* test_litellm_overhead

* test_litellm_overhead

* fix import

* test_litellm_overhead_stream

* add LiteLLMLoggingObject

* use diff folder for testing

* use diff folder for overhead testing

* test litellm overhead

* use typing

* clear typing

* test_litellm_overhead

* fix async_streaming

* update_response_metadata

* move test file

* emit litellm_overhead_latency_metric on prometheus

* add prometheus callback

* litellm_overhead_latency_metric_bucket

* fix apply hidden params

* fix StandardLoggingHiddenParams
2025-01-21 20:36:30 -08:00
Ishaan Jaff
b6f2e659b9
(Feat) Add x-litellm-overhead-duration-ms and "x-litellm-response-duration-ms" in response from LiteLLM (#7899)
* add track_llm_api_timing

* add track_llm_api_timing

* test_litellm_overhead

* use ResponseMetadata class for setting hidden params and response overhead

* instrument http handler

* fix track_llm_api_timing

* track_llm_api_timing

* emit response overhead on hidden params

* fix resp metadata

* fix make_sync_openai_embedding_request

* test_aaaaatext_completion_endpoint fixes

* _get_value_from_hidden_params

* set_hidden_params

* test_litellm_overhead

* test_litellm_overhead

* test_litellm_overhead

* fix import

* test_litellm_overhead_stream

* add LiteLLMLoggingObject

* use diff folder for testing

* use diff folder for overhead testing

* test litellm overhead

* use typing

* clear typing

* test_litellm_overhead

* fix async_streaming

* update_response_metadata

* move test file

* pply metadata to the response objec
2025-01-21 20:27:55 -08:00
Krish Dholakia
4b23420a20
Litellm dev 01 20 2025 p1 (#7884)
* fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out

* feat(openai.py): return timeout value + time taken on openai timeout errors

helps debug timeout errors

* fix(utils.py): fix num retries extraction logic when num_retries = 0

* fix(config_settings.md): litellm_logging.py

support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true

 Enables easier debug

* test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check

* fix(litellm_logging.py): fix linting error
2025-01-20 21:45:48 -08:00
Krish Dholakia
1bea338597
LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826)
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811

* fix(router.py): fix mock timeout check

* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)

This error happens if you provide multiple fallback models to the completion function with model name defined in each one.

* fix(router.py): remove mock_timeout before sending to request

prevents reuse in fallbacks

* test: update test

* test: revert test change - wrong pr

---------

Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
2025-01-17 20:59:21 -08:00
Ishaan Jaff
b30e05b54f Revert "test_completion_mistral_api_mistral_large_function_call"
This reverts commit ef9177f0a8.
2025-01-17 07:20:46 -08:00
Ishaan Jaff
ef9177f0a8 test_completion_mistral_api_mistral_large_function_call 2025-01-16 21:50:56 -08:00
Krish Dholakia
80d6bbec29
Litellm dev 01 14 2025 p2 (#7772)
* feat(pass_through_endpoints.py): fix anthropic end user cost tracking

* fix(anthropic/chat/transformation.py): use returned provider model for anthropic

handles anthropic `-latest` tag in request body throwing cost calculation errors

ensures we can be accurate in our model cost tracking

* feat(model_prices_and_context_window.json): add gemini-2.0-flash-thinking-exp pricing

* test: update test to use assumption that user_api_key_dict can get anthropic user id

* test: fix test

* fix: fix test

* fix(anthropic_pass_through.py): uncomment previous anthropic end-user cost tracking code block

can't guarantee user api key dict always has end user id - too many code paths

* fix(user_api_key_auth.py): this allows end user id from request body to always be read and set in auth object

* fix(auth_check.py): fix linting error

* test: fix auth check

* fix(auth_utils.py): fix get end user id to handle metadata = None
2025-01-15 21:34:50 -08:00
Krish Dholakia
35919d9fec
Litellm dev 01 13 2025 p2 (#7758)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* fix(factory.py): fix bedrock document url check

Make check more generic - if starts with 'text' or 'application' assume it's a document and let it go through

 Fixes https://github.com/BerriAI/litellm/issues/7746

* feat(key_management_endpoints.py): support writing new key alias to aws secret manager - on key rotation

adds rotation endpoint to aws key management hook - allows for rotated litellm virtual keys with new key alias to be written to it

* feat(key_management_event_hooks.py): support rotating keys and updating secret manager

* refactor(base_secret_manager.py): support rotate secret at the base level

since it's just an abstraction function, it's easy to implement at the base manager level

* style: cleanup unused imports
2025-01-14 17:04:01 -08:00
Ishaan Jaff
d88f01d518
(litellm SDK perf improvements) - handle cases when unable to lookup model in model cost map (#7750)
* use lru cache wrapper

* use lru_cache_wrapper for _cached_get_model_info_helper

* fix _get_traceback_str_for_error

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 19:58:46 -08:00
Ishaan Jaff
f1335362cf
(core sdk fix) - fix fallbacks stuck in infinite loop (#7751)
* test_acompletion_fallbacks_basic

* use common run_async_function

* fix completion_with_fallbacks

* fix completion with fallbacks

* fix fallback utils

* test_acompletion_fallbacks_basic

* test_completion_fallbacks_sync

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 19:34:34 -08:00
Krish Dholakia
ad2f66b3e3
[BETA] Add OpenAI /images/variations + Topaz API support (#7700)
* feat(main.py): initial commit for `/image/variations` endpoint support

* refactor(base_llm/): introduce new base llm base config for image variation endpoints

* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler

* fix: test

* feat(openai/): working openai `/image/variation` endpoint calls via sdk

* feat(topaz/): topaz sync image variation call support

Addresses https://github.com/BerriAI/litellm/issues/7593

'

* fix(topaz/transformation.py): fix linting errors

* fix(openai/image_variations/handler.py): fix passing json data

* fix(main.py): image_variation/

support async image variation route - `aimage_variation`

* fix(test_get_model_info.py): fix test

* fix: cleanup unused imports

* feat(openai/): add async `/image/variations` endpoint support

* feat(topaz/): support async `/image/variations` calls

* fix: test

* fix(utils.py): fix get_model_info_helper for no model info w/ provider config

handles situation where model info is not known but provider config exists

* test(test_router_fallbacks.py): mark flaky test

* fix: fix unused imports

* test: bump otel load test perf threshold - accounts for current load tests hitting same server
2025-01-11 23:27:46 -08:00
Ishaan Jaff
a7c803edc5
(perf) - only use response_cost_calculator 1 time per request. (Don't re-use the same helper twice per call ) (#7709)
* fix get_llm_provider for aiohttp openai

* fix _success_handler_helper_fn
2025-01-11 23:23:01 -08:00
Ishaan Jaff
75fb37217a
(sdk perf fix) - only print args passed to litellm when debugging mode is on (#7708)
* use _is_debugging_on

* fix unused imports
2025-01-11 22:56:20 -08:00
Ishaan Jaff
1a6c4905f1 fix get_llm_provider for aiohttp openai 2025-01-11 21:45:06 -08:00
Krish Dholakia
becd4bc748
Litellm dev 01 11 2025 p3 (#7702)
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names

prevents bad health checks on wildcard routes

* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
2025-01-11 20:06:54 -08:00
Krish Dholakia
27892acdfc
Litellm dev 01 10 2025 p3 (#7682)
* feat(langfuse.py): log the used prompt when prompt management used

* test: fix test

* docs(self_serve.md): add doc on restricting personal key creation on ui

* feat(s3.py): support s3 logging with team alias prefixes (if available)

New preview feature

* fix(main.py): remove old if block - simplify to just await if coroutine returned

fixes lm_studio async embedding error

* fix(langfuse.py): handle get prompt check
2025-01-10 21:56:42 -08:00
Ishaan Jaff
5c870c0c51
(performance improvement - litellm sdk + proxy) - ensure litellm does not create unnecessary threads when running async functions (#7680)
* fix handle_sync_success_callbacks_for_async_calls

* fix handle_sync_success_callbacks_for_async_calls

* fix linting / testing errors

* use handle_sync_success_callbacks_for_async_calls

* add unit testing for logging fixes
2025-01-10 17:57:22 -08:00
Krish Dholakia
a3e65c9bcb
LiteLLM Minor Fixes & Improvements (01/10/2025) - p1 (#7670)
* test(test_get_model_info.py): add unit test confirming router deployment updates global 'get_model_info'

* fix(get_supported_openai_params.py): fix custom llm provider 'get_supported_openai_params'

Fixes https://github.com/BerriAI/litellm/issues/7668

* docs(azure.md): clarify how azure ad token refresh on proxy works

Closes https://github.com/BerriAI/litellm/issues/7665
2025-01-10 17:49:05 -08:00
Ishaan Jaff
9174a6f349
(litellm sdk - perf improvement) - optimize pre_call_check (#7673)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* latency fix - litellm sdk

* fix linting error

* fix litellm logging
2025-01-10 14:16:39 -08:00
Ishaan Jaff
c999b4efe1
(litellm sdk - perf improvement) - use O(1) set lookups for checking llm providers / models (#7672)
* fix get model info logic to use O(1) lookups

* perf - use O(1) lookup for get llm provider
2025-01-10 14:16:30 -08:00
Ishaan Jaff
b3bd15e35a
speed up use_custom_pricing_for_model (#7674) 2025-01-10 14:00:48 -08:00
Krish Dholakia
c10ae8879e
fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… (#7660)
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url

* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic

deduplicates code

* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages

* docs(prompt_management.md): update prompt management to be in beta

given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)

* refactor(prompt_management_base.py): introduce base class for prompt management

allows consistent behaviour across prompt management integrations

* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base

* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set

allows tracking what prompt was used for what purpose

* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse

allows logging prompt id / prompt variables to langfuse

* test: fix test

* fix(router.py): cleanup unused imports

* fix: fix linting error

* fix: fix trace param typing

* fix: fix linting errors

* fix: fix code qa check
2025-01-10 07:31:59 -08:00
Krish Dholakia
865e6d5bda
fix(main.py): fix lm_studio/ embedding routing (#7658)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 36s
* fix(main.py): fix lm_studio/ embedding routing

adds the mapping + updates docs with example

* docs(self_serve.md): update doc to show how to auto-add sso users to teams

* fix(streaming_handler.py): simplify async iterator check, to just check if streaming response is an async iterable
2025-01-09 23:03:24 -08:00
Krish Dholakia
1e3370f3cb
LiteLLM Minor Fixes & Improvements (01/08/2025) - p2 (#7643)
* fix(streaming_chunk_builder_utils.py): add test for groq tool calling + streaming + combine chunks

Addresses https://github.com/BerriAI/litellm/issues/7621

* fix(streaming_utils.py): fix modelresponseiterator for openai like chunk parser

ensures chunk parser uses the correct tool call id when translating the chunk

 Fixes https://github.com/BerriAI/litellm/issues/7621

* build(model_hub.tsx): display cost pricing on model hub

* build(model_hub.tsx): show cost per token pricing + complete model information

* fix(types/utils.py): fix usage object handling
2025-01-08 19:45:19 -08:00