Commit graph

645 commits

Author SHA1 Message Date
Ishaan Jaff
46a150fc1b use helper for image gen tests (#7343) 2024-12-20 21:28:32 -08:00
Ishaan Jaff
451215b106 (fix) LiteLLM Proxy fix GET /files/{file_id:path}/content" endpoint (#7342)
* fix order of get_file_content

* update e2 files tests

* add e2 batches endpoint testing

* update config.yml

* write content to file

* use correct oai_misc_config

* fixes for openai batches endpoint testing

* remove extra out file

* fix input.jsonl
2024-12-20 21:27:45 -08:00
Krish Dholakia
4f3ddebf81 Litellm dev 2024 12 20 p1 (#7335)
* fix(utils.py): e2e azure tts cost tracking working

moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix: fix linting errors

* build(model_prices_and_context_window.json): add bedrock llama 3.3

Closes https://github.com/BerriAI/litellm/issues/7329

* fix(openai.py): fix return type for sync openai httpx response

* test: update test

* fix(spend_tracking_utils.py): fix if check

* fix(spend_tracking_utils.py): fix if check

* test: improve debugging for test

* fix: fix import
2024-12-20 21:22:31 -08:00
Krish Dholakia
61b4c41c3c Litellm dev 12 20 2024 p3 (#7339)
* fix(proxy_track_cost_callback.py): log to db if only end user param given

* fix: allows for jwt-auth based end user id spend tracking to work

* fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id'

more stable - works with jwt-auth based end user tracking as well

* test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs

* test: update test to use end_user api key hash param

* fix(langfuse.py): support end user cost tracking via jwt auth + langfuse

logs end user to langfuse if decoded from jwt token

* fix: fix linting errors

* test: fix test

* test: fix test

* fix: fix end user id extraction

* fix: run test earlier
2024-12-20 21:13:32 -08:00
Ishaan Jaff
1b2ed0c344 [Bug fix ]: Triton /infer handler incompatible with batch responses (#7337)
* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage
2024-12-20 20:59:40 -08:00
Krish Dholakia
e6bdec4eed Controll fallback prompts client-side (#7334)
* feat(router.py): support passing model-specific messages in fallbacks

* docs(routing.md): separate router timeouts into separate doc

allow for 1 fallbacks doc (across proxy/router)

* docs(routing.md): cleanup router docs

* docs(reliability.md): cleanup docs

* docs(reliability.md): cleaned up fallback doc

just have 1 doc across sdk/proxy

simplifies docs

* docs(reliability.md): add setting model-specific fallback prompts

* fix: fix linting errors

* test: skip test causing openai rate limit errros

* test: fix test

* test: run vertex test first to catch error
2024-12-20 19:09:53 -08:00
Krish Dholakia
b026230b0a Litellm dev 2024 12 19 p3 (#7322)
* fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params

Fixes https://github.com/BerriAI/litellm/issues/7242

* test: new test for langfuse prompt management hook

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(main.py): add 'get_chat_completion_prompt' customlogger hook

allows for langfuse prompt management

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(langfuse_prompt_management.py): working e2e langfuse prompt management

works with `langfuse/` route

* feat(main.py): initial tracing for dynamic langfuse params

allows admin to specify langfuse keys by model in model_list

* feat(main.py): support passing langfuse credentials dynamically

* fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params

allows dynamic langfuse params to work

* fix: fix linting errors

* docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial

* docs(prompt_management.md): cleanup doc

* docs: cleanup topnav

* docs(prompt_management.md): update docs to be easier to use

* fix: remove unused imports

* docs(prompt_management.md): add architectural overview doc

* fix(litellm_logging.py): fix dynamic param passing

* fix(langfuse_prompt_management.py): fix linting errors

* fix: fix linting errors

* fix: use typing_extensions for typealias to ensure python3.8 compatibility

* test: use stream_options in test to account for tiktoken diff

* fix: improve import error message, and check run test earlier
2024-12-20 13:30:16 -08:00
Krish Dholakia
e8d2cb4935 Litellm dev 12 19 2024 p2 (#7315)
* fix(proxy_server.py): only update k,v pair if v is not empty/null

Fixes https://github.com/BerriAI/litellm/issues/6787

* test(test_router.py): cleanup duplicate calls

* test: add new test stream options drop params test

* test: update optional params / stream options test to test for vertex ai mistral route specifically

Addresses https://github.com/BerriAI/litellm/issues/7309

* fix(proxy_server.py): fix linting errors

* fix: fix linting errors
2024-12-19 20:28:16 -08:00
Ishaan Jaff
6641e75e0c (feat) add infinity rerank models (#7321)
* Support Infinity Reranker (custom reranking models) (#7247)

* Support Infinity Reranker

* Clean code

* Included transformation.py

* Clean code

* Added Infinity reranker test

* Clean code

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* transform_rerank_response

* update handler.py

* infinity rerank updates

* ci/cd run again

* add infinity unit tests

* docs add instruction on how to add a new provider for rerank

---------

Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2024-12-19 18:30:28 -08:00
Ishaan Jaff
b0738fd439 (code refactor) - Add BaseRerankConfig. Use BaseRerankConfig for cohere/rerank and azure_ai/rerank (#7319)
* add base rerank config

* working sync cohere rerank

* update rerank types

* update base rerank config

* remove old rerank

* add new cohere handler.py

* add cohere rerank transform

* add get_provider_rerank_config

* add rerank to base llm http handler

* add rerank utils

* add arerank to llm http handler.py

* add AzureAIRerankConfig

* updates rerank config

* update test rerank

* fix unused imports

* update get_provider_rerank_config

* test_basic_rerank_caching

* fix unused import

* test rerank
2024-12-19 17:03:34 -08:00
Ishaan Jaff
51f3fc65f7 [Bug Fix]: ImportError: cannot import name 'T' from 're' (#7314)
* fix unused imports

* add test for python 3.12

* re introduce error - as a test

* update config for ci/cd

* fix python 13 install

* bump pyyaml

* bump numpy

* fix embedding requests

* bump pillow dep

* bump version

* bump pydantic

* bump tiktoken

* fix import

* fix python 3.13 import

* fix unused imports in tests/*
2024-12-19 13:09:30 -08:00
Ishaan Jaff
62a1cdec47 (code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Krish Dholakia
3183b7e6a1 o1 - add image param handling (#7312)
* fix(openai.py): fix returning o1 non-streaming requests

fixes issue where fake stream always true for o1

* build(model_prices_and_context_window.json): add 'supports_vision' for o1 models

* fix: add internal server error exception mapping

* fix(base_llm_unit_tests.py): drop temperature from test

* test: mark prompt caching as a flaky test
2024-12-19 11:22:25 -08:00
Ishaan Jaff
087ab41d66 (proxy admin ui) - show Teams sorted by Team Alias (#7296)
* ui - sort teams by team alias

* test assert /user/info returns teams in a sorted order

* fix team_alias check on team
2024-12-18 19:43:19 -08:00
Ishaan Jaff
6220e17ebf (feat proxy) v2 - model max budgets (#7302)
* clean up unused code

* add _PROXY_VirtualKeyModelMaxBudgetLimiter

* adjust type imports

* working _PROXY_VirtualKeyModelMaxBudgetLimiter

* fix user_api_key_model_max_budget

* fix user_api_key_model_max_budget

* update naming

* update naming

* fix changes to RouterBudgetLimiting

* test_call_with_key_over_model_budget

* test_call_with_key_over_model_budget

* handle _get_request_model_budget_config

* e2e test for test_call_with_key_over_model_budget

* clean up test

* run ci/cd again

* add validate_model_max_budget

* docs fix

* update doc

* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter

* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
1a4910f6c0 fix(health.md): add rerank model health check information (#7295)
* fix(health.md): add rerank model health check information

* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits

* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true

* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature

* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini

* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models

needed as o1-preview, and o1-mini models don't support 'system message

* fix(o1_transformation.py): translate system message based on if o1 model supports it

* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview

o1 currently doesn't support streaming, but the other model versions do

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix: fix linting errors

* fix: update '_transform_messages'

* fix(o1_transformation.py): fix provider passed for supported param checks

* test(base_llm_unit_tests.py): skip test if api takes >5s to respond

* fix(utils.py): return false in 'supports_factory' if can't find value

* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1

* feat(openai.py): support stream faking natively in openai handler

Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview

 Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(openai.py): use inference param instead of original optional param
2024-12-18 19:18:10 -08:00
Krish Dholakia
e95820367f fix(hosted_vllm/transformation.py): return fake api key, if none give… (#7301)
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error

Fixes https://github.com/BerriAI/litellm/issues/7291

* test: fix test

* fix(main.py): add hosted_vllm/ support for embeddings endpoint

Closes https://github.com/BerriAI/litellm/issues/7290

* docs(vllm.md): add docs on vllm embeddings usage

* fix(__init__.py): fix sambanova model test

* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
2024-12-18 18:41:53 -08:00
Ishaan Jaff
70883bc1b8 (feat - proxy) Add status_code to litellm_proxy_total_requests_metric_total (#7293)
* fix _select_model_name_for_cost_calc docstring

* add STATUS_CODE  to prometheus

* test prometheus unit tests

* test_prometheus_unit_tests.py

* update Proxy Level Tracking Metrics docs

* fix test_proxy_failure_metrics

* fix test_proxy_failure_metrics
2024-12-18 15:55:02 -08:00
Krish Dholakia
050499ec8f Litellm dev readd prompt caching (#7299)
* fix(router.py): re-add saving model id on prompt caching valid successful deployment

* fix(router.py): introduce optional pre_call_checks

isolate prompt caching logic in a separate file

* fix(prompt_caching_deployment_check.py): fix import

* fix(router.py): new 'async_filter_deployments' event hook

allows custom logger to filter deployments returned to routing strategy

* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing

* fix(cooldown_callbacks.py): fix linting error

* fix(budget_limiter.py): move budget logger to async_filter_deployment hook

* test: add unit test

* test(test_router_helper_utils.py): add unit testing

* fix(budget_limiter.py): fix linting errors

* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Krish Dholakia
e7918f097b fix(proxy_server.py): pass model access groups to get_key/get_team mo… (#7281)
* fix(proxy_server.py): pass model access groups to get_key/get_team models

allows end user to see actual models they have access to, instead of default models

* fix(auth_checks.py): fix linting errors

* fix: fix linting errors
2024-12-18 09:33:33 -08:00
Ishaan Jaff
c7b288ce30 (fix) unable to pass input_type parameter to Voyage AI embedding mode (#7276)
* VoyageEmbeddingConfig

* fix voyage logic to get params

* add voyage embedding transformation

* add get_provider_embedding_config

* use BaseEmbeddingConfig

* voyage clean up

* use llm http handler for embedding transformations

* test_voyage_ai_embedding_extra_params

* add voyage async

* test_voyage_ai_embedding_extra_params

* add async for llm http handler

* update BaseLLMEmbeddingTest

* test_voyage_ai_embedding_extra_params

* fix linting

* fix get_provider_embedding_config

* fix anthropic text test

* update location of base/chat/transformation

* fix import path

* fix IBMWatsonXAIConfig
2024-12-17 19:23:49 -08:00
Krish Dholakia
f966e279a6 LiteLLM Minor Fixes & Improvements (12/16/2024) - p1 (#7263)
* fix(factory.py): skip empty text blocks for bedrock user messages

Fixes https://github.com/BerriAI/litellm/issues/7169

* Add support for Gemini 2.0 GoogleSearch tool (#7257)

* Add support for google_search tool in gemini 2.0

* Add/modify tests

* Fix grounding check

* Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST

* Swap order of tools

* DFix formatting

* fix(get_api_base.py): return api base in streaming response

Fixes https://github.com/BerriAI/litellm/issues/7249

Closes https://github.com/BerriAI/litellm/pull/7250

* fix(cost_calculator.py): only set base model to model if not none

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix(cost_calculator.py): enforce stricter order when picking model for cost calculation

* fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided

* fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given

* fix(cost_calculator.py): handle `custom_llm_provider-` scenario

* fix(cost_calculator.py): e2e working tts cost tracking

ensures initial message is passed in, to cost calculator

* fix(factory.py): suppress linting errors

* fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model

* fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly

* test: handle none env var value in flaky test

* fix(litellm_logging.py): fix linting errors

---------

Co-authored-by: Sam B <samlingx@gmail.com>
2024-12-17 15:33:36 -08:00
Krish Dholakia
57809cfbf4 Litellm dev 12 17 2024 p2 (#7277)
* fix(openai/transcription/handler.py): call 'log_pre_api_call' on async calls

* fix(openai/transcriptions/handler.py): call 'logging.pre_call' on sync whisper calls as well

* fix(proxy_cli.py): remove default proxy_cli timeout param

gets passed in as a dynamic request timeout and overrides config values

* fix(langfuse.py): pass litellm httpx client - contains ssl certs (#7052)

Fixes https://github.com/BerriAI/litellm/issues/7046
2024-12-17 14:05:14 -08:00
Krish Dholakia
03e711e3e4 LITELLM: Remove requests library usage (#7235)
* fix(generic_api_callback.py): remove requests lib usage

* fix(budget_manager.py): remove requests lib usgae

* fix(main.py): cleanup requests lib usage

* fix(utils.py): remove requests lib usage

* fix(argilla.py): fix argilla test

* fix(athina.py): replace 'requests' lib usage with litellm module

* fix(greenscale.py): replace 'requests' lib usage with httpx

* fix: remove unused 'requests' lib import + replace usage in some places

* fix(prompt_layer.py): remove 'requests' lib usage from prompt layer

* fix(ollama_chat.py): remove 'requests' lib usage

* fix(baseten.py): replace 'requests' lib usage

* fix(codestral/): replace 'requests' lib usage

* fix(predibase/): replace 'requests' lib usage

* refactor: cleanup unused 'requests' lib imports

* fix(oobabooga.py): cleanup 'requests' lib usage

* fix(invoke_handler.py): remove unused 'requests' lib usage

* refactor: cleanup unused 'requests' lib import

* fix: fix linting errors

* refactor(ollama/): move ollama to using base llm http handler

removes 'requests' lib dep for ollama integration

* fix(ollama_chat.py): fix linting errors

* fix(ollama/completion/transformation.py): convert non-jpeg/png image to jpeg/png before passing to ollama
2024-12-17 12:50:04 -08:00
Krish Dholakia
f628290ce7 fix(utils.py): fix openai-like api response format parsing (#7273)
* fix(utils.py): fix openai-like api response format parsing

Fixes issue passing structured output to litellm_proxy/ route

* fix(cost_calculator.py): fix whisper transcription cost calc to use file duration, not response time

'

* test: skip test if credentials not found
2024-12-17 12:49:09 -08:00
Krish Dholakia
56491b31d7 docs(input.md): document 'extra_headers' param support (#7268)
* docs(input.md): document 'extra_headers' param support

* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)

Co-authored-by: Ryan Hoium <rhoium>

---------

Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>
2024-12-17 07:19:14 -08:00
Ishaan Jaff
427f2173d2 (feat) Add Bedrock knowledge base pass through endpoints (#7267)
* bugfix: Proxy Routing for Bedrock Knowledgebase URLs are incorrect (#7097)

* Fixing routing bug where bedrock knowledgebase urls were being generated incorrectly

* Preparing for PR

* Preparing for PR

* Preparing for PR

---------

Co-authored-by: Luke Birk <lb0737@att.com>

* fix _is_bedrock_agent_runtime_route

* docs - Query Knowledge Base

* test_is_bedrock_agent_runtime_route

* fix bedrock_proxy_route

---------

Co-authored-by: LBirk <2731718+LBirk@users.noreply.github.com>
Co-authored-by: Luke Birk <lb0737@att.com>
2024-12-16 22:19:34 -08:00
Ishaan Jaff
d891861c8e (feat) Add Azure Blob Storage Logging Integration (#7265)
* add path to http handler

* AzureBlobStorageLogger

* test_azure_blob_storage

* use constants for Azure storage

* use helper get_azure_ad_token_from_entrata_id

* azure blob storage support

* get_azure_ad_token_from_azure_storage

* fix import

* azure logging

* docs azure storage

* add docs on azure blobs

* add premium user check

* add azure_storage  as identified logging callback

* async_upload_payload_to_azure_blob_storage

* docs azure storage

* callback_class_str_to_classType
2024-12-16 22:18:22 -08:00
Krish Dholakia
194acfa95c Litellm dev 12 14 2024 p1 (#7231)
* fix(router.py): fix reading + using deployment-specific num retries on router

Fixes https://github.com/BerriAI/litellm/issues/7001

* fix(router.py): ensure 'timeout' in litellm_params overrides any value in router settings

Refactors all routes to use common '_update_kwargs_with_deployment' which has the timeout handling

* fix(router.py): fix timeout check
2024-12-14 22:22:29 -08:00
Ishaan Jaff
2459f9735d (feat) Add Tag-based budgets on litellm router / proxy (#7236)
* add BudgetConfig

* add _get_tags_from_request_kwargs

* test_tag_budgets_e2e_test_expect_to_fail

* add a check for request tags

* fix _async_get_cache_keys_for_router_budget_limiting

* fix test

* fix _sync_in_memory_spend_with_redis

* _async_get_cache_keys_for_router_budget_limiting

* fix _init_tag_budgets

* fix type casting

* docs show error for tag budget limit hit

* fix _get_tags_from_request_kwargs

* fix undo change
2024-12-14 17:28:36 -08:00
Krish Dholakia
edbf5eeeb3 Litellm remove circular imports (#7232)
* fix(utils.py): initial commit to remove circular imports - moves llmproviders to utils.py

* fix(router.py): fix 'litellm.EmbeddingResponse' import from router.py

'

* refactor: fix litellm.ModelResponse import on pass through endpoints

* refactor(litellm_logging.py): fix circular import for custom callbacks literal

* fix(factory.py): fix circular imports inside prompt factory

* fix(cost_calculator.py): fix circular import for 'litellm.Usage'

* fix(proxy_server.py): fix potential circular import with `litellm.Router'

* fix(proxy/utils.py): fix potential circular import in `litellm.Router`

* fix: remove circular imports in 'auth_checks' and 'guardrails/'

* fix(prompt_injection_detection.py): fix router impor t

* fix(vertex_passthrough_logging_handler.py): fix potential circular imports in vertex pass through

* fix(anthropic_pass_through_logging_handler.py): fix potential circular imports

* fix(slack_alerting.py-+-ollama_chat.py): fix modelresponse import

* fix(base.py): fix potential circular import

* fix(handler.py): fix potential circular ref in codestral + cohere handler's

* fix(azure.py): fix potential circular imports

* fix(gpt_transformation.py): fix modelresponse import

* fix(litellm_logging.py): add logging base class - simplify typing

makes it easy for other files to type check the logging obj without introducing circular imports

* fix(azure_ai/embed): fix potential circular import on handler.py

* fix(databricks/): fix potential circular imports in databricks/

* fix(vertex_ai/): fix potential circular imports on vertex ai embeddings

* fix(vertex_ai/image_gen): fix import

* fix(watsonx-+-bedrock): cleanup imports

* refactor(anthropic-pass-through-+-petals): cleanup imports

* refactor(huggingface/): cleanup imports

* fix(ollama-+-clarifai): cleanup circular imports

* fix(openai_like/): fix impor t

* fix(openai_like/): fix embedding handler

cleanup imports

* refactor(openai.py): cleanup imports

* fix(sagemaker/transformation.py): fix import

* ci(config.yml): add circular import test to ci/cd
2024-12-14 16:28:34 -08:00
Ishaan Jaff
73dcbf8d4e (proxy) - Auth fix, ensure re-using safe request body for checking model field (#7222)
* litellm fix auth check

* fix _read_request_body

* test_auth_with_form_data_and_model

* fix auth check

* fix _read_request_body

* fix _safe_get_request_headers
2024-12-14 12:01:25 -08:00
Krish Dholakia
d02b9a111a fix(main.py): fix retries being multiplied when using openai sdk (#7221)
* fix(main.py): fix retries being multiplied when using openai sdk

Closes https://github.com/BerriAI/litellm/pull/7130

* docs(prompt_management.md): add langfuse prompt management doc

* feat(team_endpoints.py): allow teams to add their own models

Enables teams to call their own finetuned models via the proxy

* test: add better enforcement check testing for `/model/new` now that teams can add their own models

* docs(team_model_add.md): tutorial for allowing teams to add their own models

* test: fix test
2024-12-14 11:56:55 -08:00
Ishaan Jaff
925d33aa9d Litellm add router to base llm testing (#7202)
* code qa add litellm router to base llm testing

* test_image_url

* fix img url

* fix add router to base llm class

* fix base llm testing

* add test scenario

* fix test_json_response_format

* fixes base llm testing

* fix base llm testing

* fix test image url
2024-12-13 19:16:28 -08:00
Ishaan Jaff
bc46916bb3 (feat - Router / Proxy ) Allow setting budget limits per LLM deployment (#7220)
* fix test_deployment_budget_limits_e2e_test

* refactor async_log_success_event to track spend for provider + deployment

* fix format

* rename class to RouterBudgetLimiting

* rename func

* rename types used for budgets

* add new types for deployment budgets

* add budget limits for deployments

* fix checking budgets set for provider

* update file names

* fix linting error

* _track_provider_remaining_budget_prometheus

* async_filter_deployments

* fix model list passed to router

* update error

* test_deployment_budgets_e2e_test_expect_to_fail

* fix test case

* run deployment budget limits
2024-12-13 19:15:51 -08:00
Krish Dholakia
c3f637012b Litellm dev 12 13 2024 p1 (#7219)
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls

* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith

* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.

Closes https://github.com/BerriAI/litellm/issues/7213

* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var

allows user to use local watsonx

Fixes https://github.com/BerriAI/litellm/issues/4991

* fix(litellm_logging.py): cleanup unused function

* test: skip bad ibm test
2024-12-13 19:01:28 -08:00
Krish Dholakia
550677e63d Litellm dev 12 11 2024 v2 (#7215)
* feat(bedrock/): add bedrock converse top k param

Closes https://github.com/BerriAI/litellm/issues/7087

* Fix bedrock empty content error (#7177)

* add resolver

* handle empty content on bedrock with default content

* use existing default message, tests

* Update tests/llm_translation/test_bedrock_completion.py

* fix tests

* Revert "add resolver"

This reverts commit c717e376ee.

* fallback to empty

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix(factory.py): handle empty content blocks in messages

Fixes https://github.com/BerriAI/litellm/issues/7169

* feat(router.py): add stripped model check to model fallback search

if model_name="openai/gpt-3.5-turbo" and fallback=[{"gpt-3.5-turbo"..}] the fallback should just work as expected

* fix: fix linting error

* fix(factory.py): fix linting error

* fix(factory.py): in base case still support skip empty text blocks

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-12-13 12:49:57 -08:00
Krish Dholakia
a42f008cd0 Litellm dev 12 12 2024 (#7203)
* fix(azure/): support passing headers to azure openai endpoints

Fixes https://github.com/BerriAI/litellm/issues/6217

* fix(utils.py): move default tokenizer to just openai

hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls

* fix(router.py): fix pattern matching router - add generic "*" to it as well

Fixes issue where generic "*" model access group wouldn't show up

* fix(pattern_match_deployments.py): match to more specific pattern

match to more specific pattern

allows setting generic wildcard model access group and excluding specific models more easily

* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty

don't delete all router models  b/c of empty list

Fixes https://github.com/BerriAI/litellm/issues/7196

* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api

* fix(fireworks_ai/): support passing response_format + tool call in same message

Addresses https://github.com/BerriAI/litellm/issues/7135

* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"

This reverts commit 6a30dc6929.

* test: fix test

* fix(replicate/): fix replicate default retry/polling logic

* test: add unit testing for router pattern matching

* test: update test to use default oai tokenizer

* test: mark flaky test

* test: skip flaky test
2024-12-13 08:54:03 -08:00
Ishaan Jaff
b56e29db36 (fix) latency fix - revert prompt caching check on litellm router (#7211)
* attempt to fix latency issue

* fix latency issues for router prompt caching
2024-12-12 20:50:16 -08:00
Ishaan Jaff
01b20f0bb8 (minor fix proxy) Clarify Proxy Rate limit errors are showing hash of litellm virtual key (#7210)
* fix clarify rate limit errors are showing litellm virtual key

* fix constants.py

* update test

* fix test parallel limiter
2024-12-12 20:13:14 -08:00
Ishaan Jaff
a0464f2970 fix testing retry audio test 3 times 2024-12-12 20:09:14 -08:00
Ishaan Jaff
b1c3e2d4ef (feat) UI - Disable Usage Tab once SpendLogs is 1M+ Rows (#7208)
* use utils to set proxy spend logs row count

* store proxy state variables

* fix check for _has_user_setup_sso

* fix proxyStateVariables

* fix dup code

* rename getProxyUISettings

* add fixes

* ui emit num spend logs rows

* test_proxy_server_prisma_setup

* use MAX_SPENDLOG_ROWS_TO_QUERY to constants

* test_get_ui_settings_spend_logs_threshold
2024-12-12 18:43:17 -08:00
Ishaan Jaff
8c7605a164 fix: Support WebP image format and avoid token calculation error (#7182)
* fix get_image_dimensions

* attempt without pillow

* add clear type hints

* fix run_async_function_within_sync_function

* fix calculage_img_tokens

* fix is_prompt_caching_valid_prompt

* fix naming

* fix calculate_img_tokens

* fix unused imports

* fix calculate_img_tokens

* test test_is_prompt_caching_enabled_error_handling

* test_is_prompt_caching_enabled_return_default_image_dimensions

* fix openai_token_counter

* fix get_image_dimensions

* test_token_counter_with_image_url_with_detail_high

* test_img_url_token_counter

* fix test utils

* fix testing

* test_is_prompt_caching_enabled
2024-12-12 14:32:39 -08:00
Ishaan Jaff
c6d6bda76c (docs) Document StandardLoggingPayload Spec (#7201)
* add slp spec to docs

* docs slp

* test slp enforcement
2024-12-12 14:00:42 -08:00
Ishaan Jaff
0862a233be (feat) add error_code, error_class, llm_provider to StandardLoggingPayload (#7200)
* add StandardLoggingPayloadErrorInformation to error

* test_get_error_information
2024-12-12 12:18:10 -08:00
Ishaan Jaff
02fc8d8738 (Feat) DataDog Logger - Add HOSTNAME and POD_NAME to DataDog logs (#7189)
* add unit test for test_datadog_static_methods

* docs dd vars

* test_datadog_payload_environment_variables

* test_datadog_static_methods

* docs env vars

* fix table
2024-12-12 12:06:26 -08:00
Ishaan Jaff
2185587b4d (feat) add response_time to StandardLoggingPayload - logged on datadog, gcs_bucket, s3_bucket etc (#7199)
* feat - add response_time to slp

* test_get_response_time

* docs slp

* fix test_datadog_logging_http_request
2024-12-12 12:04:43 -08:00
Krrish Dholakia
04138c2df7 test: update hf test to check if client closed 2024-12-12 11:34:50 -08:00
Krish Dholakia
a9aeb21d0b fix(acompletion): support fallbacks on acompletion (#7184)
* fix(acompletion): support fallbacks on acompletion

allows health checks for wildcard routes to use fallback models

* test: update cohere generate api testing

* add max tokens to health check (#7000)

* fix: fix health check test

* test: update testing

---------

Co-authored-by: Cameron <561860+wallies@users.noreply.github.com>
2024-12-11 19:20:54 -08:00
Krrish Dholakia
982ef7ca04 build: Squashed commit of https://github.com/BerriAI/litellm/pull/7171
Closes https://github.com/BerriAI/litellm/pull/7171
2024-12-11 01:10:12 -08:00