Commit graph

87 commits

Author SHA1 Message Date
Krrish Dholakia
f7d9cce536 refactor(azure.py): refactor acompletion to use base azure sdk client 2025-03-11 13:59:13 -07:00
Krrish Dholakia
b58edb7fa1 test(test_azure_common_utils.py): add unit testing for common azure client params function 2025-03-11 12:24:08 -07:00
Krrish Dholakia
69839b3720 refactor(azure/common_utils.py): refactor azure client param logic
create common util for azure client param logic
2025-03-11 12:14:50 -07:00
Krrish Dholakia
bfbe26b91d feat(azure.py): add azure bad request error support 2025-03-10 15:59:06 -07:00
Krish Dholakia
5e386c28b2
Litellm dev 03 04 2025 p3 (#8997)
* fix(core_helpers.py): handle litellm_metadata instead of 'metadata'

* feat(batches/): ensure batches logs are written to db

makes batches response dict compatible

* fix(cost_calculator.py): handle batch response being a dictionary

* fix(batches/main.py): modify retrieve endpoints to use @client decorator

enables logging to work on retrieve call

* fix(batches/main.py): fix retrieve batch response type to be 'dict' compatible

* fix(spend_tracking_utils.py): send unique uuid for retrieve batch call type

create batch and retrieve batch share the same id

* fix(spend_tracking_utils.py): prevent duplicate retrieve batch calls from being double counted

* refactor(batches/): refactor cost tracking for batches - do it on retrieve, and within the established litellm_logging pipeline

ensures cost is always logged to db

* fix: fix linting errors

* fix: fix linting error
2025-03-04 21:58:03 -08:00
Krish Dholakia
58141df65d
Litellm dev 02 13 2025 p2 (#8525)
* fix(azure/chat/gpt_transformation.py): add 'prediction' as a support azure param

Closes https://github.com/BerriAI/litellm/issues/8500

* build(model_prices_and_context_window.json): add new 'gemini-2.0-pro-exp-02-05' model

* style: cleanup invalid json trailing commma

* feat(utils.py): support passing 'tokenizer_config' to register_prompt_template

enables passing complete tokenizer config of model to litellm

 Allows calling deepseek on bedrock with the correct prompt template

* fix(utils.py): fix register_prompt_template for custom model names

* test(test_prompt_factory.py): fix test

* test(test_completion.py): add e2e test for bedrock invoke deepseek ft model

* feat(base_invoke_transformation.py): support hf_model_name param for bedrock invoke calls

enables proxy admin to set base model for ft bedrock deepseek model

* feat(bedrock/invoke): support deepseek_r1 route for bedrock

makes it easy to apply the right chat template to that call

* feat(constants.py): store deepseek r1 chat template - allow user to get correct response from deepseek r1 without extra work

* test(test_completion.py): add e2e mock test for bedrock deepseek

* docs(bedrock.md): document new deepseek_r1 route for bedrock

allows us to use the right config

* fix(exception_mapping_utils.py): catch read operation timeout
2025-02-13 20:28:42 -08:00
Krish Dholakia
47f46f92c8
Litellm dev 02 10 2025 p1 (#8438)
* fix(azure/chat/gpt_transformation.py): fix str compare to use int - ensure correct api version check is done

Resolves https://github.com/BerriAI/litellm/issues/8241#issuecomment-2647142891

* test(test_azure_openai.py): add better testing
2025-02-10 16:25:04 -08:00
Krish Dholakia
f651d51f26
Litellm dev 02 07 2025 p2 (#8377)
* fix(caching_routes.py): mask redis password on `/cache/ping` route

* fix(caching_routes.py): fix linting erro

* fix(caching_routes.py): fix linting error on caching routes

* fix: fix test - ignore mask_dict - has a breakpoint

* fix(azure.py): add timeout param + elapsed time in azure timeout error

* fix(http_handler.py): add elapsed time to http timeout request

makes it easier to debug how long request took before failing
2025-02-07 17:30:38 -08:00
Krish Dholakia
6b8b49451f
Fix azure max retries error (#8340)
* fix(azure.py): ensure max_retries=0 is respected

Fixes https://github.com/BerriAI/litellm/issues/6129

* fix(test_openai.py): add unit test to ensure openai sdk calls always respect max_retries = 0

* test(test_azure_openai.py): add unit testing for azure_text/ route

* fix(azure.py): fix passing max retries on streaming

* fix(azure.py): fix azure max retries on async completion + streaming

* fix(completion/handler.py): fix azure text async completion + streaming

* test(test_azure_openai.py): ensure azure openai max retries always respected

* test(test_azure_o_series.py): add testing to ensure max retries always respected

* Added gemini providers for 2.0-flash and 2.0-flash lite (#8321)

* Update model_prices_and_context_window.json

added gemini providers for 2.0-flash and 2.0-flash light

* Update model_prices_and_context_window.json

fixed URL

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* Convert tool use arguments to string before counting tokens (#6989)

In at least some cases the `messages["tool_calls"]["function"]["arguments"]` is a dict, not a string. In order to tokenize it properly it needs to be a string. In the case that it is already a string this is a noop, which is also fine.

* build(model_prices_and_context_window.json): add gemini 2.0 flash lite pricing

* build(model_prices_and_context_window.json): add gemini commercial rate limits

* fix(utils.py): fix linting error

* refactor(utils.py): refactor to maintain function size

---------

Co-authored-by: Bardia Khosravi <bardiakhosravi95@gmail.com>
Co-authored-by: Josh Morrow <josh@jcmorrow.com>
2025-02-06 23:20:48 -08:00
Krish Dholakia
443ae55904
Azure OpenAI improvements - o3 native streaming, improved tool call + response format handling (#8292)
* fix(convert_dict_to_response.py): only convert if response is the response_format tool call passed in

Fixes https://github.com/BerriAI/litellm/issues/8241

* fix(gpt_transformation.py): makes sure response format / tools conversion doesn't remove previous tool calls

* refactor(gpt_transformation.py): refactor out json schema converstion to base config

keeps logic consistent across providers

* fix(o_series_transformation.py): support o3 mini native streaming

Fixes https://github.com/BerriAI/litellm/issues/8274

* fix(gpt_transformation.py): remove unused variables

* test: update test
2025-02-05 19:38:58 -08:00
Krish Dholakia
97b8de17ab
LiteLLM Minor Fixes & Improvements (01/16/2025) - p2 (#7828)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 14s
* fix(vertex_ai/gemini/transformation.py): handle 'http://' image urls

* test: add base test for `http:` url's

* fix(factory.py/get_image_details): follow redirects

allows http calls to work

* fix(codestral/): fix stream chunk parsing on last chunk of stream

* Azure ad token provider (#6917)

* Update azure.py

Added optional parameter azure ad token provider

* Added parameter to main.py

* Found token provider arg location

* Fixed embeddings

* Fixed ad token provider

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix: fix linting errors

* fix(main.py): leave out o1 route for azure ad token provider, for now

get v0 out for sync azure gpt route to begin with

* test: skip http:// test for fireworks ai

model does not support it

* refactor: cleanup dead code

* fix: revert http:// url passthrough for gemini

google ai studio raises errors

* test: fix test

---------

Co-authored-by: bahtman <anton@baht.dk>
2025-02-02 23:17:50 -08:00
Krish Dholakia
1105e35538
Complete o3 model support (#8183)
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param

Closes https://github.com/BerriAI/litellm/issues/8182

* fix(main.py): ensure `reasoning_effort` is a mapped openai param

* refactor(azure/): rename o1_[x] files to o_series_[x]

* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort

* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests

* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers

Makes it easy to switch from openai to anthropic

* fix: fix linting errors

* fix(base_llm_unit_tests.py): fix test

* fix(main.py): add missing param
2025-02-02 22:36:37 -08:00
Krish Dholakia
23f458d2da
Improved O3 + Azure O3 support (#8181)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix: support azure o3 model family for fake streaming workaround (#8162)

* fix: support azure o3 model family for fake streaming workaround

* refactor: rename helper to is_o_series_model for clarity

* update function calling parameters for o3 models (#8178)

* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3

ensures max_tokens is correctly translated for o3

* feat(openai/): refactor o1 files to be 'o_series' files

expands naming to cover o3

* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets

* test(test_azure_o_series.py): assert stream faked for azure o3 mini

Resolves https://github.com/BerriAI/litellm/pull/8162

* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing

* docs(azure.md): update doc with `o_series/` model name

---------

Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
2025-02-01 09:52:28 -08:00
Krish Dholakia
8353caa485
build(pyproject.toml): bump uvicorn depedency requirement (#7773)
* build(pyproject.toml): bump uvicorn depedency requirement

Fixes https://github.com/BerriAI/litellm/issues/7768

* fix(anthropic/chat/transformation.py): fix is_vertex_request check to actually use optional param passed in

Fixes https://github.com/BerriAI/litellm/issues/6898#issuecomment-2590860695

* fix(o1_transformation.py): fix azure o1 'is_o1_model' check to just check for o1 in model string

https://github.com/BerriAI/litellm/issues/7743

* test: load vertex creds
2025-01-14 21:47:11 -08:00
Krish Dholakia
7b27cfb0ae
Support temporary budget increases on keys (#7754)
* fix(gpt_transformation.py): fix response_format translation check for 4o models

Fixes https://github.com/BerriAI/litellm/issues/7616

* feat(key_management_endpoints.py): support 'temp_budget_increase' and 'temp_budget_expiry' fields

Allow proxy admin to grant temporary budget increases to keys

* fix(proxy/_types.py): enforce temp_budget_increase and temp_budget_expiry are always passed together

* feat(user_api_key_auth.py): initial working temp budget increase logic

ensures key budget exceeded error checks for temp budget in key metadata

* feat(proxy_server.py): return the key max budget and key spend in the response headers

Allows clientside user to know their remaining limits

* test: add unit testing for new proxy utils

Ensures new key budget is correctly handled

* docs(temporary_budget_increase.md): add doc on temporary budget increase

* fix(utils.py): remove 3.5 from response_format check for now

not all azure  3.5 models support response_format

* fix(user_api_key_auth.py): return valid user api key auth object on all paths
2025-01-14 17:03:11 -08:00
Krish Dholakia
0120176541
Litellm dev 12 30 2024 p2 (#7495)
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model

* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests

skip as o1 on azure doesn't support tool calling yet

* fix: initial commit of azure o1 handler using openai caller

simplifies calling + allows fake streaming logic alr. implemented for openai to just work

* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models

azure does not currently support streaming for o1

* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info

enables user to toggle on when azure allows o1 streaming without needing to bump versions

* style(router.py): remove 'give feedback/get help' messaging when router is used

Prevents noisy messaging

Closes https://github.com/BerriAI/litellm/issues/5942

* fix(types/utils.py): handle none logprobs

Fixes https://github.com/BerriAI/litellm/issues/328

* fix(exception_mapping_utils.py): fix error str unbound error

* refactor(azure_ai/): move to openai_like chat completion handler

allows for easy swapping of api base url's (e.g. ai.services.com)

Fixes https://github.com/BerriAI/litellm/issues/7275

* refactor(azure_ai/): move to base llm http handler

* fix(azure_ai/): handle differing api endpoints

* fix(azure_ai/): make sure all unit tests are passing

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting error

* fix: fix linting errors

* fix(azure_ai/transformation.py): handle extra body param

* fix(azure_ai/transformation.py): fix max retries param handling

* fix: fix test

* test(test_azure_o1.py): fix test

* fix(llm_http_handler.py): support handling azure ai unprocessable entity error

* fix(llm_http_handler.py): handle sync invalid param error for azure ai

* fix(azure_ai/): streaming support with base_llm_http_handler

* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai

* fix: fix linting errors

* fix(llm_http_handler.py): fix linting error

* fix(azure_ai/): handle cohere tool call invalid index param error
2025-01-01 18:57:29 -08:00
Ishaan Jaff
38bfefa6ef
(Feat) - LiteLLM Use UsernamePasswordCredential for Azure OpenAI (#7496)
* add get_azure_ad_token_from_username_password

* docs azure use username / password for auth

* update doc

* get_azure_ad_token_from_username_password

* test test_get_azure_ad_token_from_username_password
2025-01-01 14:11:27 -08:00
Krish Dholakia
347779b813
Litellm dev 12 30 2024 p1 (#7480)
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model

* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests

skip as o1 on azure doesn't support tool calling yet

* fix: initial commit of azure o1 handler using openai caller

simplifies calling + allows fake streaming logic alr. implemented for openai to just work

* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models

azure does not currently support streaming for o1

* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info

enables user to toggle on when azure allows o1 streaming without needing to bump versions

* style(router.py): remove 'give feedback/get help' messaging when router is used

Prevents noisy messaging

Closes https://github.com/BerriAI/litellm/issues/5942

* test: fix azure o1 test

* test: fix tests

* fix: fix test
2024-12-30 21:52:52 -08:00
Ishaan Jaff
1e06ee3162
(Refactor) - Re use litellm.completion/litellm.embedding etc for health checks (#7455)
* add mode: realtime

* add _realtime_health_check

* test_realtime_health_check

* azure _realtime_health_check

* _realtime_health_check

* Realtime Models

* fix code quality

* delete OAI / Azure custom health check code

* simplest version of ahealth check

* update tests

* working health check post refactor

* working aspeech health check

* fix realtime health checks

* test_audio_transcription_health_check

* use get_audio_file_for_health_check

* test_text_completion_health_check

* ahealth_check

* simplify health check code

* update ahealth_check

* fix import

* fix unused imports

* fix ahealth_check

* fix local testing

* test_async_realtime_health_check
2024-12-28 18:38:54 -08:00
Ishaan Jaff
4e65722a00
(Bug Fix) Add health check support for realtime models (#7453)
* add mode: realtime

* add _realtime_health_check

* test_realtime_health_check

* azure _realtime_health_check

* _realtime_health_check

* Realtime Models

* fix code quality
2024-12-28 18:15:00 -08:00
Ishaan Jaff
2ece919f01
(Feat) - new endpoint GET /v1/fine_tuning/jobs/{fine_tuning_job_id:path} (#7427)
* init commit ft jobs logging

* add ft logging

* add logging for FineTuningJob

* simple FT Job create test

* simplify Azure fine tuning to use all methods in OAI ft

* update doc string

* add aretrieve_fine_tuning_job

* re use from litellm.proxy.utils import handle_exception_on_proxy

* fix naming

* add /fine_tuning/jobs/{fine_tuning_job_id:path}

* remove unused imports

* update func signature

* run ci/cd again

* ci/cd run again

* fix code qulity

* ci/cd run again
2024-12-27 17:01:14 -08:00
Krish Dholakia
9237357bcc
Litellm dev 12 25 2024 p1 (#7411)
* test(test_watsonx.py): e2e unit test for watsonx custom header

covers https://github.com/BerriAI/litellm/issues/7408

* fix(common_utils.py): handle auth token already present in headers (watsonx + openai-like base handler)

Fixes https://github.com/BerriAI/litellm/issues/7408

* fix(watsonx/chat): fix chat route

Fixes https://github.com/BerriAI/litellm/issues/7408

* fix(huggingface/chat/handler.py): fix huggingface async completion calls

* Correct handling of max_retries=0 to disable AzureOpenAI retries (#7379)

* test: fix test

---------

Co-authored-by: Minh Duc <phamminhduc0711@gmail.com>
2024-12-25 17:36:30 -08:00
Ishaan Jaff
81be0b4090
(Feat) add `"/v1/batches/{batch_id:path}/cancel" endpoint (#7406)
* use 1 file for azure batches handling

* add cancel_batch endpoint

* add a cancel batch on open ai

* add cancel_batch endpoint

* add cancel batches to test

* remove unused imports

* test_batches_operations

* update test_batches_operations
2024-12-24 20:23:50 -08:00
Krish Dholakia
404bf2974b
Litellm dev 2024 12 20 p1 (#7335)
* fix(utils.py): e2e azure tts cost tracking working

moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix: fix linting errors

* build(model_prices_and_context_window.json): add bedrock llama 3.3

Closes https://github.com/BerriAI/litellm/issues/7329

* fix(openai.py): fix return type for sync openai httpx response

* test: update test

* fix(spend_tracking_utils.py): fix if check

* fix(spend_tracking_utils.py): fix if check

* test: improve debugging for test

* fix: fix import
2024-12-20 21:22:31 -08:00
Ishaan Jaff
c7f14e936a
(code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Krish Dholakia
5253f639cd
fix(health.md): add rerank model health check information (#7295)
* fix(health.md): add rerank model health check information

* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits

* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true

* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature

* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini

* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models

needed as o1-preview, and o1-mini models don't support 'system message

* fix(o1_transformation.py): translate system message based on if o1 model supports it

* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview

o1 currently doesn't support streaming, but the other model versions do

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix: fix linting errors

* fix: update '_transform_messages'

* fix(o1_transformation.py): fix provider passed for supported param checks

* test(base_llm_unit_tests.py): skip test if api takes >5s to respond

* fix(utils.py): return false in 'supports_factory' if can't find value

* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1

* feat(openai.py): support stream faking natively in openai handler

Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview

 Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(openai.py): use inference param instead of original optional param
2024-12-18 19:18:10 -08:00
Ishaan Jaff
7a5dd29fe0
(fix) unable to pass input_type parameter to Voyage AI embedding mode (#7276)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 46s
* VoyageEmbeddingConfig

* fix voyage logic to get params

* add voyage embedding transformation

* add get_provider_embedding_config

* use BaseEmbeddingConfig

* voyage clean up

* use llm http handler for embedding transformations

* test_voyage_ai_embedding_extra_params

* add voyage async

* test_voyage_ai_embedding_extra_params

* add async for llm http handler

* update BaseLLMEmbeddingTest

* test_voyage_ai_embedding_extra_params

* fix linting

* fix get_provider_embedding_config

* fix anthropic text test

* update location of base/chat/transformation

* fix import path

* fix IBMWatsonXAIConfig
2024-12-17 19:23:49 -08:00
Krish Dholakia
b82add11ba
LITELLM: Remove requests library usage (#7235)
* fix(generic_api_callback.py): remove requests lib usage

* fix(budget_manager.py): remove requests lib usgae

* fix(main.py): cleanup requests lib usage

* fix(utils.py): remove requests lib usage

* fix(argilla.py): fix argilla test

* fix(athina.py): replace 'requests' lib usage with litellm module

* fix(greenscale.py): replace 'requests' lib usage with httpx

* fix: remove unused 'requests' lib import + replace usage in some places

* fix(prompt_layer.py): remove 'requests' lib usage from prompt layer

* fix(ollama_chat.py): remove 'requests' lib usage

* fix(baseten.py): replace 'requests' lib usage

* fix(codestral/): replace 'requests' lib usage

* fix(predibase/): replace 'requests' lib usage

* refactor: cleanup unused 'requests' lib imports

* fix(oobabooga.py): cleanup 'requests' lib usage

* fix(invoke_handler.py): remove unused 'requests' lib usage

* refactor: cleanup unused 'requests' lib import

* fix: fix linting errors

* refactor(ollama/): move ollama to using base llm http handler

removes 'requests' lib dep for ollama integration

* fix(ollama_chat.py): fix linting errors

* fix(ollama/completion/transformation.py): convert non-jpeg/png image to jpeg/png before passing to ollama
2024-12-17 12:50:04 -08:00
Ishaan Jaff
3c984ed60e
(feat) Add Azure Blob Storage Logging Integration (#7265)
* add path to http handler

* AzureBlobStorageLogger

* test_azure_blob_storage

* use constants for Azure storage

* use helper get_azure_ad_token_from_entrata_id

* azure blob storage support

* get_azure_ad_token_from_azure_storage

* fix import

* azure logging

* docs azure storage

* add docs on azure blobs

* add premium user check

* add azure_storage  as identified logging callback

* async_upload_payload_to_azure_blob_storage

* docs azure storage

* callback_class_str_to_classType
2024-12-16 22:18:22 -08:00
Krish Dholakia
516c2a6a70
Litellm remove circular imports (#7232)
* fix(utils.py): initial commit to remove circular imports - moves llmproviders to utils.py

* fix(router.py): fix 'litellm.EmbeddingResponse' import from router.py

'

* refactor: fix litellm.ModelResponse import on pass through endpoints

* refactor(litellm_logging.py): fix circular import for custom callbacks literal

* fix(factory.py): fix circular imports inside prompt factory

* fix(cost_calculator.py): fix circular import for 'litellm.Usage'

* fix(proxy_server.py): fix potential circular import with `litellm.Router'

* fix(proxy/utils.py): fix potential circular import in `litellm.Router`

* fix: remove circular imports in 'auth_checks' and 'guardrails/'

* fix(prompt_injection_detection.py): fix router impor t

* fix(vertex_passthrough_logging_handler.py): fix potential circular imports in vertex pass through

* fix(anthropic_pass_through_logging_handler.py): fix potential circular imports

* fix(slack_alerting.py-+-ollama_chat.py): fix modelresponse import

* fix(base.py): fix potential circular import

* fix(handler.py): fix potential circular ref in codestral + cohere handler's

* fix(azure.py): fix potential circular imports

* fix(gpt_transformation.py): fix modelresponse import

* fix(litellm_logging.py): add logging base class - simplify typing

makes it easy for other files to type check the logging obj without introducing circular imports

* fix(azure_ai/embed): fix potential circular import on handler.py

* fix(databricks/): fix potential circular imports in databricks/

* fix(vertex_ai/): fix potential circular imports on vertex ai embeddings

* fix(vertex_ai/image_gen): fix import

* fix(watsonx-+-bedrock): cleanup imports

* refactor(anthropic-pass-through-+-petals): cleanup imports

* refactor(huggingface/): cleanup imports

* fix(ollama-+-clarifai): cleanup circular imports

* fix(openai_like/): fix impor t

* fix(openai_like/): fix embedding handler

cleanup imports

* refactor(openai.py): cleanup imports

* fix(sagemaker/transformation.py): fix import

* ci(config.yml): add circular import test to ci/cd
2024-12-14 16:28:34 -08:00
Krish Dholakia
e68bb4e051
Litellm dev 12 12 2024 (#7203)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 47s
* fix(azure/): support passing headers to azure openai endpoints

Fixes https://github.com/BerriAI/litellm/issues/6217

* fix(utils.py): move default tokenizer to just openai

hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls

* fix(router.py): fix pattern matching router - add generic "*" to it as well

Fixes issue where generic "*" model access group wouldn't show up

* fix(pattern_match_deployments.py): match to more specific pattern

match to more specific pattern

allows setting generic wildcard model access group and excluding specific models more easily

* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty

don't delete all router models  b/c of empty list

Fixes https://github.com/BerriAI/litellm/issues/7196

* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api

* fix(fireworks_ai/): support passing response_format + tool call in same message

Addresses https://github.com/BerriAI/litellm/issues/7135

* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"

This reverts commit 6a30dc6929.

* test: fix test

* fix(replicate/): fix replicate default retry/polling logic

* test: add unit testing for router pattern matching

* test: update test to use default oai tokenizer

* test: mark flaky test

* test: skip flaky test
2024-12-13 08:54:03 -08:00
Ishaan Jaff
b5d55688e5
(Refactor) Code Quality improvement - remove /prompt_templates/ , base_aws_llm.py from /llms folder (#7164)
* fix move base_aws_llm

* fix import

* update enforce llms folder style

* move prompt_templates

* update prompt_templates location

* fix imports

* fix imports

* fix imports

* fix imports

* fix checks
2024-12-11 00:02:46 -08:00
Krish Dholakia
350cfc36f7
Litellm merge pr (#7161)
* build: merge branch

* test: fix openai naming

* fix(main.py): fix openai renaming

* style: ignore function length for config factory

* fix(sagemaker/): fix routing logic

* fix: fix imports

* fix: fix override
2024-12-10 22:49:26 -08:00
Ishaan Jaff
1d8956a3d4
Code Quality Improvement - remove file_apis, fine_tuning_apis from /llms (#7156)
* remove files_apis from /llms

* fix imports

* move fine tuning api from /llms

* fix importing fine tuning handlers

* fix imports
2024-12-10 21:44:25 -08:00
Ishaan Jaff
bfb6891eb7
rename llms/OpenAI/ -> llms/openai/ (#7154)
* rename OpenAI -> openai

* fix file rename

* fix rename changes

* fix organization of openai/transcription

* fix import OA fine tuning API

* fix openai ft handler

* fix handler import
2024-12-10 20:14:07 -08:00
Krish Dholakia
0c0498dd60
Litellm dev 12 07 2024 (#7086)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
* fix(main.py): support passing max retries to azure/openai embedding integrations

Fixes https://github.com/BerriAI/litellm/issues/7003

* feat(team_endpoints.py): allow updating team model aliases

Closes https://github.com/BerriAI/litellm/issues/6956

* feat(router.py): allow specifying model id as fallback - skips any cooldown check

Allows a default model to be checked if all models in cooldown

s/o @micahjsmith

* docs(reliability.md): add fallback to specific model to docs

* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util

Allows user to identify if messages/tools have prompt caching

Related issue: https://github.com/BerriAI/litellm/issues/6784

* feat(router.py): store model id for prompt caching valid prompt

Allows routing to that model id on subsequent requests

* fix(router.py): only cache if prompt is valid prompt caching prompt

prevents storing unnecessary items in cache

* feat(router.py): support routing prompt caching enabled models to previous deployments

Closes https://github.com/BerriAI/litellm/issues/6784

* test: fix linting errors

* feat(databricks/): convert basemodel to dict and exclude none values

allow passing pydantic message to databricks

* fix(utils.py): ensure all chat completion messages are dict

* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)

* add custom_llm_provider to SpendLogsPayload

* add custom_llm_provider to SpendLogs

* add custom llm provider to SpendLogs payload

* test_spend_logs_payload

* Add MLflow to the side bar (#7031)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* (bug fix) SpendLogs update DB catch all possible DB errors for retrying  (#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure

* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)

* fix model cost map fireworks ai "supports_response_schema": true,

* fix supports_response_schema

* fix map openai params fireworks ai

* test_map_response_format

* test_map_response_format

* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)

* bump: version 1.53.9 → 1.54.0

* fix deepinfra

* litellm db fixes LiteLLM_UserTable (#7089)

* ci/cd queue new release

* fix llama-3.3-70b-versatile

* refactor - use consistent file naming convention `AI21/` -> `ai21`  (#7090)

* fix refactor - use consistent file naming convention

* ci/cd run again

* fix naming structure

* fix use consistent naming (#7092)

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
2024-12-08 00:30:33 -08:00
Ishaan Jaff
36e99ebce7
fix use consistent naming (#7092)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
2024-12-07 22:01:00 -08:00