Commit graph

1886 commits

Author SHA1 Message Date
Krish Dholakia
e8ed40a27b
Litellm dev 01 01 2025 p2 (#7615)
* fix(utils.py): prevent double logging when passing 'fallbacks=' to .completion()

Fixes https://github.com/BerriAI/litellm/issues/7477

* fix(utils.py): fix vertex anthropic check

* fix(utils.py): ensure supported params is always set

Fixes https://github.com/BerriAI/litellm/issues/7470

* test(test_optional_params.py): add unit testing to prevent mistranslation

Fixes https://github.com/BerriAI/litellm/issues/7470

* fix: fix linting error

* test: cleanup
2025-01-07 21:40:33 -08:00
Ishaan Jaff
2ca0977921
aiohttp_openai/ fixes - allow using aiohttp_openai/gpt-4o (#7598)
* fixes for get_complete_url

* update aiohttp tests

* fix event loop for aiohtto

* ci/cd run again

* test_aiohttp_openai
2025-01-06 21:39:11 -08:00
Krish Dholakia
0c3fef24cd
Litellm dev 01 06 2025 p2 (#7597)
* test(test_amazing_vertex_completion.py): fix test

* test: initial working code gecko test

* fix(vertex_ai_non_gemini.py): support vertex ai code gecko fake streaming

Fixes https://github.com/BerriAI/litellm/issues/7360

* test(test_get_model_info.py): add test for getting custom provider model info

Covers https://github.com/BerriAI/litellm/issues/7575

* fix(utils.py): fix get_provider_model_info check

Handle custom llm provider scenario

Fixes https://github.com/
BerriAI/litellm/issues/7575
2025-01-06 21:04:49 -08:00
Ishaan Jaff
61d67cfa43
(perf) - fixes for aiohttp handler to hit 1K RPS (#7590)
* fix getting aiohttp sesson

* fix _get_async_client_session
2025-01-06 15:41:39 -08:00
Krish Dholakia
ce97e7e054
fix(groq/chat/transformation.py): fix groq response_format transformation (#7565)
Fixes https://github.com/BerriAI/litellm/issues/4804
2025-01-04 19:39:04 -08:00
Krish Dholakia
d43d83f9ef
feat(router.py): support request prioritization for text completion c… (#7540)
* feat(router.py): support request prioritization for text completion calls

* fix(internal_user_endpoints.py): fix sql query to return all keys, including null team id keys on `/user/info`

Fixes https://github.com/BerriAI/litellm/issues/7485

* fix: fix linting errors

* fix: fix linting error

* test(test_router_helper_utils.py): add direct test for '_schedule_factory'

Fixes code qa test
2025-01-03 19:35:44 -08:00
Krish Dholakia
f770dd0c95
Support checking provider-specific /models endpoints for available models based on key (#7538)
* test(test_utils.py): initial test for valid models

Addresses https://github.com/BerriAI/litellm/issues/7525

* fix: test

* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint

* refactor(fireworks_ai/): support checking model info on `/v1/models` route

* docs(set_keys.md): update docs to clarify check llm provider api usage

* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth

* fix(watsonx): read in watsonx token from env var

* fix: fix linting errors

* fix(utils.py): fix provider config check

* style: cleanup unused imports
2025-01-03 19:29:59 -08:00
Krish Dholakia
6843f3a2bb
Revert "fix: add missing parameters order, limit, before, and after in get_as…" (#7542)
This reverts commit 4b0505dffd.
2025-01-03 16:32:12 -08:00
Ishaan Jaff
c8f9e2afab fix _make_common_async_call 2025-01-03 15:15:38 -08:00
Ishaan Jaff
02875d4ae8
(fix) aiohttp_openai/ route - get to 1K RPS on single instance (#7539)
* ClientSession

* re use client_session

* _init_client_session

* fix aiohttp
2025-01-03 15:12:17 -08:00
Jean Carlo de Souza
4b0505dffd
fix: add missing parameters order, limit, before, and after in get_assistants method for openai (#7537)
- Ensured that `before` and `after` parameters are only passed when provided to avoid AttributeError.
- Implemented safe access using default values for `before` and `after` to prevent missing attribute issues.
- Added consistent handling of `order` and `limit` to improve flexibility and robustness in API calls.
2025-01-03 14:41:54 -08:00
Ishaan Jaff
b8173c2e9f fix unused imports
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
2025-01-02 22:28:22 -08:00
Ishaan Jaff
d861aa8ff3
(perf) use aiohttp for custom_openai (#7514)
* use aiohttp handler

* BaseLLMAIOHTTPHandler

* use CustomOpenAIChatConfig

* CustomOpenAIChatConfig

* CustomOpenAIChatConfig

* fix linting

* AiohttpOpenAIChatConfig

* fix order

* aiohttp_openai
2025-01-02 22:15:17 -08:00
Ishaan Jaff
4d93fe787b
Revert "(fix) GCS bucket logger - apply `truncate_standard_logging_payload_co…" (#7515)
This reverts commit 26a37c50c9.
2025-01-02 22:01:02 -08:00
Krish Dholakia
25e6f46910
Litellm dev 01 02 2025 p2 (#7512)
* feat(deepgram/transformation.py): support reading in deepgram api base from env var

* fix(litellm_logging.py): make skipping log message a .info

easier to see

* docs(logging.md): add doc on turn off all tracking/logging for a request
2025-01-02 21:57:51 -08:00
Ishaan Jaff
26a37c50c9
(fix) GCS bucket logger - apply truncate_standard_logging_payload_content to standard_logging_payload and ensure GCS flushes queue on fails (#7500)
* use truncate_standard_logging_payload_content

* update truncate_standard_logging_payload_content

* update dd logger

* update gcs async_send_batch

* fix code check

* test_datadog_payload_content_truncation

* fix code quality
2025-01-01 20:21:01 -08:00
Krish Dholakia
0120176541
Litellm dev 12 30 2024 p2 (#7495)
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model

* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests

skip as o1 on azure doesn't support tool calling yet

* fix: initial commit of azure o1 handler using openai caller

simplifies calling + allows fake streaming logic alr. implemented for openai to just work

* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models

azure does not currently support streaming for o1

* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info

enables user to toggle on when azure allows o1 streaming without needing to bump versions

* style(router.py): remove 'give feedback/get help' messaging when router is used

Prevents noisy messaging

Closes https://github.com/BerriAI/litellm/issues/5942

* fix(types/utils.py): handle none logprobs

Fixes https://github.com/BerriAI/litellm/issues/328

* fix(exception_mapping_utils.py): fix error str unbound error

* refactor(azure_ai/): move to openai_like chat completion handler

allows for easy swapping of api base url's (e.g. ai.services.com)

Fixes https://github.com/BerriAI/litellm/issues/7275

* refactor(azure_ai/): move to base llm http handler

* fix(azure_ai/): handle differing api endpoints

* fix(azure_ai/): make sure all unit tests are passing

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting error

* fix: fix linting errors

* fix(azure_ai/transformation.py): handle extra body param

* fix(azure_ai/transformation.py): fix max retries param handling

* fix: fix test

* test(test_azure_o1.py): fix test

* fix(llm_http_handler.py): support handling azure ai unprocessable entity error

* fix(llm_http_handler.py): handle sync invalid param error for azure ai

* fix(azure_ai/): streaming support with base_llm_http_handler

* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai

* fix: fix linting errors

* fix(llm_http_handler.py): fix linting error

* fix(azure_ai/): handle cohere tool call invalid index param error
2025-01-01 18:57:29 -08:00
Ishaan Jaff
38bfefa6ef
(Feat) - LiteLLM Use UsernamePasswordCredential for Azure OpenAI (#7496)
* add get_azure_ad_token_from_username_password

* docs azure use username / password for auth

* update doc

* get_azure_ad_token_from_username_password

* test test_get_azure_ad_token_from_username_password
2025-01-01 14:11:27 -08:00
Ishaan Jaff
2979b8301c
(feat) POST /fine_tuning/jobs support passing vertex specific hyper params (#7490)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* update convert_openai_request_to_vertex

* test_create_vertex_fine_tune_jobs_mocked

* fix order of methods

* update LiteLLMFineTuningJobCreate

* update OpenAIFineTuningHyperparameters

* update vertex hyper params in response

* _transform_openai_hyperparameters_to_vertex_hyperparameters

* supervised_tuning_spec["hyperParameters"] fix

* fix mapping for ft params testing

* docs fine tuning apis

* fix test_convert_basic_openai_request_to_vertex_request

* update hyperparams for create fine tuning

* fix linting

* test_create_vertex_fine_tune_jobs_mocked_with_hyperparameters

* run ci/cd again

* test_convert_basic_openai_request_to_vertex_request
2025-01-01 07:44:48 -08:00
Sven Seeberg
833bf1da22
fix ollama embedding model response #7451 (#7473)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
2024-12-31 23:24:08 -08:00
Ishaan Jaff
859f6e1635
(fix) v1/fine_tuning/jobs with VertexAI (#7487)
* update convert_openai_request_to_vertex

* test_create_vertex_fine_tune_jobs_mocked
2024-12-31 15:09:56 -08:00
Krish Dholakia
347779b813
Litellm dev 12 30 2024 p1 (#7480)
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model

* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests

skip as o1 on azure doesn't support tool calling yet

* fix: initial commit of azure o1 handler using openai caller

simplifies calling + allows fake streaming logic alr. implemented for openai to just work

* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models

azure does not currently support streaming for o1

* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info

enables user to toggle on when azure allows o1 streaming without needing to bump versions

* style(router.py): remove 'give feedback/get help' messaging when router is used

Prevents noisy messaging

Closes https://github.com/BerriAI/litellm/issues/5942

* test: fix azure o1 test

* test: fix tests

* fix: fix test
2024-12-30 21:52:52 -08:00
Krish Dholakia
31ace870a2
Litellm dev 12 28 2024 p1 (#7463)
* refactor(utils.py): migrate amazon titan config to base config

* refactor(utils.py): refactor bedrock meta invoke model translation to use base config

* refactor(utils.py): move bedrock ai21 to base config

* refactor(utils.py): move bedrock cohere to base config

* refactor(utils.py): move bedrock mistral to use base config

* refactor(utils.py): move all provider optional param translations to using a config

* docs(clientside_auth.md): clarify how to pass vertex region to litellm proxy

* fix(utils.py): handle scenario where custom llm provider is none / empty

* fix: fix get config

* test(test_otel_load_tests.py): widen perf margin

* fix(utils.py): fix get provider config check to handle custom llm's

* fix(utils.py): fix check
2024-12-28 20:26:00 -08:00
Krish Dholakia
cfb6890b9f
Litellm dev 12 28 2024 p2 (#7458)
* docs(sidebar.js): docs for support model access groups for wildcard routes

* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route

* refactor(docs/): make control model access a root-level doc in proxy sidebar

easier to discover how to control model access on litellm

* docs: more cleanup

* feat(fireworks_ai/): add document inlining support

Enables user to call non-vision models with images/pdfs/etc.

* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util

* docs(docs/): add document inlining details to fireworks ai docs

* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline

allows client-side disabling of this feature for proxy users

* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models

now true as fireworks ai supports document inlining

* test: fix tests

* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
2024-12-28 19:38:06 -08:00
Krish Dholakia
5af438ed89
Litellm dev 12 28 2024 p3 (#7464)
* feat(deepgram/): initial e2e support for deepgram stt

Uses deepgram's `/listen` endpoint to transcribe speech to text

 Closes https://github.com/BerriAI/litellm/issues/4875

* fix: fix linting errors

* test: fix test
2024-12-28 19:18:58 -08:00
Ishaan Jaff
1e06ee3162
(Refactor) - Re use litellm.completion/litellm.embedding etc for health checks (#7455)
* add mode: realtime

* add _realtime_health_check

* test_realtime_health_check

* azure _realtime_health_check

* _realtime_health_check

* Realtime Models

* fix code quality

* delete OAI / Azure custom health check code

* simplest version of ahealth check

* update tests

* working health check post refactor

* working aspeech health check

* fix realtime health checks

* test_audio_transcription_health_check

* use get_audio_file_for_health_check

* test_text_completion_health_check

* ahealth_check

* simplify health check code

* update ahealth_check

* fix import

* fix unused imports

* fix ahealth_check

* fix local testing

* test_async_realtime_health_check
2024-12-28 18:38:54 -08:00
Ishaan Jaff
4e65722a00
(Bug Fix) Add health check support for realtime models (#7453)
* add mode: realtime

* add _realtime_health_check

* test_realtime_health_check

* azure _realtime_health_check

* _realtime_health_check

* Realtime Models

* fix code quality
2024-12-28 18:15:00 -08:00
Krish Dholakia
0924df4971
Litellm dev 12 27 2024 p2 1 (#7449)
* fix(azure_ai/transformation.py): route ai.services.azure calls to the azure provider route

requires token to be passed in as 'api-key'

Closes https://github.com/BerriAI/litellm/issues/7275

* fix(key_management_endpoints.py): enforce user is member of team, if team_id set and team_id exists in team table

* fix(key_management_endpoints.py): handle assigned_user_id = none

* feat(create_key_button.tsx): allow assigning keys to other users

allows proxy admin to easily assign other people keys

* build(create_key_button.tsx): fix error message display

don't swallow the error message for key creation failure

* build(create_key_button.tsx): allow proxy admin to edit team id

* build(create_key_button.tsx): allow proxy admin to assign keys to other users

* build(edit_user.tsx): clarify how 'user budgets' are applied

* test: remove dup test

* fix(key_management_endpoints.py): don't raise error if team not in db

'

* test: fix test
2024-12-27 20:02:32 -08:00
Ishaan Jaff
2ece919f01
(Feat) - new endpoint GET /v1/fine_tuning/jobs/{fine_tuning_job_id:path} (#7427)
* init commit ft jobs logging

* add ft logging

* add logging for FineTuningJob

* simple FT Job create test

* simplify Azure fine tuning to use all methods in OAI ft

* update doc string

* add aretrieve_fine_tuning_job

* re use from litellm.proxy.utils import handle_exception_on_proxy

* fix naming

* add /fine_tuning/jobs/{fine_tuning_job_id:path}

* remove unused imports

* update func signature

* run ci/cd again

* ci/cd run again

* fix code qulity

* ci/cd run again
2024-12-27 17:01:14 -08:00
Krish Dholakia
760328b6ad
Litellm dev 12 25 2025 p2 (#7420)
* test: add new test image embedding to base llm unit tests

Addresses https://github.com/BerriAI/litellm/issues/6515

* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings

Fix https://github.com/BerriAI/litellm/issues/6515

* feat: initial commit for fireworks ai audio transcription support

Relevant issue: https://github.com/BerriAI/litellm/issues/7134

* test: initial fireworks ai test

* feat(fireworks_ai/): implemented fireworks ai audio transcription config

* fix(utils.py): register fireworks ai audio transcription config, in config manager

* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'

* refactor(fireworks_ai/): define text completion route with model name handling

moves model name handling to specific fireworks routes, as required by their api

* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix(handler.py): fix linting errors

* fix(main.py): fix tgai text completion route

* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request

* refactor: move test_fine_tuning_api out of local_testing

reduces local testing ci/cd time
2024-12-25 18:35:34 -08:00
Krish Dholakia
9237357bcc
Litellm dev 12 25 2024 p1 (#7411)
* test(test_watsonx.py): e2e unit test for watsonx custom header

covers https://github.com/BerriAI/litellm/issues/7408

* fix(common_utils.py): handle auth token already present in headers (watsonx + openai-like base handler)

Fixes https://github.com/BerriAI/litellm/issues/7408

* fix(watsonx/chat): fix chat route

Fixes https://github.com/BerriAI/litellm/issues/7408

* fix(huggingface/chat/handler.py): fix huggingface async completion calls

* Correct handling of max_retries=0 to disable AzureOpenAI retries (#7379)

* test: fix test

---------

Co-authored-by: Minh Duc <phamminhduc0711@gmail.com>
2024-12-25 17:36:30 -08:00
Krish Dholakia
39dabb2e89
Litellm dev 12 24 2024 p4 (#7407)
* fix(invoke_handler.py): fix mock response iterator to handle tool calling

returns tool call if returned by model response

* fix(prometheus.py): add new 'tokens_by_tag' metric on prometheus

allows tracking 'token usage' by task

* feat(prometheus.py): add input + output token tracking by tag

* feat(prometheus.py): add tag based deployment failure tracking

allows admin to track failure by use-case
2024-12-24 20:24:06 -08:00
Ishaan Jaff
81be0b4090
(Feat) add `"/v1/batches/{batch_id:path}/cancel" endpoint (#7406)
* use 1 file for azure batches handling

* add cancel_batch endpoint

* add a cancel batch on open ai

* add cancel_batch endpoint

* add cancel batches to test

* remove unused imports

* test_batches_operations

* update test_batches_operations
2024-12-24 20:23:50 -08:00
Krish Dholakia
3ac54483a7
Litellm dev 12 24 2024 p3 (#7403)
* fix(model_prices_and_context_window.json): specify meta llama is a bedrock converse model route

Fixes https://github.com/BerriAI/litellm/issues/7385

* test(test_get_model_info.py): enforce all new bedrock chat models added have the bedrock_converse route

Prevents https://github.com/BerriAI/litellm/issues/7385 and https://github.com/BerriAI/litellm/discussions/7325

* fix(get_supported_openai_params.py): use vertex gemini config by default for vertex ai route

Fixes https://github.com/BerriAI/litellm/issues/7378

* refactor(vertex_ai/gemini/): rename vertexaiconfig to vertexaibaseconfig - make it clear vertexaiconfig = vertexgemini config

* build(model_prices_and_context_window.json): add gpt-4o-audio-preview-2024-12-17

Closes https://github.com/BerriAI/litellm/issues/7367

* test: fix test

* test: fix o1 tests

* fix: handle llm api errors

* fix: fix linting errors
2024-12-24 18:07:53 -08:00
Krish Dholakia
c3edfc2c92
LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 35s
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching

* fix(context_caching/transformation.py): just use last identified cache point

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(vertex_ai/gemini/): track context caching tokens

* refactor(gemini/): place transformation.py inside `chat/` folder

make it easy for user to know we support the equivalent endpoint

* fix: fix import

* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder

make it easier to see cost calculation logic

* fix: fix linting errors

* fix: fix circular import

* feat(gemini/cost_calculator.py): support gemini context caching cost calculation

generifies anthropic's cost calculation function and uses it across anthropic + gemini

* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching

Closes https://github.com/BerriAI/litellm/issues/6891

* docs(gemini.md): add gemini context caching architecture diagram

make it easier for user to understand how context caching works

* docs(gemini.md): link to relevant gemini context caching code

* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code

* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario

* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation

* test: fix test
2024-12-23 22:02:52 -08:00
Ishaan Jaff
05b0d2026f
(feat) Add cost tracking for /batches requests OpenAI (#7384)
* add basic logging for create`batch`

* add create_batch as a call type

* add basic dd logging for batches

* basic batch creation logging on DD

* batch endpoints add cost calc

* fix batches_async_logging

* separate folder for batches testing

* new job for batches tests

* test batches logging

* fix validation logic

* add vertex_batch_completions.jsonl

* test test_async_create_batch

* test_async_create_batch

* update tests

* test_completion_with_no_model

* remove dead code

* update load_vertex_ai_credentials

* test_avertex_batch_prediction

* update get async httpx client

* fix get_async_httpx_client

* update test_avertex_batch_prediction

* fix batches testing config.yaml

* add google deps

* fix vertex files handler
2024-12-23 17:47:26 -08:00
Krish Dholakia
db59e08958
Litellm dev 12 23 2024 p1 (#7383)
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint

Allow users to view what the available guardrails are

* docs: document new `/guardrails/list` endpoint

* docs(enterprise.md): update docs

* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats

* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests

* fix: fix linting errors

* fix: remove unused import
2024-12-23 16:33:31 -08:00
Krish Dholakia
3671829e39
Complete 'requests' library removal (#7350)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure

* refactor(watsonx/completion/handler.py): move to using base llm http handler

removes 'requests' library usage

* fix(watsonx_text/transformation.py): fix result transformation

migrates to transformation.py, for usage with base llm http handler

* fix(streaming_handler.py): migrate watsonx streaming to transformation.py

ensures streaming works with base llm http handler

* fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic

* fix(watsonx/): fix chat route post completion route refactor

* refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well

* refactor(base.py): remove requests library usage from litellm

* build(pyproject.toml): remove requests library usage

* fix: fix linting errors

* fix: fix linting errors

* fix(types/utils.py): fix validation errors for modelresponsestream

* fix(replicate/handler.py): fix linting errors

* fix(litellm_logging.py): handle modelresponsestream object

* fix(streaming_handler.py): fix modelresponsestream args

* fix: remove unused imports

* test: fix test

* fix: fix test

* test: fix test

* test: fix tests

* test: fix test

* test: fix patch target

* test: fix test
2024-12-22 07:21:25 -08:00
Krish Dholakia
404bf2974b
Litellm dev 2024 12 20 p1 (#7335)
* fix(utils.py): e2e azure tts cost tracking working

moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix: fix linting errors

* build(model_prices_and_context_window.json): add bedrock llama 3.3

Closes https://github.com/BerriAI/litellm/issues/7329

* fix(openai.py): fix return type for sync openai httpx response

* test: update test

* fix(spend_tracking_utils.py): fix if check

* fix(spend_tracking_utils.py): fix if check

* test: improve debugging for test

* fix: fix import
2024-12-20 21:22:31 -08:00
Ishaan Jaff
6107f9f3f3
[Bug fix ]: Triton /infer handler incompatible with batch responses (#7337)
* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage
2024-12-20 20:59:40 -08:00
Krish Dholakia
27a4d08604
Litellm dev 2024 12 19 p3 (#7322)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params

Fixes https://github.com/BerriAI/litellm/issues/7242

* test: new test for langfuse prompt management hook

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(main.py): add 'get_chat_completion_prompt' customlogger hook

allows for langfuse prompt management

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(langfuse_prompt_management.py): working e2e langfuse prompt management

works with `langfuse/` route

* feat(main.py): initial tracing for dynamic langfuse params

allows admin to specify langfuse keys by model in model_list

* feat(main.py): support passing langfuse credentials dynamically

* fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params

allows dynamic langfuse params to work

* fix: fix linting errors

* docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial

* docs(prompt_management.md): cleanup doc

* docs: cleanup topnav

* docs(prompt_management.md): update docs to be easier to use

* fix: remove unused imports

* docs(prompt_management.md): add architectural overview doc

* fix(litellm_logging.py): fix dynamic param passing

* fix(langfuse_prompt_management.py): fix linting errors

* fix: fix linting errors

* fix: use typing_extensions for typealias to ensure python3.8 compatibility

* test: use stream_options in test to account for tiktoken diff

* fix: improve import error message, and check run test earlier
2024-12-20 13:30:16 -08:00
Ishaan Jaff
617ac63d14
(feat) add infinity rerank models (#7321)
* Support Infinity Reranker (custom reranking models) (#7247)

* Support Infinity Reranker

* Clean code

* Included transformation.py

* Clean code

* Added Infinity reranker test

* Clean code

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* transform_rerank_response

* update handler.py

* infinity rerank updates

* ci/cd run again

* add infinity unit tests

* docs add instruction on how to add a new provider for rerank

---------

Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2024-12-19 18:30:28 -08:00
Ishaan Jaff
5f15b0aa20
(code refactor) - Add BaseRerankConfig. Use BaseRerankConfig for cohere/rerank and azure_ai/rerank (#7319)
* add base rerank config

* working sync cohere rerank

* update rerank types

* update base rerank config

* remove old rerank

* add new cohere handler.py

* add cohere rerank transform

* add get_provider_rerank_config

* add rerank to base llm http handler

* add rerank utils

* add arerank to llm http handler.py

* add AzureAIRerankConfig

* updates rerank config

* update test rerank

* fix unused imports

* update get_provider_rerank_config

* test_basic_rerank_caching

* fix unused import

* test rerank
2024-12-19 17:03:34 -08:00
Ishaan Jaff
c7f14e936a
(code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Krish Dholakia
62b00cf28d
o1 - add image param handling (#7312)
* fix(openai.py): fix returning o1 non-streaming requests

fixes issue where fake stream always true for o1

* build(model_prices_and_context_window.json): add 'supports_vision' for o1 models

* fix: add internal server error exception mapping

* fix(base_llm_unit_tests.py): drop temperature from test

* test: mark prompt caching as a flaky test
2024-12-19 11:22:25 -08:00
Krish Dholakia
5253f639cd
fix(health.md): add rerank model health check information (#7295)
* fix(health.md): add rerank model health check information

* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits

* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true

* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature

* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini

* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models

needed as o1-preview, and o1-mini models don't support 'system message

* fix(o1_transformation.py): translate system message based on if o1 model supports it

* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview

o1 currently doesn't support streaming, but the other model versions do

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix: fix linting errors

* fix: update '_transform_messages'

* fix(o1_transformation.py): fix provider passed for supported param checks

* test(base_llm_unit_tests.py): skip test if api takes >5s to respond

* fix(utils.py): return false in 'supports_factory' if can't find value

* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1

* feat(openai.py): support stream faking natively in openai handler

Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview

 Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(openai.py): use inference param instead of original optional param
2024-12-18 19:18:10 -08:00
Krish Dholakia
6a45ee1ef7
fix(hosted_vllm/transformation.py): return fake api key, if none give… (#7301)
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error

Fixes https://github.com/BerriAI/litellm/issues/7291

* test: fix test

* fix(main.py): add hosted_vllm/ support for embeddings endpoint

Closes https://github.com/BerriAI/litellm/issues/7290

* docs(vllm.md): add docs on vllm embeddings usage

* fix(__init__.py): fix sambanova model test

* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
2024-12-18 18:41:53 -08:00
Ishaan Jaff
7a5dd29fe0
(fix) unable to pass input_type parameter to Voyage AI embedding mode (#7276)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 46s
* VoyageEmbeddingConfig

* fix voyage logic to get params

* add voyage embedding transformation

* add get_provider_embedding_config

* use BaseEmbeddingConfig

* voyage clean up

* use llm http handler for embedding transformations

* test_voyage_ai_embedding_extra_params

* add voyage async

* test_voyage_ai_embedding_extra_params

* add async for llm http handler

* update BaseLLMEmbeddingTest

* test_voyage_ai_embedding_extra_params

* fix linting

* fix get_provider_embedding_config

* fix anthropic text test

* update location of base/chat/transformation

* fix import path

* fix IBMWatsonXAIConfig
2024-12-17 19:23:49 -08:00
Krish Dholakia
179d2f56b7
LiteLLM Minor Fixes & Improvements (12/16/2024) - p1 (#7263)
* fix(factory.py): skip empty text blocks for bedrock user messages

Fixes https://github.com/BerriAI/litellm/issues/7169

* Add support for Gemini 2.0 GoogleSearch tool (#7257)

* Add support for google_search tool in gemini 2.0

* Add/modify tests

* Fix grounding check

* Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST

* Swap order of tools

* DFix formatting

* fix(get_api_base.py): return api base in streaming response

Fixes https://github.com/BerriAI/litellm/issues/7249

Closes https://github.com/BerriAI/litellm/pull/7250

* fix(cost_calculator.py): only set base model to model if not none

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix(cost_calculator.py): enforce stricter order when picking model for cost calculation

* fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided

* fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given

* fix(cost_calculator.py): handle `custom_llm_provider-` scenario

* fix(cost_calculator.py): e2e working tts cost tracking

ensures initial message is passed in, to cost calculator

* fix(factory.py): suppress linting errors

* fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model

* fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly

* test: handle none env var value in flaky test

* fix(litellm_logging.py): fix linting errors

---------

Co-authored-by: Sam B <samlingx@gmail.com>
2024-12-17 15:33:36 -08:00
Krish Dholakia
019ddc32d6
Litellm dev 12 17 2024 p2 (#7277)
* fix(openai/transcription/handler.py): call 'log_pre_api_call' on async calls

* fix(openai/transcriptions/handler.py): call 'logging.pre_call' on sync whisper calls as well

* fix(proxy_cli.py): remove default proxy_cli timeout param

gets passed in as a dynamic request timeout and overrides config values

* fix(langfuse.py): pass litellm httpx client - contains ssl certs (#7052)

Fixes https://github.com/BerriAI/litellm/issues/7046
2024-12-17 14:05:14 -08:00