Commit graph

75 commits

Author SHA1 Message Date
Krish Dholakia
d58fe5a9f9 Add OpenAI gpt-4o-transcribe support (#9517)
* refactor: introduce new transformation config for gpt-4o-transcribe models

* refactor: expose new transformation configs for audio transcription

* ci: fix config yml

* feat(openai/transcriptions): support provider config transformation on openai audio transcriptions

allows gpt-4o and whisper audio transformation to work as expected

* refactor: migrate fireworks ai + deepgram to new transform request pattern

* feat(openai/): working support for gpt-4o-audio-transcribe

* build(model_prices_and_context_window.json): add gpt-4o-transcribe to model cost map

* build(model_prices_and_context_window.json): specify what endpoints are supported for `/audio/transcriptions`

* fix(get_supported_openai_params.py): fix return

* refactor(deepgram/): migrate unit test to deepgram handler

* refactor: cleanup unused imports

* fix(get_supported_openai_params.py): fix linting error

* test: update test
2025-03-26 23:10:25 -07:00
Krish Dholakia
9c083e7d2c Support Gemini audio token cost tracking + fix openai audio input token cost tracking (#9535)
* fix(vertex_and_google_ai_studio_gemini.py): log gemini audio tokens in usage object

enables accurate cost tracking

* refactor(vertex_ai/cost_calculator.py): refactor 128k+ token cost calculation to only run if model info has it

Google has moved away from this for gemini-2.0 models

* refactor(vertex_ai/cost_calculator.py): migrate to usage object for more flexible data passthrough

* fix(llm_cost_calc/utils.py): support audio token cost tracking in generic cost per token

enables vertex ai cost tracking to work with audio tokens

* fix(llm_cost_calc/utils.py): default to total prompt tokens if text tokens field not set

* refactor(llm_cost_calc/utils.py): move openai cost tracking to generic cost per token

more consistent behaviour across providers

* test: add unit test for gemini audio token cost calculation

* ci: bump ci config

* test: fix test
2025-03-26 17:26:25 -07:00
Ishaan Jaff
435a89dd79 transform_responses_api_request 2025-03-20 12:28:55 -07:00
Ishaan Jaff
1567e52185 add should_fake_stream 2025-03-20 09:54:26 -07:00
Krrish Dholakia
b228456b67 feat(azure/gpt_transformation.py): add azure audio model support
Closes https://github.com/BerriAI/litellm/issues/6305
2025-03-19 22:57:49 -07:00
Ishaan Jaff
08cb68c8fb fix import hashlib 2025-03-19 21:08:19 -07:00
Ishaan Jaff
55e669d7d8 get_openai_client_cache_key 2025-03-18 18:35:50 -07:00
Ishaan Jaff
4bac8f53a5 fix common utils 2025-03-18 17:59:46 -07:00
Ishaan Jaff
9f31177a20 use common caching logic for openai/azure clients 2025-03-18 17:57:03 -07:00
Ishaan Jaff
ef91a0c72b use common logic for re-using openai clients 2025-03-18 17:56:32 -07:00
Ishaan Jaff
b3bc8a3231 :test_completion_azure_ad_toke 2025-03-18 12:25:32 -07:00
Ishaan Jaff
1ede6080ef fix logic for intializing openai clients 2025-03-18 10:23:30 -07:00
Ishaan Jaff
3aa1e78ec3 handle _get_async_http_client for OpenAI 2025-03-18 08:56:08 -07:00
Krish Dholakia
72f92853e0 Merge branch 'main' into litellm_dev_03_12_2025_p1 2025-03-12 22:14:02 -07:00
Krrish Dholakia
f41cc5691d refactor: update method signature 2025-03-12 15:23:38 -07:00
Krish Dholakia
103b3cb574 Merge branch 'main' into litellm_dev_03_10_2025_p3 2025-03-12 14:56:01 -07:00
Ishaan Jaff
f75022ab59 test_validate_environment 2025-03-12 12:57:40 -07:00
Ishaan Jaff
9e47ec53fa Optional[Dict] 2025-03-12 12:29:13 -07:00
Ishaan Jaff
75a1281f77 ResponsesAPIStreamEvents 2025-03-11 23:42:35 -07:00
Ishaan Jaff
64bac26582 add debug logging 2025-03-11 23:13:10 -07:00
Ishaan Jaff
7b1172d9b9 add async streaming support 2025-03-11 20:00:42 -07:00
Ishaan Jaff
5dac3a5d3b working basic openai response api request 2025-03-11 17:37:19 -07:00
Krrish Dholakia
2f262ed9b4 refactor(azure.py): refactor to have client init work across all endpoints 2025-03-11 17:27:24 -07:00
Ishaan Jaff
cb2b49f7a3 transform_response_api_response 2025-03-11 17:03:31 -07:00
Ishaan Jaff
e4d177d65d add transform_response_api_response 2025-03-11 16:53:18 -07:00
Ishaan Jaff
03765d334c add transform_request for OpenAI responses API 2025-03-11 16:33:26 -07:00
Ishaan Jaff
73a94058ab add validate_environment, get_complete_url 2025-03-11 16:12:36 -07:00
Ishaan Jaff
8dfd1dc136 working transform 2025-03-11 15:24:42 -07:00
Ishaan Jaff
2fbcf88fda add OpenAIResponsesAPIConfig 2025-03-11 15:10:34 -07:00
Krrish Dholakia
cc0606b38d feat(openai.py): bubble all error information back to client 2025-03-10 15:27:43 -07:00
Krrish Dholakia
bb2fa73609 refactor: instrument body param to bubble up on exception 2025-03-10 15:21:04 -07:00
Krish Dholakia
5bd6e93ac7 Support format param for specifying image type (#9019)
* fix(transformation.py): support a 'format' parameter for image's

allow user to specify mime type

* fix: pass mimetype via 'format' param

* feat(gemini/chat/transformation.py): support 'format' param for gemini

* fix(factory.py): support 'format' param on sync bedrock converse calls

* feat(bedrock/converse_transformation.py): support 'format' param for bedrock async calls

* refactor(factory.py): move to supporting 'format' param in base helper

ensures consistency in param support

* feat(gpt_transformation.py): filter out 'format' param

don't send invalid param to openai

* fix(gpt_transformation.py): fix translation

* fix: fix translation error
2025-03-05 19:52:53 -08:00
Krish Dholakia
b43b8dc21c Litellm dev 03 04 2025 p3 (#8997)
* fix(core_helpers.py): handle litellm_metadata instead of 'metadata'

* feat(batches/): ensure batches logs are written to db

makes batches response dict compatible

* fix(cost_calculator.py): handle batch response being a dictionary

* fix(batches/main.py): modify retrieve endpoints to use @client decorator

enables logging to work on retrieve call

* fix(batches/main.py): fix retrieve batch response type to be 'dict' compatible

* fix(spend_tracking_utils.py): send unique uuid for retrieve batch call type

create batch and retrieve batch share the same id

* fix(spend_tracking_utils.py): prevent duplicate retrieve batch calls from being double counted

* refactor(batches/): refactor cost tracking for batches - do it on retrieve, and within the established litellm_logging pipeline

ensures cost is always logged to db

* fix: fix linting errors

* fix: fix linting error
2025-03-04 21:58:03 -08:00
Krrish Dholakia
225f5124db build: merge branch 2025-03-02 08:31:57 -08:00
Ishaan Jaff
2237751838 [Bug]: Deepseek error on proxy after upgrading to 1.61.13-stable (#8860)
* fix deepseek error

* test_deepseek_provider_async_completion

* fix get_complete_url
2025-02-26 21:11:06 -08:00
Krish Dholakia
f230de9c33 fix(o_series_transformation.py): fix optional param check for o-serie… (#8787)
* fix(o_series_transformation.py): fix optional param check for o-series models

o3-mini and o-1 do not support parallel tool calling

* fix(utils.py): support 'drop_params' for 'thinking' param across models

allows switching to older claude versions (or non-anthropic models) and param to be safely dropped

* fix: fix passing thinking param in optional params

allows dropping thinking_param where not applicable

* test: update old model

* fix(utils.py): fix linting errors

* fix(main.py): add param to acompletion
2025-02-26 12:26:55 -08:00
Ishaan Jaff
684d6c8c42 (Bug Fix) Using LiteLLM Python SDK with model=litellm_proxy/ for embedding, image_generation, transcription, speech, rerank (#8815)
* test_litellm_gateway_from_sdk

* fix embedding check for openai

* test litellm proxy provider

* fix image generation openai compatible models

* fix litellm.transcription

* test_litellm_gateway_from_sdk_rerank

* docs litellm python sdk

* docs litellm python sdk with proxy

* test_litellm_gateway_from_sdk_rerank

* ci/cd run again

* test_litellm_gateway_from_sdk_image_generation

* test_litellm_gateway_from_sdk_embedding

* test_litellm_gateway_from_sdk_embedding
2025-02-25 16:22:37 -08:00
Krish Dholakia
d041f30b4f feat(openai/o_series_transformation.py): support native streaming for all openai o-series models (#8552)
o1 now supports streaming
2025-02-14 20:04:19 -08:00
Krish Dholakia
65aed4d0e1 Litellm dev 02 10 2025 p2 (#8443)
* Fixed issue #8246 (#8250)

* Fixed issue #8246

* Added unit tests for discard() and for remove_callback_from_list_by_object()

* fix(openai.py): support dynamic passing of organization param to openai

handles scenario where client-side org id is passed to openai

---------

Co-authored-by: Erez Hadad <erezh@il.ibm.com>
2025-02-10 17:53:46 -08:00
Krish Dholakia
7228cca663 Litellm dev 02 07 2025 p3 (#8387)
* add back streaming for base o3 (#8361)

* test(base_llm_unit_tests.py): add base test for o-series models - ensure streaming always works

* fix(base_llm_unit_tests.py): fix test for o series models

* refactor: move test

---------

Co-authored-by: Matteo Boschini <12133566+mbosc@users.noreply.github.com>
2025-02-07 22:45:45 -08:00
Ishaan Jaff
4aaf763d0d (Feat) - Add /bedrock/invoke support for all Anthropic models (#8383)
* use anthropic transformation for bedrock/invoke

* use anthropic transforms for bedrock invoke claude

* TestBedrockInvokeClaudeJson

* add AmazonAnthropicClaudeStreamDecoder

* pass bedrock_invoke_provider to make_call

* fix _get_base_bedrock_model

* fix get_bedrock_route

* fix bedrock routing

* fixes for bedrock invoke

* test_all_model_configs

* fix AWSEventStreamDecoder linting

* fix code qa

* test_bedrock_get_base_model

* test_get_model_info_bedrock_models

* test_bedrock_base_model_helper

* test_bedrock_route_detection
2025-02-07 22:41:11 -08:00
Krish Dholakia
6745100c49 Fix deepseek calling - refactor to use base_llm_http_handler (#8266)
* refactor(deepseek/): move deepseek to base llm http handler

Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457

* fix(gpt_transformation.py): support stream parsing for gpt-like calls

* test(test_deepseek_completion.py): add async streaming test

* fix(gpt_transformation.py): fix import

* fix(gpt_transformation.py): return full api base and content type
2025-02-04 22:30:00 -08:00
Krish Dholakia
20495cb415 test(base_llm_unit_tests.py): add test to ensure drop params is respe… (#8224)
* test(base_llm_unit_tests.py): add test to ensure drop params is respected

* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility

* build: add cherry picked commits
2025-02-03 16:04:44 -08:00
Krish Dholakia
8900b18504 Complete o3 model support (#8183)
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param

Closes https://github.com/BerriAI/litellm/issues/8182

* fix(main.py): ensure `reasoning_effort` is a mapped openai param

* refactor(azure/): rename o1_[x] files to o_series_[x]

* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort

* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests

* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers

Makes it easy to switch from openai to anthropic

* fix: fix linting errors

* fix(base_llm_unit_tests.py): fix test

* fix(main.py): add missing param
2025-02-02 22:36:37 -08:00
Krish Dholakia
73c909a69f Improved O3 + Azure O3 support (#8181)
* fix: support azure o3 model family for fake streaming workaround (#8162)

* fix: support azure o3 model family for fake streaming workaround

* refactor: rename helper to is_o_series_model for clarity

* update function calling parameters for o3 models (#8178)

* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3

ensures max_tokens is correctly translated for o3

* feat(openai/): refactor o1 files to be 'o_series' files

expands naming to cover o3

* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets

* test(test_azure_o_series.py): assert stream faked for azure o3 mini

Resolves https://github.com/BerriAI/litellm/pull/8162

* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing

* docs(azure.md): update doc with `o_series/` model name

---------

Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
2025-02-01 09:52:28 -08:00
Ishaan Jaff
ef6ab91ac2 (Fixes) OpenAI Streaming Token Counting + Fixes usage track when litellm.turn_off_message_logging=True (#8156)
* working streaming usage tracking

* fix test_async_chat_openai_stream_options

* fix await asyncio.sleep(1)

* test_async_chat_azure

* fix s3 logging

* fix get_stream_options

* fix get_stream_options

* fix streaming handler

* test_stream_token_counting_with_redaction

* fix codeql concern
2025-01-31 15:06:37 -08:00
Krish Dholakia
96488ae118 Fix custom pricing - separate provider info from model info (#7990)
* fix(utils.py): initial commit fixing custom cost tracking

refactors out provider specific model info from `get_model_info` - this was causing custom costs to be registered incorrectly

* fix(utils.py): cleanup `_supports_factory` to check provider info, if model info is None

some providers support features like vision across all models

* fix(utils.py): refactor to use _supports_factory

* test: update testing

* fix: fix linting errors

* test: fix testing
2025-01-25 21:49:28 -08:00
Ishaan Jaff
359a4ee3a9 (Feat) Add x-litellm-overhead-duration-ms and "x-litellm-response-duration-ms" in response from LiteLLM (#7899)
* add track_llm_api_timing

* add track_llm_api_timing

* test_litellm_overhead

* use ResponseMetadata class for setting hidden params and response overhead

* instrument http handler

* fix track_llm_api_timing

* track_llm_api_timing

* emit response overhead on hidden params

* fix resp metadata

* fix make_sync_openai_embedding_request

* test_aaaaatext_completion_endpoint fixes

* _get_value_from_hidden_params

* set_hidden_params

* test_litellm_overhead

* test_litellm_overhead

* test_litellm_overhead

* fix import

* test_litellm_overhead_stream

* add LiteLLMLoggingObject

* use diff folder for testing

* use diff folder for overhead testing

* test litellm overhead

* use typing

* clear typing

* test_litellm_overhead

* fix async_streaming

* update_response_metadata

* move test file

* pply metadata to the response objec
2025-01-21 20:27:55 -08:00
Krish Dholakia
4c1d4acabc Litellm dev 01 20 2025 p1 (#7884)
* fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out

* feat(openai.py): return timeout value + time taken on openai timeout errors

helps debug timeout errors

* fix(utils.py): fix num retries extraction logic when num_retries = 0

* fix(config_settings.md): litellm_logging.py

support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true

 Enables easier debug

* test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check

* fix(litellm_logging.py): fix linting error
2025-01-20 21:45:48 -08:00
Krish Dholakia
9bc4001c51 LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826)
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811

* fix(router.py): fix mock timeout check

* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)

This error happens if you provide multiple fallback models to the completion function with model name defined in each one.

* fix(router.py): remove mock_timeout before sending to request

prevents reuse in fallbacks

* test: update test

* test: revert test change - wrong pr

---------

Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
2025-01-17 20:59:21 -08:00