* refactor: introduce new transformation config for gpt-4o-transcribe models
* refactor: expose new transformation configs for audio transcription
* ci: fix config yml
* feat(openai/transcriptions): support provider config transformation on openai audio transcriptions
allows gpt-4o and whisper audio transformation to work as expected
* refactor: migrate fireworks ai + deepgram to new transform request pattern
* feat(openai/): working support for gpt-4o-audio-transcribe
* build(model_prices_and_context_window.json): add gpt-4o-transcribe to model cost map
* build(model_prices_and_context_window.json): specify what endpoints are supported for `/audio/transcriptions`
* fix(get_supported_openai_params.py): fix return
* refactor(deepgram/): migrate unit test to deepgram handler
* refactor: cleanup unused imports
* fix(get_supported_openai_params.py): fix linting error
* test: update test
* fix(vertex_and_google_ai_studio_gemini.py): log gemini audio tokens in usage object
enables accurate cost tracking
* refactor(vertex_ai/cost_calculator.py): refactor 128k+ token cost calculation to only run if model info has it
Google has moved away from this for gemini-2.0 models
* refactor(vertex_ai/cost_calculator.py): migrate to usage object for more flexible data passthrough
* fix(llm_cost_calc/utils.py): support audio token cost tracking in generic cost per token
enables vertex ai cost tracking to work with audio tokens
* fix(llm_cost_calc/utils.py): default to total prompt tokens if text tokens field not set
* refactor(llm_cost_calc/utils.py): move openai cost tracking to generic cost per token
more consistent behaviour across providers
* test: add unit test for gemini audio token cost calculation
* ci: bump ci config
* test: fix test
* fix(transformation.py): support a 'format' parameter for image's
allow user to specify mime type
* fix: pass mimetype via 'format' param
* feat(gemini/chat/transformation.py): support 'format' param for gemini
* fix(factory.py): support 'format' param on sync bedrock converse calls
* feat(bedrock/converse_transformation.py): support 'format' param for bedrock async calls
* refactor(factory.py): move to supporting 'format' param in base helper
ensures consistency in param support
* feat(gpt_transformation.py): filter out 'format' param
don't send invalid param to openai
* fix(gpt_transformation.py): fix translation
* fix: fix translation error
* fix(core_helpers.py): handle litellm_metadata instead of 'metadata'
* feat(batches/): ensure batches logs are written to db
makes batches response dict compatible
* fix(cost_calculator.py): handle batch response being a dictionary
* fix(batches/main.py): modify retrieve endpoints to use @client decorator
enables logging to work on retrieve call
* fix(batches/main.py): fix retrieve batch response type to be 'dict' compatible
* fix(spend_tracking_utils.py): send unique uuid for retrieve batch call type
create batch and retrieve batch share the same id
* fix(spend_tracking_utils.py): prevent duplicate retrieve batch calls from being double counted
* refactor(batches/): refactor cost tracking for batches - do it on retrieve, and within the established litellm_logging pipeline
ensures cost is always logged to db
* fix: fix linting errors
* fix: fix linting error
* fix(o_series_transformation.py): fix optional param check for o-series models
o3-mini and o-1 do not support parallel tool calling
* fix(utils.py): support 'drop_params' for 'thinking' param across models
allows switching to older claude versions (or non-anthropic models) and param to be safely dropped
* fix: fix passing thinking param in optional params
allows dropping thinking_param where not applicable
* test: update old model
* fix(utils.py): fix linting errors
* fix(main.py): add param to acompletion
* Fixed issue #8246 (#8250)
* Fixed issue #8246
* Added unit tests for discard() and for remove_callback_from_list_by_object()
* fix(openai.py): support dynamic passing of organization param to openai
handles scenario where client-side org id is passed to openai
---------
Co-authored-by: Erez Hadad <erezh@il.ibm.com>
* add back streaming for base o3 (#8361)
* test(base_llm_unit_tests.py): add base test for o-series models - ensure streaming always works
* fix(base_llm_unit_tests.py): fix test for o series models
* refactor: move test
---------
Co-authored-by: Matteo Boschini <12133566+mbosc@users.noreply.github.com>
* refactor(deepseek/): move deepseek to base llm http handler
Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457
* fix(gpt_transformation.py): support stream parsing for gpt-like calls
* test(test_deepseek_completion.py): add async streaming test
* fix(gpt_transformation.py): fix import
* fix(gpt_transformation.py): return full api base and content type
* test(base_llm_unit_tests.py): add test to ensure drop params is respected
* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility
* build: add cherry picked commits
* fix(o_series_transformation.py): add 'reasoning_effort' as o series model param
Closes https://github.com/BerriAI/litellm/issues/8182
* fix(main.py): ensure `reasoning_effort` is a mapped openai param
* refactor(azure/): rename o1_[x] files to o_series_[x]
* refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort
* test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests
* feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers
Makes it easy to switch from openai to anthropic
* fix: fix linting errors
* fix(base_llm_unit_tests.py): fix test
* fix(main.py): add missing param
* fix: support azure o3 model family for fake streaming workaround (#8162)
* fix: support azure o3 model family for fake streaming workaround
* refactor: rename helper to is_o_series_model for clarity
* update function calling parameters for o3 models (#8178)
* refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3
ensures max_tokens is correctly translated for o3
* feat(openai/): refactor o1 files to be 'o_series' files
expands naming to cover o3
* fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets
* test(test_azure_o_series.py): assert stream faked for azure o3 mini
Resolves https://github.com/BerriAI/litellm/pull/8162
* fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing
* docs(azure.md): update doc with `o_series/` model name
---------
Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
* fix(utils.py): initial commit fixing custom cost tracking
refactors out provider specific model info from `get_model_info` - this was causing custom costs to be registered incorrectly
* fix(utils.py): cleanup `_supports_factory` to check provider info, if model info is None
some providers support features like vision across all models
* fix(utils.py): refactor to use _supports_factory
* test: update testing
* fix: fix linting errors
* test: fix testing
* fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out
* feat(openai.py): return timeout value + time taken on openai timeout errors
helps debug timeout errors
* fix(utils.py): fix num retries extraction logic when num_retries = 0
* fix(config_settings.md): litellm_logging.py
support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true
Enables easier debug
* test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check
* fix(litellm_logging.py): fix linting error
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811
* fix(router.py): fix mock timeout check
* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)
This error happens if you provide multiple fallback models to the completion function with model name defined in each one.
* fix(router.py): remove mock_timeout before sending to request
prevents reuse in fallbacks
* test: update test
* test: revert test change - wrong pr
---------
Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>