litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-27 03:34:10 +00:00

Author	SHA1	Message	Date
Krish Dholakia	6b8b49451f	Fix azure max retries error (#8340 ) * fix(azure.py): ensure max_retries=0 is respected Fixes https://github.com/BerriAI/litellm/issues/6129 * fix(test_openai.py): add unit test to ensure openai sdk calls always respect max_retries = 0 * test(test_azure_openai.py): add unit testing for azure_text/ route * fix(azure.py): fix passing max retries on streaming * fix(azure.py): fix azure max retries on async completion + streaming * fix(completion/handler.py): fix azure text async completion + streaming * test(test_azure_openai.py): ensure azure openai max retries always respected * test(test_azure_o_series.py): add testing to ensure max retries always respected * Added gemini providers for 2.0-flash and 2.0-flash lite (#8321) * Update model_prices_and_context_window.json added gemini providers for 2.0-flash and 2.0-flash light * Update model_prices_and_context_window.json fixed URL --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Convert tool use arguments to string before counting tokens (#6989) In at least some cases the `messages["tool_calls"]["function"]["arguments"]` is a dict, not a string. In order to tokenize it properly it needs to be a string. In the case that it is already a string this is a noop, which is also fine. * build(model_prices_and_context_window.json): add gemini 2.0 flash lite pricing * build(model_prices_and_context_window.json): add gemini commercial rate limits * fix(utils.py): fix linting error * refactor(utils.py): refactor to maintain function size --------- Co-authored-by: Bardia Khosravi <bardiakhosravi95@gmail.com> Co-authored-by: Josh Morrow <josh@jcmorrow.com>	2025-02-06 23:20:48 -08:00
Krish Dholakia	443ae55904	Azure OpenAI improvements - o3 native streaming, improved tool call + response format handling (#8292 ) * fix(convert_dict_to_response.py): only convert if response is the response_format tool call passed in Fixes https://github.com/BerriAI/litellm/issues/8241 * fix(gpt_transformation.py): makes sure response format / tools conversion doesn't remove previous tool calls * refactor(gpt_transformation.py): refactor out json schema converstion to base config keeps logic consistent across providers * fix(o_series_transformation.py): support o3 mini native streaming Fixes https://github.com/BerriAI/litellm/issues/8274 * fix(gpt_transformation.py): remove unused variables * test: update test	2025-02-05 19:38:58 -08:00
Ishaan Jaff	818792228c	(Refactor) - migrate bedrock invoke to `BaseLLMHTTPHandler` class (#8290 ) * initial transform for invoke * invoke transform_response * working - able to make request * working get_complete_url * working - invoke now runs on llm_http_handler * fix unused imports * track litellm overhead ms * working stream request * sign_request transform * sign_request update * use has_async_custom_stream_wrapper property * use get_async_custom_stream_wrapper in base llm http handler * fix make_call in invoke handler * fix invoke with streaming get_async_custom_stream_wrapper * working bedrock async streaming with invoke * fix make call handler for bedrock * test_all_model_configs * fix test_bedrock_custom_prompt_template * sync streaming for bedrock invoke * fix _add_stream_param_to_request_body * test_async_text_completion_bedrock * fix transform_request * fix get_supported_openai_params * fix test supports tool choice * fix test_supports_tool_choice * add unit test coverage for bedrock invoke transform * fix location of transformation files * update import loc * fix bedrock invoke unit tests * fix import for max completion tokens	2025-02-05 18:58:55 -08:00
Krish Dholakia	3c813b3a87	Fix deepseek calling - refactor to use base_llm_http_handler (#8266 ) * refactor(deepseek/): move deepseek to base llm http handler Fixes https://github.com/BerriAI/litellm/issues/8128#issuecomment-2635430457 * fix(gpt_transformation.py): support stream parsing for gpt-like calls * test(test_deepseek_completion.py): add async streaming test * fix(gpt_transformation.py): fix import * fix(gpt_transformation.py): return full api base and content type	2025-02-04 22:30:00 -08:00
Ishaan Jaff	3a6349d871	(Feat) - Add support for structured output on `bedrock/nova` models + add util `litellm.supports_tool_choice` (#8264 ) * fix supports_tool_choice * TestBedrockNovaJson * use supports_tool_choice * fix supports_tool_choice * add supports_tool_choice param * script to add fields to model cost map * test_supports_tool_choice * test_supports_tool_choice * fix supports tool choice check * test_supports_tool_choice_simple_tests * fix supports_tool_choice check * fix supports_tool_choice bedrock * test_supports_tool_choice * test_supports_tool_choice * fix bedrock/eu-west-3/mistral.mistral-large-2402-v1:0 * ci/cd run again * test_supports_tool_choice_simple_tests * TestGoogleAIStudioGemini temp - remove to run ci/cd * test_aaalangfuse_logging_metadata * TestGoogleAIStudioGemini * test_check_provider_match * remove add param to map	2025-02-04 21:47:16 -08:00
Ishaan Jaff	b965d9bd9a	Fix passing top_k parameter for Bedrock Anthropic models (#8131 ) (#8269 ) * Fix Bedrock Anthropic topK bug * Remove extra import * Add unit test + make tests mocked * Fix camel case * Fix tests to remove exception handling Co-authored-by: vibhavbhat <vibhavb00@gmail.com>	2025-02-04 21:16:21 -08:00
Krish Dholakia	c8494abdea	test(base_llm_unit_tests.py): add test to ensure drop params is respe… (#8224 ) * test(base_llm_unit_tests.py): add test to ensure drop params is respected * fix(types/prometheus.py): use typing_extensions for python3.8 compatibility * build: add cherry picked commits	2025-02-03 16:04:44 -08:00
Krish Dholakia	97b8de17ab	LiteLLM Minor Fixes & Improvements (01/16/2025) - p2 (#7828 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 14s Details * fix(vertex_ai/gemini/transformation.py): handle 'http://' image urls * test: add base test for `http:` url's * fix(factory.py/get_image_details): follow redirects allows http calls to work * fix(codestral/): fix stream chunk parsing on last chunk of stream * Azure ad token provider (#6917) * Update azure.py Added optional parameter azure ad token provider * Added parameter to main.py * Found token provider arg location * Fixed embeddings * Fixed ad token provider --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: fix linting errors * fix(main.py): leave out o1 route for azure ad token provider, for now get v0 out for sync azure gpt route to begin with * test: skip http:// test for fireworks ai model does not support it * refactor: cleanup dead code * fix: revert http:// url passthrough for gemini google ai studio raises errors * test: fix test --------- Co-authored-by: bahtman <anton@baht.dk>	2025-02-02 23:17:50 -08:00
Krish Dholakia	1105e35538	Complete o3 model support (#8183 ) * fix(o_series_transformation.py): add 'reasoning_effort' as o series model param Closes https://github.com/BerriAI/litellm/issues/8182 * fix(main.py): ensure `reasoning_effort` is a mapped openai param * refactor(azure/): rename o1_[x] files to o_series_[x] * refactor(base_llm_unit_tests.py): refactor testing for o series reasoning effort * test(test_azure_o_series.py): have azure o series tests correctly inherit from base o series model tests * feat(base_utils.py): support translating 'developer' role to 'system' role for non-openai providers Makes it easy to switch from openai to anthropic * fix: fix linting errors * fix(base_llm_unit_tests.py): fix test * fix(main.py): add missing param	2025-02-02 22:36:37 -08:00
Krish Dholakia	23f458d2da	Improved O3 + Azure O3 support (#8181 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * fix: support azure o3 model family for fake streaming workaround (#8162) * fix: support azure o3 model family for fake streaming workaround * refactor: rename helper to is_o_series_model for clarity * update function calling parameters for o3 models (#8178) * refactor(o1_transformation.py): refactor o1 config to be o series config, expand o series model check to o3 ensures max_tokens is correctly translated for o3 * feat(openai/): refactor o1 files to be 'o_series' files expands naming to cover o3 * fix(azure/chat/o1_handler.py): azure openai is an instance of openai - was causing resets * test(test_azure_o_series.py): assert stream faked for azure o3 mini Resolves https://github.com/BerriAI/litellm/pull/8162 * fix(o1_transformation.py): fix o1 transformation logic to handle explicit o1_series routing * docs(azure.md): update doc with `o_series/` model name --------- Co-authored-by: byrongrogan <47910641+byrongrogan@users.noreply.github.com> Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>	2025-02-01 09:52:28 -08:00
Ishaan Jaff	795a71ce1e	fix linting error	2025-01-31 21:27:56 -08:00
Ishaan Jaff	9ff27809b2	(Feat) add bedrock/deepseek custom import models (#8132 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 16s Details * add support for using llama spec with bedrock * fix get_bedrock_invoke_provider * add support for using bedrock provider in mappings * working request * test_bedrock_custom_deepseek * test_bedrock_custom_deepseek * fix _get_model_id_for_llama_like_model * test_bedrock_custom_deepseek * doc DeepSeek-R1-Distill-Llama-70B * test_bedrock_custom_deepseek	2025-01-31 18:40:44 -08:00
Ishaan Jaff	2cf0daa31c	(Fixes) OpenAI Streaming Token Counting + Fixes usage track when `litellm.turn_off_message_logging=True` (#8156 ) * working streaming usage tracking * fix test_async_chat_openai_stream_options * fix await asyncio.sleep(1) * test_async_chat_azure * fix s3 logging * fix get_stream_options * fix get_stream_options * fix streaming handler * test_stream_token_counting_with_redaction * fix codeql concern	2025-01-31 15:06:37 -08:00
miraclebakelaser	527e283c51	fix: propagating json_mode to acompletion (#8133 )	2025-01-30 21:17:26 -08:00
Ishaan Jaff	395e22ea72	fix code qa check All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 15s Details	2025-01-30 07:13:18 -08:00
Krish Dholakia	ba8ba9eddb	feat(databricks/chat/transformation.py): add tools and 'tool_choice' param support (#8076 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 38s Details * feat(databricks/chat/transformation.py): add tools and 'tool_choice' param support Closes https://github.com/BerriAI/litellm/issues/7788 * refactor: cleanup redundant file * test: mark flaky test * test: mark all parallel request tests as flaky	2025-01-29 21:09:07 -08:00
Krish Dholakia	dad24f2b52	Litellm dev 01 29 2025 p2 (#8102 ) * docs: cleanup doc * feat(bedrock/): initial commit adding bedrock/converse_like/<model> route support allows routing to a converse like endpoint Resolves https://github.com/BerriAI/litellm/issues/8085 * feat(bedrock/chat/converse_transformation.py): make converse config base config compatible enables new 'converse_like' route * feat(converse_transformation.py): enables using the proxy with converse like api endpoint Resolves https://github.com/BerriAI/litellm/issues/8085	2025-01-29 20:53:37 -08:00
Krish Dholakia	8eaa5dc797	Bedrock document processing fixes (#8005 ) * refactor(factory.py): refactor async bedrock message transformation to use async get request for image url conversion improve latency of bedrock call * test(test_bedrock_completion.py): add unit testing to ensure async image url get called for async bedrock call * refactor(factory.py): refactor bedrock translation to use BedrockImageProcessor reduces duplicate code * fix(factory.py): fix bug not allowing pdf's to be processed * fix(factory.py): fix bedrock converse document understanding with image url * docs(bedrock.md): clarify all bedrock document types are supported * refactor: cleanup redundant test + unused imports * perf: improve perf with reusable clients * test: fix test	2025-01-28 17:48:32 -08:00
Krish Dholakia	03eef5a2a0	Fix custom pricing - separate provider info from model info (#7990 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 34s Details * fix(utils.py): initial commit fixing custom cost tracking refactors out provider specific model info from `get_model_info` - this was causing custom costs to be registered incorrectly * fix(utils.py): cleanup `_supports_factory` to check provider info, if model info is None some providers support features like vision across all models * fix(utils.py): refactor to use _supports_factory * test: update testing * fix: fix linting errors * test: fix testing	2025-01-25 21:49:28 -08:00
Krish Dholakia	08b124aeb6	Litellm dev 01 25 2025 p2 (#8003 ) * fix(base_utils.py): supported nested json schema passed in for anthropic calls * refactor(base_utils.py): refactor ref parsing to prevent infinite loop * test(test_openai_endpoints.py): refactor anthropic test to use bedrock * fix(langfuse_prompt_management.py): add unit test for sync langfuse calls Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757	2025-01-25 16:50:57 -08:00
Krish Dholakia	1e011b66d3	Ollama ssl verify = False + Spend Logs reliability fixes (#7931 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * fix(http_handler.py): support passing ssl verify dynamically and using the correct httpx client based on passed ssl verify param Fixes https://github.com/BerriAI/litellm/issues/6499 * feat(llm_http_handler.py): support passing `ssl_verify=False` dynamically in call args Closes https://github.com/BerriAI/litellm/issues/6499 * fix(proxy/utils.py): prevent bad logs from breaking all cost tracking + reset list regardless of success/failure prevents malformed logs from causing all spend tracking to break since they're constantly retried * test(test_proxy_utils.py): add test to ensure bad log is dropped * test(test_proxy_utils.py): ensure in-memory spend logs reset after bad log error * test(test_user_api_key_auth.py): add unit test to ensure end user id as str works * fix(auth_utils.py): ensure extracted end user id is always a str prevents db cost tracking errors * test(test_auth_utils.py): ensure get end user id from request body always returns a string * test: update tests * test: skip bedrock test- behaviour now supported * test: fix testing * refactor(spend_tracking_utils.py): reduce size of get_logging_payload * test: fix test * bump: version 1.59.4 → 1.59.5 * Revert "bump: version 1.59.4 → 1.59.5" This reverts commit `1182b46b2e`. * fix(utils.py): fix spend logs retry logic * fix(spend_tracking_utils.py): fix get tags * fix(spend_tracking_utils.py): fix end user id spend tracking on pass-through endpoints	2025-01-23 23:05:41 -08:00
Krish Dholakia	851b0c4c4d	Retry for replicate completion response of status=processing (#7901 ) (#7965 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 34s Details We use the DEFAULT_REPLICATE_ constants for retry count and initial delay. If the completion response returns status=processing, we loop to retry. Fixes https://github.com/BerriAI/litellm/issues/7900 Signed-off-by: BJ Hargrave <hargrave@us.ibm.com> Co-authored-by: BJ Hargrave <bj@hargrave.dev>	2025-01-23 22:45:43 -08:00
Krish Dholakia	c6e9240405	Add datadog health check support + fix bedrock converse cost tracking w/ region name specified (#7958 ) * fix(bedrock/converse_handler.py): fix bedrock region name on async calls * fix(utils.py): fix split model handling Fixes bedrock cost calculation when region name is given * feat(_health_endpoints.py): support health checking datadog integration Closes https://github.com/BerriAI/litellm/issues/7921	2025-01-23 22:17:09 -08:00
Krish Dholakia	76795dba39	Deepseek r1 support + watsonx qa improvements (#7907 ) * fix(types/utils.py): support returning 'reasoning_content' for deepseek models Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218 * fix(convert_dict_to_response.py): return deepseek response in provider_specific_field allows for separating openai vs. non-openai params in model response * fix(utils.py): support 'provider_specific_field' in delta chunk as well allows deepseek reasoning content chunk to be returned to user from stream as well Fixes https://github.com/BerriAI/litellm/issues/7877#issuecomment-2603813218 * fix(watsonx/chat/handler.py): fix passing space id to watsonx on chat route * fix(watsonx/): fix watsonx_text/ route with space id * fix(watsonx/): qa item - also adds better unit testing for watsonx embedding calls * fix(utils.py): rename to '..fields' * fix: fix linting errors * fix(utils.py): fix typing - don't show provider-specific field if none or empty - prevents default respons e from being non-oai compatible * fix: cleanup unused imports * docs(deepseek.md): add docs for deepseek reasoning model	2025-01-21 23:13:15 -08:00
Ishaan Jaff	b6f2e659b9	(Feat) Add x-litellm-overhead-duration-ms and "x-litellm-response-duration-ms" in response from LiteLLM (#7899 ) * add track_llm_api_timing * add track_llm_api_timing * test_litellm_overhead * use ResponseMetadata class for setting hidden params and response overhead * instrument http handler * fix track_llm_api_timing * track_llm_api_timing * emit response overhead on hidden params * fix resp metadata * fix make_sync_openai_embedding_request * test_aaaaatext_completion_endpoint fixes * _get_value_from_hidden_params * set_hidden_params * test_litellm_overhead * test_litellm_overhead * test_litellm_overhead * fix import * test_litellm_overhead_stream * add LiteLLMLoggingObject * use diff folder for testing * use diff folder for overhead testing * test litellm overhead * use typing * clear typing * test_litellm_overhead * fix async_streaming * update_response_metadata * move test file * pply metadata to the response objec	2025-01-21 20:27:55 -08:00
Krish Dholakia	c8aa876785	fix(proxy_server.py): fix get model info when litellm_model_id is set + move model analytics to free (#7886 ) * fix(proxy_server.py): fix get model info when litellm_model_id is set Fixes https://github.com/BerriAI/litellm/issues/7873 * test(test_models.py): add test to ensure get model info on specific deployment has same value as all model info Fixes https://github.com/BerriAI/litellm/issues/7873 * fix(usage.tsx): make model analytics free Fixes @iqballx's feedback * fix(fix(invoke_handler.py):-fix-bedrock-error-chunk-parsing): return correct bedrock status code and error message if chunk in stream Improves bedrock stream error handling * fix(proxy_server.py): fix linting errors * test(test_auth_checks.py): remove redundant test * fix(proxy_server.py): fix linting errors * test: fix flaky test * test: fix test	2025-01-21 08:19:07 -08:00
Krish Dholakia	64e1df1f14	Litellm dev 01 20 2025 p3 (#7890 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * fix(router.py): pass stream timeout correctly for non openai / azure models Fixes https://github.com/BerriAI/litellm/issues/7870 * test(test_router_timeout.py): add test for streaming * test(test_router_timeout.py): add unit testing for new router functions * docs(ollama.md): link to section on calling ollama within docker container * test: remove redundant test * test: fix test to include timeout value * docs(config_settings.md): document new router settings param	2025-01-20 21:46:36 -08:00
Krish Dholakia	4b23420a20	Litellm dev 01 20 2025 p1 (#7884 ) * fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out * feat(openai.py): return timeout value + time taken on openai timeout errors helps debug timeout errors * fix(utils.py): fix num retries extraction logic when num_retries = 0 * fix(config_settings.md): litellm_logging.py support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true Enables easier debug * test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check * fix(litellm_logging.py): fix linting error	2025-01-20 21:45:48 -08:00
Krish Dholakia	4b88635372	fix(fireworks_ai/): fix global disable flag with transform messages helper (#7847 ) fixes issue where .get() = none was preventing global disable flag from being picked up	2025-01-20 20:16:11 -08:00
Krish Dholakia	e67f18b153	LiteLLM Minor Fixes & Improvements (01/18/2025) - p1 (#7857 ) * OllamaChatConfig supports JSON schema response format in optional parameters (#7832) * fix(types/router.py): handle none values for bool types Fixes https://github.com/BerriAI/litellm/issues/7855#issuecomment-2599781974 * test: handle no hf token in env --------- Co-authored-by: trislaz <35226192+trislaz@users.noreply.github.com>	2025-01-18 19:03:50 -08:00
Krish Dholakia	1bea338597	LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826 ) * fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811 * fix(router.py): fix mock timeout check * fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806) This error happens if you provide multiple fallback models to the completion function with model name defined in each one. * fix(router.py): remove mock_timeout before sending to request prevents reuse in fallbacks * test: update test * test: revert test change - wrong pr --------- Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>	2025-01-17 20:59:21 -08:00
Krish Dholakia	c4ff0b6487	refactor: make bedrock image transformation requests async (#7840 ) * refactor: initial commit for using separate sync vs. async transformation routes for bedrock ensures no blocking calls e.g. when converting image url to b64 * perf(converse_transformation.py): make bedrock converse transformation async asyncify's the bedrock message transformation - useful for handling image urls for bedrock * fix(converse_handler.py): fix logging for async streaming * style: cleanup unused imports	2025-01-17 20:14:15 -08:00
Krish Dholakia	71c41f8f33	QA: ensure all bedrock regional models have same `supported_` as base + Anthropic nested pydantic object support (#7844 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * build: ensure all regional bedrock models have same supported values as base bedrock model prevents drift * test(base_llm_unit_tests.py): add testing for nested pydantic objects * fix(test_utils.py): add test_get_potential_model_names * fix(anthropic/chat/transformation.py): support nested pydantic objects Fixes https://github.com/BerriAI/litellm/issues/7755	2025-01-17 19:49:12 -08:00
Ishaan Jaff	15e0a605e4	_handle_tool_call_message linting All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details	2025-01-16 22:34:16 -08:00
Ishaan Jaff	7f63e7c15a	test_completion_mistral_api_mistral_large_function_call	2025-01-16 22:27:48 -08:00
Ishaan Jaff	f3268b6b06	sec fix minor (#7810 )	2025-01-16 22:03:28 -08:00
Ishaan Jaff	b492551d3d	(fix) IBM Watsonx using ZenApiKey (#7821 ) * ibm watsonx fix * test ZenAPIKey * fix zenapikey	2025-01-16 22:02:36 -08:00
Krish Dholakia	80d6bbec29	Litellm dev 01 14 2025 p2 (#7772 ) * feat(pass_through_endpoints.py): fix anthropic end user cost tracking * fix(anthropic/chat/transformation.py): use returned provider model for anthropic handles anthropic `-latest` tag in request body throwing cost calculation errors ensures we can be accurate in our model cost tracking * feat(model_prices_and_context_window.json): add gemini-2.0-flash-thinking-exp pricing * test: update test to use assumption that user_api_key_dict can get anthropic user id * test: fix test * fix: fix test * fix(anthropic_pass_through.py): uncomment previous anthropic end-user cost tracking code block can't guarantee user api key dict always has end user id - too many code paths * fix(user_api_key_auth.py): this allows end user id from request body to always be read and set in auth object * fix(auth_check.py): fix linting error * test: fix auth check * fix(auth_utils.py): fix get end user id to handle metadata = None	2025-01-15 21:34:50 -08:00
Krish Dholakia	fe60a38c8e	Litellm dev 01 2025 p4 (#7776 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * fix(gemini/): support gemini 'frequency_penalty' and 'presence_penalty' Closes https://github.com/BerriAI/litellm/issues/7748 * feat(proxy_server.py): new env var to disable prisma health check on startup * test: fix test	2025-01-14 21:49:25 -08:00
Krish Dholakia	8353caa485	build(pyproject.toml): bump uvicorn depedency requirement (#7773 ) * build(pyproject.toml): bump uvicorn depedency requirement Fixes https://github.com/BerriAI/litellm/issues/7768 * fix(anthropic/chat/transformation.py): fix is_vertex_request check to actually use optional param passed in Fixes https://github.com/BerriAI/litellm/issues/6898#issuecomment-2590860695 * fix(o1_transformation.py): fix azure o1 'is_o1_model' check to just check for o1 in model string https://github.com/BerriAI/litellm/issues/7743 * test: load vertex creds	2025-01-14 21:47:11 -08:00
Ishaan Jaff	30bb4c4cdd	(fix) `BaseAWSLLM` - cache IAM role credentials when used (#7775 ) * fix base aws llm * fix auth with aws role * test aws base llm * fix base aws llm init * run ci/cd again * fix get_credentials * ci/cd run again * _auth_with_aws_role	2025-01-14 20:16:22 -08:00
Krish Dholakia	7b27cfb0ae	Support temporary budget increases on keys (#7754 ) * fix(gpt_transformation.py): fix response_format translation check for 4o models Fixes https://github.com/BerriAI/litellm/issues/7616 * feat(key_management_endpoints.py): support 'temp_budget_increase' and 'temp_budget_expiry' fields Allow proxy admin to grant temporary budget increases to keys * fix(proxy/_types.py): enforce temp_budget_increase and temp_budget_expiry are always passed together * feat(user_api_key_auth.py): initial working temp budget increase logic ensures key budget exceeded error checks for temp budget in key metadata * feat(proxy_server.py): return the key max budget and key spend in the response headers Allows clientside user to know their remaining limits * test: add unit testing for new proxy utils Ensures new key budget is correctly handled * docs(temporary_budget_increase.md): add doc on temporary budget increase * fix(utils.py): remove 3.5 from response_format check for now not all azure 3.5 models support response_format * fix(user_api_key_auth.py): return valid user api key auth object on all paths	2025-01-14 17:03:11 -08:00
Krish Dholakia	ec5a354eac	add azure o1 pricing (#7715 ) * build(model_prices_and_context_window.json): add azure o1 pricing Closes https://github.com/BerriAI/litellm/issues/7712 * refactor: replace regex with string method for whitespace check in stop-sequences handling (#7713) * Allows overriding keep_alive time in ollama (#7079) * Allows overriding keep_alive time in ollama * Also adds to ollama_chat * Adds some info on the docs about this parameter * fix: together ai warning (#7688) Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com> * fix(proxy_server.py): handle config containing thread locked objects when using get_config_state * fix(proxy_server.py): add exception to debug * build(model_prices_and_context_window.json): update 'supports_vision' for azure o1 --------- Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com> Co-authored-by: Regis David Souza Mesquita <github@rdsm.dev> Co-authored-by: Carl <45709281+capsenz@users.noreply.github.com> Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>	2025-01-12 18:15:35 -08:00
Krish Dholakia	ad2f66b3e3	[BETA] Add OpenAI `/images/variations` + Topaz API support (#7700 ) * feat(main.py): initial commit for `/image/variations` endpoint support * refactor(base_llm/): introduce new base llm base config for image variation endpoints * refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler * fix: test * feat(openai/): working openai `/image/variation` endpoint calls via sdk * feat(topaz/): topaz sync image variation call support Addresses https://github.com/BerriAI/litellm/issues/7593 ' * fix(topaz/transformation.py): fix linting errors * fix(openai/image_variations/handler.py): fix passing json data * fix(main.py): image_variation/ support async image variation route - `aimage_variation` * fix(test_get_model_info.py): fix test * fix: cleanup unused imports * feat(openai/): add async `/image/variations` endpoint support * feat(topaz/): support async `/image/variations` calls * fix: test * fix(utils.py): fix get_model_info_helper for no model info w/ provider config handles situation where model info is not known but provider config exists * test(test_router_fallbacks.py): mark flaky test * fix: fix unused imports * test: bump otel load test perf threshold - accounts for current load tests hitting same server	2025-01-11 23:27:46 -08:00
Krish Dholakia	27892acdfc	Litellm dev 01 10 2025 p3 (#7682 ) * feat(langfuse.py): log the used prompt when prompt management used * test: fix test * docs(self_serve.md): add doc on restricting personal key creation on ui * feat(s3.py): support s3 logging with team alias prefixes (if available) New preview feature * fix(main.py): remove old if block - simplify to just await if coroutine returned fixes lm_studio async embedding error * fix(langfuse.py): handle get prompt check	2025-01-10 21:56:42 -08:00
Krish Dholakia	c10ae8879e	fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… (#7660 ) * fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url * refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic deduplicates code * fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages * docs(prompt_management.md): update prompt management to be in beta given feedback - this still needs to be revised (e.g. passing in user message, not ignoring) * refactor(prompt_management_base.py): introduce base class for prompt management allows consistent behaviour across prompt management integrations * feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base * fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set allows tracking what prompt was used for what purpose * feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse allows logging prompt id / prompt variables to langfuse * test: fix test * fix(router.py): cleanup unused imports * fix: fix linting error * fix: fix trace param typing * fix: fix linting errors * fix: fix code qa check	2025-01-10 07:31:59 -08:00
Ishaan Jaff	13f364682d	(Feat - Batches API) add support for retrieving vertex api batch jobs (#7661 ) * add _async_retrieve_batch * fix aretrieve_batch * fix _get_batch_id_from_vertex_ai_batch_response * fix batches docs	2025-01-09 18:35:03 -08:00
Ishaan Jaff	a85de46ef7	(proxy - RPS) - Get 2K RPS at 4 instances, minor fix `aiohttp_openai/` (#7659 ) * speed up transform_response * use 2 workers * undo changes to uvicorn * ci/cd run again	2025-01-09 17:24:18 -08:00
Krish Dholakia	1e3370f3cb	LiteLLM Minor Fixes & Improvements (01/08/2025) - p2 (#7643 ) * fix(streaming_chunk_builder_utils.py): add test for groq tool calling + streaming + combine chunks Addresses https://github.com/BerriAI/litellm/issues/7621 * fix(streaming_utils.py): fix modelresponseiterator for openai like chunk parser ensures chunk parser uses the correct tool call id when translating the chunk Fixes https://github.com/BerriAI/litellm/issues/7621 * build(model_hub.tsx): display cost pricing on model hub * build(model_hub.tsx): show cost per token pricing + complete model information * fix(types/utils.py): fix usage object handling	2025-01-08 19:45:19 -08:00
Krish Dholakia	0ffc5379ea	Litellm dev 01 07 2025 p2 (#7622 ) * build(ui/): update ui * fix: drop unsupported non-whitespace characters for real when calling… (#7484) * fix: drop unsupported non-whitespace characters for real when calling anthropic with stop sequences * test: add parameterized test for _map_stop_sequences method in AnthropicConfig --------- Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com>	2025-01-08 16:56:39 -08:00

... 5 6 7 8 9 ...

2236 commits