Commit graph

1610 commits

Author SHA1 Message Date
Krish Dholakia
c8aa876785
fix(proxy_server.py): fix get model info when litellm_model_id is set + move model analytics to free (#7886)
* fix(proxy_server.py): fix get model info when litellm_model_id is set

Fixes https://github.com/BerriAI/litellm/issues/7873

* test(test_models.py): add test to ensure get model info on specific deployment has same value as all model info

Fixes https://github.com/BerriAI/litellm/issues/7873

* fix(usage.tsx): make model analytics free

Fixes @iqballx's feedback

* fix(fix(invoke_handler.py):-fix-bedrock-error-chunk-parsing): return correct bedrock status code and error message if chunk in stream

Improves bedrock stream error handling

* fix(proxy_server.py): fix linting errors

* test(test_auth_checks.py): remove redundant test

* fix(proxy_server.py): fix linting errors

* test: fix flaky test

* test: fix test
2025-01-21 08:19:07 -08:00
Ishaan Jaff
0295f494b6
(e2e testing + minor refactor) - Virtual Key Max budget check (#7888)
* use helper _virtual_key_max_budget_check

* e2e testing for budget exceeded errors

* e2e budget testing

* test_chat_completion_budget_update

* test_chat_completion_high_budget
2025-01-21 06:47:26 -08:00
Krish Dholakia
64e1df1f14
Litellm dev 01 20 2025 p3 (#7890)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(router.py): pass stream timeout correctly for non openai / azure models

Fixes https://github.com/BerriAI/litellm/issues/7870

* test(test_router_timeout.py): add test for streaming

* test(test_router_timeout.py): add unit testing for new router functions

* docs(ollama.md): link to section on calling ollama within docker container

* test: remove redundant test

* test: fix test to include timeout value

* docs(config_settings.md): document new router settings param
2025-01-20 21:46:36 -08:00
Krish Dholakia
4b23420a20
Litellm dev 01 20 2025 p1 (#7884)
* fix(initial-test-to-return-api-timeout-value-in-openai-timeout-exception): Makes it easier for user to debug why request timed out

* feat(openai.py): return timeout value + time taken on openai timeout errors

helps debug timeout errors

* fix(utils.py): fix num retries extraction logic when num_retries = 0

* fix(config_settings.md): litellm_logging.py

support printing payload to console if 'LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD' is true

 Enables easier debug

* test(test_auth_checks.py'): remove common checks userapikeyauth enforcement check

* fix(litellm_logging.py): fix linting error
2025-01-20 21:45:48 -08:00
Ishaan Jaff
806df5d31c
(Feat) datadog_llm_observability callback - emit request_tags on logs (#7883)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* dd - emit tags on llm obs payload

* dd  - show requester tags on traces

* test_get_datadog_tags

* _get_datadog_tags

* fix dd POD_NAME

* test_get_datadog_tags
2025-01-20 20:36:27 -08:00
Krish Dholakia
4b88635372
fix(fireworks_ai/): fix global disable flag with transform messages helper (#7847)
fixes issue where .get() = none was preventing global disable flag from being picked up
2025-01-20 20:16:11 -08:00
Krish Dholakia
dca6904937
JWT Auth - enforce_rbac support + UI team view, spend calc fix (#7863)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* fix(user_dashboard.tsx): fix spend calculation when team selected

sum all team keys, not user keys

* docs(admin_ui_sso.md): fix docs tabbing

* feat(user_api_key_auth.py): introduce new 'enforce_rbac' param on jwt auth

allows proxy admin to prevent any unmapped yet authenticated jwt tokens from calling proxy

Fixes https://github.com/BerriAI/litellm/issues/6793

* test: more unit testing + refactoring

* fix: fix returning id when obj not found in db

* fix(user_api_key_auth.py): add end user id tracking from jwt auth

* docs(token_auth.md): add doc on rbac with JWTs

* fix: fix unused params

* test: remove old test
2025-01-19 21:28:55 -08:00
Krish Dholakia
c306c2e0fc
Auth checks on invalid fallback models (#7871)
* fix(user_api_key_auth.py): handle clientside fallback model when item in list is dictionary

* fix(auth_checks.py): help user find invalid model names during dev

Ensure fallbacks work in prod

* fix(user_api_key_auth.py): fix linting check

* fix: cleanup unused variables

* fix: fix import

* fix(auth_checks.py): fix auth check
2025-01-19 21:28:10 -08:00
Krish Dholakia
3a7b13efa2
feat(health_check.py): set upperbound for api when making health check call (#7865)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 10s
* feat(health_check.py): set upperbound for api when making health check call

prevent bad model from health check to hang and cause pod restarts

* fix(health_check.py): cleanup task once completed

* fix(constants.py): bump default health check timeout to 1min

* docs(health.md): add 'health_check_timeout' to health docs on litellm

* build(proxy_server_config.yaml): add bad model to health check
2025-01-18 19:47:43 -08:00
Krish Dholakia
e67f18b153
LiteLLM Minor Fixes & Improvements (01/18/2025) - p1 (#7857)
* OllamaChatConfig supports JSON schema response format in optional parameters (#7832)

* fix(types/router.py): handle none values for bool types

Fixes https://github.com/BerriAI/litellm/issues/7855#issuecomment-2599781974

* test: handle no hf token in env

---------

Co-authored-by: trislaz <35226192+trislaz@users.noreply.github.com>
2025-01-18 19:03:50 -08:00
Ishaan Jaff
2fdbcca9ae e2e ui testing fixes 2025-01-18 07:46:55 -08:00
Krish Dholakia
1bea338597
LiteLLM Minor Fixes & Improvements (2024/16/01) (#7826)
* fix(lm_studio/chat/transformation.py): Fix https://github.com/BerriAI/litellm/issues/7811

* fix(router.py): fix mock timeout check

* fix: drop model name from fallback args since it causes a conflict with the model=model that is provided later on. (#7806)

This error happens if you provide multiple fallback models to the completion function with model name defined in each one.

* fix(router.py): remove mock_timeout before sending to request

prevents reuse in fallbacks

* test: update test

* test: revert test change - wrong pr

---------

Co-authored-by: Dudu Lasry <david1542@users.noreply.github.com>
2025-01-17 20:59:21 -08:00
Krish Dholakia
80f7af510b
Improve Proxy Resiliency: Cooldown single-deployment model groups if 100% calls failed in high traffic (#7823)
* refactor(_is_cooldown_required): move '_is_cooldown_required' into cooldown_handlers.py

* refactor(cooldown_handlers.py): move cooldown constants into `.constants.py`

* fix(cooldown_handlers.py): remove if single deployment don't cooldown logic

move to traffic based cooldown logic

Addresses https://github.com/BerriAI/litellm/issues/7822

* fix: add unit tests for '_should_cooldown_deployment'

* test: ensure all tests pass

* test: update test

* fix(cooldown_handlers.py): don't cooldown single deployment models for anything besides traffic related errors

* fix(cooldown_handlers.py): fix cooldown handler logic

* fix(cooldown_handlers.py): fix check
2025-01-17 20:17:02 -08:00
Krish Dholakia
c4ff0b6487
refactor: make bedrock image transformation requests async (#7840)
* refactor: initial commit for using separate sync vs. async transformation routes for bedrock

ensures no blocking calls e.g. when converting image url to b64

* perf(converse_transformation.py): make bedrock converse transformation async

asyncify's the bedrock message transformation - useful for handling image urls for bedrock

* fix(converse_handler.py): fix logging for async streaming

* style: cleanup unused imports
2025-01-17 20:14:15 -08:00
Krish Dholakia
71c41f8f33
QA: ensure all bedrock regional models have same supported_ as base + Anthropic nested pydantic object support (#7844)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* build: ensure all regional bedrock models have same supported values as base bedrock model

prevents drift

* test(base_llm_unit_tests.py): add testing for nested pydantic objects

* fix(test_utils.py): add test_get_potential_model_names

* fix(anthropic/chat/transformation.py): support nested pydantic objects

Fixes https://github.com/BerriAI/litellm/issues/7755
2025-01-17 19:49:12 -08:00
Ishaan Jaff
2c117264a2
[Hashicorp - secret manager] - use vault namespace for tls auth (#7834)
* hcorp - use x-vault-namespace

* _get_tls_cert_auth_body

* HCP_VAULT_CERT_ROLE

* test_hashicorp_secret_manager_tls_cert_auth

* HCP_VAULT_CERT_ROLE
2025-01-17 19:27:56 -08:00
Ishaan Jaff
d3c2f4331a
(UI - View SpendLogs Table) (#7842)
* litellm log messages / responses

* add messages/response to schema.prisma

* add support for logging messages / responses in DB

* test_spend_logs_payload_with_prompts_enabled

* _get_messages_for_spend_logs_payload

* ui_view_spend_logs endpoint

* add tanstack and moment

* add uiSpendLogsCall

* ui view logs table

* ui view spendLogs table

* ui_view_spend_logs

* fix code quality

* test_spend_logs_payload_with_prompts_enabled

* _get_messages_for_spend_logs_payload

* test_spend_logs_payload_with_prompts_enabled

* test_spend_logs_payload_with_prompts_enabled

* ui view spend logs

* minor ui fix

* ui - update leftnav

* ui - clean up ui

* fix leftnav

* ui fix navbar

* ui fix moving chat ui tab
2025-01-17 18:53:45 -08:00
yujonglee
7584369fbe
add key and team level budget (#7831)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
2025-01-17 09:04:12 -08:00
Ishaan Jaff
632ba92af1 Revert "fix: fix test"
This reverts commit 0642a78abb.
2025-01-17 07:21:19 -08:00
Ishaan Jaff
b30e05b54f Revert "test_completion_mistral_api_mistral_large_function_call"
This reverts commit ef9177f0a8.
2025-01-17 07:20:46 -08:00
Krrish Dholakia
0642a78abb fix: fix test 2025-01-17 07:16:32 -08:00
Ishaan Jaff
c8febaca2e test_watsonx_token_in_env_var 2025-01-16 22:28:37 -08:00
Ishaan Jaff
7f63e7c15a test_completion_mistral_api_mistral_large_function_call 2025-01-16 22:27:48 -08:00
Ishaan Jaff
b492551d3d
(fix) IBM Watsonx using ZenApiKey (#7821)
* ibm watsonx fix

* test ZenAPIKey

* fix zenapikey
2025-01-16 22:02:36 -08:00
Ishaan Jaff
5b36985c00 run ci/cd again 2025-01-16 22:02:03 -08:00
Ishaan Jaff
ef9177f0a8 test_completion_mistral_api_mistral_large_function_call 2025-01-16 21:50:56 -08:00
Ishaan Jaff
117256d264 test_async_vertexai_streaming_response 2025-01-16 21:45:12 -08:00
Ishaan Jaff
5458a2ff33 fireworks ai use llama-v3p1-8b-instruct 2025-01-16 21:28:44 -08:00
Krrish Dholakia
8ab1335ae0 test: fix unit test 2025-01-16 21:11:17 -08:00
Ishaan Jaff
2f38e72026 test commit on main 2025-01-16 20:52:55 -08:00
Krish Dholakia
c57266c9dc
test: initial commit enforcing testing on all anthropic pass through … (#7794)
* test: initial commit enforcing testing on all anthropic pass through functions

prevents future regressions

* test(test_unit_test_anthropic_pass_through.py): add unit test for '_get_user_from_metadata' function

* test(test_unit_test_anthropic_passthrough.py): add unit test for handle_logging_anthropic_collected_chunks

* test(test_unit_test_anthropic_pass_through): add coverage for all anthropic pass through functions
2025-01-15 22:02:35 -08:00
Krish Dholakia
843cd3b7c6
test: initial test to enforce all functions in user_api_key_auth.py h… (#7797)
* test: initial test to enforce all functions in user_api_key_auth.py have direct testing

* test(test_user_api_key_auth.py): add is_allowed_route unit test

* test(test_user_api_key_auth.py): add more tests

* test(test_user_api_key_auth.py): add complete testing coverage for all functions in `user_api_key_auth.py`

* test(test_db_schema_changes.py): add a unit test to ensure all db schema changes are backwards compatible

gives user an easy rollback path

* test: fix schema compatibility test filepath

* test: fix test
2025-01-15 21:52:45 -08:00
Krish Dholakia
80d6bbec29
Litellm dev 01 14 2025 p2 (#7772)
* feat(pass_through_endpoints.py): fix anthropic end user cost tracking

* fix(anthropic/chat/transformation.py): use returned provider model for anthropic

handles anthropic `-latest` tag in request body throwing cost calculation errors

ensures we can be accurate in our model cost tracking

* feat(model_prices_and_context_window.json): add gemini-2.0-flash-thinking-exp pricing

* test: update test to use assumption that user_api_key_dict can get anthropic user id

* test: fix test

* fix: fix test

* fix(anthropic_pass_through.py): uncomment previous anthropic end-user cost tracking code block

can't guarantee user api key dict always has end user id - too many code paths

* fix(user_api_key_auth.py): this allows end user id from request body to always be read and set in auth object

* fix(auth_check.py): fix linting error

* test: fix auth check

* fix(auth_utils.py): fix get end user id to handle metadata = None
2025-01-15 21:34:50 -08:00
Krish Dholakia
fe60a38c8e
Litellm dev 01 2025 p4 (#7776)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(gemini/): support gemini 'frequency_penalty' and 'presence_penalty'

Closes https://github.com/BerriAI/litellm/issues/7748

* feat(proxy_server.py): new env var to disable prisma health check on startup

* test: fix test
2025-01-14 21:49:25 -08:00
Krish Dholakia
8353caa485
build(pyproject.toml): bump uvicorn depedency requirement (#7773)
* build(pyproject.toml): bump uvicorn depedency requirement

Fixes https://github.com/BerriAI/litellm/issues/7768

* fix(anthropic/chat/transformation.py): fix is_vertex_request check to actually use optional param passed in

Fixes https://github.com/BerriAI/litellm/issues/6898#issuecomment-2590860695

* fix(o1_transformation.py): fix azure o1 'is_o1_model' check to just check for o1 in model string

https://github.com/BerriAI/litellm/issues/7743

* test: load vertex creds
2025-01-14 21:47:11 -08:00
Ishaan Jaff
30bb4c4cdd
(fix) BaseAWSLLM - cache IAM role credentials when used (#7775)
* fix base aws llm

* fix auth with aws role

* test aws base llm

* fix base aws llm init

* run ci/cd again

* fix get_credentials

* ci/cd run again

* _auth_with_aws_role
2025-01-14 20:16:22 -08:00
Ishaan Jaff
5fbbf47581
(Feat) prometheus - emit remaining team budget metric on proxy startup (#7777)
* fix get_paginated_teams

* use _initialize_remaining_budget_metrics

* fix prom metric

* run ci/cd again

* fix run async func

* fix _initialize_prometheus_startup_metrics

* fix _initialize_prometheus_startup_metrics

* prom unit tests

* test_get_paginated_teams
2025-01-14 20:08:23 -08:00
Krish Dholakia
35919d9fec
Litellm dev 01 13 2025 p2 (#7758)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* fix(factory.py): fix bedrock document url check

Make check more generic - if starts with 'text' or 'application' assume it's a document and let it go through

 Fixes https://github.com/BerriAI/litellm/issues/7746

* feat(key_management_endpoints.py): support writing new key alias to aws secret manager - on key rotation

adds rotation endpoint to aws key management hook - allows for rotated litellm virtual keys with new key alias to be written to it

* feat(key_management_event_hooks.py): support rotating keys and updating secret manager

* refactor(base_secret_manager.py): support rotate secret at the base level

since it's just an abstraction function, it's easy to implement at the base manager level

* style: cleanup unused imports
2025-01-14 17:04:01 -08:00
Krish Dholakia
7b27cfb0ae
Support temporary budget increases on keys (#7754)
* fix(gpt_transformation.py): fix response_format translation check for 4o models

Fixes https://github.com/BerriAI/litellm/issues/7616

* feat(key_management_endpoints.py): support 'temp_budget_increase' and 'temp_budget_expiry' fields

Allow proxy admin to grant temporary budget increases to keys

* fix(proxy/_types.py): enforce temp_budget_increase and temp_budget_expiry are always passed together

* feat(user_api_key_auth.py): initial working temp budget increase logic

ensures key budget exceeded error checks for temp budget in key metadata

* feat(proxy_server.py): return the key max budget and key spend in the response headers

Allows clientside user to know their remaining limits

* test: add unit testing for new proxy utils

Ensures new key budget is correctly handled

* docs(temporary_budget_increase.md): add doc on temporary budget increase

* fix(utils.py): remove 3.5 from response_format check for now

not all azure  3.5 models support response_format

* fix(user_api_key_auth.py): return valid user api key auth object on all paths
2025-01-14 17:03:11 -08:00
Krish Dholakia
29663c2db5
Litellm dev 01 14 2025 p1 (#7771)
* First-class Aim Guardrails support (#7738)

* initial aim support

* add tests

* docs(langsmith_integration.md): cleanup

* style: cleanup unused imports

---------

Co-authored-by: Tomer Bin <117278227+hxtomer@users.noreply.github.com>
2025-01-14 16:18:21 -08:00
Ishaan Jaff
d510f1d517
(fix) health check - allow setting health_check_model (#7752)
* use _update_litellm_params_for_health_check

* fix Wildcard Routes

* test_update_litellm_params_for_health_check

* test_perform_health_check_with_health_check_model

* fix doc string

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 20:16:44 -08:00
Ishaan Jaff
9daa6fb0b4
(prometheus - minor bug fix) - litellm_llm_api_time_to_first_token_metric not populating for bedrock models (#7740)
* fix prometheus ttft

* fix test_set_latency_metrics

* fix _set_latency_metrics

* fix _set_latency_metrics

* fix test_set_latency_metrics

* test_async_log_success_event

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 20:16:34 -08:00
Ishaan Jaff
f1335362cf
(core sdk fix) - fix fallbacks stuck in infinite loop (#7751)
* test_acompletion_fallbacks_basic

* use common run_async_function

* fix completion_with_fallbacks

* fix completion with fallbacks

* fix fallback utils

* test_acompletion_fallbacks_basic

* test_completion_fallbacks_sync

* huggingface/mistralai/Mistral-7B-Instruct-v0.3
2025-01-13 19:34:34 -08:00
Ishaan Jaff
970e9c7507 huggingface/mistralai/Mistral-7B-Instruct-v0.3 2025-01-13 18:42:36 -08:00
Ishaan Jaff
3fe1f3b3b2 test_team_access_groups
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
2025-01-12 22:26:13 -08:00
Krish Dholakia
ec5a354eac
add azure o1 pricing (#7715)
* build(model_prices_and_context_window.json): add azure o1 pricing

Closes https://github.com/BerriAI/litellm/issues/7712

* refactor: replace regex with string method for whitespace check in stop-sequences handling (#7713)

* Allows overriding keep_alive time in ollama (#7079)

* Allows overriding keep_alive time in ollama

* Also adds to ollama_chat

* Adds some info on the docs about this parameter

* fix: together ai warning (#7688)

Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>

* fix(proxy_server.py): handle config containing thread locked objects when using get_config_state

* fix(proxy_server.py): add exception to debug

* build(model_prices_and_context_window.json): update 'supports_vision' for azure o1

---------

Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com>
Co-authored-by: Regis David Souza Mesquita <github@rdsm.dev>
Co-authored-by: Carl <45709281+capsenz@users.noreply.github.com>
Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>
2025-01-12 18:15:35 -08:00
Ishaan Jaff
15b52039d2
(litellm sdk speedup router) - adds a helper _cached_get_model_group_info to use when trying to get deployment tpm/rpm limits (#7719)
* fix _cached_get_model_group_info

* fixes get_remaining_model_group_usage

* test_cached_get_model_group_info
2025-01-12 15:14:54 -08:00
Krish Dholakia
ad2f66b3e3
[BETA] Add OpenAI /images/variations + Topaz API support (#7700)
* feat(main.py): initial commit for `/image/variations` endpoint support

* refactor(base_llm/): introduce new base llm base config for image variation endpoints

* refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler

* fix: test

* feat(openai/): working openai `/image/variation` endpoint calls via sdk

* feat(topaz/): topaz sync image variation call support

Addresses https://github.com/BerriAI/litellm/issues/7593

'

* fix(topaz/transformation.py): fix linting errors

* fix(openai/image_variations/handler.py): fix passing json data

* fix(main.py): image_variation/

support async image variation route - `aimage_variation`

* fix(test_get_model_info.py): fix test

* fix: cleanup unused imports

* feat(openai/): add async `/image/variations` endpoint support

* feat(topaz/): support async `/image/variations` calls

* fix: test

* fix(utils.py): fix get_model_info_helper for no model info w/ provider config

handles situation where model info is not known but provider config exists

* test(test_router_fallbacks.py): mark flaky test

* fix: fix unused imports

* test: bump otel load test perf threshold - accounts for current load tests hitting same server
2025-01-11 23:27:46 -08:00
Krish Dholakia
becd4bc748
Litellm dev 01 11 2025 p3 (#7702)
* fix(__init__.py): fix init to exclude pricing-only model cost values from real model names

prevents bad health checks on wildcard routes

* fix(get_llm_provider.py): fix to handle calling bedrock_converse models
2025-01-11 20:06:54 -08:00
Krish Dholakia
27892acdfc
Litellm dev 01 10 2025 p3 (#7682)
* feat(langfuse.py): log the used prompt when prompt management used

* test: fix test

* docs(self_serve.md): add doc on restricting personal key creation on ui

* feat(s3.py): support s3 logging with team alias prefixes (if available)

New preview feature

* fix(main.py): remove old if block - simplify to just await if coroutine returned

fixes lm_studio async embedding error

* fix(langfuse.py): handle get prompt check
2025-01-10 21:56:42 -08:00