Commit graph

1949 commits

Author SHA1 Message Date
Ishaan Jaff
6cef115bb0
(Security fix) - remove code block that inserts master key hash into DB (#8268)
* remove code block upserting master key hash to db

* run test to check if key upserted into db

* run ci/cd again

* litellm_proxy_security_tests

* litellm_proxy_security_tests

* run prisma entrypoint

* ci/cd run again

* fix test master key not in db
2025-02-05 17:25:42 -08:00
Krish Dholakia
4e34fc3bf8
[BETA] Support OIDC role based access to proxy (#8260)
* feat(proxy/_types.py): add new jwt field params

allows users + services to auth into proxy

* feat(handle_jwt.py): allow team role proxy access

allows proxy admin to set allowed team roles

* fix(proxy/_types.py): add 'routes' to role based permissions

allow proxy admin to restrict what routes a team can access easily

* feat(handle_jwt.py): support more flexible role based route access

v2 on role based 'allowed_routes'

* test(test_jwt.py): add unit test for rbac for proxy routes

* feat(handle_jwt.py): ensure cost tracking always works for any jwt request with `enforce_rbac=True`

* docs(token_auth.md): add documentation on controlling model access via OIDC Roles

* test: increase time delay before retrying

* test: handle model overloaded for test
2025-02-04 21:59:39 -08:00
Krrish Dholakia
c743475aba build: Squashed commit of the following:
commit 3e4e2cb20a
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date:   Tue Feb 4 15:10:34 2025 -0800

    fix(proxy_server.py): fix redirect from `/sso/key/callback` to redirect on custom server path

    Fixes https://github.com/BerriAI/litellm/issues/5997
2025-02-04 21:45:33 -08:00
Ishaan Jaff
7e1b79d446
(Bug fix) - Langfuse / Callback settings stored in DB (#8251)
* fix _decrypt_and_set_db_env_variables

* fix proxy config

* test callbacks in DB

* test langfuse callbacks in db

* test_e2e_langfuse_callbacks_in_db

* proxy_store_model_in_db_tests

* fix proxy_store_model_in_db_tests

* proxy_store_model_in_db_tests

* fix store_model_db_config.yaml

* fix check_langfuse_request

* fix test langfuse base url

* ci/cd run again
2025-02-04 21:09:37 -08:00
Ishaan Jaff
8a235e7d38
(Refactor / QA) - Use LoggingCallbackManager to append callbacks and ensure no duplicate callbacks are added (#8112)
* LoggingCallbackManager

* add logging_callback_manager

* use logging_callback_manager

* add add_litellm_failure_callback

* use add_litellm_callback

* use add_litellm_async_success_callback

* add_litellm_async_failure_callback

* linting fix

* fix logging callback manager

* test_duplicate_multiple_loggers_test

* use _reset_all_callbacks

* fix testing with dup callbacks

* test_basic_image_generation

* reset callbacks for tests

* fix check for _add_custom_logger_to_list

* fix test_amazing_sync_embedding

* fix _get_custom_logger_key

* fix batches testing

* fix _reset_all_callbacks

* fix _check_callback_list_size

* add callback_manager_test

* fix test gemini-2.0-flash-thinking-exp-01-21
2025-01-30 19:35:50 -08:00
Ishaan Jaff
b812286534
(fix) - proxy reliability, ensure duplicate callbacks are not added to proxy (#8067)
* refactor _add_callbacks_from_db_config

* fix check for _custom_logger_exists_in_litellm_callbacks

* move loc of test utils

* run ci/cd again

* test_add_custom_logger_callback_to_specific_event_with_duplicates_callbacks

* fix _custom_logger_class_exists_in_success_callbacks

* unit testing for test_add_callbacks_from_db_config

* test_custom_logger_exists_in_callbacks_individual_functions

* fix config.yml

* fix test test_stream_chunk_builder_openai_audio_output_usage - use direct dict comparison
2025-01-28 21:01:56 -08:00
Ishaan Jaff
7f2742334c
(UI) - allow assigning wildcard models to a team / key (#8041)
* fix message.error

* fix add return_wildcard_routes

* ui edit modelAvailableCall

* fetchAvailableModelsForTeamOrKey

* ui set all models for a team

* ui define common helpers

* edit create key button

* fix viewing model display names

* fix editing team models

* update gitignore

* add jest testing for ui

* Revert "add jest testing for ui"

This reverts commit 98f9a3ebfd.
2025-01-27 18:06:22 -08:00
Krish Dholakia
1e011b66d3
Ollama ssl verify = False + Spend Logs reliability fixes (#7931)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(http_handler.py): support passing ssl verify dynamically and using the correct httpx client based on passed ssl verify param

Fixes https://github.com/BerriAI/litellm/issues/6499

* feat(llm_http_handler.py): support passing `ssl_verify=False` dynamically in call args

Closes https://github.com/BerriAI/litellm/issues/6499

* fix(proxy/utils.py): prevent bad logs from breaking all cost tracking + reset list regardless of success/failure

prevents malformed logs from causing all spend tracking to break since they're constantly retried

* test(test_proxy_utils.py): add test to ensure bad log is dropped

* test(test_proxy_utils.py): ensure in-memory spend logs reset after bad log error

* test(test_user_api_key_auth.py): add unit test to ensure end user id as str works

* fix(auth_utils.py): ensure extracted end user id is always a str

prevents db cost tracking errors

* test(test_auth_utils.py): ensure get end user id from request body always returns a string

* test: update tests

* test: skip bedrock test- behaviour now supported

* test: fix testing

* refactor(spend_tracking_utils.py): reduce size of get_logging_payload

* test: fix test

* bump: version 1.59.4 → 1.59.5

* Revert "bump: version 1.59.4 → 1.59.5"

This reverts commit 1182b46b2e.

* fix(utils.py): fix spend logs retry logic

* fix(spend_tracking_utils.py): fix get tags

* fix(spend_tracking_utils.py): fix end user id spend tracking on pass-through endpoints
2025-01-23 23:05:41 -08:00
Krish Dholakia
513b1904ab
Add attempted-retries and timeout values to response headers + more testing (#7926)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 14s
* feat(router.py): add retry headers to response

makes it easy to add testing to ensure model-specific retries are respected

* fix(add_retry_headers.py): clarify attempted retries vs. max retries

* test(test_fallbacks.py): add test for checking if max retries set for model is respected

* test(test_fallbacks.py): assert values for attempted retries and max retries are as expected

* fix(utils.py): return timeout in litellm proxy response headers

* test(test_fallbacks.py): add test to assert model specific timeout used on timeout error

* test: add bad model with timeout to proxy

* fix: fix linting error

* fix(router.py): fix get model list from model alias

* test: loosen test restriction - account for other events on proxy
2025-01-22 22:19:44 -08:00
Krish Dholakia
866fffb50d
Litellm dev 01 21 2025 p1 (#7898)
* fix(utils.py): don't pass 'anthropic-beta' header to vertex - will cause request to fail

* fix(utils.py): add flag to allow user to disable filtering invalid headers

ensure user can control behaviour

* style(utils.py): cleanup message

* test(test_utils.py): add unit test to cover invalid header filtering

* fix(proxy_server.py): fix custom openapi schema generation

* fix(utils.py): pass extra headers if set

* fix(main.py): fix image variation to use 'client' param
2025-01-21 20:36:11 -08:00
Ishaan Jaff
b6f2e659b9
(Feat) Add x-litellm-overhead-duration-ms and "x-litellm-response-duration-ms" in response from LiteLLM (#7899)
* add track_llm_api_timing

* add track_llm_api_timing

* test_litellm_overhead

* use ResponseMetadata class for setting hidden params and response overhead

* instrument http handler

* fix track_llm_api_timing

* track_llm_api_timing

* emit response overhead on hidden params

* fix resp metadata

* fix make_sync_openai_embedding_request

* test_aaaaatext_completion_endpoint fixes

* _get_value_from_hidden_params

* set_hidden_params

* test_litellm_overhead

* test_litellm_overhead

* test_litellm_overhead

* fix import

* test_litellm_overhead_stream

* add LiteLLMLoggingObject

* use diff folder for testing

* use diff folder for overhead testing

* test litellm overhead

* use typing

* clear typing

* test_litellm_overhead

* fix async_streaming

* update_response_metadata

* move test file

* pply metadata to the response objec
2025-01-21 20:27:55 -08:00
Krish Dholakia
c8aa876785
fix(proxy_server.py): fix get model info when litellm_model_id is set + move model analytics to free (#7886)
* fix(proxy_server.py): fix get model info when litellm_model_id is set

Fixes https://github.com/BerriAI/litellm/issues/7873

* test(test_models.py): add test to ensure get model info on specific deployment has same value as all model info

Fixes https://github.com/BerriAI/litellm/issues/7873

* fix(usage.tsx): make model analytics free

Fixes @iqballx's feedback

* fix(fix(invoke_handler.py):-fix-bedrock-error-chunk-parsing): return correct bedrock status code and error message if chunk in stream

Improves bedrock stream error handling

* fix(proxy_server.py): fix linting errors

* test(test_auth_checks.py): remove redundant test

* fix(proxy_server.py): fix linting errors

* test: fix flaky test

* test: fix test
2025-01-21 08:19:07 -08:00
Krish Dholakia
d00febcdaa
/key/delete - allow team admin to delete team keys (#7846)
* fix(key_management_endpoints.py): fix key delete to allow team admins + other proxy admins to delete keys

Fixes https://github.com/BerriAI/litellm/issues/7760

* fix(key_management_endpoints.py): remove unused variables

* fix(key_management_endpoints.py): fix linting error
2025-01-17 20:16:12 -08:00
Ishaan Jaff
9b944ca60c
(Fix + Testing) - Add dd-trace-run to litellm ci/cd pipeline + fix bug caused by dd-trace patching OpenAI sdk (#7820)
* add dd trace to e2e docker run tests

* update dd trace v

* fix entrypoint

* dd trace fixes

* proxy_build_from_pip_tests

* build python3.13

* use py 3.13

* fix build from pip

* dd trace fix

* proxy_build_from_pip_tests

* bump build from pip
2025-01-16 22:03:09 -08:00
Krish Dholakia
fe60a38c8e
Litellm dev 01 2025 p4 (#7776)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(gemini/): support gemini 'frequency_penalty' and 'presence_penalty'

Closes https://github.com/BerriAI/litellm/issues/7748

* feat(proxy_server.py): new env var to disable prisma health check on startup

* test: fix test
2025-01-14 21:49:25 -08:00
Krish Dholakia
7b27cfb0ae
Support temporary budget increases on keys (#7754)
* fix(gpt_transformation.py): fix response_format translation check for 4o models

Fixes https://github.com/BerriAI/litellm/issues/7616

* feat(key_management_endpoints.py): support 'temp_budget_increase' and 'temp_budget_expiry' fields

Allow proxy admin to grant temporary budget increases to keys

* fix(proxy/_types.py): enforce temp_budget_increase and temp_budget_expiry are always passed together

* feat(user_api_key_auth.py): initial working temp budget increase logic

ensures key budget exceeded error checks for temp budget in key metadata

* feat(proxy_server.py): return the key max budget and key spend in the response headers

Allows clientside user to know their remaining limits

* test: add unit testing for new proxy utils

Ensures new key budget is correctly handled

* docs(temporary_budget_increase.md): add doc on temporary budget increase

* fix(utils.py): remove 3.5 from response_format check for now

not all azure  3.5 models support response_format

* fix(user_api_key_auth.py): return valid user api key auth object on all paths
2025-01-14 17:03:11 -08:00
Krish Dholakia
ec5a354eac
add azure o1 pricing (#7715)
* build(model_prices_and_context_window.json): add azure o1 pricing

Closes https://github.com/BerriAI/litellm/issues/7712

* refactor: replace regex with string method for whitespace check in stop-sequences handling (#7713)

* Allows overriding keep_alive time in ollama (#7079)

* Allows overriding keep_alive time in ollama

* Also adds to ollama_chat

* Adds some info on the docs about this parameter

* fix: together ai warning (#7688)

Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>

* fix(proxy_server.py): handle config containing thread locked objects when using get_config_state

* fix(proxy_server.py): add exception to debug

* build(model_prices_and_context_window.json): update 'supports_vision' for azure o1

---------

Co-authored-by: Wolfram Ravenwolf <52386626+WolframRavenwolf@users.noreply.github.com>
Co-authored-by: Regis David Souza Mesquita <github@rdsm.dev>
Co-authored-by: Carl <45709281+capsenz@users.noreply.github.com>
Co-authored-by: Carl Senze <carl.senze@aleph-alpha.com>
2025-01-12 18:15:35 -08:00
Ishaan Jaff
02f5c44a35
[Bug fix]: Proxy Auth Layer - Allow Azure Realtime routes as llm_api_routes (#7684)
* fix route check azure realtime endpoints

* test_is_llm_api_route

* fix /realtime

* test_routes_on_litellm_proxy
2025-01-10 20:38:06 -08:00
Krish Dholakia
c10ae8879e
fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… (#7660)
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url

* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic

deduplicates code

* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages

* docs(prompt_management.md): update prompt management to be in beta

given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)

* refactor(prompt_management_base.py): introduce base class for prompt management

allows consistent behaviour across prompt management integrations

* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base

* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set

allows tracking what prompt was used for what purpose

* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse

allows logging prompt id / prompt variables to langfuse

* test: fix test

* fix(router.py): cleanup unused imports

* fix: fix linting error

* fix: fix trace param typing

* fix: fix linting errors

* fix: fix code qa check
2025-01-10 07:31:59 -08:00
Krish Dholakia
63926f484c
feat(ui_sso.py): Allows users to use test key pane, and have team budget limits be enforced for their use-case (#7666) 2025-01-09 22:12:45 -08:00
Krish Dholakia
07c5f136f1
fix(utils.py): fix select tokenizer for custom tokenizer (#7599)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 36s
* fix(utils.py): fix select tokenizer for custom tokenizer

* fix(router.py): fix 'utils/token_counter' endpoint
2025-01-07 22:37:09 -08:00
Ishaan Jaff
6125ba1e2b
(Feat) - allow including dd-trace in litellm base image (#7587)
* introduce USE_DDTRACE=true

* update dd tracer

* update

* bump dd trace

* use og slim image

* DD tracing

* fix _init_dd_tracer
2025-01-06 17:27:09 -08:00
Ishaan Jaff
716efd5fad
(fix proxy perf) use _read_request_body instead of ast.literal_eval to get better performance (#7545)
* fix ast literal eval

* run ci/cd again
2025-01-03 17:48:32 -08:00
Krish Dholakia
6843f3a2bb
Revert "fix: add missing parameters order, limit, before, and after in get_as…" (#7542)
This reverts commit 4b0505dffd.
2025-01-03 16:32:12 -08:00
Jean Carlo de Souza
4b0505dffd
fix: add missing parameters order, limit, before, and after in get_assistants method for openai (#7537)
- Ensured that `before` and `after` parameters are only passed when provided to avoid AttributeError.
- Implemented safe access using default values for `before` and `after` to prevent missing attribute issues.
- Added consistent handling of `order` and `limit` to improve flexibility and robustness in API calls.
2025-01-03 14:41:54 -08:00
Krish Dholakia
33f301ec86
Litellm dev 01 02 2025 p1 (#7516)
* fix(redact_messages.py): fix redact messages for non-model response input to be dictionary

fixes issue with otel logging when message redaction is enabled

* fix(proxy_server.py): fix langfuse key leak in exception string

* test: fix test

* test: fix test

* test: fix tests
2025-01-03 14:40:57 -08:00
Ishaan Jaff
cf60444916
(Feat) Add support for reading secrets from Hashicorp vault (#7497)
* HashicorpSecretManager

* test_hashicorp_secret_managerv

* use 1 helper initialize_secret_manager

* add HASHICORP_VAULT

* working config

* hcorp read_secret

* HashicorpSecretManager

* add secret_manager_testing

* use 1 folder for secret manager testing

* test_hashicorp_secret_manager_get_secret

* HashicorpSecretManager

* docs HCP secrets

* update folder name

* docs hcorp secret manager

* remove unused imports

* add conftest.py

* fix tests

* docs document env vars
2025-01-01 18:35:05 -08:00
Ishaan Jaff
03b1db5a7d
(Feat) - Add PagerDuty Alerting Integration (#7478)
* define basic types

* fix verbose_logger.exception statement

* fix basic alerting

* test pager duty alerting

* test_pagerduty_alerting_high_failure_rate

* PagerDutyAlerting

* async_log_failure_event

* use pre_call_hook

* add _request_is_completed helper util

* update AlertingConfig

* rename PagerDutyInternalEvent

* _send_alert_if_thresholds_crossed

* use pagerduty as _custom_logger_compatible_callbacks_literal

* fix slack alerting imports

* fix imports in slack alerting

* PagerDutyAlerting

* fix _load_alerting_settings

* test_pagerduty_hanging_request_alerting

* working pager duty alerting

* fix linting

* doc pager duty alerting

* update hanging_response_handler

* fix import location

* update failure_threshold

* update async_pre_call_hook

* docs pagerduty

* test - callback_class_str_to_classType

* fix linting errors

* fix linting + testing error

* PagerDutyAlerting

* test_pagerduty_hanging_request_alerting

* fix unused imports

* docs pager duty

* @pytest.mark.flaky(retries=6, delay=2)

* test_model_info_bedrock_converse_enforcement
2025-01-01 07:12:51 -08:00
Krish Dholakia
39cbd9d878
Litellm dev 12 31 2024 p1 (#7488)
* fix(internal_user_endpoints.py): fix team list sort - handle team_alias being set + None

* fix(key_management_endpoints.py): allow team admin to create key for member via admin ui

Fixes https://github.com/BerriAI/litellm/issues/7482

* fix(proxy_server.py): allow querying info on specific model group via `/model_group/info`

allows client-side user to get model info from proxy

* fix(proxy_server.py): add docstring on `/model_group/info` showing how to filter by model name

* test(test_proxy_utils.py): add unit test for returning model group info filtered

* fix(proxy_server.py): fix query param

* fix(test_Get_model_info.py): handle no whitelisted bedrock modells
2024-12-31 23:21:51 -08:00
Krish Dholakia
080de89cfb
Fix team-based logging to langfuse + allow custom tokenizer on /token_counter endpoint (#7493)
* fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class

* fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well

* feat(proxy_server.py): support setting custom tokenizer on config.yaml

Allows customizing value for `/utils/token_counter`

* fix(proxy_server.py): fix linting errors

* test: skip if file not found

* style: cleanup unused import

* docs(configs.md): add docs on setting custom tokenizer
2024-12-31 23:18:41 -08:00
Ishaan Jaff
3158dcf88b
(Security fix) - Upgrade to fastapi==0.115.5 (#7447)
* fix upgrade fast api

* bump fastapi

* update a proxy startup tests

* remove unused test file

* update tests

* bump fast api
2024-12-28 17:08:19 -08:00
Krish Dholakia
539f166166
Support budget/rate limit tiers for keys (#7429)
* feat(proxy/utils.py): get associated litellm budget from db in combined_view for key

allows user to create rate limit tiers and associate those to keys

* feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set

allows rate limit tiers to be easily applied to keys

* docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers

make feature discoverable

* feat(key_management_endpoints.py): return litellm_budget_table value in key generate

make it easy for user to know associated budget on key creation

* fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate`

* docs(key_management_endpoints.py): document budget_id usage

* refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it

* docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs

* fix(customer_endpoints.py): use new pydantic obj name

* docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm

* Litellm dev 12 26 2024 p2 (#7432)

* (Feat) Add logging for `POST v1/fine_tuning/jobs`  (#7426)

* init commit ft jobs logging

* add ft logging

* add logging for FineTuningJob

* simple FT Job create test

* (docs) - show all supported Azure OpenAI endpoints in overview  (#7428)

* azure batches

* update doc

* docs azure endpoints

* docs endpoints on azure

* docs azure batches api

* docs azure batches api

* fix(key_management_endpoints.py): fix key update to actually work

* test(test_key_management.py): add e2e test asserting ui key update call works

* fix: proxy/_types - fix linting erros

* test: update test

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* fix: test

* fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers

* fix: fix linting errors

* test: fix test

* fix: remove unused import

* test: update test

* docs(customer_endpoints.py): document new model_max_budget param

* test: specify unique key alias

* docs(budget_management_endpoints.py): document new model_max_budget param

* test: fix test

* test: fix tests

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-12-26 19:05:27 -08:00
Krish Dholakia
2e86a4806d
Litellm dev 12 24 2024 p2 (#7400)
* fix(utils.py): default custom_llm_provider=None for 'supports_response_schema'

Closes https://github.com/BerriAI/litellm/issues/7397

* refactor(langfuse/): call langfuse logger inside customlogger compatible langfuse class, refactor langfuse logger to use verbose_logger.debug instead of print_verbose

* refactor(litellm_pre_call_utils.py): move config based team callbacks inside dynamic team callback logic

enables simpler unit testing for config-based team callbacks

* fix(proxy/_types.py): handle teamcallbackmetadata - none values

drop none values if present. if all none, use default dict to avoid downstream errors

* test(test_proxy_utils.py): add unit test preventing future issues - asserts team_id in config state not popped off across calls

Fixes https://github.com/BerriAI/litellm/issues/6787

* fix(langfuse_prompt_management.py): add success + failure logging event support

* fix: fix linting error

* test: fix test

* test: fix test

* test: override o1 prompt caching - openai currently not working

* test: fix test
2024-12-24 20:33:41 -08:00
Ishaan Jaff
47e12802df
(feat) /batches Add support for using /batches endpoints in OAI format (#7402)
* run azure testing on ci/cd

* update docs on azure batches endpoints

* add input azure.jsonl

* refactor - use separate file for batches endpoints

* fixes for passing custom llm provider to /batch endpoints

* pass custom llm provider to files endpoints

* update azure batches doc

* add info for azure batches api

* update batches endpoints

* use simple helper for raising proxy exception

* update config.yml

* fix imports

* update tests

* use existing settings

* update env var used

* update configs

* update config.yml

* update ft testing
2024-12-24 16:58:05 -08:00
Krish Dholakia
db59e08958
Litellm dev 12 23 2024 p1 (#7383)
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint

Allow users to view what the available guardrails are

* docs: document new `/guardrails/list` endpoint

* docs(enterprise.md): update docs

* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats

* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests

* fix: fix linting errors

* fix: remove unused import
2024-12-23 16:33:31 -08:00
Krish Dholakia
a8ae2f551a
Litellm enforce enterprise features (#7357)
* fix(proxy_server.py): enforce team id based model add only works if enterprise user

* fix(auth_checks.py): enforce common_checks can only be imported by user_api_key_auth.py

* fix(auth_checks.py): insert not premium user error message on failed common checks run
2024-12-21 19:14:13 -08:00
Ishaan Jaff
ce41cd977c
(Admin UI) - Test Key Tab - Allow using UI Session instead of manually creating a virtual key (#7348)
* ui fix - allow searching model list + fix bug on filtering

* qa fix - use correct provider name for azure_text

* ui wrap content onto next line

* ui fix - allow selecting current UI session when logging in

* ui session budgets
2024-12-21 13:14:15 -08:00
Krish Dholakia
4c7a3931b7
Litellm dev 12 19 2024 p2 (#7315)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 46s
* fix(proxy_server.py): only update k,v pair if v is not empty/null

Fixes https://github.com/BerriAI/litellm/issues/6787

* test(test_router.py): cleanup duplicate calls

* test: add new test stream options drop params test

* test: update optional params / stream options test to test for vertex ai mistral route specifically

Addresses https://github.com/BerriAI/litellm/issues/7309

* fix(proxy_server.py): fix linting errors

* fix: fix linting errors
2024-12-19 20:28:16 -08:00
Ishaan Jaff
c7f14e936a
(code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Ishaan Jaff
4b2958b8b8
fix use 1 file _PROXY_track_cost_callback (#7304) 2024-12-18 19:46:26 -08:00
Ishaan Jaff
6261ec3599
(feat proxy) v2 - model max budgets (#7302)
* clean up unused code

* add _PROXY_VirtualKeyModelMaxBudgetLimiter

* adjust type imports

* working _PROXY_VirtualKeyModelMaxBudgetLimiter

* fix user_api_key_model_max_budget

* fix user_api_key_model_max_budget

* update naming

* update naming

* fix changes to RouterBudgetLimiting

* test_call_with_key_over_model_budget

* test_call_with_key_over_model_budget

* handle _get_request_model_budget_config

* e2e test for test_call_with_key_over_model_budget

* clean up test

* run ci/cd again

* add validate_model_max_budget

* docs fix

* update doc

* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter

* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
0fe8bfe87a
fix(proxy_server.py): pass model access groups to get_key/get_team mo… (#7281)
* fix(proxy_server.py): pass model access groups to get_key/get_team models

allows end user to see actual models they have access to, instead of default models

* fix(auth_checks.py): fix linting errors

* fix: fix linting errors
2024-12-18 09:33:33 -08:00
Krish Dholakia
516c2a6a70
Litellm remove circular imports (#7232)
* fix(utils.py): initial commit to remove circular imports - moves llmproviders to utils.py

* fix(router.py): fix 'litellm.EmbeddingResponse' import from router.py

'

* refactor: fix litellm.ModelResponse import on pass through endpoints

* refactor(litellm_logging.py): fix circular import for custom callbacks literal

* fix(factory.py): fix circular imports inside prompt factory

* fix(cost_calculator.py): fix circular import for 'litellm.Usage'

* fix(proxy_server.py): fix potential circular import with `litellm.Router'

* fix(proxy/utils.py): fix potential circular import in `litellm.Router`

* fix: remove circular imports in 'auth_checks' and 'guardrails/'

* fix(prompt_injection_detection.py): fix router impor t

* fix(vertex_passthrough_logging_handler.py): fix potential circular imports in vertex pass through

* fix(anthropic_pass_through_logging_handler.py): fix potential circular imports

* fix(slack_alerting.py-+-ollama_chat.py): fix modelresponse import

* fix(base.py): fix potential circular import

* fix(handler.py): fix potential circular ref in codestral + cohere handler's

* fix(azure.py): fix potential circular imports

* fix(gpt_transformation.py): fix modelresponse import

* fix(litellm_logging.py): add logging base class - simplify typing

makes it easy for other files to type check the logging obj without introducing circular imports

* fix(azure_ai/embed): fix potential circular import on handler.py

* fix(databricks/): fix potential circular imports in databricks/

* fix(vertex_ai/): fix potential circular imports on vertex ai embeddings

* fix(vertex_ai/image_gen): fix import

* fix(watsonx-+-bedrock): cleanup imports

* refactor(anthropic-pass-through-+-petals): cleanup imports

* refactor(huggingface/): cleanup imports

* fix(ollama-+-clarifai): cleanup circular imports

* fix(openai_like/): fix impor t

* fix(openai_like/): fix embedding handler

cleanup imports

* refactor(openai.py): cleanup imports

* fix(sagemaker/transformation.py): fix import

* ci(config.yml): add circular import test to ci/cd
2024-12-14 16:28:34 -08:00
Krish Dholakia
ec36353b41
fix(main.py): fix retries being multiplied when using openai sdk (#7221)
* fix(main.py): fix retries being multiplied when using openai sdk

Closes https://github.com/BerriAI/litellm/pull/7130

* docs(prompt_management.md): add langfuse prompt management doc

* feat(team_endpoints.py): allow teams to add their own models

Enables teams to call their own finetuned models via the proxy

* test: add better enforcement check testing for `/model/new` now that teams can add their own models

* docs(team_model_add.md): tutorial for allowing teams to add their own models

* test: fix test
2024-12-14 11:56:55 -08:00
Krish Dholakia
e68bb4e051
Litellm dev 12 12 2024 (#7203)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 47s
* fix(azure/): support passing headers to azure openai endpoints

Fixes https://github.com/BerriAI/litellm/issues/6217

* fix(utils.py): move default tokenizer to just openai

hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls

* fix(router.py): fix pattern matching router - add generic "*" to it as well

Fixes issue where generic "*" model access group wouldn't show up

* fix(pattern_match_deployments.py): match to more specific pattern

match to more specific pattern

allows setting generic wildcard model access group and excluding specific models more easily

* fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty

don't delete all router models  b/c of empty list

Fixes https://github.com/BerriAI/litellm/issues/7196

* fix(anthropic/): fix handling response_format for anthropic messages with anthropic api

* fix(fireworks_ai/): support passing response_format + tool call in same message

Addresses https://github.com/BerriAI/litellm/issues/7135

* Revert "fix(fireworks_ai/): support passing response_format + tool call in same message"

This reverts commit 6a30dc6929.

* test: fix test

* fix(replicate/): fix replicate default retry/polling logic

* test: add unit testing for router pattern matching

* test: update test to use default oai tokenizer

* test: mark flaky test

* test: skip flaky test
2024-12-13 08:54:03 -08:00
Ishaan Jaff
b889d7c72f
(feat) UI - Disable Usage Tab once SpendLogs is 1M+ Rows (#7208)
* use utils to set proxy spend logs row count

* store proxy state variables

* fix check for _has_user_setup_sso

* fix proxyStateVariables

* fix dup code

* rename getProxyUISettings

* add fixes

* ui emit num spend logs rows

* test_proxy_server_prisma_setup

* use MAX_SPENDLOG_ROWS_TO_QUERY to constants

* test_get_ui_settings_spend_logs_threshold
2024-12-12 18:43:17 -08:00
Krish Dholakia
e4493248ae
Litellm dev 12 06 2024 (#7067)
* fix(edit_budget_modal.tsx): call `/budget/update` endpoint instead of `/budget/new`

allows updating existing budget on ui

* fix(user_api_key_auth.py): support cost tracking for end user via jwt field

* fix(presidio.py): support pii masking on sync logging callbacks

enables masking before logging to langfuse

* feat(utils.py): support retry policy logic inside '.completion()'

Fixes https://github.com/BerriAI/litellm/issues/6623

* fix(utils.py): support retry by retry policy on async logic as well

* fix(handle_jwt.py): set leeway default leeway value

* test: fix test to handle jwt audience claim
2024-12-06 22:44:18 -08:00
Krish Dholakia
816f0ef8d2
LiteLLM Minor Fixes & Improvements (12/05/2024) (#7051)
* fix(cost_calculator.py): move to using `.get_model_info()` for cost per token calculations

ensures cost tracking is reliable - handles edge cases of parsing model cost map

* build(model_prices_and_context_window.json): add 'supports_response_schema' for select tgai models

Fixes https://github.com/BerriAI/litellm/pull/7037#discussion_r1872157329

* build(model_prices_and_context_window.json): remove 'pdf input' and 'vision' support from nova micro in model map

Bedrock docs indicate no support for micro - https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html

* fix(converse_transformation.py): support amazon nova tool use

* fix(opentelemetry): Add missing LLM request type attribute to spans (#7041)

* feat(opentelemetry): add LLM request type attribute to spans

* lint

* fix: curl usage (#7038)

curl -d, --data <data> is lowercase d
curl -D, --dump-header <filename> is uppercase D

references:
https://curl.se/docs/manpage.html#-d
https://curl.se/docs/manpage.html#-D

* fix(spend_tracking.py): handle empty 'id' in model response - when creating spend log

Fixes https://github.com/BerriAI/litellm/issues/7023

* fix(streaming_chunk_builder.py): handle initial id being empty string

Fixes https://github.com/BerriAI/litellm/issues/7023

* fix(anthropic_passthrough_logging_handler.py): add end user cost tracking for anthropic pass through endpoint

* docs(pass_through/): refactor docs location + add table on supported features for pass through endpoints

* feat(anthropic_passthrough_logging_handler.py): support end user cost tracking via anthropic sdk

* docs(anthropic_completion.md): add docs on passing end user param for cost tracking on anthropic sdk

* fix(litellm_logging.py): use standard logging payload if present in kwargs

prevent datadog logging error for pass through endpoints

* docs(bedrock.md): add rerank api usage example to docs

* bugfix/change dummy tool name format (#7053)

* fix viewing keys (#7042)

* ui new build

* build(model_prices_and_context_window.json): add bedrock region models to model cost map (#7044)

* bye (#6982)

* (fix) litellm router.aspeech  (#6962)

* doc Migrating Databases

* fix aspeech on router

* test_audio_speech_router

* test_audio_speech_router

* docs show supported providers on batches api doc

* change dummy tool name format

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>

* fix: fix linting errors

* test: update test

* fix(litellm_logging.py): fix pass through check

* fix(test_otel_logging.py): fix test

* fix(cost_calculator.py): update handling for cost per second

* fix(cost_calculator.py): fix cost check

* test: fix test

* (fix) adding public routes when using custom header  (#7045)

* get_api_key_from_custom_header

* add test_get_api_key_from_custom_header

* fix testing use 1 file for test user api key auth

* fix test user api key auth

* test_custom_api_key_header_name

* build: update ui build

---------

Co-authored-by: Doron Kopit <83537683+doronkopit5@users.noreply.github.com>
Co-authored-by: lloydchang <lloydchang@gmail.com>
Co-authored-by: hgulersen <haymigulersen@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
2024-12-06 14:29:53 -08:00
Ishaan Jaff
84db69d4c4
(feat) add Vertex Batches API support in OpenAI format (#7032)
* working request

* working transform

* working request

* transform vertex batch response

* add _async_create_batch

* move gcs functions to base

* fix _get_content_from_openai_file

* transform_openai_file_content_to_vertex_ai_file_content

* fix transform vertex gcs bucket upload to OAI files format

* working e2e test

* _get_gcs_object_name

* fix linting

* add doc string

* fix transform_gcs_bucket_response_to_openai_file_object

* use vertex for batch endpoints

* add batches support for vertex

* test_vertex_batches_endpoint

* test_vertex_batch_prediction

* fix gcs bucket base auth

* docs clean up batches

* docs Batch API

* docs vertex batches api

* test_get_gcs_logging_config_without_service_account

* undo change

* fix vertex md

* test_get_gcs_logging_config_without_service_account

* ci/cd run again
2024-12-04 19:40:28 -08:00
Ishaan Jaff
05f810922c
(feat) Allow disabling ErrorLogs written to the DB (#6940)
* fix - allow disabling logging error logs

* docs on disabling error logs

* doc string for _PROXY_failure_handler

* test_disable_error_logs

* rename file

* fix rename file

* increase test coverage for test_enable_error_logs
2024-11-27 19:34:51 -08:00