Commit graph

112 commits

Author SHA1 Message Date
Ishaan Jaff
62a1cdec47 (code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Ishaan Jaff
6220e17ebf (feat proxy) v2 - model max budgets (#7302)
* clean up unused code

* add _PROXY_VirtualKeyModelMaxBudgetLimiter

* adjust type imports

* working _PROXY_VirtualKeyModelMaxBudgetLimiter

* fix user_api_key_model_max_budget

* fix user_api_key_model_max_budget

* update naming

* update naming

* fix changes to RouterBudgetLimiting

* test_call_with_key_over_model_budget

* test_call_with_key_over_model_budget

* handle _get_request_model_budget_config

* e2e test for test_call_with_key_over_model_budget

* clean up test

* run ci/cd again

* add validate_model_max_budget

* docs fix

* update doc

* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter

* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
050499ec8f Litellm dev readd prompt caching (#7299)
* fix(router.py): re-add saving model id on prompt caching valid successful deployment

* fix(router.py): introduce optional pre_call_checks

isolate prompt caching logic in a separate file

* fix(prompt_caching_deployment_check.py): fix import

* fix(router.py): new 'async_filter_deployments' event hook

allows custom logger to filter deployments returned to routing strategy

* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing

* fix(cooldown_callbacks.py): fix linting error

* fix(budget_limiter.py): move budget logger to async_filter_deployment hook

* test: add unit test

* test(test_router_helper_utils.py): add unit testing

* fix(budget_limiter.py): fix linting errors

* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Ishaan Jaff
533381d4ad tag budgets fixes 2024-12-18 10:28:37 -08:00
Krish Dholakia
03e711e3e4 LITELLM: Remove requests library usage (#7235)
* fix(generic_api_callback.py): remove requests lib usage

* fix(budget_manager.py): remove requests lib usgae

* fix(main.py): cleanup requests lib usage

* fix(utils.py): remove requests lib usage

* fix(argilla.py): fix argilla test

* fix(athina.py): replace 'requests' lib usage with litellm module

* fix(greenscale.py): replace 'requests' lib usage with httpx

* fix: remove unused 'requests' lib import + replace usage in some places

* fix(prompt_layer.py): remove 'requests' lib usage from prompt layer

* fix(ollama_chat.py): remove 'requests' lib usage

* fix(baseten.py): replace 'requests' lib usage

* fix(codestral/): replace 'requests' lib usage

* fix(predibase/): replace 'requests' lib usage

* refactor: cleanup unused 'requests' lib imports

* fix(oobabooga.py): cleanup 'requests' lib usage

* fix(invoke_handler.py): remove unused 'requests' lib usage

* refactor: cleanup unused 'requests' lib import

* fix: fix linting errors

* refactor(ollama/): move ollama to using base llm http handler

removes 'requests' lib dep for ollama integration

* fix(ollama_chat.py): fix linting errors

* fix(ollama/completion/transformation.py): convert non-jpeg/png image to jpeg/png before passing to ollama
2024-12-17 12:50:04 -08:00
Ishaan Jaff
2459f9735d (feat) Add Tag-based budgets on litellm router / proxy (#7236)
* add BudgetConfig

* add _get_tags_from_request_kwargs

* test_tag_budgets_e2e_test_expect_to_fail

* add a check for request tags

* fix _async_get_cache_keys_for_router_budget_limiting

* fix test

* fix _sync_in_memory_spend_with_redis

* _async_get_cache_keys_for_router_budget_limiting

* fix _init_tag_budgets

* fix type casting

* docs show error for tag budget limit hit

* fix _get_tags_from_request_kwargs

* fix undo change
2024-12-14 17:28:36 -08:00
Ishaan Jaff
bc46916bb3 (feat - Router / Proxy ) Allow setting budget limits per LLM deployment (#7220)
* fix test_deployment_budget_limits_e2e_test

* refactor async_log_success_event to track spend for provider + deployment

* fix format

* rename class to RouterBudgetLimiting

* rename func

* rename types used for budgets

* add new types for deployment budgets

* add budget limits for deployments

* fix checking budgets set for provider

* update file names

* fix linting error

* _track_provider_remaining_budget_prometheus

* async_filter_deployments

* fix model list passed to router

* update error

* test_deployment_budgets_e2e_test_expect_to_fail

* fix test case

* run deployment budget limits
2024-12-13 19:15:51 -08:00
Ishaan Jaff
6a9225fac2 (Refactor) Code Quality improvement - stop redefining LiteLLMBase (#7147)
* fix stop redefining  LiteLLMBase

* use better name for base pydantic obj
2024-12-10 15:49:01 -08:00
Ishaan Jaff
fc7a9830ab Provider Budget Routing - Get Budget, Spend Details (#7063)
* add async_get_ttl to dual cache

* add ProviderBudgetResponse

* add provider_budgets

* test_redis_get_ttl

* _init_or_get_provider_budget_in_cache

* test_init_or_get_provider_budget_in_cache

* use _init_provider_budget_in_cache

* test_get_current_provider_budget_reset_at

* doc Get Budget, Spend Details

* doc Provider Budget Routing
2024-12-06 21:14:12 -08:00
Ishaan Jaff
e47ebefced (feat) - provider budget improvements - ensure provider budgets work with multiple proxy instances + improve latency to ~90ms (#6886)
* use 1 file for duration_in_seconds

* add to readme.md

* re use duration_in_seconds

* fix importing _extract_from_regex, get_last_day_of_month

* fix import

* update provider budget routing

* fix - remove dup test

* add support for using in multi instance environments

* test_in_memory_redis_sync_e2e

* test_in_memory_redis_sync_e2e

* fix test_in_memory_redis_sync_e2e

* fix code quality check

* fix test provider budgets

* working provider budget tests

* add fixture for provider budget routing

* fix router testing for provider budgets

* add comments on provider budget routing

* use RedisPipelineIncrementOperation

* add redis async_increment_pipeline

* use redis async_increment_pipeline

* use lower value for testing

* use redis async_increment_pipeline

* use consistent key name for increment op

* add handling for budget windows

* fix typing async_increment_pipeline

* fix set attr

* add clear doc strings

* unit testing for provider budgets

* test_redis_increment_pipeline
2024-11-24 16:36:19 -08:00
Ishaan Jaff
72afed5b7e (QOL improvement) Provider budget routing - allow using 1s, 1d, 1mo, 2mo etc (#6885)
* use 1 file for duration_in_seconds

* add to readme.md

* re use duration_in_seconds

* fix importing _extract_from_regex, get_last_day_of_month

* fix import

* update provider budget routing

* fix - remove dup test
2024-11-23 16:59:46 -08:00
Ishaan Jaff
64b46e32cf (feat) provider budget routing improvements (#6827)
* minor fix for provider budget

* fix raise good error message when budget crossed for provider budget

* fix test provider budgets

* test provider budgets

* feat - emit llm provider spend on prometheus

* test_prometheus_metric_tracking

* doc provider budgets
2024-11-19 21:25:08 -08:00
Ishaan Jaff
ce6465c9df (Feat) Add provider specific budget routing (#6817)
* add ProviderBudgetConfig

* working test_provider_budgets_e2e_test

* test_provider_budgets_e2e_test_expect_to_fail

* use 1 cache read for getting provider spend

* test_provider_budgets_e2e_test

* add doc on provider budgets

* clean up provider budgets

* unit testing for provider budget routing

* use as flag, not routing strat

* fix init provider budget routing

* use async_filter_deployments

* fix test provider budgets

* doc provider budget routing

* doc provider budget routing

* fix docs changes

* fix comment
2024-11-19 20:25:27 -08:00
Krish Dholakia
3c591167e0 fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check

* fix(lowest_tpm_rpm_v2.py): return headers in correct format

* test: update test

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* test: remove eol model

* fix(proxy_server.py): fix db config loading logic

* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten

* test: skip test if required env var is missing

* test: fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
2024-11-05 22:03:44 +05:30
Ishaan Jaff
5986f7457e (router_strategy/) ensure all async functions use async cache methods (#6489)
* fix router strat

* use async set / get cache in router_strategy

* add coverage for router strategy

* fix imports

* fix batch_get_cache

* use async methods for least busy

* fix least busy use async methods

* fix test_dual_cache_increment

* test async_get_available_deployment when routing_strategy="least-busy"
2024-10-29 21:07:17 +05:30
Krish Dholakia
e712a2090b redis otel tracing + async support for latency routing (#6452)
* docs(exception_mapping.md): add missing exception types

Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183

* fix(main.py): register custom model pricing with specific key

Ensure custom model pricing is registered to the specific model+provider key combination

* test: make testing more robust for custom pricing

* fix(redis_cache.py): instrument otel logging for sync redis calls

ensures complete coverage for all redis cache calls

* refactor: pass parent_otel_span for redis caching calls in router

allows for more observability into what calls are causing latency issues

* test: update tests with new params

* refactor: ensure e2e otel tracing for router

* refactor(router.py): add more otel tracing acrosss router

catch all latency issues for router requests

* fix: fix linting error

* fix(router.py): fix linting error

* fix: fix test

* test: fix tests

* fix(dual_cache.py): pass ttl to redis cache

* fix: fix param
2024-10-28 21:52:12 -07:00
Ishaan Jaff
0c5a47c404 (code quality) add ruff check PLR0915 for too-many-statements (#6309)
* ruff add PLR0915

* add noqa for PLR0915

* fix noqa

* add # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* add # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915
2024-10-18 15:36:49 +05:30
Ishaan Jaff
ece65164fb (refactor router.py ) - PR 3 - Ensure all functions under 100 lines (#6181)
* add flake 8 check

* split up litellm _acompletion

* fix get model client

* refactor use commong func to add metadata to kwargs

* use common func to get timeout

* re-use helper to _get_async_model_client

* use _handle_mock_testing_rate_limit_error

* fix docstring for _handle_mock_testing_rate_limit_error

* fix function_with_retries

* use helper for mock testing fallbacks

* router - use 1 func for simple_shuffle

* add doc string for simple_shuffle

* use 1 function for filtering cooldown deployments

* fix use common helper to _get_fallback_model_group_from_fallbacks
2024-10-14 21:27:54 +05:30
Ishaan Jaff
ba56e37244 (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208)
* use folder for caching

* fix importing caching

* fix clickhouse pyright

* fix linting

* fix correctly pass kwargs and args

* fix test case for embedding

* fix linting

* fix embedding caching logic

* fix refactor handle utils.py

* fix test_embedding_caching_azure_individual_items_reordered
2024-10-14 16:34:01 +05:30
Krish Dholakia
94a05ca5d0 Litellm ruff linting enforcement (#5992)
* ci(config.yml): add a 'check_code_quality' step

Addresses https://github.com/BerriAI/litellm/issues/5991

* ci(config.yml): check why circle ci doesn't pick up this test

* ci(config.yml): fix to run 'check_code_quality' tests

* fix(__init__.py): fix unprotected import

* fix(__init__.py): don't remove unused imports

* build(ruff.toml): update ruff.toml to ignore unused imports

* fix: fix: ruff + pyright - fix linting + type-checking errors

* fix: fix linting errors

* fix(lago.py): fix module init error

* fix: fix linting errors

* ci(config.yml): cd into correct dir for checks

* fix(proxy_server.py): fix linting error

* fix(utils.py): fix bare except

causes ruff linting errors

* fix: ruff - fix remaining linting errors

* fix(clickhouse.py): use standard logging object

* fix(__init__.py): fix unprotected import

* fix: ruff - fix linting errors

* fix: fix linting errors

* ci(config.yml): cleanup code qa step (formatting handled in local_testing)

* fix(_health_endpoints.py): fix ruff linting errors

* ci(config.yml): just use ruff in check_code_quality pipeline for now

* build(custom_guardrail.py): include missing file

* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krish Dholakia
f3fa2160a0 LiteLLM Minor Fixes & Improvements (09/21/2024) (#5819)
* fix(router.py): fix error message

* Litellm disable keys (#5814)

* build(schema.prisma): allow blocking/unblocking keys

Fixes https://github.com/BerriAI/litellm/issues/5328

* fix(key_management_endpoints.py): fix pop

* feat(auth_checks.py): allow admin to enable/disable virtual keys

Closes https://github.com/BerriAI/litellm/issues/5328

* docs(vertex.md): add auth section for vertex ai

Addresses - https://github.com/BerriAI/litellm/issues/5768#issuecomment-2365284223

* build(model_prices_and_context_window.json): show which models support prompt_caching

Closes https://github.com/BerriAI/litellm/issues/5776

* fix(router.py): allow setting default priority for requests

* fix(router.py): add 'retry-after' header for concurrent request limit errors

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(router.py): correctly raise and use retry-after header from azure+openai

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(user_api_key_auth.py): fix valid token being none

* fix(auth_checks.py): fix model dump for cache management object

* fix(user_api_key_auth.py): pass prisma_client to obj

* test(test_otel.py): update test for new key check

* test: fix test
2024-09-21 18:51:53 -07:00
Ishaan Jaff
55a4bd217f [Fix] Router/ Proxy - Tag Based routing, raise correct error when no deployments found and tag filtering is on (#5745)
* fix tag routing - raise correct error when no model with tag based routing

* fix error string from tag based routing

* test router tag based routing

* raise 401 error when no tags avialable for deploymen

* linting fix
2024-09-17 20:24:28 -07:00
Krish Dholakia
713d762411 LiteLLM Minor Fixes and Improvements (09/13/2024) (#5689)
* refactor: cleanup unused variables + fix pyright errors

* feat(health_check.py): Closes https://github.com/BerriAI/litellm/issues/5686

* fix(o1_reasoning.py): add stricter check for o-1 reasoning model

* refactor(mistral/): make it easier to see mistral transformation logic

* fix(openai.py): fix openai o-1 model param mapping

Fixes https://github.com/BerriAI/litellm/issues/5685

* feat(main.py): infer finetuned gemini model from base model

Fixes https://github.com/BerriAI/litellm/issues/5678

* docs(vertex.md): update docs to call finetuned gemini models

* feat(proxy_server.py): allow admin to hide proxy model aliases

Closes https://github.com/BerriAI/litellm/issues/5692

* docs(load_balancing.md): add docs on hiding alias models from proxy config

* fix(base.py): don't raise notimplemented error

* fix(user_api_key_auth.py): fix model max budget check

* fix(router.py): fix elif

* fix(user_api_key_auth.py): don't set team_id to empty str

* fix(team_endpoints.py): fix response type

* test(test_completion.py): handle predibase error

* test(test_proxy_server.py): fix test

* fix(o1_transformation.py): fix max_completion_token mapping

* test(test_image_generation.py): mark flaky test
2024-09-14 10:02:55 -07:00
Ishaan Jaff
83f3be6dc3 support default deployments 2024-09-09 14:23:17 -07:00
Ishaan Jaff
764b78349d fix taf based routing debugging 2024-09-09 14:11:54 -07:00
Ishaan Jaff
f49fdab804 fix debug statements 2024-09-09 14:00:17 -07:00
Ishaan Jaff
0bb30b3ee8 fix get_deployments_for_tag 2024-08-29 13:51:36 -07:00
Krrish Dholakia
2874b94fb1 refactor: replace .error() with .exception() logging for better debugging on sentry 2024-08-16 09:22:47 -07:00
Ishaan Jaff
d1a4246d2b control using enable_tag_filtering 2024-07-18 19:39:04 -07:00
Ishaan Jaff
cd40d58544 router - refactor to tag based routing 2024-07-18 19:22:09 -07:00
Ishaan Jaff
d4cad75d34 router - use free paid tier routing 2024-07-18 17:09:42 -07:00
Ishaan Jaff
957b8a1b8d helper to get_deployments_for_tier 2024-07-18 17:06:06 -07:00
Krrish Dholakia
e391e30285 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
bcc07afd04 fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event 2024-05-21 18:42:15 -07:00
Krrish Dholakia
f007bf7e21 feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krish Dholakia
c0e43a7296 Merge pull request #3412 from sumanth13131/usage-based-routing-ttl-on-cache
usage-based-routing-ttl-on-cache
2024-05-21 07:58:41 -07:00
Krrish Dholakia
84db63e3dd fix(lowest_latency.py): allow ttl to be a float 2024-05-15 09:59:21 -07:00
sumanth
4bbd9c866c addressed comments 2024-05-14 10:05:19 +05:30
SUMANTH
0db58c2fac Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache 2024-05-14 09:08:01 +05:30
Rahul Kataria
be4450106d Remove duplicate code in router_strategy 2024-05-12 18:05:57 +05:30
Krrish Dholakia
926b86af87 feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls 2024-05-11 13:43:08 -07:00
Krrish Dholakia
5f93cae3ff feat(proxy_server.py): return litellm version in response headers 2024-05-08 16:00:08 -07:00
Ishaan Jaff
3bc0b998b2 feat - make lowest_cost pure async 2024-05-07 13:51:50 -07:00
Ishaan Jaff
0f82d97202 fix allow user to pass input_cost and output_cost 2024-05-07 13:08:16 -07:00
Ishaan Jaff
a52ef20a40 test - lowest cost router 2024-05-07 13:04:12 -07:00
Ishaan Jaff
864512efd9 fix - default value for cost 2024-05-07 12:51:52 -07:00
Ishaan Jaff
a2304aa78b fix - lowest cost routing 2024-05-07 12:49:20 -07:00
Ishaan Jaff
98778f54e7 feat - add lowst cost router 2024-05-07 12:12:09 -07:00
Krrish Dholakia
cb88ed4df8 fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified) 2024-05-03 09:00:32 -07:00
sumanth
dce55bab76 usage-based-routing-ttl-on-cache 2024-05-03 10:50:45 +05:30