Ishaan Jaff
62a1cdec47
(code quality) run ruff rule to ban unused imports ( #7313 )
...
* remove unused imports
* fix AmazonConverseConfig
* fix test
* fix import
* ruff check fixes
* test fixes
* fix testing
* fix imports
2024-12-19 12:33:42 -08:00
Ishaan Jaff
6220e17ebf
(feat proxy) v2 - model max budgets ( #7302 )
...
* clean up unused code
* add _PROXY_VirtualKeyModelMaxBudgetLimiter
* adjust type imports
* working _PROXY_VirtualKeyModelMaxBudgetLimiter
* fix user_api_key_model_max_budget
* fix user_api_key_model_max_budget
* update naming
* update naming
* fix changes to RouterBudgetLimiting
* test_call_with_key_over_model_budget
* test_call_with_key_over_model_budget
* handle _get_request_model_budget_config
* e2e test for test_call_with_key_over_model_budget
* clean up test
* run ci/cd again
* add validate_model_max_budget
* docs fix
* update doc
* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter
* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
050499ec8f
Litellm dev readd prompt caching ( #7299 )
...
* fix(router.py): re-add saving model id on prompt caching valid successful deployment
* fix(router.py): introduce optional pre_call_checks
isolate prompt caching logic in a separate file
* fix(prompt_caching_deployment_check.py): fix import
* fix(router.py): new 'async_filter_deployments' event hook
allows custom logger to filter deployments returned to routing strategy
* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing
* fix(cooldown_callbacks.py): fix linting error
* fix(budget_limiter.py): move budget logger to async_filter_deployment hook
* test: add unit test
* test(test_router_helper_utils.py): add unit testing
* fix(budget_limiter.py): fix linting errors
* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Ishaan Jaff
533381d4ad
tag budgets fixes
2024-12-18 10:28:37 -08:00
Krish Dholakia
03e711e3e4
LITELLM: Remove requests
library usage ( #7235 )
...
* fix(generic_api_callback.py): remove requests lib usage
* fix(budget_manager.py): remove requests lib usgae
* fix(main.py): cleanup requests lib usage
* fix(utils.py): remove requests lib usage
* fix(argilla.py): fix argilla test
* fix(athina.py): replace 'requests' lib usage with litellm module
* fix(greenscale.py): replace 'requests' lib usage with httpx
* fix: remove unused 'requests' lib import + replace usage in some places
* fix(prompt_layer.py): remove 'requests' lib usage from prompt layer
* fix(ollama_chat.py): remove 'requests' lib usage
* fix(baseten.py): replace 'requests' lib usage
* fix(codestral/): replace 'requests' lib usage
* fix(predibase/): replace 'requests' lib usage
* refactor: cleanup unused 'requests' lib imports
* fix(oobabooga.py): cleanup 'requests' lib usage
* fix(invoke_handler.py): remove unused 'requests' lib usage
* refactor: cleanup unused 'requests' lib import
* fix: fix linting errors
* refactor(ollama/): move ollama to using base llm http handler
removes 'requests' lib dep for ollama integration
* fix(ollama_chat.py): fix linting errors
* fix(ollama/completion/transformation.py): convert non-jpeg/png image to jpeg/png before passing to ollama
2024-12-17 12:50:04 -08:00
Ishaan Jaff
2459f9735d
(feat) Add Tag-based budgets on litellm router / proxy ( #7236 )
...
* add BudgetConfig
* add _get_tags_from_request_kwargs
* test_tag_budgets_e2e_test_expect_to_fail
* add a check for request tags
* fix _async_get_cache_keys_for_router_budget_limiting
* fix test
* fix _sync_in_memory_spend_with_redis
* _async_get_cache_keys_for_router_budget_limiting
* fix _init_tag_budgets
* fix type casting
* docs show error for tag budget limit hit
* fix _get_tags_from_request_kwargs
* fix undo change
2024-12-14 17:28:36 -08:00
Ishaan Jaff
bc46916bb3
(feat - Router / Proxy ) Allow setting budget limits per LLM deployment ( #7220 )
...
* fix test_deployment_budget_limits_e2e_test
* refactor async_log_success_event to track spend for provider + deployment
* fix format
* rename class to RouterBudgetLimiting
* rename func
* rename types used for budgets
* add new types for deployment budgets
* add budget limits for deployments
* fix checking budgets set for provider
* update file names
* fix linting error
* _track_provider_remaining_budget_prometheus
* async_filter_deployments
* fix model list passed to router
* update error
* test_deployment_budgets_e2e_test_expect_to_fail
* fix test case
* run deployment budget limits
2024-12-13 19:15:51 -08:00
Ishaan Jaff
6a9225fac2
(Refactor) Code Quality improvement - stop redefining LiteLLMBase ( #7147 )
...
* fix stop redefining LiteLLMBase
* use better name for base pydantic obj
2024-12-10 15:49:01 -08:00
Ishaan Jaff
fc7a9830ab
Provider Budget Routing - Get Budget, Spend Details ( #7063 )
...
* add async_get_ttl to dual cache
* add ProviderBudgetResponse
* add provider_budgets
* test_redis_get_ttl
* _init_or_get_provider_budget_in_cache
* test_init_or_get_provider_budget_in_cache
* use _init_provider_budget_in_cache
* test_get_current_provider_budget_reset_at
* doc Get Budget, Spend Details
* doc Provider Budget Routing
2024-12-06 21:14:12 -08:00
Ishaan Jaff
e47ebefced
(feat) - provider budget improvements - ensure provider budgets work with multiple proxy instances + improve latency to ~90ms ( #6886 )
...
* use 1 file for duration_in_seconds
* add to readme.md
* re use duration_in_seconds
* fix importing _extract_from_regex, get_last_day_of_month
* fix import
* update provider budget routing
* fix - remove dup test
* add support for using in multi instance environments
* test_in_memory_redis_sync_e2e
* test_in_memory_redis_sync_e2e
* fix test_in_memory_redis_sync_e2e
* fix code quality check
* fix test provider budgets
* working provider budget tests
* add fixture for provider budget routing
* fix router testing for provider budgets
* add comments on provider budget routing
* use RedisPipelineIncrementOperation
* add redis async_increment_pipeline
* use redis async_increment_pipeline
* use lower value for testing
* use redis async_increment_pipeline
* use consistent key name for increment op
* add handling for budget windows
* fix typing async_increment_pipeline
* fix set attr
* add clear doc strings
* unit testing for provider budgets
* test_redis_increment_pipeline
2024-11-24 16:36:19 -08:00
Ishaan Jaff
72afed5b7e
(QOL improvement) Provider budget routing - allow using 1s, 1d, 1mo, 2mo etc ( #6885 )
...
* use 1 file for duration_in_seconds
* add to readme.md
* re use duration_in_seconds
* fix importing _extract_from_regex, get_last_day_of_month
* fix import
* update provider budget routing
* fix - remove dup test
2024-11-23 16:59:46 -08:00
Ishaan Jaff
64b46e32cf
(feat) provider budget routing improvements ( #6827 )
...
* minor fix for provider budget
* fix raise good error message when budget crossed for provider budget
* fix test provider budgets
* test provider budgets
* feat - emit llm provider spend on prometheus
* test_prometheus_metric_tracking
* doc provider budgets
2024-11-19 21:25:08 -08:00
Ishaan Jaff
ce6465c9df
(Feat) Add provider specific budget routing ( #6817 )
...
* add ProviderBudgetConfig
* working test_provider_budgets_e2e_test
* test_provider_budgets_e2e_test_expect_to_fail
* use 1 cache read for getting provider spend
* test_provider_budgets_e2e_test
* add doc on provider budgets
* clean up provider budgets
* unit testing for provider budget routing
* use as flag, not routing strat
* fix init provider budget routing
* use async_filter_deployments
* fix test provider budgets
* doc provider budget routing
* doc provider budget routing
* fix docs changes
* fix comment
2024-11-19 20:25:27 -08:00
Krish Dholakia
3c591167e0
fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check ( #6577 )
...
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566 )
Bumps [cookie](https://github.com/jshttp/cookie ) and [express](https://github.com/expressjs/express ). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases )
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1 )
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases )
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md )
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1 )
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554 )
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534 )
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588 )
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573 )
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
2024-11-05 22:03:44 +05:30
Ishaan Jaff
5986f7457e
(router_strategy/) ensure all async functions use async cache methods ( #6489 )
...
* fix router strat
* use async set / get cache in router_strategy
* add coverage for router strategy
* fix imports
* fix batch_get_cache
* use async methods for least busy
* fix least busy use async methods
* fix test_dual_cache_increment
* test async_get_available_deployment when routing_strategy="least-busy"
2024-10-29 21:07:17 +05:30
Krish Dholakia
e712a2090b
redis otel tracing + async support for latency routing ( #6452 )
...
* docs(exception_mapping.md): add missing exception types
Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183
* fix(main.py): register custom model pricing with specific key
Ensure custom model pricing is registered to the specific model+provider key combination
* test: make testing more robust for custom pricing
* fix(redis_cache.py): instrument otel logging for sync redis calls
ensures complete coverage for all redis cache calls
* refactor: pass parent_otel_span for redis caching calls in router
allows for more observability into what calls are causing latency issues
* test: update tests with new params
* refactor: ensure e2e otel tracing for router
* refactor(router.py): add more otel tracing acrosss router
catch all latency issues for router requests
* fix: fix linting error
* fix(router.py): fix linting error
* fix: fix test
* test: fix tests
* fix(dual_cache.py): pass ttl to redis cache
* fix: fix param
2024-10-28 21:52:12 -07:00
Ishaan Jaff
0c5a47c404
(code quality) add ruff check PLR0915 for too-many-statements
( #6309 )
...
* ruff add PLR0915
* add noqa for PLR0915
* fix noqa
* add # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* add # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
2024-10-18 15:36:49 +05:30
Ishaan Jaff
ece65164fb
(refactor router.py ) - PR 3 - Ensure all functions under 100 lines ( #6181 )
...
* add flake 8 check
* split up litellm _acompletion
* fix get model client
* refactor use commong func to add metadata to kwargs
* use common func to get timeout
* re-use helper to _get_async_model_client
* use _handle_mock_testing_rate_limit_error
* fix docstring for _handle_mock_testing_rate_limit_error
* fix function_with_retries
* use helper for mock testing fallbacks
* router - use 1 func for simple_shuffle
* add doc string for simple_shuffle
* use 1 function for filtering cooldown deployments
* fix use common helper to _get_fallback_model_group_from_fallbacks
2024-10-14 21:27:54 +05:30
Ishaan Jaff
ba56e37244
(refactor) caching use LLMCachingHandler for async_get_cache and set_cache ( #6208 )
...
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* fix test_embedding_caching_azure_individual_items_reordered
2024-10-14 16:34:01 +05:30
Krish Dholakia
94a05ca5d0
Litellm ruff linting enforcement ( #5992 )
...
* ci(config.yml): add a 'check_code_quality' step
Addresses https://github.com/BerriAI/litellm/issues/5991
* ci(config.yml): check why circle ci doesn't pick up this test
* ci(config.yml): fix to run 'check_code_quality' tests
* fix(__init__.py): fix unprotected import
* fix(__init__.py): don't remove unused imports
* build(ruff.toml): update ruff.toml to ignore unused imports
* fix: fix: ruff + pyright - fix linting + type-checking errors
* fix: fix linting errors
* fix(lago.py): fix module init error
* fix: fix linting errors
* ci(config.yml): cd into correct dir for checks
* fix(proxy_server.py): fix linting error
* fix(utils.py): fix bare except
causes ruff linting errors
* fix: ruff - fix remaining linting errors
* fix(clickhouse.py): use standard logging object
* fix(__init__.py): fix unprotected import
* fix: ruff - fix linting errors
* fix: fix linting errors
* ci(config.yml): cleanup code qa step (formatting handled in local_testing)
* fix(_health_endpoints.py): fix ruff linting errors
* ci(config.yml): just use ruff in check_code_quality pipeline for now
* build(custom_guardrail.py): include missing file
* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krish Dholakia
f3fa2160a0
LiteLLM Minor Fixes & Improvements (09/21/2024) ( #5819 )
...
* fix(router.py): fix error message
* Litellm disable keys (#5814 )
* build(schema.prisma): allow blocking/unblocking keys
Fixes https://github.com/BerriAI/litellm/issues/5328
* fix(key_management_endpoints.py): fix pop
* feat(auth_checks.py): allow admin to enable/disable virtual keys
Closes https://github.com/BerriAI/litellm/issues/5328
* docs(vertex.md): add auth section for vertex ai
Addresses - https://github.com/BerriAI/litellm/issues/5768#issuecomment-2365284223
* build(model_prices_and_context_window.json): show which models support prompt_caching
Closes https://github.com/BerriAI/litellm/issues/5776
* fix(router.py): allow setting default priority for requests
* fix(router.py): add 'retry-after' header for concurrent request limit errors
Fixes https://github.com/BerriAI/litellm/issues/5783
* fix(router.py): correctly raise and use retry-after header from azure+openai
Fixes https://github.com/BerriAI/litellm/issues/5783
* fix(user_api_key_auth.py): fix valid token being none
* fix(auth_checks.py): fix model dump for cache management object
* fix(user_api_key_auth.py): pass prisma_client to obj
* test(test_otel.py): update test for new key check
* test: fix test
2024-09-21 18:51:53 -07:00
Ishaan Jaff
55a4bd217f
[Fix] Router/ Proxy - Tag Based routing, raise correct error when no deployments found and tag filtering is on ( #5745 )
...
* fix tag routing - raise correct error when no model with tag based routing
* fix error string from tag based routing
* test router tag based routing
* raise 401 error when no tags avialable for deploymen
* linting fix
2024-09-17 20:24:28 -07:00
Krish Dholakia
713d762411
LiteLLM Minor Fixes and Improvements (09/13/2024) ( #5689 )
...
* refactor: cleanup unused variables + fix pyright errors
* feat(health_check.py): Closes https://github.com/BerriAI/litellm/issues/5686
* fix(o1_reasoning.py): add stricter check for o-1 reasoning model
* refactor(mistral/): make it easier to see mistral transformation logic
* fix(openai.py): fix openai o-1 model param mapping
Fixes https://github.com/BerriAI/litellm/issues/5685
* feat(main.py): infer finetuned gemini model from base model
Fixes https://github.com/BerriAI/litellm/issues/5678
* docs(vertex.md): update docs to call finetuned gemini models
* feat(proxy_server.py): allow admin to hide proxy model aliases
Closes https://github.com/BerriAI/litellm/issues/5692
* docs(load_balancing.md): add docs on hiding alias models from proxy config
* fix(base.py): don't raise notimplemented error
* fix(user_api_key_auth.py): fix model max budget check
* fix(router.py): fix elif
* fix(user_api_key_auth.py): don't set team_id to empty str
* fix(team_endpoints.py): fix response type
* test(test_completion.py): handle predibase error
* test(test_proxy_server.py): fix test
* fix(o1_transformation.py): fix max_completion_token mapping
* test(test_image_generation.py): mark flaky test
2024-09-14 10:02:55 -07:00
Ishaan Jaff
83f3be6dc3
support default deployments
2024-09-09 14:23:17 -07:00
Ishaan Jaff
764b78349d
fix taf based routing debugging
2024-09-09 14:11:54 -07:00
Ishaan Jaff
f49fdab804
fix debug statements
2024-09-09 14:00:17 -07:00
Ishaan Jaff
0bb30b3ee8
fix get_deployments_for_tag
2024-08-29 13:51:36 -07:00
Krrish Dholakia
2874b94fb1
refactor: replace .error() with .exception() logging for better debugging on sentry
2024-08-16 09:22:47 -07:00
Ishaan Jaff
d1a4246d2b
control using enable_tag_filtering
2024-07-18 19:39:04 -07:00
Ishaan Jaff
cd40d58544
router - refactor to tag based routing
2024-07-18 19:22:09 -07:00
Ishaan Jaff
d4cad75d34
router - use free paid tier routing
2024-07-18 17:09:42 -07:00
Ishaan Jaff
957b8a1b8d
helper to get_deployments_for_tier
2024-07-18 17:06:06 -07:00
Krrish Dholakia
e391e30285
refactor: replace 'traceback.print_exc()' with logging library
...
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
bcc07afd04
fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event
2024-05-21 18:42:15 -07:00
Krrish Dholakia
f007bf7e21
feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
...
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krish Dholakia
c0e43a7296
Merge pull request #3412 from sumanth13131/usage-based-routing-ttl-on-cache
...
usage-based-routing-ttl-on-cache
2024-05-21 07:58:41 -07:00
Krrish Dholakia
84db63e3dd
fix(lowest_latency.py): allow ttl to be a float
2024-05-15 09:59:21 -07:00
sumanth
4bbd9c866c
addressed comments
2024-05-14 10:05:19 +05:30
SUMANTH
0db58c2fac
Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache
2024-05-14 09:08:01 +05:30
Rahul Kataria
be4450106d
Remove duplicate code in router_strategy
2024-05-12 18:05:57 +05:30
Krrish Dholakia
926b86af87
feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls
2024-05-11 13:43:08 -07:00
Krrish Dholakia
5f93cae3ff
feat(proxy_server.py): return litellm version in response headers
2024-05-08 16:00:08 -07:00
Ishaan Jaff
3bc0b998b2
feat - make lowest_cost pure async
2024-05-07 13:51:50 -07:00
Ishaan Jaff
0f82d97202
fix allow user to pass input_cost and output_cost
2024-05-07 13:08:16 -07:00
Ishaan Jaff
a52ef20a40
test - lowest cost router
2024-05-07 13:04:12 -07:00
Ishaan Jaff
864512efd9
fix - default value for cost
2024-05-07 12:51:52 -07:00
Ishaan Jaff
a2304aa78b
fix - lowest cost routing
2024-05-07 12:49:20 -07:00
Ishaan Jaff
98778f54e7
feat - add lowst cost router
2024-05-07 12:12:09 -07:00
Krrish Dholakia
cb88ed4df8
fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)
2024-05-03 09:00:32 -07:00
sumanth
dce55bab76
usage-based-routing-ttl-on-cache
2024-05-03 10:50:45 +05:30