litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-27 19:54:13 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	62a1cdec47	(code quality) run ruff rule to ban unused imports (#7313 ) * remove unused imports * fix AmazonConverseConfig * fix test * fix import * ruff check fixes * test fixes * fix testing * fix imports	2024-12-19 12:33:42 -08:00
Ishaan Jaff	6220e17ebf	(feat proxy) v2 - model max budgets (#7302 ) * clean up unused code * add _PROXY_VirtualKeyModelMaxBudgetLimiter * adjust type imports * working _PROXY_VirtualKeyModelMaxBudgetLimiter * fix user_api_key_model_max_budget * fix user_api_key_model_max_budget * update naming * update naming * fix changes to RouterBudgetLimiting * test_call_with_key_over_model_budget * test_call_with_key_over_model_budget * handle _get_request_model_budget_config * e2e test for test_call_with_key_over_model_budget * clean up test * run ci/cd again * add validate_model_max_budget * docs fix * update doc * add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter * test_unit_test_max_model_budget_limiter.py	2024-12-18 19:42:46 -08:00
Krish Dholakia	050499ec8f	Litellm dev readd prompt caching (#7299 ) * fix(router.py): re-add saving model id on prompt caching valid successful deployment * fix(router.py): introduce optional pre_call_checks isolate prompt caching logic in a separate file * fix(prompt_caching_deployment_check.py): fix import * fix(router.py): new 'async_filter_deployments' event hook allows custom logger to filter deployments returned to routing strategy * feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing * fix(cooldown_callbacks.py): fix linting error * fix(budget_limiter.py): move budget logger to async_filter_deployment hook * test: add unit test * test(test_router_helper_utils.py): add unit testing * fix(budget_limiter.py): fix linting errors * docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs	2024-12-18 15:13:49 -08:00
Ishaan Jaff	533381d4ad	tag budgets fixes	2024-12-18 10:28:37 -08:00
Krish Dholakia	03e711e3e4	LITELLM: Remove `requests` library usage (#7235 ) * fix(generic_api_callback.py): remove requests lib usage * fix(budget_manager.py): remove requests lib usgae * fix(main.py): cleanup requests lib usage * fix(utils.py): remove requests lib usage * fix(argilla.py): fix argilla test * fix(athina.py): replace 'requests' lib usage with litellm module * fix(greenscale.py): replace 'requests' lib usage with httpx * fix: remove unused 'requests' lib import + replace usage in some places * fix(prompt_layer.py): remove 'requests' lib usage from prompt layer * fix(ollama_chat.py): remove 'requests' lib usage * fix(baseten.py): replace 'requests' lib usage * fix(codestral/): replace 'requests' lib usage * fix(predibase/): replace 'requests' lib usage * refactor: cleanup unused 'requests' lib imports * fix(oobabooga.py): cleanup 'requests' lib usage * fix(invoke_handler.py): remove unused 'requests' lib usage * refactor: cleanup unused 'requests' lib import * fix: fix linting errors * refactor(ollama/): move ollama to using base llm http handler removes 'requests' lib dep for ollama integration * fix(ollama_chat.py): fix linting errors * fix(ollama/completion/transformation.py): convert non-jpeg/png image to jpeg/png before passing to ollama	2024-12-17 12:50:04 -08:00
Ishaan Jaff	2459f9735d	(feat) Add Tag-based budgets on litellm router / proxy (#7236 ) * add BudgetConfig * add _get_tags_from_request_kwargs * test_tag_budgets_e2e_test_expect_to_fail * add a check for request tags * fix _async_get_cache_keys_for_router_budget_limiting * fix test * fix _sync_in_memory_spend_with_redis * _async_get_cache_keys_for_router_budget_limiting * fix _init_tag_budgets * fix type casting * docs show error for tag budget limit hit * fix _get_tags_from_request_kwargs * fix undo change	2024-12-14 17:28:36 -08:00
Ishaan Jaff	bc46916bb3	(feat - Router / Proxy ) Allow setting budget limits per LLM deployment (#7220 ) * fix test_deployment_budget_limits_e2e_test * refactor async_log_success_event to track spend for provider + deployment * fix format * rename class to RouterBudgetLimiting * rename func * rename types used for budgets * add new types for deployment budgets * add budget limits for deployments * fix checking budgets set for provider * update file names * fix linting error * _track_provider_remaining_budget_prometheus * async_filter_deployments * fix model list passed to router * update error * test_deployment_budgets_e2e_test_expect_to_fail * fix test case * run deployment budget limits	2024-12-13 19:15:51 -08:00
Ishaan Jaff	6a9225fac2	(Refactor) Code Quality improvement - stop redefining LiteLLMBase (#7147 ) * fix stop redefining LiteLLMBase * use better name for base pydantic obj	2024-12-10 15:49:01 -08:00
Ishaan Jaff	fc7a9830ab	Provider Budget Routing - Get Budget, Spend Details (#7063 ) * add async_get_ttl to dual cache * add ProviderBudgetResponse * add provider_budgets * test_redis_get_ttl * _init_or_get_provider_budget_in_cache * test_init_or_get_provider_budget_in_cache * use _init_provider_budget_in_cache * test_get_current_provider_budget_reset_at * doc Get Budget, Spend Details * doc Provider Budget Routing	2024-12-06 21:14:12 -08:00
Ishaan Jaff	e47ebefced	(feat) - provider budget improvements - ensure provider budgets work with multiple proxy instances + improve latency to ~90ms (#6886 ) * use 1 file for duration_in_seconds * add to readme.md * re use duration_in_seconds * fix importing _extract_from_regex, get_last_day_of_month * fix import * update provider budget routing * fix - remove dup test * add support for using in multi instance environments * test_in_memory_redis_sync_e2e * test_in_memory_redis_sync_e2e * fix test_in_memory_redis_sync_e2e * fix code quality check * fix test provider budgets * working provider budget tests * add fixture for provider budget routing * fix router testing for provider budgets * add comments on provider budget routing * use RedisPipelineIncrementOperation * add redis async_increment_pipeline * use redis async_increment_pipeline * use lower value for testing * use redis async_increment_pipeline * use consistent key name for increment op * add handling for budget windows * fix typing async_increment_pipeline * fix set attr * add clear doc strings * unit testing for provider budgets * test_redis_increment_pipeline	2024-11-24 16:36:19 -08:00
Ishaan Jaff	72afed5b7e	(QOL improvement) Provider budget routing - allow using 1s, 1d, 1mo, 2mo etc (#6885 ) * use 1 file for duration_in_seconds * add to readme.md * re use duration_in_seconds * fix importing _extract_from_regex, get_last_day_of_month * fix import * update provider budget routing * fix - remove dup test	2024-11-23 16:59:46 -08:00
Ishaan Jaff	64b46e32cf	(feat) provider budget routing improvements (#6827 ) * minor fix for provider budget * fix raise good error message when budget crossed for provider budget * fix test provider budgets * test provider budgets * feat - emit llm provider spend on prometheus * test_prometheus_metric_tracking * doc provider budgets	2024-11-19 21:25:08 -08:00
Ishaan Jaff	ce6465c9df	(Feat) Add provider specific budget routing (#6817 ) * add ProviderBudgetConfig * working test_provider_budgets_e2e_test * test_provider_budgets_e2e_test_expect_to_fail * use 1 cache read for getting provider spend * test_provider_budgets_e2e_test * add doc on provider budgets * clean up provider budgets * unit testing for provider budget routing * use as flag, not routing strat * fix init provider budget routing * use async_filter_deployments * fix test provider budgets * doc provider budget routing * doc provider budget routing * fix docs changes * fix comment	2024-11-19 20:25:27 -08:00
Krish Dholakia	3c591167e0	fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577 ) * fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check * fix(lowest_tpm_rpm_v2.py): return headers in correct format * test: update test * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * test: remove eol model * fix(proxy_server.py): fix db config loading logic * fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten * test: skip test if required env var is missing * test: fix test --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>	2024-11-05 22:03:44 +05:30
Ishaan Jaff	5986f7457e	(router_strategy/) ensure all async functions use async cache methods (#6489 ) * fix router strat * use async set / get cache in router_strategy * add coverage for router strategy * fix imports * fix batch_get_cache * use async methods for least busy * fix least busy use async methods * fix test_dual_cache_increment * test async_get_available_deployment when routing_strategy="least-busy"	2024-10-29 21:07:17 +05:30
Krish Dholakia	e712a2090b	redis otel tracing + async support for latency routing (#6452 ) * docs(exception_mapping.md): add missing exception types Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183 * fix(main.py): register custom model pricing with specific key Ensure custom model pricing is registered to the specific model+provider key combination * test: make testing more robust for custom pricing * fix(redis_cache.py): instrument otel logging for sync redis calls ensures complete coverage for all redis cache calls * refactor: pass parent_otel_span for redis caching calls in router allows for more observability into what calls are causing latency issues * test: update tests with new params * refactor: ensure e2e otel tracing for router * refactor(router.py): add more otel tracing acrosss router catch all latency issues for router requests * fix: fix linting error * fix(router.py): fix linting error * fix: fix test * test: fix tests * fix(dual_cache.py): pass ttl to redis cache * fix: fix param	2024-10-28 21:52:12 -07:00
Ishaan Jaff	0c5a47c404	(code quality) add ruff check PLR0915 for `too-many-statements` (#6309 ) * ruff add PLR0915 * add noqa for PLR0915 * fix noqa * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915	2024-10-18 15:36:49 +05:30
Ishaan Jaff	ece65164fb	(refactor router.py ) - PR 3 - Ensure all functions under 100 lines (#6181 ) * add flake 8 check * split up litellm _acompletion * fix get model client * refactor use commong func to add metadata to kwargs * use common func to get timeout * re-use helper to _get_async_model_client * use _handle_mock_testing_rate_limit_error * fix docstring for _handle_mock_testing_rate_limit_error * fix function_with_retries * use helper for mock testing fallbacks * router - use 1 func for simple_shuffle * add doc string for simple_shuffle * use 1 function for filtering cooldown deployments * fix use common helper to _get_fallback_model_group_from_fallbacks	2024-10-14 21:27:54 +05:30
Ishaan Jaff	ba56e37244	(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 ) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered	2024-10-14 16:34:01 +05:30
Krish Dholakia	94a05ca5d0	Litellm ruff linting enforcement (#5992 ) * ci(config.yml): add a 'check_code_quality' step Addresses https://github.com/BerriAI/litellm/issues/5991 * ci(config.yml): check why circle ci doesn't pick up this test * ci(config.yml): fix to run 'check_code_quality' tests * fix(__init__.py): fix unprotected import * fix(__init__.py): don't remove unused imports * build(ruff.toml): update ruff.toml to ignore unused imports * fix: fix: ruff + pyright - fix linting + type-checking errors * fix: fix linting errors * fix(lago.py): fix module init error * fix: fix linting errors * ci(config.yml): cd into correct dir for checks * fix(proxy_server.py): fix linting error * fix(utils.py): fix bare except causes ruff linting errors * fix: ruff - fix remaining linting errors * fix(clickhouse.py): use standard logging object * fix(__init__.py): fix unprotected import * fix: ruff - fix linting errors * fix: fix linting errors * ci(config.yml): cleanup code qa step (formatting handled in local_testing) * fix(_health_endpoints.py): fix ruff linting errors * ci(config.yml): just use ruff in check_code_quality pipeline for now * build(custom_guardrail.py): include missing file * style(embedding_handler.py): fix ruff check	2024-10-01 19:44:20 -04:00
Krish Dholakia	f3fa2160a0	LiteLLM Minor Fixes & Improvements (09/21/2024) (#5819 ) * fix(router.py): fix error message * Litellm disable keys (#5814) * build(schema.prisma): allow blocking/unblocking keys Fixes https://github.com/BerriAI/litellm/issues/5328 * fix(key_management_endpoints.py): fix pop * feat(auth_checks.py): allow admin to enable/disable virtual keys Closes https://github.com/BerriAI/litellm/issues/5328 * docs(vertex.md): add auth section for vertex ai Addresses - https://github.com/BerriAI/litellm/issues/5768#issuecomment-2365284223 * build(model_prices_and_context_window.json): show which models support prompt_caching Closes https://github.com/BerriAI/litellm/issues/5776 * fix(router.py): allow setting default priority for requests * fix(router.py): add 'retry-after' header for concurrent request limit errors Fixes https://github.com/BerriAI/litellm/issues/5783 * fix(router.py): correctly raise and use retry-after header from azure+openai Fixes https://github.com/BerriAI/litellm/issues/5783 * fix(user_api_key_auth.py): fix valid token being none * fix(auth_checks.py): fix model dump for cache management object * fix(user_api_key_auth.py): pass prisma_client to obj * test(test_otel.py): update test for new key check * test: fix test	2024-09-21 18:51:53 -07:00
Ishaan Jaff	55a4bd217f	[Fix] Router/ Proxy - Tag Based routing, raise correct error when no deployments found and tag filtering is on (#5745 ) * fix tag routing - raise correct error when no model with tag based routing * fix error string from tag based routing * test router tag based routing * raise 401 error when no tags avialable for deploymen * linting fix	2024-09-17 20:24:28 -07:00
Krish Dholakia	713d762411	LiteLLM Minor Fixes and Improvements (09/13/2024) (#5689 ) * refactor: cleanup unused variables + fix pyright errors * feat(health_check.py): Closes https://github.com/BerriAI/litellm/issues/5686 * fix(o1_reasoning.py): add stricter check for o-1 reasoning model * refactor(mistral/): make it easier to see mistral transformation logic * fix(openai.py): fix openai o-1 model param mapping Fixes https://github.com/BerriAI/litellm/issues/5685 * feat(main.py): infer finetuned gemini model from base model Fixes https://github.com/BerriAI/litellm/issues/5678 * docs(vertex.md): update docs to call finetuned gemini models * feat(proxy_server.py): allow admin to hide proxy model aliases Closes https://github.com/BerriAI/litellm/issues/5692 * docs(load_balancing.md): add docs on hiding alias models from proxy config * fix(base.py): don't raise notimplemented error * fix(user_api_key_auth.py): fix model max budget check * fix(router.py): fix elif * fix(user_api_key_auth.py): don't set team_id to empty str * fix(team_endpoints.py): fix response type * test(test_completion.py): handle predibase error * test(test_proxy_server.py): fix test * fix(o1_transformation.py): fix max_completion_token mapping * test(test_image_generation.py): mark flaky test	2024-09-14 10:02:55 -07:00
Ishaan Jaff	83f3be6dc3	support default deployments	2024-09-09 14:23:17 -07:00
Ishaan Jaff	764b78349d	fix taf based routing debugging	2024-09-09 14:11:54 -07:00
Ishaan Jaff	f49fdab804	fix debug statements	2024-09-09 14:00:17 -07:00
Ishaan Jaff	0bb30b3ee8	fix get_deployments_for_tag	2024-08-29 13:51:36 -07:00
Krrish Dholakia	2874b94fb1	refactor: replace .error() with .exception() logging for better debugging on sentry	2024-08-16 09:22:47 -07:00
Ishaan Jaff	d1a4246d2b	control using enable_tag_filtering	2024-07-18 19:39:04 -07:00
Ishaan Jaff	cd40d58544	router - refactor to tag based routing	2024-07-18 19:22:09 -07:00
Ishaan Jaff	d4cad75d34	router - use free paid tier routing	2024-07-18 17:09:42 -07:00
Ishaan Jaff	957b8a1b8d	helper to get_deployments_for_tier	2024-07-18 17:06:06 -07:00
Krrish Dholakia	e391e30285	refactor: replace 'traceback.print_exc()' with logging library allows error logs to be in json format for otel logging	2024-06-06 13:47:43 -07:00
Krrish Dholakia	bcc07afd04	fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event	2024-05-21 18:42:15 -07:00
Krrish Dholakia	f007bf7e21	feat(lowest_latency.py): route by time to first token, for streaming requests (if available) Closes https://github.com/BerriAI/litellm/issues/3574	2024-05-21 13:08:17 -07:00
Krish Dholakia	c0e43a7296	Merge pull request #3412 from sumanth13131/usage-based-routing-ttl-on-cache usage-based-routing-ttl-on-cache	2024-05-21 07:58:41 -07:00
Krrish Dholakia	84db63e3dd	fix(lowest_latency.py): allow ttl to be a float	2024-05-15 09:59:21 -07:00
sumanth	4bbd9c866c	addressed comments	2024-05-14 10:05:19 +05:30
SUMANTH	0db58c2fac	Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache	2024-05-14 09:08:01 +05:30
Rahul Kataria	be4450106d	Remove duplicate code in router_strategy	2024-05-12 18:05:57 +05:30
Krrish Dholakia	926b86af87	feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls	2024-05-11 13:43:08 -07:00
Krrish Dholakia	5f93cae3ff	feat(proxy_server.py): return litellm version in response headers	2024-05-08 16:00:08 -07:00
Ishaan Jaff	3bc0b998b2	feat - make lowest_cost pure async	2024-05-07 13:51:50 -07:00
Ishaan Jaff	0f82d97202	fix allow user to pass input_cost and output_cost	2024-05-07 13:08:16 -07:00
Ishaan Jaff	a52ef20a40	test - lowest cost router	2024-05-07 13:04:12 -07:00
Ishaan Jaff	864512efd9	fix - default value for cost	2024-05-07 12:51:52 -07:00
Ishaan Jaff	a2304aa78b	fix - lowest cost routing	2024-05-07 12:49:20 -07:00
Ishaan Jaff	98778f54e7	feat - add lowst cost router	2024-05-07 12:12:09 -07:00
Krrish Dholakia	cb88ed4df8	fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)	2024-05-03 09:00:32 -07:00
sumanth	dce55bab76	usage-based-routing-ttl-on-cache	2024-05-03 10:50:45 +05:30

1 2 3

112 commits