litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-25 10:44:24 +00:00

Author	SHA1	Message	Date
Krish Dholakia	b4e5c0de69	Improve rpm check on keys (#8301 ) * fix(parallel_request_limiter.py): initial commit that solves the rpm limit check on keys Fixes https://github.com/BerriAI/litellm/issues/6938 * fix(parallel_request_limiter.py): simpler approach - just increment RPM in pre call hook instead of on success * fix(parallel_request_limiter.py): pass testing * fix: fix linting error * fix(parallel_request_limiter.py): fix parallel request check for keys	2025-02-05 20:23:08 -08:00
Krish Dholakia	539f166166	Support budget/rate limit tiers for keys (#7429 ) * feat(proxy/utils.py): get associated litellm budget from db in combined_view for key allows user to create rate limit tiers and associate those to keys * feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set allows rate limit tiers to be easily applied to keys * docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers make feature discoverable * feat(key_management_endpoints.py): return litellm_budget_table value in key generate make it easy for user to know associated budget on key creation * fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate` * docs(key_management_endpoints.py): document budget_id usage * refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it * docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs * fix(customer_endpoints.py): use new pydantic obj name * docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm * Litellm dev 12 26 2024 p2 (#7432) * (Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426) * init commit ft jobs logging * add ft logging * add logging for FineTuningJob * simple FT Job create test * (docs) - show all supported Azure OpenAI endpoints in overview (#7428) * azure batches * update doc * docs azure endpoints * docs endpoints on azure * docs azure batches api * docs azure batches api * fix(key_management_endpoints.py): fix key update to actually work * test(test_key_management.py): add e2e test asserting ui key update call works * fix: proxy/_types - fix linting erros * test: update test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * fix: test * fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers * fix: fix linting errors * test: fix test * fix: remove unused import * test: update test * docs(customer_endpoints.py): document new model_max_budget param * test: specify unique key alias * docs(budget_management_endpoints.py): document new model_max_budget param * test: fix test * test: fix tests --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2024-12-26 19:05:27 -08:00
Ishaan Jaff	c7f14e936a	(code quality) run ruff rule to ban unused imports (#7313 ) * remove unused imports * fix AmazonConverseConfig * fix test * fix import * ruff check fixes * test fixes * fix testing * fix imports	2024-12-19 12:33:42 -08:00
Ishaan Jaff	3de32f4106	(minor fix proxy) Clarify Proxy Rate limit errors are showing hash of litellm virtual key (#7210 ) * fix clarify rate limit errors are showing litellm virtual key * fix constants.py * update test * fix test parallel limiter	2024-12-12 20:13:14 -08:00
Krish Dholakia	d88e8922d4	Litellm dev 11 02 2024 (#6561 ) * fix(dual_cache.py): update in-memory check for redis batch get cache Fixes latency delay for async_batch_redis_cache * fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set * feat(user_api_key_auth.py): add parent otel component for auth allows us to isolate how much latency is added by auth checks * perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task) reduces latency by 200ms * feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter) Reduces latency by 400-800ms * fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls reduces latency by 50-100ms * fix: fix linting error * fix(_service_logger.py): fix import * fix(user_api_key_auth.py): fix service logging * fix(dual_cache.py): don't pass 'self' * fix: fix python3.8 error * fix: fix init]	2024-11-04 07:48:20 +05:30
Ishaan Jaff	610974b4fc	(code quality) add ruff check PLR0915 for `too-many-statements` (#6309 ) * ruff add PLR0915 * add noqa for PLR0915 * fix noqa * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915	2024-10-18 15:36:49 +05:30
Krish Dholakia	39486e2003	Litellm dev 10 14 2024 (#6221 ) * fix(__init__.py): expose DualCache, RedisCache, InMemoryCache on root abstract internal file refactors from impacting users * feat(utils.py): handle invalid openai parallel tool calling response Fixes https://community.openai.com/t/model-tries-to-call-unknown-function-multi-tool-use-parallel/490653 * docs(bedrock.md): clarify all bedrock models are supported Closes https://github.com/BerriAI/litellm/issues/6168#issuecomment-2412082236	2024-10-14 22:11:14 -07:00
Ishaan Jaff	4d1b4beb3d	(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 ) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered	2024-10-14 16:34:01 +05:30
Krish Dholakia	d57be47b0f	Litellm ruff linting enforcement (#5992 ) * ci(config.yml): add a 'check_code_quality' step Addresses https://github.com/BerriAI/litellm/issues/5991 * ci(config.yml): check why circle ci doesn't pick up this test * ci(config.yml): fix to run 'check_code_quality' tests * fix(__init__.py): fix unprotected import * fix(__init__.py): don't remove unused imports * build(ruff.toml): update ruff.toml to ignore unused imports * fix: fix: ruff + pyright - fix linting + type-checking errors * fix: fix linting errors * fix(lago.py): fix module init error * fix: fix linting errors * ci(config.yml): cd into correct dir for checks * fix(proxy_server.py): fix linting error * fix(utils.py): fix bare except causes ruff linting errors * fix: ruff - fix remaining linting errors * fix(clickhouse.py): use standard logging object * fix(__init__.py): fix unprotected import * fix: ruff - fix linting errors * fix: fix linting errors * ci(config.yml): cleanup code qa step (formatting handled in local_testing) * fix(_health_endpoints.py): fix ruff linting errors * ci(config.yml): just use ruff in check_code_quality pipeline for now * build(custom_guardrail.py): include missing file * style(embedding_handler.py): fix ruff check	2024-10-01 19:44:20 -04:00
Krrish Dholakia	6c7d1d5c96	fix(parallel_request_limiter.py): only update hidden params, don't set new (can lead to errors for responses where attribute can't be set)	2024-09-28 21:08:15 -07:00
Krrish Dholakia	3f8a5b3ef6	fix(parallel_request_limiter.py): make sure hidden params is dict before dereferencing	2024-09-28 21:08:15 -07:00
Krrish Dholakia	5222fc8e1b	fix(parallel_request_limiter.py): return remaining tpm/rpm in openai-compatible way Fixes https://github.com/BerriAI/litellm/issues/5957	2024-09-28 21:08:15 -07:00
Ishaan Jaff	088d906276	fix use one async async_batch_set_cache (#5956 )	2024-09-28 09:59:38 -07:00
Krish Dholakia	0b30e212da	LiteLLM Minor Fixes & Improvements (09/27/2024) (#5938 ) * fix(langfuse.py): prevent double logging requester metadata Fixes https://github.com/BerriAI/litellm/issues/5935 * build(model_prices_and_context_window.json): add mistral pixtral cost tracking Closes https://github.com/BerriAI/litellm/issues/5837 * handle streaming for azure ai studio error * [Perf Proxy] parallel request limiter - use one cache update call (#5932) * fix parallel request limiter - use one cache update call * ci/cd run again * run ci/cd again * use docker username password * fix config.yml * fix config * fix config * fix config.yml * ci/cd run again * use correct typing for batch set cache * fix async_set_cache_pipeline * fix only check user id tpm / rpm limits when limits set * fix test_openai_azure_embedding_with_oidc_and_cf * fix(groq/chat/transformation.py): Fixes https://github.com/BerriAI/litellm/issues/5839 * feat(anthropic/chat.py): return 'retry-after' headers from anthropic Fixes https://github.com/BerriAI/litellm/issues/4387 * feat: raise validation error if message has tool calls without passing `tools` param for anthropic/bedrock Closes https://github.com/BerriAI/litellm/issues/5747 * [Feature]#5940, add max_workers parameter for the batch_completion (#5947) * handle streaming for azure ai studio error * bump: version 1.48.2 → 1.48.3 * docs(data_security.md): add legal/compliance faq's Make it easier for companies to use litellm * docs: resolve imports * [Feature]#5940, add max_workers parameter for the batch_completion method --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local> * fix(converse_transformation.py): fix default message value * fix(utils.py): fix get_model_info to handle finetuned models Fixes issue for standard logging payloads, where model_map_value was null for finetuned openai models * fix(litellm_pre_call_utils.py): add debug statement for data sent after updating with team/key callbacks * fix: fix linting errors * fix(anthropic/chat/handler.py): fix cache creation input tokens * fix(exception_mapping_utils.py): fix missing imports * fix(anthropic/chat/handler.py): fix usage block translation * test: fix test * test: fix tests * style(types/utils.py): trigger new build * test: fix test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Jose Alberto Arango Sanchez <jose.arangos@udea.edu.co> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>	2024-09-27 22:52:57 -07:00
Ishaan Jaff	f4613a100d	[Perf Proxy] parallel request limiter - use one cache update call (#5932 ) * fix parallel request limiter - use one cache update call * ci/cd run again * run ci/cd again * use docker username password * fix config.yml * fix config * fix config * fix config.yml * ci/cd run again * use correct typing for batch set cache * fix async_set_cache_pipeline * fix only check user id tpm / rpm limits when limits set * fix test_openai_azure_embedding_with_oidc_and_cf	2024-09-27 17:24:46 -07:00
Ishaan Jaff	58171f35ef	[Fix proxy perf] Use correct cache key when reading from redis cache (#5928 ) * fix parallel request limiter use correct user id * async def get_user_object( fix * use safe get_internal_user_object * fix store internal users in redis correctly	2024-09-26 18:13:35 -07:00
Ishaan Jaff	7cbcf538c6	[Feat] Improve OTEL Tracking - Require all Redis Cache reads to be logged on OTEL (#5881 ) * fix use previous internal usage caching logic * fix test_dual_cache_uses_redis * redis track event_metadata in service logging * show otel error on _get_parent_otel_span_from_kwargs * track parent otel span on internal usage cache * update_request_status * fix internal usage cache * fix linting * fix test internal usage cache * fix linting error * show event metadata in redis set * fix test_get_team_redis * fix test_get_team_redis * test_proxy_logging_setup	2024-09-25 10:57:08 -07:00
Krish Dholakia	dad1ad2077	LiteLLM Minor Fixes and Improvements (09/14/2024) (#5697 ) * fix(health_check.py): hide sensitive keys from health check debug information k * fix(route_llm_request.py): fix proxy model not found error message to indicate how to resolve issue * fix(vertex_llm_base.py): fix exception message to not log credentials	2024-09-14 10:32:39 -07:00
Ishaan Jaff	0e1d3804ff	refactor vertex endpoints to pass through all routes	2024-08-21 17:08:42 -07:00
Ishaan Jaff	398295116f	inly write model tpm/rpm tracking when user set it	2024-08-18 09:58:09 -07:00
Ishaan Jaff	fa96610bbc	fix async_pre_call_hook in parallel request limiter	2024-08-17 12:42:28 -07:00
Ishaan Jaff	feb8c3c5b4	Merge pull request #5259 from BerriAI/litellm_return_remaining_tokens_in_header [Feat] return `x-litellm-key-remaining-requests-{model}`: 1, `x-litellm-key-remaining-tokens-{model}: None` in response headers	2024-08-17 12:41:16 -07:00
Ishaan Jaff	ee0f772b5c	feat return rmng tokens for model for api key	2024-08-17 12:35:10 -07:00
Ishaan Jaff	5985c7e933	feat - use commong helper for getting model group	2024-08-17 10:46:04 -07:00
Ishaan Jaff	412d30d362	add litellm-key-remaining-tokens on prometheus	2024-08-17 10:02:20 -07:00
Ishaan Jaff	785482f023	feat add settings for rpm/tpm limits for a model	2024-08-17 09:16:01 -07:00
Ishaan Jaff	1ee33478c9	track rpm/tpm usage per key+model	2024-08-16 18:28:58 -07:00
Krrish Dholakia	61f4b71ef7	refactor: replace .error() with .exception() logging for better debugging on sentry	2024-08-16 09:22:47 -07:00
Krrish Dholakia	5d96ff6694	fix(utils.py): handle scenario where model="azure/*" and custom_llm_provider="azure" Fixes https://github.com/BerriAI/litellm/issues/4912	2024-08-02 17:48:53 -07:00
Ishaan Jaff	c4e4b4675c	fix raise better error when crossing tpm / rpm limits	2024-07-26 17:35:08 -07:00
Krrish Dholakia	07d90f6739	feat(aporio_ai.py): support aporio ai prompt injection for chat completion requests Closes https://github.com/BerriAI/litellm/issues/2950	2024-07-17 16:38:47 -07:00
Krrish Dholakia	fde434be66	feat(proxy_server.py): return 'retry-after' param for rate limited requests Closes https://github.com/BerriAI/litellm/issues/4695	2024-07-13 17:15:20 -07:00
Krrish Dholakia	0cc273d77b	feat(pass_through_endpoint.py): support enforcing key rpm limits on pass through endpoints Closes https://github.com/BerriAI/litellm/issues/4698	2024-07-13 13:29:44 -07:00
Krrish Dholakia	76c9b715f2	fix(parallel_request_limiter.py): use redis cache, if available for rate limiting across instances Fixes https://github.com/BerriAI/litellm/issues/4148	2024-06-12 10:35:48 -07:00
Krrish Dholakia	4408b717f0	fix(parallel_request_limiter.py): fix user+team tpm/rpm limit check Closes https://github.com/BerriAI/litellm/issues/3788	2024-05-27 08:48:23 -07:00
Ishaan Jaff	106910cecf	feat - add end user rate limiting	2024-05-22 14:01:57 -07:00
Krrish Dholakia	594ca947c8	fix(parallel_request_limiter.py): fix max parallel request limiter on retries	2024-05-15 20:16:11 -07:00
Krrish Dholakia	5a117490ec	fix(proxy_server.py): fix tpm/rpm limiting for jwt auth fixes tpm/rpm limiting for jwt auth and implements unit tests for jwt auth	2024-03-28 21:19:34 -07:00
Krrish Dholakia	7876aa2d75	fix(parallel_request_limiter.py): handle metadata being none	2024-03-14 10:02:41 -07:00
Krrish Dholakia	ad55f4dbb5	feat(proxy_server.py): retry if virtual key is rate limited currently for chat completions	2024-03-05 19:00:03 -08:00
Krrish Dholakia	b3574f2b37	fix(parallel_request_limiter.py): handle none scenario	2024-02-26 20:09:06 -08:00
Krrish Dholakia	f86ab19067	fix(parallel_request_limiter.py): fix team rate limit enforcement	2024-02-26 18:06:13 -08:00
Krrish Dholakia	f84ac35000	feat(parallel_request_limiter.py): enforce team based tpm / rpm limits	2024-02-26 16:20:41 -08:00
ishaan-jaff	a13243652f	(fix) failing parallel_Request_limiter test	2024-02-22 19:16:22 -08:00
ishaan-jaff	1fff8f8105	(fix) don't double check curr data and time	2024-02-22 18:50:02 -08:00
ishaan-jaff	b5900099af	(feat) tpm/rpm limit by User	2024-02-22 18:44:03 -08:00
Krrish Dholakia	b9393fb769	fix(test_parallel_request_limiter.py): use mock responses for streaming	2024-02-08 21:45:38 -08:00
ishaan-jaff	13fe72d6d5	(fix) parallel_request_limiter debug	2024-02-06 12:43:28 -08:00
Krrish Dholakia	92058cbcd4	fix(utils.py): override default success callbacks with dynamic callbacks if set	2024-02-02 06:21:43 -08:00
Krrish Dholakia	bbe71c8375	fix(test_parallel_request_limiter): increase time limit for waiting for success logging event to happen	2024-01-30 13:26:17 -08:00

1 2

62 commits