litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-26 03:04:13 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	41aade2cc0	(feat) Use `litellm/` prefix when storing virtual keys in AWS secret manager (#6765 ) * fix - storing AWS keys in secret manager * fix test_key_generate_with_secret_manager_call * allow using prefix_for_stored_virtual_keys * add prefix_for_stored_virtual_keys * test_key_generate_with_secret_manager_call	2024-11-15 18:07:43 -08:00
Ishaan Jaff	f8e700064e	(Feat) Add support for storing virtual keys in AWS SecretManager (#6728 ) * add SecretManager to httpxSpecialProvider * fix importing AWSSecretsManagerV2 * add unit testing for writing keys to AWS secret manager * use KeyManagementEventHooks for key/generated events * us event hooks for key management endpoints * working AWSSecretsManagerV2 * fix write secret to AWS secret manager on /key/generate * fix KeyManagementSettings * use tasks for key management hooks * add async_delete_secret * add test for async_delete_secret * use _delete_virtual_keys_from_secret_manager * fix test secret manager * test_key_generate_with_secret_manager_call * fix check for key_management_settings * sync_read_secret * test_aws_secret_manager * fix sync_read_secret * use helper to check when _should_read_secret_from_secret_manager * test_get_secret_with_access_mode * test - handle eol model claude-2, use claude-2.1 instead * docs AWS secret manager * fix test_read_nonexistent_secret * fix test_supports_response_schema * ci/cd run again	2024-11-14 09:25:07 -08:00
Krish Dholakia	d88e8922d4	Litellm dev 11 02 2024 (#6561 ) * fix(dual_cache.py): update in-memory check for redis batch get cache Fixes latency delay for async_batch_redis_cache * fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set * feat(user_api_key_auth.py): add parent otel component for auth allows us to isolate how much latency is added by auth checks * perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task) reduces latency by 200ms * feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter) Reduces latency by 400-800ms * fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls reduces latency by 50-100ms * fix: fix linting error * fix(_service_logger.py): fix import * fix(user_api_key_auth.py): fix service logging * fix(dual_cache.py): don't pass 'self' * fix: fix python3.8 error * fix: fix init]	2024-11-04 07:48:20 +05:30
Krish Dholakia	4f8a3fd4cf	redis otel tracing + async support for latency routing (#6452 ) * docs(exception_mapping.md): add missing exception types Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183 * fix(main.py): register custom model pricing with specific key Ensure custom model pricing is registered to the specific model+provider key combination * test: make testing more robust for custom pricing * fix(redis_cache.py): instrument otel logging for sync redis calls ensures complete coverage for all redis cache calls * refactor: pass parent_otel_span for redis caching calls in router allows for more observability into what calls are causing latency issues * test: update tests with new params * refactor: ensure e2e otel tracing for router * refactor(router.py): add more otel tracing acrosss router catch all latency issues for router requests * fix: fix linting error * fix(router.py): fix linting error * fix: fix test * test: fix tests * fix(dual_cache.py): pass ttl to redis cache * fix: fix param	2024-10-28 21:52:12 -07:00
Ishaan Jaff	610974b4fc	(code quality) add ruff check PLR0915 for `too-many-statements` (#6309 ) * ruff add PLR0915 * add noqa for PLR0915 * fix noqa * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * add # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915 * # noqa: PLR0915	2024-10-18 15:36:49 +05:30
Krish Dholakia	39486e2003	Litellm dev 10 14 2024 (#6221 ) * fix(__init__.py): expose DualCache, RedisCache, InMemoryCache on root abstract internal file refactors from impacting users * feat(utils.py): handle invalid openai parallel tool calling response Fixes https://community.openai.com/t/model-tries-to-call-unknown-function-multi-tool-use-parallel/490653 * docs(bedrock.md): clarify all bedrock models are supported Closes https://github.com/BerriAI/litellm/issues/6168#issuecomment-2412082236	2024-10-14 22:11:14 -07:00
Ishaan Jaff	4d1b4beb3d	(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 ) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered	2024-10-14 16:34:01 +05:30
Krish Dholakia	d57be47b0f	Litellm ruff linting enforcement (#5992 ) * ci(config.yml): add a 'check_code_quality' step Addresses https://github.com/BerriAI/litellm/issues/5991 * ci(config.yml): check why circle ci doesn't pick up this test * ci(config.yml): fix to run 'check_code_quality' tests * fix(__init__.py): fix unprotected import * fix(__init__.py): don't remove unused imports * build(ruff.toml): update ruff.toml to ignore unused imports * fix: fix: ruff + pyright - fix linting + type-checking errors * fix: fix linting errors * fix(lago.py): fix module init error * fix: fix linting errors * ci(config.yml): cd into correct dir for checks * fix(proxy_server.py): fix linting error * fix(utils.py): fix bare except causes ruff linting errors * fix: ruff - fix remaining linting errors * fix(clickhouse.py): use standard logging object * fix(__init__.py): fix unprotected import * fix: ruff - fix linting errors * fix: fix linting errors * ci(config.yml): cleanup code qa step (formatting handled in local_testing) * fix(_health_endpoints.py): fix ruff linting errors * ci(config.yml): just use ruff in check_code_quality pipeline for now * build(custom_guardrail.py): include missing file * style(embedding_handler.py): fix ruff check	2024-10-01 19:44:20 -04:00
Krrish Dholakia	6c7d1d5c96	fix(parallel_request_limiter.py): only update hidden params, don't set new (can lead to errors for responses where attribute can't be set)	2024-09-28 21:08:15 -07:00
Krrish Dholakia	3f8a5b3ef6	fix(parallel_request_limiter.py): make sure hidden params is dict before dereferencing	2024-09-28 21:08:15 -07:00
Krrish Dholakia	5222fc8e1b	fix(parallel_request_limiter.py): return remaining tpm/rpm in openai-compatible way Fixes https://github.com/BerriAI/litellm/issues/5957	2024-09-28 21:08:15 -07:00
Krrish Dholakia	efc06d4a03	fix(batch_redis_get.py): handle custom namespace Fix https://github.com/BerriAI/litellm/issues/5917	2024-09-28 21:08:14 -07:00
Ishaan Jaff	088d906276	fix use one async async_batch_set_cache (#5956 )	2024-09-28 09:59:38 -07:00
Krish Dholakia	0b30e212da	LiteLLM Minor Fixes & Improvements (09/27/2024) (#5938 ) * fix(langfuse.py): prevent double logging requester metadata Fixes https://github.com/BerriAI/litellm/issues/5935 * build(model_prices_and_context_window.json): add mistral pixtral cost tracking Closes https://github.com/BerriAI/litellm/issues/5837 * handle streaming for azure ai studio error * [Perf Proxy] parallel request limiter - use one cache update call (#5932) * fix parallel request limiter - use one cache update call * ci/cd run again * run ci/cd again * use docker username password * fix config.yml * fix config * fix config * fix config.yml * ci/cd run again * use correct typing for batch set cache * fix async_set_cache_pipeline * fix only check user id tpm / rpm limits when limits set * fix test_openai_azure_embedding_with_oidc_and_cf * fix(groq/chat/transformation.py): Fixes https://github.com/BerriAI/litellm/issues/5839 * feat(anthropic/chat.py): return 'retry-after' headers from anthropic Fixes https://github.com/BerriAI/litellm/issues/4387 * feat: raise validation error if message has tool calls without passing `tools` param for anthropic/bedrock Closes https://github.com/BerriAI/litellm/issues/5747 * [Feature]#5940, add max_workers parameter for the batch_completion (#5947) * handle streaming for azure ai studio error * bump: version 1.48.2 → 1.48.3 * docs(data_security.md): add legal/compliance faq's Make it easier for companies to use litellm * docs: resolve imports * [Feature]#5940, add max_workers parameter for the batch_completion method --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local> * fix(converse_transformation.py): fix default message value * fix(utils.py): fix get_model_info to handle finetuned models Fixes issue for standard logging payloads, where model_map_value was null for finetuned openai models * fix(litellm_pre_call_utils.py): add debug statement for data sent after updating with team/key callbacks * fix: fix linting errors * fix(anthropic/chat/handler.py): fix cache creation input tokens * fix(exception_mapping_utils.py): fix missing imports * fix(anthropic/chat/handler.py): fix usage block translation * test: fix test * test: fix tests * style(types/utils.py): trigger new build * test: fix test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Jose Alberto Arango Sanchez <jose.arangos@udea.edu.co> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>	2024-09-27 22:52:57 -07:00
Ishaan Jaff	f4613a100d	[Perf Proxy] parallel request limiter - use one cache update call (#5932 ) * fix parallel request limiter - use one cache update call * ci/cd run again * run ci/cd again * use docker username password * fix config.yml * fix config * fix config * fix config.yml * ci/cd run again * use correct typing for batch set cache * fix async_set_cache_pipeline * fix only check user id tpm / rpm limits when limits set * fix test_openai_azure_embedding_with_oidc_and_cf	2024-09-27 17:24:46 -07:00
Ishaan Jaff	58171f35ef	[Fix proxy perf] Use correct cache key when reading from redis cache (#5928 ) * fix parallel request limiter use correct user id * async def get_user_object( fix * use safe get_internal_user_object * fix store internal users in redis correctly	2024-09-26 18:13:35 -07:00
Ishaan Jaff	7cbcf538c6	[Feat] Improve OTEL Tracking - Require all Redis Cache reads to be logged on OTEL (#5881 ) * fix use previous internal usage caching logic * fix test_dual_cache_uses_redis * redis track event_metadata in service logging * show otel error on _get_parent_otel_span_from_kwargs * track parent otel span on internal usage cache * update_request_status * fix internal usage cache * fix linting * fix test internal usage cache * fix linting error * show event metadata in redis set * fix test_get_team_redis * fix test_get_team_redis * test_proxy_logging_setup	2024-09-25 10:57:08 -07:00
Krish Dholakia	234185ec13	LiteLLM Minor Fixes & Improvements (09/16/2024) (#5723 ) (#5731 ) * LiteLLM Minor Fixes & Improvements (09/16/2024) (#5723) * coverage (#5713) Signed-off-by: dbczumar <corey.zumar@databricks.com> * Move (#5714) Signed-off-by: dbczumar <corey.zumar@databricks.com> * fix(litellm_logging.py): fix logging client re-init (#5710) Fixes https://github.com/BerriAI/litellm/issues/5695 * fix(presidio.py): Fix logging_hook response and add support for additional presidio variables in guardrails config Fixes https://github.com/BerriAI/litellm/issues/5682 * feat(o1_handler.py): fake streaming for openai o1 models Fixes https://github.com/BerriAI/litellm/issues/5694 * docs: deprecated traceloop integration in favor of native otel (#5249) * fix: fix linting errors * fix: fix linting errors * fix(main.py): fix o1 import --------- Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Nir Gazit <nirga@users.noreply.github.com> * feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view (#5730) * feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view Supports having `MonthlyGlobalSpend` view be a material view, and exposes an endpoint to refresh it * fix(custom_logger.py): reset calltype * fix: fix linting errors * fix: fix linting error * fix: fix import * test(test_databricks.py): fix databricks tests --------- Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>	2024-09-17 08:05:52 -07:00
Ishaan Jaff	b6ae2204a8	[Feat-Proxy] Slack Alerting - allow using os.environ/ vars for alert to webhook url (#5726 ) * allow using os.environ for slack urls * use env vars for webhook urls * fix types for get_secret * fix linting * fix linting * fix linting * linting fixes * linting fix * docs alerting slack * fix get data	2024-09-16 18:03:37 -07:00
Krish Dholakia	dad1ad2077	LiteLLM Minor Fixes and Improvements (09/14/2024) (#5697 ) * fix(health_check.py): hide sensitive keys from health check debug information k * fix(route_llm_request.py): fix proxy model not found error message to indicate how to resolve issue * fix(vertex_llm_base.py): fix exception message to not log credentials	2024-09-14 10:32:39 -07:00
Ishaan Jaff	fb5be57bb8	v0 add rerank on litellm proxy	2024-08-27 17:28:39 -07:00
Ishaan Jaff	0e1d3804ff	refactor vertex endpoints to pass through all routes	2024-08-21 17:08:42 -07:00
Ishaan Jaff	4685b9909a	feat - allow accessing data post success call	2024-08-19 11:35:33 -07:00
Ishaan Jaff	398295116f	inly write model tpm/rpm tracking when user set it	2024-08-18 09:58:09 -07:00
Ishaan Jaff	fa96610bbc	fix async_pre_call_hook in parallel request limiter	2024-08-17 12:42:28 -07:00
Ishaan Jaff	feb8c3c5b4	Merge pull request #5259 from BerriAI/litellm_return_remaining_tokens_in_header [Feat] return `x-litellm-key-remaining-requests-{model}`: 1, `x-litellm-key-remaining-tokens-{model}: None` in response headers	2024-08-17 12:41:16 -07:00
Ishaan Jaff	ee0f772b5c	feat return rmng tokens for model for api key	2024-08-17 12:35:10 -07:00
Ishaan Jaff	5985c7e933	feat - use commong helper for getting model group	2024-08-17 10:46:04 -07:00
Ishaan Jaff	412d30d362	add litellm-key-remaining-tokens on prometheus	2024-08-17 10:02:20 -07:00
Ishaan Jaff	785482f023	feat add settings for rpm/tpm limits for a model	2024-08-17 09:16:01 -07:00
Ishaan Jaff	1ee33478c9	track rpm/tpm usage per key+model	2024-08-16 18:28:58 -07:00
Krrish Dholakia	61f4b71ef7	refactor: replace .error() with .exception() logging for better debugging on sentry	2024-08-16 09:22:47 -07:00
Krrish Dholakia	5d96ff6694	fix(utils.py): handle scenario where model="azure/*" and custom_llm_provider="azure" Fixes https://github.com/BerriAI/litellm/issues/4912	2024-08-02 17:48:53 -07:00
Ishaan Jaff	c4e4b4675c	fix raise better error when crossing tpm / rpm limits	2024-07-26 17:35:08 -07:00
Krrish Dholakia	07d90f6739	feat(aporio_ai.py): support aporio ai prompt injection for chat completion requests Closes https://github.com/BerriAI/litellm/issues/2950	2024-07-17 16:38:47 -07:00
Krrish Dholakia	fde434be66	feat(proxy_server.py): return 'retry-after' param for rate limited requests Closes https://github.com/BerriAI/litellm/issues/4695	2024-07-13 17:15:20 -07:00
Krrish Dholakia	7e769f3b89	fix: fix linting errors	2024-07-13 14:39:42 -07:00
Krrish Dholakia	0cc273d77b	feat(pass_through_endpoint.py): support enforcing key rpm limits on pass through endpoints Closes https://github.com/BerriAI/litellm/issues/4698	2024-07-13 13:29:44 -07:00
Krrish Dholakia	9d918d2ac7	fix(presidio_pii_masking.py): support logging_only pii masking	2024-07-11 18:04:12 -07:00
Krrish Dholakia	1193ee8803	fix(presidio_pii_masking.py): fix presidio unset url check + add same check for langfuse	2024-07-06 17:50:55 -07:00
Krrish Dholakia	d57d3df1d6	fix(presidio_pii_masking.py): add support for setting 'http://' if unset by render env for presidio base url	2024-07-06 17:42:10 -07:00
Krrish Dholakia	196b94455e	fix(dynamic_rate_limiter.py): add rpm allocation, priority + quota reservation to docs	2024-07-01 23:35:42 -07:00
Krrish Dholakia	6b529d4e0e	fix(dynamic_rate_limiter.py): support setting priority + reserving tpm/rpm	2024-07-01 23:08:54 -07:00
Krrish Dholakia	0781014706	test(test_dynamic_rate_limit_handler.py): refactor tests for rpm suppprt	2024-07-01 20:16:10 -07:00
Krrish Dholakia	f23b17091d	fix(dynamic_rate_limiter.py): support dynamic rate limiting on rpm	2024-07-01 17:45:10 -07:00
Krrish Dholakia	bae7377128	docs(team_budgets.md): fix script /	2024-06-22 15:42:05 -07:00
Krrish Dholakia	a31a05d45d	feat(dynamic_rate_limiter.py): working e2e	2024-06-22 14:41:22 -07:00
Krrish Dholakia	532f24bfb7	refactor: instrument 'dynamic_rate_limiting' callback on proxy	2024-06-22 00:32:29 -07:00
Krrish Dholakia	068e8dff5b	feat(dynamic_rate_limiter.py): passing base case	2024-06-21 22:46:46 -07:00
Krrish Dholakia	a028600932	feat(dynamic_rate_limiter.py): update cache with active project	2024-06-21 20:25:40 -07:00

1 2 3

118 commits