Ishaan Jaff
58171f35ef
[Fix proxy perf] Use correct cache key when reading from redis cache ( #5928 )
...
* fix parallel request limiter use correct user id
* async def get_user_object(
fix
* use safe get_internal_user_object
* fix store internal users in redis correctly
2024-09-26 18:13:35 -07:00
Ishaan Jaff
7cbcf538c6
[Feat] Improve OTEL Tracking - Require all Redis Cache reads to be logged on OTEL ( #5881 )
...
* fix use previous internal usage caching logic
* fix test_dual_cache_uses_redis
* redis track event_metadata in service logging
* show otel error on _get_parent_otel_span_from_kwargs
* track parent otel span on internal usage cache
* update_request_status
* fix internal usage cache
* fix linting
* fix test internal usage cache
* fix linting error
* show event metadata in redis set
* fix test_get_team_redis
* fix test_get_team_redis
* test_proxy_logging_setup
2024-09-25 10:57:08 -07:00
Krish Dholakia
234185ec13
LiteLLM Minor Fixes & Improvements (09/16/2024) ( #5723 ) ( #5731 )
...
* LiteLLM Minor Fixes & Improvements (09/16/2024) (#5723 )
* coverage (#5713 )
Signed-off-by: dbczumar <corey.zumar@databricks.com>
* Move (#5714 )
Signed-off-by: dbczumar <corey.zumar@databricks.com>
* fix(litellm_logging.py): fix logging client re-init (#5710 )
Fixes https://github.com/BerriAI/litellm/issues/5695
* fix(presidio.py): Fix logging_hook response and add support for additional presidio variables in guardrails config
Fixes https://github.com/BerriAI/litellm/issues/5682
* feat(o1_handler.py): fake streaming for openai o1 models
Fixes https://github.com/BerriAI/litellm/issues/5694
* docs: deprecated traceloop integration in favor of native otel (#5249 )
* fix: fix linting errors
* fix: fix linting errors
* fix(main.py): fix o1 import
---------
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view (#5730 )
* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view
Supports having `MonthlyGlobalSpend` view be a material view, and exposes an endpoint to refresh it
* fix(custom_logger.py): reset calltype
* fix: fix linting errors
* fix: fix linting error
* fix: fix import
* test(test_databricks.py): fix databricks tests
---------
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
2024-09-17 08:05:52 -07:00
Ishaan Jaff
b6ae2204a8
[Feat-Proxy] Slack Alerting - allow using os.environ/ vars for alert to webhook url ( #5726 )
...
* allow using os.environ for slack urls
* use env vars for webhook urls
* fix types for get_secret
* fix linting
* fix linting
* fix linting
* linting fixes
* linting fix
* docs alerting slack
* fix get data
2024-09-16 18:03:37 -07:00
Krish Dholakia
dad1ad2077
LiteLLM Minor Fixes and Improvements (09/14/2024) ( #5697 )
...
* fix(health_check.py): hide sensitive keys from health check debug information k
* fix(route_llm_request.py): fix proxy model not found error message to indicate how to resolve issue
* fix(vertex_llm_base.py): fix exception message to not log credentials
2024-09-14 10:32:39 -07:00
Ishaan Jaff
fb5be57bb8
v0 add rerank on litellm proxy
2024-08-27 17:28:39 -07:00
Ishaan Jaff
0e1d3804ff
refactor vertex endpoints to pass through all routes
2024-08-21 17:08:42 -07:00
Ishaan Jaff
4685b9909a
feat - allow accessing data post success call
2024-08-19 11:35:33 -07:00
Ishaan Jaff
398295116f
inly write model tpm/rpm tracking when user set it
2024-08-18 09:58:09 -07:00
Ishaan Jaff
fa96610bbc
fix async_pre_call_hook in parallel request limiter
2024-08-17 12:42:28 -07:00
Ishaan Jaff
feb8c3c5b4
Merge pull request #5259 from BerriAI/litellm_return_remaining_tokens_in_header
...
[Feat] return `x-litellm-key-remaining-requests-{model}`: 1, `x-litellm-key-remaining-tokens-{model}: None` in response headers
2024-08-17 12:41:16 -07:00
Ishaan Jaff
ee0f772b5c
feat return rmng tokens for model for api key
2024-08-17 12:35:10 -07:00
Ishaan Jaff
5985c7e933
feat - use commong helper for getting model group
2024-08-17 10:46:04 -07:00
Ishaan Jaff
412d30d362
add litellm-key-remaining-tokens on prometheus
2024-08-17 10:02:20 -07:00
Ishaan Jaff
785482f023
feat add settings for rpm/tpm limits for a model
2024-08-17 09:16:01 -07:00
Ishaan Jaff
1ee33478c9
track rpm/tpm usage per key+model
2024-08-16 18:28:58 -07:00
Krrish Dholakia
61f4b71ef7
refactor: replace .error() with .exception() logging for better debugging on sentry
2024-08-16 09:22:47 -07:00
Krrish Dholakia
5d96ff6694
fix(utils.py): handle scenario where model="azure/*" and custom_llm_provider="azure"
...
Fixes https://github.com/BerriAI/litellm/issues/4912
2024-08-02 17:48:53 -07:00
Ishaan Jaff
c4e4b4675c
fix raise better error when crossing tpm / rpm limits
2024-07-26 17:35:08 -07:00
Krrish Dholakia
07d90f6739
feat(aporio_ai.py): support aporio ai prompt injection for chat completion requests
...
Closes https://github.com/BerriAI/litellm/issues/2950
2024-07-17 16:38:47 -07:00
Krrish Dholakia
fde434be66
feat(proxy_server.py): return 'retry-after' param for rate limited requests
...
Closes https://github.com/BerriAI/litellm/issues/4695
2024-07-13 17:15:20 -07:00
Krrish Dholakia
7e769f3b89
fix: fix linting errors
2024-07-13 14:39:42 -07:00
Krrish Dholakia
0cc273d77b
feat(pass_through_endpoint.py): support enforcing key rpm limits on pass through endpoints
...
Closes https://github.com/BerriAI/litellm/issues/4698
2024-07-13 13:29:44 -07:00
Krrish Dholakia
9d918d2ac7
fix(presidio_pii_masking.py): support logging_only pii masking
2024-07-11 18:04:12 -07:00
Krrish Dholakia
1193ee8803
fix(presidio_pii_masking.py): fix presidio unset url check + add same check for langfuse
2024-07-06 17:50:55 -07:00
Krrish Dholakia
d57d3df1d6
fix(presidio_pii_masking.py): add support for setting 'http://' if unset by render env for presidio base url
2024-07-06 17:42:10 -07:00
Krrish Dholakia
196b94455e
fix(dynamic_rate_limiter.py): add rpm allocation, priority + quota reservation to docs
2024-07-01 23:35:42 -07:00
Krrish Dholakia
6b529d4e0e
fix(dynamic_rate_limiter.py): support setting priority + reserving tpm/rpm
2024-07-01 23:08:54 -07:00
Krrish Dholakia
0781014706
test(test_dynamic_rate_limit_handler.py): refactor tests for rpm suppprt
2024-07-01 20:16:10 -07:00
Krrish Dholakia
f23b17091d
fix(dynamic_rate_limiter.py): support dynamic rate limiting on rpm
2024-07-01 17:45:10 -07:00
Krrish Dholakia
bae7377128
docs(team_budgets.md): fix script
...
/
2024-06-22 15:42:05 -07:00
Krrish Dholakia
a31a05d45d
feat(dynamic_rate_limiter.py): working e2e
2024-06-22 14:41:22 -07:00
Krrish Dholakia
532f24bfb7
refactor: instrument 'dynamic_rate_limiting' callback on proxy
2024-06-22 00:32:29 -07:00
Krrish Dholakia
068e8dff5b
feat(dynamic_rate_limiter.py): passing base case
2024-06-21 22:46:46 -07:00
Krrish Dholakia
a028600932
feat(dynamic_rate_limiter.py): update cache with active project
2024-06-21 20:25:40 -07:00
Krrish Dholakia
2545da777b
feat(dynamic_rate_limiter.py): initial commit for dynamic rate limiting
...
Closes https://github.com/BerriAI/litellm/issues/4124
2024-06-21 18:41:31 -07:00
Krish Dholakia
e61cd2e1e2
Merge branch 'main' into litellm_redis_cache_usage
2024-06-13 22:07:21 -07:00
Krrish Dholakia
3b913443fe
feat(vertex_httpx.py): Moving to call vertex ai via httpx (instead of their sdk). Allows us to support all their api updates.
2024-06-12 16:47:00 -07:00
Krrish Dholakia
76c9b715f2
fix(parallel_request_limiter.py): use redis cache, if available for rate limiting across instances
...
Fixes https://github.com/BerriAI/litellm/issues/4148
2024-06-12 10:35:48 -07:00
Krrish Dholakia
af1ae80277
fix(litellm_pre_call_utils.py): add support for key level caching params
2024-06-07 22:09:14 -07:00
Krrish Dholakia
6cca5612d2
refactor: replace 'traceback.print_exc()' with logging library
...
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
4408b717f0
fix(parallel_request_limiter.py): fix user+team tpm/rpm limit check
...
Closes https://github.com/BerriAI/litellm/issues/3788
2024-05-27 08:48:23 -07:00
Ishaan Jaff
106910cecf
feat - add end user rate limiting
2024-05-22 14:01:57 -07:00
Krrish Dholakia
b41f30ca60
fix(proxy_server.py): fixes for making rejected responses work with streaming
2024-05-20 12:32:19 -07:00
Krrish Dholakia
f11f207ae6
feat(proxy_server.py): refactor returning rejected message, to work with error logging
...
log the rejected request as a failed call to langfuse/slack alerting
2024-05-20 11:14:36 -07:00
Krrish Dholakia
594ca947c8
fix(parallel_request_limiter.py): fix max parallel request limiter on retries
2024-05-15 20:16:11 -07:00
Ishaan Jaff
91a6a0eef4
(Fix) - linting errors
2024-05-11 15:57:06 -07:00
Lunik
1639a51f24
🔊 fix: Correctly use verbose logging
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 11:04:23 +02:00
Lunik
8783fd4895
✨ feat: Use 8 severity levels for azure content safety
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 10:45:39 +02:00
Lunik
cb178723ca
📝 doc: Azure content safety Proxy usage
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 10:39:43 +02:00