Krrish Dholakia
1e4f8744e6
docs(team_budgets.md): fix script
...
/
2024-06-22 15:42:05 -07:00
Krrish Dholakia
8843b0dc77
feat(dynamic_rate_limiter.py): working e2e
2024-06-22 14:41:22 -07:00
Krrish Dholakia
8f95381276
refactor: instrument 'dynamic_rate_limiting' callback on proxy
2024-06-22 00:32:29 -07:00
Krrish Dholakia
6a7982fa40
feat(dynamic_rate_limiter.py): passing base case
2024-06-21 22:46:46 -07:00
Krrish Dholakia
0430807178
feat(dynamic_rate_limiter.py): update cache with active project
2024-06-21 20:25:40 -07:00
Krrish Dholakia
89dba82be9
feat(dynamic_rate_limiter.py): initial commit for dynamic rate limiting
...
Closes https://github.com/BerriAI/litellm/issues/4124
2024-06-21 18:41:31 -07:00
Krish Dholakia
c373f104cc
Merge branch 'main' into litellm_redis_cache_usage
2024-06-13 22:07:21 -07:00
Krrish Dholakia
29169b3039
feat(vertex_httpx.py): Moving to call vertex ai via httpx (instead of their sdk). Allows us to support all their api updates.
2024-06-12 16:47:00 -07:00
Krrish Dholakia
77328e4a28
fix(parallel_request_limiter.py): use redis cache, if available for rate limiting across instances
...
Fixes https://github.com/BerriAI/litellm/issues/4148
2024-06-12 10:35:48 -07:00
Krrish Dholakia
22b51c5af4
fix(litellm_pre_call_utils.py): add support for key level caching params
2024-06-07 22:09:14 -07:00
Krrish Dholakia
e391e30285
refactor: replace 'traceback.print_exc()' with logging library
...
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
56fd0c60d1
fix(parallel_request_limiter.py): fix user+team tpm/rpm limit check
...
Closes https://github.com/BerriAI/litellm/issues/3788
2024-05-27 08:48:23 -07:00
Ishaan Jaff
eb58440ebf
feat - add end user rate limiting
2024-05-22 14:01:57 -07:00
Krrish Dholakia
d4d4550bb6
fix(proxy_server.py): fixes for making rejected responses work with streaming
2024-05-20 12:32:19 -07:00
Krrish Dholakia
8fb8d068fb
feat(proxy_server.py): refactor returning rejected message, to work with error logging
...
log the rejected request as a failed call to langfuse/slack alerting
2024-05-20 11:14:36 -07:00
Krrish Dholakia
3f339cb694
fix(parallel_request_limiter.py): fix max parallel request limiter on retries
2024-05-15 20:16:11 -07:00
Ishaan Jaff
9cc30e32b3
(Fix) - linting errors
2024-05-11 15:57:06 -07:00
Lunik
5f43a7b511
🔊 fix: Correctly use verbose logging
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 11:04:23 +02:00
Lunik
38d4cbc511
✨ feat: Use 8 severity levels for azure content safety
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 10:45:39 +02:00
Lunik
d69a1eeb4f
📝 doc: Azure content safety Proxy usage
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-04 10:39:43 +02:00
Lunik
08593fcaab
⚡ ️ perf: Remove test violation on each stream chunk
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-03 20:51:40 +02:00
Lunik
7945e28356
✅ ci: Add tests
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-03 20:50:37 +02:00
Lunik
3ca174bc57
✨ feat: Add Azure Content-Safety Proxy hooks
...
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-05-02 23:21:08 +02:00
Krrish Dholakia
31e2d4e6d1
feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis
...
makes routing work across multiple instances
2024-04-10 14:56:23 -07:00
Krrish Dholakia
e06d43dc90
fix(tpm_rpm_limiter.py): fix cache init logic
2024-04-01 18:01:38 -07:00
Krrish Dholakia
b39bc583bd
test(test_max_tpm_rpm_limiter.py): unit tests for key + team based tpm rpm limits on proxy
2024-04-01 08:00:01 -07:00
Krrish Dholakia
555f0af027
fix(tpm_rpm_limiter.py): enable redis caching for tpm/rpm checks on keys/user/teams
...
allows tpm/rpm checks to work across instances
https://github.com/BerriAI/litellm/issues/2730
2024-03-30 20:01:36 -07:00
Krrish Dholakia
737bb3e444
fix(proxy_server.py): fix tpm/rpm limiting for jwt auth
...
fixes tpm/rpm limiting for jwt auth and implements unit tests for jwt auth
2024-03-28 21:19:34 -07:00
Krrish Dholakia
7bc76ddbc3
feat(llm_guard.py): enable key-specific llm guard check
2024-03-26 17:21:51 -07:00
Ishaan Jaff
f0992c2dbd
(fix) stop using f strings with logger
2024-03-25 10:47:18 -07:00
Krrish Dholakia
b872644496
fix(prompt_injection_detection.py): fix type check
2024-03-21 08:56:13 -07:00
Krrish Dholakia
5cfabe9a09
fix: fix linting issue
2024-03-21 08:19:09 -07:00
Krrish Dholakia
e9cc6b4cc9
feat(proxy_server.py): enable llm api based prompt injection checks
...
run user calls through an llm api to check for prompt injection attacks. This happens in parallel to th
e actual llm call using `async_moderation_hook`
2024-03-20 22:43:42 -07:00
Krrish Dholakia
feb78b7819
fix(proxy_server.py): fix import
2024-03-20 19:15:06 -07:00
Krrish Dholakia
e9ff51aa70
fix(prompt_injection_detection.py): ensure combinations are actual phrases, not just 1-2 words
...
reduces misflagging
https://github.com/BerriAI/litellm/issues/2601
2024-03-20 19:09:38 -07:00
Krrish Dholakia
3680f16cd7
feat(batch_redis_get.py): batch redis GET requests for a given key + call type
...
reduces number of redis requests. 85ms latency improvement over 3 minutes of load (19k requests).
2024-03-15 14:54:16 -07:00
Krrish Dholakia
8d1c60bfdc
feat(batch_redis_get.py): batch redis GET requests for a given key + call type
...
reduces the number of GET requests we're making in high-throughput scenarios
2024-03-15 14:40:11 -07:00
Krrish Dholakia
d2f47ee45b
fix(parallel_request_limiter.py): handle metadata being none
2024-03-14 10:02:41 -07:00
Krrish Dholakia
c963e2761b
feat(proxy_server.py): retry if virtual key is rate limited
...
currently for chat completions
2024-03-05 19:00:03 -08:00
Krrish Dholakia
f72b84f6e0
fix(parallel_request_limiter.py): handle none scenario
2024-02-26 20:09:06 -08:00
Krrish Dholakia
7fff5119de
fix(parallel_request_limiter.py): fix team rate limit enforcement
2024-02-26 18:06:13 -08:00
Krrish Dholakia
5213fd2e1e
feat(parallel_request_limiter.py): enforce team based tpm / rpm limits
2024-02-26 16:20:41 -08:00
ishaan-jaff
5ec69a0ca5
(fix) failing parallel_Request_limiter test
2024-02-22 19:16:22 -08:00
ishaan-jaff
b728ded300
(fix) don't double check curr data and time
2024-02-22 18:50:02 -08:00
ishaan-jaff
74d66d5ac5
(feat) tpm/rpm limit by User
2024-02-22 18:44:03 -08:00
Krrish Dholakia
f68b692147
fix(presidio_pii_masking.py): enable user to pass ad hoc recognizer for pii masking
2024-02-20 16:01:15 -08:00
Krrish Dholakia
aa93b02562
fix(presidio_pii_masking.py): enable user to pass their own ad hoc recognizers to presidio
2024-02-20 15:19:31 -08:00
Krrish Dholakia
966abee67f
test(test_presidio_pii_masking.py): add more unit tests
2024-02-19 16:30:44 -08:00
Krrish Dholakia
93acda267c
feat(presidio_pii_masking.py): allow request level controls for turning on/off pii masking
...
https://github.com/BerriAI/litellm/issues/2003
2024-02-17 11:04:56 -08:00
Krrish Dholakia
7a75682637
docs(enterprise.md): add llama guard tutorial to enterprise docs
2024-02-17 09:25:49 -08:00