Ishaan Jaff
|
b4bca8db82
|
feat - allow accessing data post success call
|
2024-08-19 11:35:33 -07:00 |
|
Ishaan Jaff
|
94e74b9ede
|
inly write model tpm/rpm tracking when user set it
|
2024-08-18 09:58:09 -07:00 |
|
Ishaan Jaff
|
8578301116
|
fix async_pre_call_hook in parallel request limiter
|
2024-08-17 12:42:28 -07:00 |
|
Ishaan Jaff
|
db8f789318
|
Merge pull request #5259 from BerriAI/litellm_return_remaining_tokens_in_header
[Feat] return `x-litellm-key-remaining-requests-{model}`: 1, `x-litellm-key-remaining-tokens-{model}: None` in response headers
|
2024-08-17 12:41:16 -07:00 |
|
Ishaan Jaff
|
9f6630912d
|
feat return rmng tokens for model for api key
|
2024-08-17 12:35:10 -07:00 |
|
Ishaan Jaff
|
a62277a6aa
|
feat - use commong helper for getting model group
|
2024-08-17 10:46:04 -07:00 |
|
Ishaan Jaff
|
03196742d2
|
add litellm-key-remaining-tokens on prometheus
|
2024-08-17 10:02:20 -07:00 |
|
Ishaan Jaff
|
8ae626b31f
|
feat add settings for rpm/tpm limits for a model
|
2024-08-17 09:16:01 -07:00 |
|
Ishaan Jaff
|
824ea32452
|
track rpm/tpm usage per key+model
|
2024-08-16 18:28:58 -07:00 |
|
Krrish Dholakia
|
2874b94fb1
|
refactor: replace .error() with .exception() logging for better debugging on sentry
|
2024-08-16 09:22:47 -07:00 |
|
Krrish Dholakia
|
e6bc7e938a
|
fix(utils.py): handle scenario where model="azure/*" and custom_llm_provider="azure"
Fixes https://github.com/BerriAI/litellm/issues/4912
|
2024-08-02 17:48:53 -07:00 |
|
Ishaan Jaff
|
bda2ac1af5
|
fix raise better error when crossing tpm / rpm limits
|
2024-07-26 17:35:08 -07:00 |
|
Krrish Dholakia
|
af2055c2b7
|
feat(aporio_ai.py): support aporio ai prompt injection for chat completion requests
Closes https://github.com/BerriAI/litellm/issues/2950
|
2024-07-17 16:38:47 -07:00 |
|
Krrish Dholakia
|
17635450cd
|
feat(proxy_server.py): return 'retry-after' param for rate limited requests
Closes https://github.com/BerriAI/litellm/issues/4695
|
2024-07-13 17:15:20 -07:00 |
|
Krrish Dholakia
|
4ca677638f
|
fix: fix linting errors
|
2024-07-13 14:39:42 -07:00 |
|
Krrish Dholakia
|
1d6643df22
|
feat(pass_through_endpoint.py): support enforcing key rpm limits on pass through endpoints
Closes https://github.com/BerriAI/litellm/issues/4698
|
2024-07-13 13:29:44 -07:00 |
|
Krrish Dholakia
|
1a57e49e46
|
fix(presidio_pii_masking.py): support logging_only pii masking
|
2024-07-11 18:04:12 -07:00 |
|
Krrish Dholakia
|
bcd7358daf
|
fix(presidio_pii_masking.py): fix presidio unset url check + add same check for langfuse
|
2024-07-06 17:50:55 -07:00 |
|
Krrish Dholakia
|
e424fea721
|
fix(presidio_pii_masking.py): add support for setting 'http://' if unset by render env for presidio base url
|
2024-07-06 17:42:10 -07:00 |
|
Krrish Dholakia
|
c1a1529582
|
fix(dynamic_rate_limiter.py): add rpm allocation, priority + quota reservation to docs
|
2024-07-01 23:35:42 -07:00 |
|
Krrish Dholakia
|
0bc08063e1
|
fix(dynamic_rate_limiter.py): support setting priority + reserving tpm/rpm
|
2024-07-01 23:08:54 -07:00 |
|
Krrish Dholakia
|
f74490c69b
|
test(test_dynamic_rate_limit_handler.py): refactor tests for rpm suppprt
|
2024-07-01 20:16:10 -07:00 |
|
Krrish Dholakia
|
d528e263c2
|
fix(dynamic_rate_limiter.py): support dynamic rate limiting on rpm
|
2024-07-01 17:45:10 -07:00 |
|
Krrish Dholakia
|
1e4f8744e6
|
docs(team_budgets.md): fix script
/
|
2024-06-22 15:42:05 -07:00 |
|
Krrish Dholakia
|
8843b0dc77
|
feat(dynamic_rate_limiter.py): working e2e
|
2024-06-22 14:41:22 -07:00 |
|
Krrish Dholakia
|
8f95381276
|
refactor: instrument 'dynamic_rate_limiting' callback on proxy
|
2024-06-22 00:32:29 -07:00 |
|
Krrish Dholakia
|
6a7982fa40
|
feat(dynamic_rate_limiter.py): passing base case
|
2024-06-21 22:46:46 -07:00 |
|
Krrish Dholakia
|
0430807178
|
feat(dynamic_rate_limiter.py): update cache with active project
|
2024-06-21 20:25:40 -07:00 |
|
Krrish Dholakia
|
89dba82be9
|
feat(dynamic_rate_limiter.py): initial commit for dynamic rate limiting
Closes https://github.com/BerriAI/litellm/issues/4124
|
2024-06-21 18:41:31 -07:00 |
|
Krish Dholakia
|
c373f104cc
|
Merge branch 'main' into litellm_redis_cache_usage
|
2024-06-13 22:07:21 -07:00 |
|
Krrish Dholakia
|
29169b3039
|
feat(vertex_httpx.py): Moving to call vertex ai via httpx (instead of their sdk). Allows us to support all their api updates.
|
2024-06-12 16:47:00 -07:00 |
|
Krrish Dholakia
|
77328e4a28
|
fix(parallel_request_limiter.py): use redis cache, if available for rate limiting across instances
Fixes https://github.com/BerriAI/litellm/issues/4148
|
2024-06-12 10:35:48 -07:00 |
|
Krrish Dholakia
|
22b51c5af4
|
fix(litellm_pre_call_utils.py): add support for key level caching params
|
2024-06-07 22:09:14 -07:00 |
|
Krrish Dholakia
|
e391e30285
|
refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
|
2024-06-06 13:47:43 -07:00 |
|
Krrish Dholakia
|
56fd0c60d1
|
fix(parallel_request_limiter.py): fix user+team tpm/rpm limit check
Closes https://github.com/BerriAI/litellm/issues/3788
|
2024-05-27 08:48:23 -07:00 |
|
Ishaan Jaff
|
eb58440ebf
|
feat - add end user rate limiting
|
2024-05-22 14:01:57 -07:00 |
|
Krrish Dholakia
|
d4d4550bb6
|
fix(proxy_server.py): fixes for making rejected responses work with streaming
|
2024-05-20 12:32:19 -07:00 |
|
Krrish Dholakia
|
8fb8d068fb
|
feat(proxy_server.py): refactor returning rejected message, to work with error logging
log the rejected request as a failed call to langfuse/slack alerting
|
2024-05-20 11:14:36 -07:00 |
|
Krrish Dholakia
|
3f339cb694
|
fix(parallel_request_limiter.py): fix max parallel request limiter on retries
|
2024-05-15 20:16:11 -07:00 |
|
Ishaan Jaff
|
9cc30e32b3
|
(Fix) - linting errors
|
2024-05-11 15:57:06 -07:00 |
|
Lunik
|
5f43a7b511
|
🔊 fix: Correctly use verbose logging
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-04 11:04:23 +02:00 |
|
Lunik
|
38d4cbc511
|
✨ feat: Use 8 severity levels for azure content safety
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-04 10:45:39 +02:00 |
|
Lunik
|
d69a1eeb4f
|
📝 doc: Azure content safety Proxy usage
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-04 10:39:43 +02:00 |
|
Lunik
|
08593fcaab
|
⚡️ perf: Remove test violation on each stream chunk
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-03 20:51:40 +02:00 |
|
Lunik
|
7945e28356
|
✅ ci: Add tests
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-03 20:50:37 +02:00 |
|
Lunik
|
3ca174bc57
|
✨ feat: Add Azure Content-Safety Proxy hooks
Signed-off-by: Lunik <lunik@tiwabbit.fr>
|
2024-05-02 23:21:08 +02:00 |
|
Krrish Dholakia
|
31e2d4e6d1
|
feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis
makes routing work across multiple instances
|
2024-04-10 14:56:23 -07:00 |
|
Krrish Dholakia
|
e06d43dc90
|
fix(tpm_rpm_limiter.py): fix cache init logic
|
2024-04-01 18:01:38 -07:00 |
|
Krrish Dholakia
|
b39bc583bd
|
test(test_max_tpm_rpm_limiter.py): unit tests for key + team based tpm rpm limits on proxy
|
2024-04-01 08:00:01 -07:00 |
|
Krrish Dholakia
|
555f0af027
|
fix(tpm_rpm_limiter.py): enable redis caching for tpm/rpm checks on keys/user/teams
allows tpm/rpm checks to work across instances
https://github.com/BerriAI/litellm/issues/2730
|
2024-03-30 20:01:36 -07:00 |
|