Commit graph

80 commits

Author SHA1 Message Date
Krrish Dholakia
e391e30285 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
bcc07afd04 fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event 2024-05-21 18:42:15 -07:00
Krrish Dholakia
f007bf7e21 feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krish Dholakia
c0e43a7296 Merge pull request #3412 from sumanth13131/usage-based-routing-ttl-on-cache
usage-based-routing-ttl-on-cache
2024-05-21 07:58:41 -07:00
Krrish Dholakia
84db63e3dd fix(lowest_latency.py): allow ttl to be a float 2024-05-15 09:59:21 -07:00
sumanth
4bbd9c866c addressed comments 2024-05-14 10:05:19 +05:30
SUMANTH
0db58c2fac Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache 2024-05-14 09:08:01 +05:30
Rahul Kataria
be4450106d Remove duplicate code in router_strategy 2024-05-12 18:05:57 +05:30
Krrish Dholakia
926b86af87 feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls 2024-05-11 13:43:08 -07:00
Krrish Dholakia
5f93cae3ff feat(proxy_server.py): return litellm version in response headers 2024-05-08 16:00:08 -07:00
Ishaan Jaff
3bc0b998b2 feat - make lowest_cost pure async 2024-05-07 13:51:50 -07:00
Ishaan Jaff
0f82d97202 fix allow user to pass input_cost and output_cost 2024-05-07 13:08:16 -07:00
Ishaan Jaff
a52ef20a40 test - lowest cost router 2024-05-07 13:04:12 -07:00
Ishaan Jaff
864512efd9 fix - default value for cost 2024-05-07 12:51:52 -07:00
Ishaan Jaff
a2304aa78b fix - lowest cost routing 2024-05-07 12:49:20 -07:00
Ishaan Jaff
98778f54e7 feat - add lowst cost router 2024-05-07 12:12:09 -07:00
Krrish Dholakia
cb88ed4df8 fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified) 2024-05-03 09:00:32 -07:00
sumanth
dce55bab76 usage-based-routing-ttl-on-cache 2024-05-03 10:50:45 +05:30
Krrish Dholakia
b22c604c8c feat(router.py): add 'get_model_info' helper function to get the model info for a specific model, based on it's id 2024-05-02 17:53:09 -07:00
Krrish Dholakia
7ae28bfcc9 fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold
if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.
2024-04-30 12:00:26 -07:00
Krrish Dholakia
7e71c4f4f7 fix(lowest_tpm_rpm_v2.py): skip if item_tpm is None 2024-04-29 21:34:25 -07:00
Krish Dholakia
23df9eaefb Merge pull request #3358 from sumanth13131/usage-based-routing-RPM-fix
usage based routing RPM count fix
2024-04-29 16:45:25 -07:00
Ishaan Jaff
f4a036618f Merge pull request #3360 from BerriAI/litellm_random_pick_lowest_latency
[Fix] Lowest Latency routing - random pick deployments when all latencies=0
2024-04-29 16:31:32 -07:00
Ishaan Jaff
d4a0530d02 fix - lowest latency routing 2024-04-29 16:02:57 -07:00
Ishaan Jaff
2a49580b5b fix lowest latency - routing 2024-04-29 15:51:52 -07:00
Krrish Dholakia
3afe7ab1a1 fix(lowest_tpm_rpm_v2.py): shuffle deployments with same tpm values 2024-04-29 15:23:47 -07:00
Krrish Dholakia
c39f8f3ef1 fix(lowest_tpm_rpm_v2.py): add more detail to 'No deployments available' error message 2024-04-29 15:04:37 -07:00
sumanth
480a77996f usage based routing RPM count fix 2024-04-30 00:29:38 +05:30
Ishaan Jaff
7306072d33 fix debugging lowest latency router 2024-04-25 19:34:28 -07:00
Ishaan Jaff
3ab5e687f6 fix better debugging for latency 2024-04-25 11:35:08 -07:00
Ishaan Jaff
4931514330 fix 2024-04-25 11:25:03 -07:00
Ishaan Jaff
3b9d6dfc47 temp - show better debug logs for lowest latency 2024-04-25 11:22:52 -07:00
Ishaan Jaff
a26ecbad97 fix - increase default penalty for lowest latency 2024-04-25 07:54:25 -07:00
Ishaan Jaff
5dae1cf303 fix - set latency stats in kwargs 2024-04-24 20:13:45 -07:00
Ishaan Jaff
654c736d29 feat - penalize timeout errors 2024-04-24 16:35:00 -07:00
Krrish Dholakia
1ca2439eb7 fix(lowest_tpm_rpm_v2.py): use a combined tpm+rpm query in async get cache, to reduce redis client calls in high traffic 2024-04-20 16:13:11 -07:00
Krrish Dholakia
5da934099f fix(caching.py): dual cache async_batch_get_cache fix + testing
this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)
2024-04-19 15:03:25 -07:00
Krrish Dholakia
308a6e11f8 fix(lowest_tpm_rpm_v2.py): ensure backwards compatibility for python 3.8 2024-04-18 21:42:35 -07:00
Krrish Dholakia
376ee4e9d7 fix(test_lowest_tpm_rpm_routing_v2.py): unit testing for usage-based-routing-v2 2024-04-18 21:38:00 -07:00
Krrish Dholakia
72691e05f4 fix(tpm_rpm_routing_v2.py): fix tpm rpm routing 2024-04-18 20:01:22 -07:00
Krrish Dholakia
eb7f260efc fix(lowest_tpm_rpm_v2.py): don't fail calls if redis fails to connect 2024-04-12 19:36:59 -07:00
Krrish Dholakia
c177407f7b test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy 2024-04-12 18:56:13 -07:00
Krrish Dholakia
d9b8f63e86 fix(router.py): support pre_call_rpm_check for lowest_tpm_rpm_v2 routing
have routing strategies expose an ‘update rpm’ function; for checking + updating rpm pre call
2024-04-12 18:25:14 -07:00
Krrish Dholakia
8f06c2d8c4 fix(router.py): fix datetime object 2024-04-10 17:55:24 -07:00
Krrish Dholakia
384245e331 fix(router.py): make get_cooldown_deployment logic async 2024-04-10 16:57:01 -07:00
Krrish Dholakia
f5206d592a fix(router.py): generate consistent model id's
having the same id for a deployment, lets redis usage caching work across multiple instances
2024-04-10 15:23:57 -07:00
Krrish Dholakia
31e2d4e6d1 feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis
makes routing work across multiple instances
2024-04-10 14:56:23 -07:00
Krish Dholakia
b8d285d120 Merge pull request #2798 from CLARKBENHAM/main
add test for rate limits - Router isn't coroutine safe
2024-04-06 08:47:40 -07:00
Krrish Dholakia
48a5948081 fix(router.py): handle id being passed in as int 2024-04-04 14:23:10 -07:00
CLARKBENHAM
1c93ebf05a undo black formating 2024-04-02 19:53:48 -07:00