Ishaan Jaff
c4052ee7d7
support default deployments
2024-09-09 14:23:17 -07:00
Ishaan Jaff
f1d0045ae6
fix taf based routing debugging
2024-09-09 14:11:54 -07:00
Ishaan Jaff
a1f0df3cea
fix debug statements
2024-09-09 14:00:17 -07:00
Ishaan Jaff
84bda9cc80
fix get_deployments_for_tag
2024-08-29 13:51:36 -07:00
Krrish Dholakia
61f4b71ef7
refactor: replace .error() with .exception() logging for better debugging on sentry
2024-08-16 09:22:47 -07:00
Ishaan Jaff
08adda7091
control using enable_tag_filtering
2024-07-18 19:39:04 -07:00
Ishaan Jaff
4d0fbfea83
router - refactor to tag based routing
2024-07-18 19:22:09 -07:00
Ishaan Jaff
64e38562d9
router - use free paid tier routing
2024-07-18 17:09:42 -07:00
Ishaan Jaff
88cd641089
helper to get_deployments_for_tier
2024-07-18 17:06:06 -07:00
Krrish Dholakia
6cca5612d2
refactor: replace 'traceback.print_exc()' with logging library
...
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
f19d7327ca
fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event
2024-05-21 18:42:15 -07:00
Krrish Dholakia
2b3da449c8
feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
...
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krish Dholakia
2cda5a2bc3
Merge pull request #3412 from sumanth13131/usage-based-routing-ttl-on-cache
...
usage-based-routing-ttl-on-cache
2024-05-21 07:58:41 -07:00
Krrish Dholakia
f5d73547c7
fix(lowest_latency.py): allow ttl to be a float
2024-05-15 09:59:21 -07:00
sumanth
71e0294485
addressed comments
2024-05-14 10:05:19 +05:30
SUMANTH
978672a56d
Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache
2024-05-14 09:08:01 +05:30
Rahul Kataria
d57ecf3371
Remove duplicate code in router_strategy
2024-05-12 18:05:57 +05:30
Krrish Dholakia
4a3b084961
feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls
2024-05-11 13:43:08 -07:00
Krrish Dholakia
6575143460
feat(proxy_server.py): return litellm version in response headers
2024-05-08 16:00:08 -07:00
Ishaan Jaff
6983e7a84f
feat - make lowest_cost pure async
2024-05-07 13:51:50 -07:00
Ishaan Jaff
486cbb990c
fix allow user to pass input_cost and output_cost
2024-05-07 13:08:16 -07:00
Ishaan Jaff
71a92b4fef
test - lowest cost router
2024-05-07 13:04:12 -07:00
Ishaan Jaff
690d7b10a6
fix - default value for cost
2024-05-07 12:51:52 -07:00
Ishaan Jaff
245960708d
fix - lowest cost routing
2024-05-07 12:49:20 -07:00
Ishaan Jaff
31ac43bfdc
feat - add lowst cost router
2024-05-07 12:12:09 -07:00
Krrish Dholakia
0b72904608
fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)
2024-05-03 09:00:32 -07:00
sumanth
3bc6b5d119
usage-based-routing-ttl-on-cache
2024-05-03 10:50:45 +05:30
Krrish Dholakia
91971fa9e0
feat(router.py): add 'get_model_info' helper function to get the model info for a specific model, based on it's id
2024-05-02 17:53:09 -07:00
Krrish Dholakia
90cdfef1c1
fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold
...
if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.
2024-04-30 12:00:26 -07:00
Krrish Dholakia
020b175ef4
fix(lowest_tpm_rpm_v2.py): skip if item_tpm is None
2024-04-29 21:34:25 -07:00
Krish Dholakia
32534b5e91
Merge pull request #3358 from sumanth13131/usage-based-routing-RPM-fix
...
usage based routing RPM count fix
2024-04-29 16:45:25 -07:00
Ishaan Jaff
d58dd2cbeb
Merge pull request #3360 from BerriAI/litellm_random_pick_lowest_latency
...
[Fix] Lowest Latency routing - random pick deployments when all latencies=0
2024-04-29 16:31:32 -07:00
Ishaan Jaff
4cb4a7f06d
fix - lowest latency routing
2024-04-29 16:02:57 -07:00
Ishaan Jaff
3b0aa05378
fix lowest latency - routing
2024-04-29 15:51:52 -07:00
Krrish Dholakia
a978f2d881
fix(lowest_tpm_rpm_v2.py): shuffle deployments with same tpm values
2024-04-29 15:23:47 -07:00
Krrish Dholakia
f10a066d36
fix(lowest_tpm_rpm_v2.py): add more detail to 'No deployments available' error message
2024-04-29 15:04:37 -07:00
sumanth
89e655c79e
usage based routing RPM count fix
2024-04-30 00:29:38 +05:30
Ishaan Jaff
bf92a0b31c
fix debugging lowest latency router
2024-04-25 19:34:28 -07:00
Ishaan Jaff
737af2b458
fix better debugging for latency
2024-04-25 11:35:08 -07:00
Ishaan Jaff
787735bb5a
fix
2024-04-25 11:25:03 -07:00
Ishaan Jaff
984259d420
temp - show better debug logs for lowest latency
2024-04-25 11:22:52 -07:00
Ishaan Jaff
92f21cba30
fix - increase default penalty for lowest latency
2024-04-25 07:54:25 -07:00
Ishaan Jaff
212369498e
fix - set latency stats in kwargs
2024-04-24 20:13:45 -07:00
Ishaan Jaff
bf6abed808
feat - penalize timeout errors
2024-04-24 16:35:00 -07:00
Krrish Dholakia
9379e3d047
fix(lowest_tpm_rpm_v2.py): use a combined tpm+rpm query in async get cache, to reduce redis client calls in high traffic
2024-04-20 16:13:11 -07:00
Krrish Dholakia
01a1a8f731
fix(caching.py): dual cache async_batch_get_cache fix + testing
...
this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)
2024-04-19 15:03:25 -07:00
Krrish Dholakia
3b9e2a58e2
fix(lowest_tpm_rpm_v2.py): ensure backwards compatibility for python 3.8
2024-04-18 21:42:35 -07:00
Krrish Dholakia
81573b2dd9
fix(test_lowest_tpm_rpm_routing_v2.py): unit testing for usage-based-routing-v2
2024-04-18 21:38:00 -07:00
Krrish Dholakia
a05f148c17
fix(tpm_rpm_routing_v2.py): fix tpm rpm routing
2024-04-18 20:01:22 -07:00
Krrish Dholakia
8179596ebc
fix(lowest_tpm_rpm_v2.py): don't fail calls if redis fails to connect
2024-04-12 19:36:59 -07:00