Krrish Dholakia
|
afaee375e6
|
fix(lowest_latency.py): consistent time calc
|
2024-02-14 15:03:35 -08:00 |
|
stephenleo
|
a6f24acb8b
|
fix latency calc (lower better)
|
2024-02-11 17:06:46 +08:00 |
|
ishaan-jaff
|
a8a47b897d
|
(feat) router - usage based routing - consider input_tokens
|
2024-01-19 13:59:49 -08:00 |
|
Krrish Dholakia
|
ae9b8f50e0
|
fix(lowest_latency.py): fix merge issue
|
2024-01-10 21:37:46 +05:30 |
|
Krish Dholakia
|
e635ca2151
|
Merge branch 'main' into litellm_latency_routing_updates
|
2024-01-10 21:33:54 +05:30 |
|
Krrish Dholakia
|
7df19b2f7c
|
fix(router.py): allow user to control the latency routing time window
|
2024-01-10 20:56:52 +05:30 |
|
Krrish Dholakia
|
f288b12411
|
fix(lowest_latency.py): add back tpm/rpm checks, configurable time window
|
2024-01-10 20:52:01 +05:30 |
|
Krrish Dholakia
|
b5ec5eb10b
|
refactor(lowest_latency.py): fix linting error
|
2024-01-09 09:51:43 +05:30 |
|
Krrish Dholakia
|
fb9ebfbedd
|
feat(lowest_latency.py): support expanded time window for latency based routing
uses a 1hr avg. of latency for deployments, to determine which to route to
https://github.com/BerriAI/litellm/issues/1361
|
2024-01-09 09:38:04 +05:30 |
|
Krrish Dholakia
|
de1d51e3de
|
fix(lowest_tpm_rpm.py): handle null case for text/message input
|
2024-01-02 12:24:29 +05:30 |
|
Krrish Dholakia
|
01c042fdc6
|
feat(router.py): add support for retry/fallbacks for async embedding calls
|
2024-01-02 11:54:28 +05:30 |
|
Krrish Dholakia
|
4f988058a1
|
refactor(test_router_caching.py): move tpm/rpm routing tests to separate file
|
2024-01-02 11:10:11 +05:30 |
|
Krrish Dholakia
|
4eae0c9a0d
|
fix(router.py): correctly raise no model available error
https://github.com/BerriAI/litellm/issues/1289
|
2024-01-01 21:22:42 +05:30 |
|
Krrish Dholakia
|
d3dee9b20c
|
test(test_lowest_latency_routing.py): add more tests
|
2023-12-30 17:41:42 +05:30 |
|
Krrish Dholakia
|
25ee96271e
|
fix(router.py): fix latency based routing
|
2023-12-30 17:25:40 +05:30 |
|
Krrish Dholakia
|
402d59d0ff
|
fix(lowest_tpm_rpm_routing.py): broaden scope of get deployment logic
|
2023-12-30 13:27:50 +05:30 |
|
Krrish Dholakia
|
e1925d0e29
|
fix(router.py): support retry and fallbacks for atext_completion
|
2023-12-30 11:19:32 +05:30 |
|
Krrish Dholakia
|
a11940f4eb
|
fix(router.py): handle initial scenario for tpm/rpm routing
|
2023-12-30 07:28:45 +05:30 |
|
Krrish Dholakia
|
1933d44cbd
|
fix(router.py): fix int logic
|
2023-12-29 20:41:56 +05:30 |
|
Krrish Dholakia
|
a30f00276b
|
refactor(lowest_tpm_rpm.py): move tpm/rpm based routing to a separate file for better testing
|
2023-12-29 18:33:43 +05:30 |
|
Krrish Dholakia
|
3fa1bb9f08
|
test(test_least_busy_router.py): add better testing for least busy routing
|
2023-12-29 17:16:00 +05:30 |
|
Krrish Dholakia
|
ffe2350428
|
fix(least_busy.py): support consistent use of model id instead of deployment name
|
2023-12-29 17:05:26 +05:30 |
|
Krrish Dholakia
|
79978c44ba
|
refactor: add black formatting
|
2023-12-25 14:11:20 +05:30 |
|
Krrish Dholakia
|
a65c8919fc
|
fix(router.py): fix least-busy routing
|
2023-12-08 20:29:49 -08:00 |
|