Krrish Dholakia
|
ea1574c160
|
test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy
|
2024-04-12 18:56:13 -07:00 |
|
Krrish Dholakia
|
c03b0bbb24
|
fix(router.py): support pre_call_rpm_check for lowest_tpm_rpm_v2 routing
have routing strategies expose an ‘update rpm’ function; for checking + updating rpm pre call
|
2024-04-12 18:25:14 -07:00 |
|
Krrish Dholakia
|
37ac17aebd
|
fix(router.py): fix datetime object
|
2024-04-10 17:55:24 -07:00 |
|
Krrish Dholakia
|
2531701a2a
|
fix(router.py): make get_cooldown_deployment logic async
|
2024-04-10 16:57:01 -07:00 |
|
Krrish Dholakia
|
a47a719caa
|
fix(router.py): generate consistent model id's
having the same id for a deployment, lets redis usage caching work across multiple instances
|
2024-04-10 15:23:57 -07:00 |
|
Krrish Dholakia
|
180cf9bd5c
|
feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis
makes routing work across multiple instances
|
2024-04-10 14:56:23 -07:00 |
|
Krish Dholakia
|
9119858f4a
|
Merge pull request #2798 from CLARKBENHAM/main
add test for rate limits - Router isn't coroutine safe
|
2024-04-06 08:47:40 -07:00 |
|
Krrish Dholakia
|
2236f283fe
|
fix(router.py): handle id being passed in as int
|
2024-04-04 14:23:10 -07:00 |
|
CLARKBENHAM
|
18749e7051
|
undo black formating
|
2024-04-02 19:53:48 -07:00 |
|
CLARKBENHAM
|
164898a213
|
fix lowest latency tests
|
2024-04-02 19:10:40 -07:00 |
|
Krrish Dholakia
|
47ca223d0b
|
fix(lowest_tpm_rpm_routing.py): fix base case where max tpm/rpm is 0
|
2024-03-28 14:51:31 -07:00 |
|
Ishaan Jaff
|
5d121a9f3c
|
(fix) stop using f strings with logger
|
2024-03-25 10:47:18 -07:00 |
|
Krrish Dholakia
|
2f1899284c
|
fix(router.py): add more debug logs
|
2024-03-11 12:34:35 -07:00 |
|
ishaan-jaff
|
e23c68b15a
|
(fix) failing usage based routing test
|
2024-03-11 12:14:13 -07:00 |
|
Krrish Dholakia
|
0273410310
|
fix(lowest_tpm_rpm.py): handle async scenarios
|
2024-03-06 21:38:30 -08:00 |
|
Krrish Dholakia
|
fccacaf91b
|
fix(lowest_latency.py): consistent time calc
|
2024-02-14 15:03:35 -08:00 |
|
stephenleo
|
37c83e0023
|
fix latency calc (lower better)
|
2024-02-11 17:06:46 +08:00 |
|
ishaan-jaff
|
d0442ae0f2
|
(feat) router - usage based routing - consider input_tokens
|
2024-01-19 13:59:49 -08:00 |
|
Krrish Dholakia
|
31917176ff
|
fix(lowest_latency.py): fix merge issue
|
2024-01-10 21:37:46 +05:30 |
|
Krish Dholakia
|
298e937586
|
Merge branch 'main' into litellm_latency_routing_updates
|
2024-01-10 21:33:54 +05:30 |
|
Krrish Dholakia
|
fe632c08a4
|
fix(router.py): allow user to control the latency routing time window
|
2024-01-10 20:56:52 +05:30 |
|
Krrish Dholakia
|
bb04a340a5
|
fix(lowest_latency.py): add back tpm/rpm checks, configurable time window
|
2024-01-10 20:52:01 +05:30 |
|
Krrish Dholakia
|
a35f4272f4
|
refactor(lowest_latency.py): fix linting error
|
2024-01-09 09:51:43 +05:30 |
|
Krrish Dholakia
|
a5147f9e06
|
feat(lowest_latency.py): support expanded time window for latency based routing
uses a 1hr avg. of latency for deployments, to determine which to route to
https://github.com/BerriAI/litellm/issues/1361
|
2024-01-09 09:38:04 +05:30 |
|
Krrish Dholakia
|
2ab31bcaf8
|
fix(lowest_tpm_rpm.py): handle null case for text/message input
|
2024-01-02 12:24:29 +05:30 |
|
Krrish Dholakia
|
a37a18ca80
|
feat(router.py): add support for retry/fallbacks for async embedding calls
|
2024-01-02 11:54:28 +05:30 |
|
Krrish Dholakia
|
dff4c172d0
|
refactor(test_router_caching.py): move tpm/rpm routing tests to separate file
|
2024-01-02 11:10:11 +05:30 |
|
Krrish Dholakia
|
a83e2e07cf
|
fix(router.py): correctly raise no model available error
https://github.com/BerriAI/litellm/issues/1289
|
2024-01-01 21:22:42 +05:30 |
|
Krrish Dholakia
|
027218c3f0
|
test(test_lowest_latency_routing.py): add more tests
|
2023-12-30 17:41:42 +05:30 |
|
Krrish Dholakia
|
f2d0d5584a
|
fix(router.py): fix latency based routing
|
2023-12-30 17:25:40 +05:30 |
|
Krrish Dholakia
|
b66cf0aa43
|
fix(lowest_tpm_rpm_routing.py): broaden scope of get deployment logic
|
2023-12-30 13:27:50 +05:30 |
|
Krrish Dholakia
|
38f55249e1
|
fix(router.py): support retry and fallbacks for atext_completion
|
2023-12-30 11:19:32 +05:30 |
|
Krrish Dholakia
|
a34de56289
|
fix(router.py): handle initial scenario for tpm/rpm routing
|
2023-12-30 07:28:45 +05:30 |
|
Krrish Dholakia
|
2fc264ca04
|
fix(router.py): fix int logic
|
2023-12-29 20:41:56 +05:30 |
|
Krrish Dholakia
|
cf91e49c87
|
refactor(lowest_tpm_rpm.py): move tpm/rpm based routing to a separate file for better testing
|
2023-12-29 18:33:43 +05:30 |
|
Krrish Dholakia
|
54d7bc2cc3
|
test(test_least_busy_router.py): add better testing for least busy routing
|
2023-12-29 17:16:00 +05:30 |
|
Krrish Dholakia
|
678bbfa9be
|
fix(least_busy.py): support consistent use of model id instead of deployment name
|
2023-12-29 17:05:26 +05:30 |
|
Krrish Dholakia
|
4905929de3
|
refactor: add black formatting
|
2023-12-25 14:11:20 +05:30 |
|
Krrish Dholakia
|
4bf875d3ed
|
fix(router.py): fix least-busy routing
|
2023-12-08 20:29:49 -08:00 |
|