Krrish Dholakia
3560f0ef2c
refactor: move all testing to top-level of repo
...
Closes https://github.com/BerriAI/litellm/issues/486
2024-09-28 21:08:14 -07:00
Krrish Dholakia
94db4ec830
test: mark flaky tests
2024-08-30 07:53:04 -07:00
Ishaan Jaff
bcf7f3e437
handle flaky pytests
2024-08-27 22:44:49 -07:00
Krrish Dholakia
cd73ea245a
test(test_lowest_latency_routing.py): use mock responses
2024-06-20 21:05:23 -07:00
Krrish Dholakia
6024f9e45e
test(test_lowest_latency_routing.py): add time.sleep to reduce test flakiness
2024-06-07 11:28:33 -07:00
Krrish Dholakia
2b3da449c8
feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
...
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Ishaan Jaff
fdf7a4d8c8
fix - test_lowest_latency_routing_first_pick
2024-05-15 14:24:13 -07:00
Krrish Dholakia
0b72904608
fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)
2024-05-03 09:00:32 -07:00
Krrish Dholakia
90cdfef1c1
fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold
...
if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.
2024-04-30 12:00:26 -07:00
Ishaan Jaff
5247d7b6a5
test - lowest latency router
2024-04-29 15:51:01 -07:00
Ishaan Jaff
2e6fc91a75
test - lowest latency logger
2024-04-24 16:35:43 -07:00
Krish Dholakia
9119858f4a
Merge pull request #2798 from CLARKBENHAM/main
...
add test for rate limits - Router isn't coroutine safe
2024-04-06 08:47:40 -07:00
Krrish Dholakia
2236f283fe
fix(router.py): handle id being passed in as int
2024-04-04 14:23:10 -07:00
CLARKBENHAM
44cb0f352a
formating
2024-04-02 19:56:07 -07:00
CLARKBENHAM
164898a213
fix lowest latency tests
2024-04-02 19:10:40 -07:00
CLARKBENHAM
29573b0967
param both tests to include failure (also fix prev)
2024-04-02 18:53:42 -07:00
CLARKBENHAM
4f95966475
tests showing error
2024-04-02 18:45:05 -07:00
Krrish Dholakia
6a8d518e44
test(test_lowest_latency_routing.py): use the correct cache key
2024-01-10 22:15:01 +05:30
Krrish Dholakia
9a829ff956
refactor: cleanup duplicates
2024-01-10 21:42:20 +05:30
Krish Dholakia
298e937586
Merge branch 'main' into litellm_latency_routing_updates
2024-01-10 21:33:54 +05:30
Krrish Dholakia
fe632c08a4
fix(router.py): allow user to control the latency routing time window
2024-01-10 20:56:52 +05:30
Krrish Dholakia
bb04a340a5
fix(lowest_latency.py): add back tpm/rpm checks, configurable time window
2024-01-10 20:52:01 +05:30
Krrish Dholakia
a5147f9e06
feat(lowest_latency.py): support expanded time window for latency based routing
...
uses a 1hr avg. of latency for deployments, to determine which to route to
https://github.com/BerriAI/litellm/issues/1361
2024-01-09 09:38:04 +05:30
Krrish Dholakia
027218c3f0
test(test_lowest_latency_routing.py): add more tests
2023-12-30 17:41:42 +05:30
Krrish Dholakia
f2d0d5584a
fix(router.py): fix latency based routing
2023-12-30 17:25:40 +05:30