Commit graph

33 commits

Author SHA1 Message Date
Krrish Dholakia
61f4b71ef7 refactor: replace .error() with .exception() logging for better debugging on sentry 2024-08-16 09:22:47 -07:00
Krrish Dholakia
6cca5612d2 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
f19d7327ca fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event 2024-05-21 18:42:15 -07:00
Krrish Dholakia
2b3da449c8 feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krrish Dholakia
f5d73547c7 fix(lowest_latency.py): allow ttl to be a float 2024-05-15 09:59:21 -07:00
Rahul Kataria
d57ecf3371 Remove duplicate code in router_strategy 2024-05-12 18:05:57 +05:30
Krrish Dholakia
4a3b084961 feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls 2024-05-11 13:43:08 -07:00
Krrish Dholakia
6575143460 feat(proxy_server.py): return litellm version in response headers 2024-05-08 16:00:08 -07:00
Krrish Dholakia
0b72904608 fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified) 2024-05-03 09:00:32 -07:00
Krrish Dholakia
90cdfef1c1 fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold
if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.
2024-04-30 12:00:26 -07:00
Ishaan Jaff
4cb4a7f06d fix - lowest latency routing 2024-04-29 16:02:57 -07:00
Ishaan Jaff
3b0aa05378 fix lowest latency - routing 2024-04-29 15:51:52 -07:00
Ishaan Jaff
bf92a0b31c fix debugging lowest latency router 2024-04-25 19:34:28 -07:00
Ishaan Jaff
737af2b458 fix better debugging for latency 2024-04-25 11:35:08 -07:00
Ishaan Jaff
787735bb5a fix 2024-04-25 11:25:03 -07:00
Ishaan Jaff
984259d420 temp - show better debug logs for lowest latency 2024-04-25 11:22:52 -07:00
Ishaan Jaff
92f21cba30 fix - increase default penalty for lowest latency 2024-04-25 07:54:25 -07:00
Ishaan Jaff
212369498e fix - set latency stats in kwargs 2024-04-24 20:13:45 -07:00
Ishaan Jaff
bf6abed808 feat - penalize timeout errors 2024-04-24 16:35:00 -07:00
Krish Dholakia
9119858f4a
Merge pull request #2798 from CLARKBENHAM/main
add test for rate limits - Router isn't coroutine safe
2024-04-06 08:47:40 -07:00
Krrish Dholakia
2236f283fe fix(router.py): handle id being passed in as int 2024-04-04 14:23:10 -07:00
CLARKBENHAM
18749e7051 undo black formating 2024-04-02 19:53:48 -07:00
CLARKBENHAM
164898a213 fix lowest latency tests 2024-04-02 19:10:40 -07:00
Krrish Dholakia
fccacaf91b fix(lowest_latency.py): consistent time calc 2024-02-14 15:03:35 -08:00
stephenleo
37c83e0023 fix latency calc (lower better) 2024-02-11 17:06:46 +08:00
Krrish Dholakia
31917176ff fix(lowest_latency.py): fix merge issue 2024-01-10 21:37:46 +05:30
Krish Dholakia
298e937586
Merge branch 'main' into litellm_latency_routing_updates 2024-01-10 21:33:54 +05:30
Krrish Dholakia
fe632c08a4 fix(router.py): allow user to control the latency routing time window 2024-01-10 20:56:52 +05:30
Krrish Dholakia
bb04a340a5 fix(lowest_latency.py): add back tpm/rpm checks, configurable time window 2024-01-10 20:52:01 +05:30
Krrish Dholakia
a35f4272f4 refactor(lowest_latency.py): fix linting error 2024-01-09 09:51:43 +05:30
Krrish Dholakia
a5147f9e06 feat(lowest_latency.py): support expanded time window for latency based routing
uses a 1hr avg. of latency for deployments, to determine which to route to

https://github.com/BerriAI/litellm/issues/1361
2024-01-09 09:38:04 +05:30
Krrish Dholakia
027218c3f0 test(test_lowest_latency_routing.py): add more tests 2023-12-30 17:41:42 +05:30
Krrish Dholakia
f2d0d5584a fix(router.py): fix latency based routing 2023-12-30 17:25:40 +05:30