litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-26 19:24:27 +00:00

Author	SHA1	Message	Date
Krrish Dholakia	f19d7327ca	fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event	2024-05-21 18:42:15 -07:00
Krrish Dholakia	2b3da449c8	feat(lowest_latency.py): route by time to first token, for streaming requests (if available) Closes https://github.com/BerriAI/litellm/issues/3574	2024-05-21 13:08:17 -07:00
Krrish Dholakia	f5d73547c7	fix(lowest_latency.py): allow ttl to be a float	2024-05-15 09:59:21 -07:00
Rahul Kataria	d57ecf3371	Remove duplicate code in router_strategy	2024-05-12 18:05:57 +05:30
Krrish Dholakia	4a3b084961	feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls	2024-05-11 13:43:08 -07:00
Krrish Dholakia	6575143460	feat(proxy_server.py): return litellm version in response headers	2024-05-08 16:00:08 -07:00
Krrish Dholakia	0b72904608	fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)	2024-05-03 09:00:32 -07:00
Krrish Dholakia	90cdfef1c1	fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.	2024-04-30 12:00:26 -07:00
Ishaan Jaff	4cb4a7f06d	fix - lowest latency routing	2024-04-29 16:02:57 -07:00
Ishaan Jaff	3b0aa05378	fix lowest latency - routing	2024-04-29 15:51:52 -07:00
Ishaan Jaff	bf92a0b31c	fix debugging lowest latency router	2024-04-25 19:34:28 -07:00
Ishaan Jaff	737af2b458	fix better debugging for latency	2024-04-25 11:35:08 -07:00
Ishaan Jaff	787735bb5a	fix	2024-04-25 11:25:03 -07:00
Ishaan Jaff	984259d420	temp - show better debug logs for lowest latency	2024-04-25 11:22:52 -07:00
Ishaan Jaff	92f21cba30	fix - increase default penalty for lowest latency	2024-04-25 07:54:25 -07:00
Ishaan Jaff	212369498e	fix - set latency stats in kwargs	2024-04-24 20:13:45 -07:00
Ishaan Jaff	bf6abed808	feat - penalize timeout errors	2024-04-24 16:35:00 -07:00
Krish Dholakia	9119858f4a	Merge pull request #2798 from CLARKBENHAM/main add test for rate limits - Router isn't coroutine safe	2024-04-06 08:47:40 -07:00
Krrish Dholakia	2236f283fe	fix(router.py): handle id being passed in as int	2024-04-04 14:23:10 -07:00
CLARKBENHAM	18749e7051	undo black formating	2024-04-02 19:53:48 -07:00
CLARKBENHAM	164898a213	fix lowest latency tests	2024-04-02 19:10:40 -07:00
Krrish Dholakia	fccacaf91b	fix(lowest_latency.py): consistent time calc	2024-02-14 15:03:35 -08:00
stephenleo	37c83e0023	fix latency calc (lower better)	2024-02-11 17:06:46 +08:00
Krrish Dholakia	31917176ff	fix(lowest_latency.py): fix merge issue	2024-01-10 21:37:46 +05:30
Krish Dholakia	298e937586	Merge branch 'main' into litellm_latency_routing_updates	2024-01-10 21:33:54 +05:30
Krrish Dholakia	fe632c08a4	fix(router.py): allow user to control the latency routing time window	2024-01-10 20:56:52 +05:30
Krrish Dholakia	bb04a340a5	fix(lowest_latency.py): add back tpm/rpm checks, configurable time window	2024-01-10 20:52:01 +05:30
Krrish Dholakia	a35f4272f4	refactor(lowest_latency.py): fix linting error	2024-01-09 09:51:43 +05:30
Krrish Dholakia	a5147f9e06	feat(lowest_latency.py): support expanded time window for latency based routing uses a 1hr avg. of latency for deployments, to determine which to route to https://github.com/BerriAI/litellm/issues/1361	2024-01-09 09:38:04 +05:30
Krrish Dholakia	027218c3f0	test(test_lowest_latency_routing.py): add more tests	2023-12-30 17:41:42 +05:30
Krrish Dholakia	f2d0d5584a	fix(router.py): fix latency based routing	2023-12-30 17:25:40 +05:30

31 commits