litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-27 11:43:54 +00:00

Author	SHA1	Message	Date
Krrish Dholakia	f5d73547c7	fix(lowest_latency.py): allow ttl to be a float	2024-05-15 09:59:21 -07:00
Rahul Kataria	d57ecf3371	Remove duplicate code in router_strategy	2024-05-12 18:05:57 +05:30
Krrish Dholakia	4a3b084961	feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls	2024-05-11 13:43:08 -07:00
Krrish Dholakia	6575143460	feat(proxy_server.py): return litellm version in response headers	2024-05-08 16:00:08 -07:00
Ishaan Jaff	6983e7a84f	feat - make lowest_cost pure async	2024-05-07 13:51:50 -07:00
Ishaan Jaff	486cbb990c	fix allow user to pass input_cost and output_cost	2024-05-07 13:08:16 -07:00
Ishaan Jaff	71a92b4fef	test - lowest cost router	2024-05-07 13:04:12 -07:00
Ishaan Jaff	690d7b10a6	fix - default value for cost	2024-05-07 12:51:52 -07:00
Ishaan Jaff	245960708d	fix - lowest cost routing	2024-05-07 12:49:20 -07:00
Ishaan Jaff	31ac43bfdc	feat - add lowst cost router	2024-05-07 12:12:09 -07:00
Krrish Dholakia	0b72904608	fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified)	2024-05-03 09:00:32 -07:00
Krrish Dholakia	91971fa9e0	feat(router.py): add 'get_model_info' helper function to get the model info for a specific model, based on it's id	2024-05-02 17:53:09 -07:00
Krrish Dholakia	90cdfef1c1	fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.	2024-04-30 12:00:26 -07:00
Krrish Dholakia	020b175ef4	fix(lowest_tpm_rpm_v2.py): skip if item_tpm is None	2024-04-29 21:34:25 -07:00
Krish Dholakia	32534b5e91	Merge pull request #3358 from sumanth13131/usage-based-routing-RPM-fix usage based routing RPM count fix	2024-04-29 16:45:25 -07:00
Ishaan Jaff	d58dd2cbeb	Merge pull request #3360 from BerriAI/litellm_random_pick_lowest_latency [Fix] Lowest Latency routing - random pick deployments when all latencies=0	2024-04-29 16:31:32 -07:00
Ishaan Jaff	4cb4a7f06d	fix - lowest latency routing	2024-04-29 16:02:57 -07:00
Ishaan Jaff	3b0aa05378	fix lowest latency - routing	2024-04-29 15:51:52 -07:00
Krrish Dholakia	a978f2d881	fix(lowest_tpm_rpm_v2.py): shuffle deployments with same tpm values	2024-04-29 15:23:47 -07:00
Krrish Dholakia	f10a066d36	fix(lowest_tpm_rpm_v2.py): add more detail to 'No deployments available' error message	2024-04-29 15:04:37 -07:00
sumanth	89e655c79e	usage based routing RPM count fix	2024-04-30 00:29:38 +05:30
Ishaan Jaff	bf92a0b31c	fix debugging lowest latency router	2024-04-25 19:34:28 -07:00
Ishaan Jaff	737af2b458	fix better debugging for latency	2024-04-25 11:35:08 -07:00
Ishaan Jaff	787735bb5a	fix	2024-04-25 11:25:03 -07:00
Ishaan Jaff	984259d420	temp - show better debug logs for lowest latency	2024-04-25 11:22:52 -07:00
Ishaan Jaff	92f21cba30	fix - increase default penalty for lowest latency	2024-04-25 07:54:25 -07:00
Ishaan Jaff	212369498e	fix - set latency stats in kwargs	2024-04-24 20:13:45 -07:00
Ishaan Jaff	bf6abed808	feat - penalize timeout errors	2024-04-24 16:35:00 -07:00
Krrish Dholakia	9379e3d047	fix(lowest_tpm_rpm_v2.py): use a combined tpm+rpm query in async get cache, to reduce redis client calls in high traffic	2024-04-20 16:13:11 -07:00
Krrish Dholakia	01a1a8f731	fix(caching.py): dual cache async_batch_get_cache fix + testing this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)	2024-04-19 15:03:25 -07:00
Krrish Dholakia	3b9e2a58e2	fix(lowest_tpm_rpm_v2.py): ensure backwards compatibility for python 3.8	2024-04-18 21:42:35 -07:00
Krrish Dholakia	81573b2dd9	fix(test_lowest_tpm_rpm_routing_v2.py): unit testing for usage-based-routing-v2	2024-04-18 21:38:00 -07:00
Krrish Dholakia	a05f148c17	fix(tpm_rpm_routing_v2.py): fix tpm rpm routing	2024-04-18 20:01:22 -07:00
Krrish Dholakia	8179596ebc	fix(lowest_tpm_rpm_v2.py): don't fail calls if redis fails to connect	2024-04-12 19:36:59 -07:00
Krrish Dholakia	ea1574c160	test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy	2024-04-12 18:56:13 -07:00
Krrish Dholakia	c03b0bbb24	fix(router.py): support pre_call_rpm_check for lowest_tpm_rpm_v2 routing have routing strategies expose an ‘update rpm’ function; for checking + updating rpm pre call	2024-04-12 18:25:14 -07:00
Krrish Dholakia	37ac17aebd	fix(router.py): fix datetime object	2024-04-10 17:55:24 -07:00
Krrish Dholakia	2531701a2a	fix(router.py): make get_cooldown_deployment logic async	2024-04-10 16:57:01 -07:00
Krrish Dholakia	a47a719caa	fix(router.py): generate consistent model id's having the same id for a deployment, lets redis usage caching work across multiple instances	2024-04-10 15:23:57 -07:00
Krrish Dholakia	180cf9bd5c	feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis makes routing work across multiple instances	2024-04-10 14:56:23 -07:00
Krish Dholakia	9119858f4a	Merge pull request #2798 from CLARKBENHAM/main add test for rate limits - Router isn't coroutine safe	2024-04-06 08:47:40 -07:00
Krrish Dholakia	2236f283fe	fix(router.py): handle id being passed in as int	2024-04-04 14:23:10 -07:00
CLARKBENHAM	18749e7051	undo black formating	2024-04-02 19:53:48 -07:00
CLARKBENHAM	164898a213	fix lowest latency tests	2024-04-02 19:10:40 -07:00
Krrish Dholakia	47ca223d0b	fix(lowest_tpm_rpm_routing.py): fix base case where max tpm/rpm is 0	2024-03-28 14:51:31 -07:00
Ishaan Jaff	5d121a9f3c	(fix) stop using f strings with logger	2024-03-25 10:47:18 -07:00
Krrish Dholakia	2f1899284c	fix(router.py): add more debug logs	2024-03-11 12:34:35 -07:00
ishaan-jaff	e23c68b15a	(fix) failing usage based routing test	2024-03-11 12:14:13 -07:00
Krrish Dholakia	0273410310	fix(lowest_tpm_rpm.py): handle async scenarios	2024-03-06 21:38:30 -08:00
Krrish Dholakia	fccacaf91b	fix(lowest_latency.py): consistent time calc	2024-02-14 15:03:35 -08:00

1 2

73 commits