Commit graph

23 commits

Author SHA1 Message Date
Krrish Dholakia
6cca5612d2 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
sumanth
71e0294485 addressed comments 2024-05-14 10:05:19 +05:30
SUMANTH
978672a56d
Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache 2024-05-14 09:08:01 +05:30
Krrish Dholakia
4a3b084961 feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls 2024-05-11 13:43:08 -07:00
sumanth
3bc6b5d119 usage-based-routing-ttl-on-cache 2024-05-03 10:50:45 +05:30
Krrish Dholakia
91971fa9e0 feat(router.py): add 'get_model_info' helper function to get the model info for a specific model, based on it's id 2024-05-02 17:53:09 -07:00
Krrish Dholakia
020b175ef4 fix(lowest_tpm_rpm_v2.py): skip if item_tpm is None 2024-04-29 21:34:25 -07:00
Krish Dholakia
32534b5e91
Merge pull request #3358 from sumanth13131/usage-based-routing-RPM-fix
usage based routing RPM count fix
2024-04-29 16:45:25 -07:00
Krrish Dholakia
a978f2d881 fix(lowest_tpm_rpm_v2.py): shuffle deployments with same tpm values 2024-04-29 15:23:47 -07:00
Krrish Dholakia
f10a066d36 fix(lowest_tpm_rpm_v2.py): add more detail to 'No deployments available' error message 2024-04-29 15:04:37 -07:00
sumanth
89e655c79e usage based routing RPM count fix 2024-04-30 00:29:38 +05:30
Krrish Dholakia
9379e3d047 fix(lowest_tpm_rpm_v2.py): use a combined tpm+rpm query in async get cache, to reduce redis client calls in high traffic 2024-04-20 16:13:11 -07:00
Krrish Dholakia
01a1a8f731 fix(caching.py): dual cache async_batch_get_cache fix + testing
this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)
2024-04-19 15:03:25 -07:00
Krrish Dholakia
3b9e2a58e2 fix(lowest_tpm_rpm_v2.py): ensure backwards compatibility for python 3.8 2024-04-18 21:42:35 -07:00
Krrish Dholakia
81573b2dd9 fix(test_lowest_tpm_rpm_routing_v2.py): unit testing for usage-based-routing-v2 2024-04-18 21:38:00 -07:00
Krrish Dholakia
a05f148c17 fix(tpm_rpm_routing_v2.py): fix tpm rpm routing 2024-04-18 20:01:22 -07:00
Krrish Dholakia
8179596ebc fix(lowest_tpm_rpm_v2.py): don't fail calls if redis fails to connect 2024-04-12 19:36:59 -07:00
Krrish Dholakia
ea1574c160 test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy 2024-04-12 18:56:13 -07:00
Krrish Dholakia
c03b0bbb24 fix(router.py): support pre_call_rpm_check for lowest_tpm_rpm_v2 routing
have routing strategies expose an ‘update rpm’ function; for checking + updating rpm pre call
2024-04-12 18:25:14 -07:00
Krrish Dholakia
37ac17aebd fix(router.py): fix datetime object 2024-04-10 17:55:24 -07:00
Krrish Dholakia
2531701a2a fix(router.py): make get_cooldown_deployment logic async 2024-04-10 16:57:01 -07:00
Krrish Dholakia
a47a719caa fix(router.py): generate consistent model id's
having the same id for a deployment, lets redis usage caching work across multiple instances
2024-04-10 15:23:57 -07:00
Krrish Dholakia
180cf9bd5c feat(lowest_tpm_rpm_v2.py): move to using redis.incr and redis.mget for getting model usage from redis
makes routing work across multiple instances
2024-04-10 14:56:23 -07:00