Commit graph

44 commits

Author SHA1 Message Date
Krish Dholakia
2ec1ea9c31 Revert "Fix latency redis" 2025-03-19 18:11:22 -07:00
Emerson Gomes
fc31e20e04 Handle empty valid_deployments in LowestLatencyLoggingHandler 2025-03-19 19:56:57 -05:00
Emerson Gomes
8a3dba52ad Fix TTFT prioritization for streaming in LowestLatencyLoggingHandler 2025-03-18 14:58:55 -05:00
Emerson Gomes
6b1ecf196d fix redis serialization issue with Redis + lowest latency strategy 2025-03-17 19:19:20 -05:00
Ishaan Jaff
62a1cdec47 (code quality) run ruff rule to ban unused imports (#7313)
* remove unused imports

* fix AmazonConverseConfig

* fix test

* fix import

* ruff check fixes

* test fixes

* fix testing

* fix imports
2024-12-19 12:33:42 -08:00
Ishaan Jaff
6a9225fac2 (Refactor) Code Quality improvement - stop redefining LiteLLMBase (#7147)
* fix stop redefining  LiteLLMBase

* use better name for base pydantic obj
2024-12-10 15:49:01 -08:00
Ishaan Jaff
5986f7457e (router_strategy/) ensure all async functions use async cache methods (#6489)
* fix router strat

* use async set / get cache in router_strategy

* add coverage for router strategy

* fix imports

* fix batch_get_cache

* use async methods for least busy

* fix least busy use async methods

* fix test_dual_cache_increment

* test async_get_available_deployment when routing_strategy="least-busy"
2024-10-29 21:07:17 +05:30
Krish Dholakia
e712a2090b redis otel tracing + async support for latency routing (#6452)
* docs(exception_mapping.md): add missing exception types

Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183

* fix(main.py): register custom model pricing with specific key

Ensure custom model pricing is registered to the specific model+provider key combination

* test: make testing more robust for custom pricing

* fix(redis_cache.py): instrument otel logging for sync redis calls

ensures complete coverage for all redis cache calls

* refactor: pass parent_otel_span for redis caching calls in router

allows for more observability into what calls are causing latency issues

* test: update tests with new params

* refactor: ensure e2e otel tracing for router

* refactor(router.py): add more otel tracing acrosss router

catch all latency issues for router requests

* fix: fix linting error

* fix(router.py): fix linting error

* fix: fix test

* test: fix tests

* fix(dual_cache.py): pass ttl to redis cache

* fix: fix param
2024-10-28 21:52:12 -07:00
Ishaan Jaff
0c5a47c404 (code quality) add ruff check PLR0915 for too-many-statements (#6309)
* ruff add PLR0915

* add noqa for PLR0915

* fix noqa

* add # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* add # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915

* # noqa: PLR0915
2024-10-18 15:36:49 +05:30
Ishaan Jaff
ba56e37244 (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208)
* use folder for caching

* fix importing caching

* fix clickhouse pyright

* fix linting

* fix correctly pass kwargs and args

* fix test case for embedding

* fix linting

* fix embedding caching logic

* fix refactor handle utils.py

* fix test_embedding_caching_azure_individual_items_reordered
2024-10-14 16:34:01 +05:30
Krish Dholakia
94a05ca5d0 Litellm ruff linting enforcement (#5992)
* ci(config.yml): add a 'check_code_quality' step

Addresses https://github.com/BerriAI/litellm/issues/5991

* ci(config.yml): check why circle ci doesn't pick up this test

* ci(config.yml): fix to run 'check_code_quality' tests

* fix(__init__.py): fix unprotected import

* fix(__init__.py): don't remove unused imports

* build(ruff.toml): update ruff.toml to ignore unused imports

* fix: fix: ruff + pyright - fix linting + type-checking errors

* fix: fix linting errors

* fix(lago.py): fix module init error

* fix: fix linting errors

* ci(config.yml): cd into correct dir for checks

* fix(proxy_server.py): fix linting error

* fix(utils.py): fix bare except

causes ruff linting errors

* fix: ruff - fix remaining linting errors

* fix(clickhouse.py): use standard logging object

* fix(__init__.py): fix unprotected import

* fix: ruff - fix linting errors

* fix: fix linting errors

* ci(config.yml): cleanup code qa step (formatting handled in local_testing)

* fix(_health_endpoints.py): fix ruff linting errors

* ci(config.yml): just use ruff in check_code_quality pipeline for now

* build(custom_guardrail.py): include missing file

* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krrish Dholakia
2874b94fb1 refactor: replace .error() with .exception() logging for better debugging on sentry 2024-08-16 09:22:47 -07:00
Krrish Dholakia
e391e30285 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
Krrish Dholakia
bcc07afd04 fix(lowest_latency.py): set default none value for time_to_first_token in sync log success event 2024-05-21 18:42:15 -07:00
Krrish Dholakia
f007bf7e21 feat(lowest_latency.py): route by time to first token, for streaming requests (if available)
Closes https://github.com/BerriAI/litellm/issues/3574
2024-05-21 13:08:17 -07:00
Krrish Dholakia
84db63e3dd fix(lowest_latency.py): allow ttl to be a float 2024-05-15 09:59:21 -07:00
Rahul Kataria
be4450106d Remove duplicate code in router_strategy 2024-05-12 18:05:57 +05:30
Krrish Dholakia
926b86af87 feat(bedrock_httpx.py): moves to using httpx client for bedrock cohere calls 2024-05-11 13:43:08 -07:00
Krrish Dholakia
5f93cae3ff feat(proxy_server.py): return litellm version in response headers 2024-05-08 16:00:08 -07:00
Krrish Dholakia
cb88ed4df8 fix(lowest_latency.py): fix the size of the latency list to 10 by default (can be modified) 2024-05-03 09:00:32 -07:00
Krrish Dholakia
7ae28bfcc9 fix(lowest_latency.py): allow setting a buffer for getting values within a certain latency threshold
if an endpoint is slow - it's completion time might not be updated till the call is completed. This prevents us from overloading those endpoints, in a simple way.
2024-04-30 12:00:26 -07:00
Ishaan Jaff
d4a0530d02 fix - lowest latency routing 2024-04-29 16:02:57 -07:00
Ishaan Jaff
2a49580b5b fix lowest latency - routing 2024-04-29 15:51:52 -07:00
Ishaan Jaff
7306072d33 fix debugging lowest latency router 2024-04-25 19:34:28 -07:00
Ishaan Jaff
3ab5e687f6 fix better debugging for latency 2024-04-25 11:35:08 -07:00
Ishaan Jaff
4931514330 fix 2024-04-25 11:25:03 -07:00
Ishaan Jaff
3b9d6dfc47 temp - show better debug logs for lowest latency 2024-04-25 11:22:52 -07:00
Ishaan Jaff
a26ecbad97 fix - increase default penalty for lowest latency 2024-04-25 07:54:25 -07:00
Ishaan Jaff
5dae1cf303 fix - set latency stats in kwargs 2024-04-24 20:13:45 -07:00
Ishaan Jaff
654c736d29 feat - penalize timeout errors 2024-04-24 16:35:00 -07:00
Krish Dholakia
b8d285d120 Merge pull request #2798 from CLARKBENHAM/main
add test for rate limits - Router isn't coroutine safe
2024-04-06 08:47:40 -07:00
Krrish Dholakia
48a5948081 fix(router.py): handle id being passed in as int 2024-04-04 14:23:10 -07:00
CLARKBENHAM
1c93ebf05a undo black formating 2024-04-02 19:53:48 -07:00
CLARKBENHAM
2dd0c32612 fix lowest latency tests 2024-04-02 19:10:40 -07:00
Krrish Dholakia
afaee375e6 fix(lowest_latency.py): consistent time calc 2024-02-14 15:03:35 -08:00
stephenleo
a6f24acb8b fix latency calc (lower better) 2024-02-11 17:06:46 +08:00
Krrish Dholakia
ae9b8f50e0 fix(lowest_latency.py): fix merge issue 2024-01-10 21:37:46 +05:30
Krish Dholakia
e635ca2151 Merge branch 'main' into litellm_latency_routing_updates 2024-01-10 21:33:54 +05:30
Krrish Dholakia
7df19b2f7c fix(router.py): allow user to control the latency routing time window 2024-01-10 20:56:52 +05:30
Krrish Dholakia
f288b12411 fix(lowest_latency.py): add back tpm/rpm checks, configurable time window 2024-01-10 20:52:01 +05:30
Krrish Dholakia
b5ec5eb10b refactor(lowest_latency.py): fix linting error 2024-01-09 09:51:43 +05:30
Krrish Dholakia
fb9ebfbedd feat(lowest_latency.py): support expanded time window for latency based routing
uses a 1hr avg. of latency for deployments, to determine which to route to

https://github.com/BerriAI/litellm/issues/1361
2024-01-09 09:38:04 +05:30
Krrish Dholakia
d3dee9b20c test(test_lowest_latency_routing.py): add more tests 2023-12-30 17:41:42 +05:30
Krrish Dholakia
25ee96271e fix(router.py): fix latency based routing 2023-12-30 17:25:40 +05:30