Commit graph

207 commits

Author SHA1 Message Date
Ishaan Jaff
4d1b4beb3d
(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208)
* use folder for caching

* fix importing caching

* fix clickhouse pyright

* fix linting

* fix correctly pass kwargs and args

* fix test case for embedding

* fix linting

* fix embedding caching logic

* fix refactor handle utils.py

* fix test_embedding_caching_azure_individual_items_reordered
2024-10-14 16:34:01 +05:30
Krish Dholakia
2acb0c0675
Litellm Minor Fixes & Improvements (10/12/2024) (#6179)
* build(model_prices_and_context_window.json): add bedrock llama3.2 pricing

* build(model_prices_and_context_window.json): add bedrock cross region inference pricing

* Revert "(perf) move s3 logging to Batch logging + async [94% faster perf under 100 RPS on 1 litellm instance] (#6165)"

This reverts commit 2a5624af47.

* add azure/gpt-4o-2024-05-13 (#6174)

* LiteLLM Minor Fixes & Improvements (10/10/2024)  (#6158)

* refactor(vertex_ai_partner_models/anthropic): refactor anthropic to use partner model logic

* fix(vertex_ai/): support passing custom api base to partner models

Fixes https://github.com/BerriAI/litellm/issues/4317

* fix(proxy_server.py): Fix prometheus premium user check logic

* docs(prometheus.md): update quick start docs

* fix(custom_llm.py): support passing dynamic api key + api base

* fix(realtime_api/main.py): Add request/response logging for realtime api endpoints

Closes https://github.com/BerriAI/litellm/issues/6081

* feat(openai/realtime): add openai realtime api logging

Closes https://github.com/BerriAI/litellm/issues/6081

* fix(realtime_streaming.py): fix linting errors

* fix(realtime_streaming.py): fix linting errors

* fix: fix linting errors

* fix pattern match router

* Add literalai in the sidebar observability category (#6163)

* fix: add literalai in the sidebar

* fix: typo

* update (#6160)

* Feat: Add Langtrace integration (#5341)

* Feat: Add Langtrace integration

* add langtrace service name

* fix timestamps for traces

* add tests

* Discard Callback + use existing otel logger

* cleanup

* remove print statments

* remove callback

* add docs

* docs

* add logging docs

* format logging

* remove emoji and add litellm proxy example

* format logging

* format `logging.md`

* add langtrace docs to logging.md

* sync conflict

* docs fix

* (perf) move s3 logging to Batch logging + async [94% faster perf under 100 RPS on 1 litellm instance] (#6165)

* fix move s3 to use customLogger

* add basic s3 logging test

* add s3 to custom logger compatible

* use batch logger for s3

* s3 set flush interval and batch size

* fix s3 logging

* add notes on s3 logging

* fix s3 logging

* add basic s3 logging test

* fix s3 type errors

* add test for sync logging on s3

* fix: fix to debug log

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Willy Douhard <willy.douhard@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
Co-authored-by: Ali Waleed <ali@scale3labs.com>

* docs(custom_llm_server.md): update doc on passing custom params

* fix(pass_through_endpoints.py): don't require headers

Fixes https://github.com/BerriAI/litellm/issues/6128

* feat(utils.py): add support for caching rerank endpoints

Closes https://github.com/BerriAI/litellm/issues/6144

* feat(litellm_logging.py'): add response headers for failed requests

Closes https://github.com/BerriAI/litellm/issues/6159

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Willy Douhard <willy.douhard@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
Co-authored-by: Ali Waleed <ali@scale3labs.com>
2024-10-12 11:48:34 -07:00
Ishaan Jaff
2c8bba293f
(bug fix) TTL not being set for embedding caching requests (#6095)
* fix ttl for cache pipeline settings

* add test for caching

* add test for setting ttls on redis caching
2024-10-07 15:53:18 +05:30
Krish Dholakia
fac3b2ee42
Add pyright to ci/cd + Fix remaining type-checking errors (#6082)
* fix: fix type-checking errors

* fix: fix additional type-checking errors

* fix: additional type-checking error fixes

* fix: fix additional type-checking errors

* fix: additional type-check fixes

* fix: fix all type-checking errors + add pyright to ci/cd

* fix: fix incorrect import

* ci(config.yml): use mypy on ci/cd

* fix: fix type-checking errors in utils.py

* fix: fix all type-checking errors on main.py

* fix: fix mypy linting errors

* fix(anthropic/cost_calculator.py): fix linting errors

* fix: fix mypy linting errors

* fix: fix linting errors
2024-10-05 17:04:00 -04:00
Krish Dholakia
5c33d1c9af
Litellm Minor Fixes & Improvements (10/03/2024) (#6049)
* fix(proxy_server.py): remove spendlog fixes from proxy startup logic

Moves  https://github.com/BerriAI/litellm/pull/4794 to `/db_scripts` and cleans up some caching-related debug info (easier to trace debug logs)

* fix(langfuse_endpoints.py): Fixes https://github.com/BerriAI/litellm/issues/6041

* fix(azure.py): fix health checks for azure audio transcription models

Fixes https://github.com/BerriAI/litellm/issues/5999

* Feat: Add Literal AI Integration (#5653)

* feat: add Literal AI integration

* update readme

* Update README.md

* fix: address comments

* fix: remove literalai sdk

* fix: use HTTPHandler

* chore: add test

* fix: add asyncio lock

* fix(literal_ai.py): fix linting errors

* fix(literal_ai.py): fix linting errors

* refactor: cleanup

---------

Co-authored-by: Willy Douhard <willy.douhard@gmail.com>
2024-10-03 18:02:28 -04:00
Krish Dholakia
d57be47b0f
Litellm ruff linting enforcement (#5992)
* ci(config.yml): add a 'check_code_quality' step

Addresses https://github.com/BerriAI/litellm/issues/5991

* ci(config.yml): check why circle ci doesn't pick up this test

* ci(config.yml): fix to run 'check_code_quality' tests

* fix(__init__.py): fix unprotected import

* fix(__init__.py): don't remove unused imports

* build(ruff.toml): update ruff.toml to ignore unused imports

* fix: fix: ruff + pyright - fix linting + type-checking errors

* fix: fix linting errors

* fix(lago.py): fix module init error

* fix: fix linting errors

* ci(config.yml): cd into correct dir for checks

* fix(proxy_server.py): fix linting error

* fix(utils.py): fix bare except

causes ruff linting errors

* fix: ruff - fix remaining linting errors

* fix(clickhouse.py): use standard logging object

* fix(__init__.py): fix unprotected import

* fix: ruff - fix linting errors

* fix: fix linting errors

* ci(config.yml): cleanup code qa step (formatting handled in local_testing)

* fix(_health_endpoints.py): fix ruff linting errors

* ci(config.yml): just use ruff in check_code_quality pipeline for now

* build(custom_guardrail.py): include missing file

* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krrish Dholakia
575b7911b2 fix(caching.py): cleanup print_stack() 2024-09-28 21:08:15 -07:00
Krrish Dholakia
81d6c5e5a5 fix(router.py): skip setting model_group response headers for now
current implementation increases redis cache calls by 3x
2024-09-28 21:08:15 -07:00
Krrish Dholakia
efc06d4a03 fix(batch_redis_get.py): handle custom namespace
Fix https://github.com/BerriAI/litellm/issues/5917
2024-09-28 21:08:14 -07:00
Ishaan Jaff
7500855654
fix redis async_set_cache_pipeline when empty list passed to it (#5962) 2024-09-28 13:32:00 -07:00
Ishaan Jaff
f4613a100d [Perf Proxy] parallel request limiter - use one cache update call (#5932)
* fix parallel request limiter - use one cache update call

* ci/cd run again

* run ci/cd again

* use docker username password

* fix config.yml

* fix config

* fix config

* fix config.yml

* ci/cd run again

* use correct typing for batch set cache

* fix async_set_cache_pipeline

* fix only check user id tpm / rpm limits when limits set

* fix test_openai_azure_embedding_with_oidc_and_cf
2024-09-27 17:24:46 -07:00
Ishaan Jaff
7cbcf538c6
[Feat] Improve OTEL Tracking - Require all Redis Cache reads to be logged on OTEL (#5881)
* fix use previous internal usage caching logic

* fix test_dual_cache_uses_redis

* redis track event_metadata in service logging

* show otel error on _get_parent_otel_span_from_kwargs

* track parent otel span on internal usage cache

* update_request_status

* fix internal usage cache

* fix linting

* fix test internal usage cache

* fix linting error

* show event metadata in redis set

* fix test_get_team_redis

* fix test_get_team_redis

* test_proxy_logging_setup
2024-09-25 10:57:08 -07:00
Ishaan Jaff
2000e8cde9
[Perf Fix] Don't always read from Redis by Default (#5877)
* fix use previous internal usage caching logic

* fix test_dual_cache_uses_redis
2024-09-24 21:34:18 -07:00
Krish Dholakia
8039b95aaf
LiteLLM Minor Fixes & Improvements (09/21/2024) (#5819)
* fix(router.py): fix error message

* Litellm disable keys (#5814)

* build(schema.prisma): allow blocking/unblocking keys

Fixes https://github.com/BerriAI/litellm/issues/5328

* fix(key_management_endpoints.py): fix pop

* feat(auth_checks.py): allow admin to enable/disable virtual keys

Closes https://github.com/BerriAI/litellm/issues/5328

* docs(vertex.md): add auth section for vertex ai

Addresses - https://github.com/BerriAI/litellm/issues/5768#issuecomment-2365284223

* build(model_prices_and_context_window.json): show which models support prompt_caching

Closes https://github.com/BerriAI/litellm/issues/5776

* fix(router.py): allow setting default priority for requests

* fix(router.py): add 'retry-after' header for concurrent request limit errors

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(router.py): correctly raise and use retry-after header from azure+openai

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(user_api_key_auth.py): fix valid token being none

* fix(auth_checks.py): fix model dump for cache management object

* fix(user_api_key_auth.py): pass prisma_client to obj

* test(test_otel.py): update test for new key check

* test: fix test
2024-09-21 18:51:53 -07:00
Krish Dholakia
98c34a7e27
LiteLLM Minor Fixes and Improvements (11/09/2024) (#5634)
* fix(caching.py): set ttl for async_increment cache

fixes issue where ttl for redis client was not being set on increment_cache

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(caching.py): fix increment cache w/ ttl for sync increment cache on redis

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(router.py): support adding retry policy + allowed fails policy via config.yaml

* fix(router.py): don't cooldown single deployments

No point, as there's no other deployment to loadbalance with.

* fix(user_api_key_auth.py): support setting allowed email domains on jwt tokens

Closes https://github.com/BerriAI/litellm/issues/5605

* docs(token_auth.md): add user upsert + allowed email domain to jwt auth docs

* fix(litellm_pre_call_utils.py): fix dynamic key logging when team id is set

Fixes issue where key logging would not be set if team metadata was not none

* fix(secret_managers/main.py): load environment variables correctly

Fixes issue where os.environ/ was not being loaded correctly

* test(test_router.py): fix test

* feat(spend_tracking_utils.py): support logging additional usage params - e.g. prompt caching values for deepseek

* test: fix tests

* test: fix test

* test: fix test

* test: fix test

* test: fix test
2024-09-11 22:36:06 -07:00
Ishaan Jaff
421b857714 pass llm provider when creating async httpx clients 2024-09-10 11:51:42 -07:00
Ishaan Jaff
d4b9a1307d rename get_async_httpx_client 2024-09-10 10:38:01 -07:00
Ishaan Jaff
81ee1653af use correct type hints for audio transcriptions 2024-09-05 09:12:27 -07:00
Krish Dholakia
1e7e538261
LiteLLM Minor fixes + improvements (08/04/2024) (#5505)
* Minor IAM AWS OIDC Improvements (#5246)

* AWS IAM: Temporary tokens are valid across all regions after being issued, so it is wasteful to request one for each region.

* AWS IAM: Include an inline policy, to help reduce misuse of overly permissive IAM roles.

* (test_bedrock_completion.py): Ensure we are testing cross AWS region OIDC flow.

* fix(router.py): log rejected requests

Fixes https://github.com/BerriAI/litellm/issues/5498

* refactor: don't use verbose_logger.exception, if exception is raised

User might already have handling for this. But alerting systems in prod will raise this as an unhandled error.

* fix(datadog.py): support setting datadog source as an env var

Fixes https://github.com/BerriAI/litellm/issues/5508

* docs(logging.md): add dd_source to datadog docs

* fix(proxy_server.py): expose `/customer/list` endpoint for showing all customers

* (bedrock): Fix usage with Cloudflare AI Gateway, and proxies in general. (#5509)

* feat(anthropic.py): support 'cache_control' param for content when it is a string

* Revert "(bedrock): Fix usage with Cloudflare AI Gateway, and proxies in gener…" (#5519)

This reverts commit 3fac0349c2.

* refactor: ci/cd run again

---------

Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
2024-09-04 22:16:55 -07:00
Ishaan Jaff
ca5a117544 dual cache use always read redis as True by default 2024-09-04 08:01:55 -07:00
Ishaan Jaff
fd122cb759 fix always read redis 2024-09-02 21:08:32 -07:00
Ishaan Jaff
15296b4fb7 fix allow qdrant api key to be optional 2024-08-30 11:13:23 -07:00
Krrish Dholakia
0eea01dae9 feat(vertex_ai_context_caching.py): check gemini cache, if key already exists 2024-08-26 22:19:01 -07:00
Ishaan Jaff
feb354d3bc fix should_use_cache 2024-08-24 09:37:41 -07:00
Ishaan Jaff
3c1da2e823 feat - allow setting cache mode 2024-08-24 09:03:59 -07:00
Krrish Dholakia
e2d7539690 feat(caching.py): redis cluster support
Closes https://github.com/BerriAI/litellm/issues/4358
2024-08-21 15:01:52 -07:00
Ishaan Jaff
e7ecb2fe3a fix qdrant litellm on proxy 2024-08-21 12:52:29 -07:00
Ishaan Jaff
c6dfd2d276 fixes for using qdrant with litellm proxy 2024-08-21 12:36:41 -07:00
Ishaan Jaff
428a74be07 fix drant url 2024-08-21 12:09:09 -07:00
Ishaan Jaff
7d0196191f
Merge pull request #5018 from haadirakhangi/main
Qdrant Semantic Caching
2024-08-21 08:50:43 -07:00
Haadi Rakhangi
7f1c3f5edf implemented RestAPI and added support for cloud and local Qdrant clusters 2024-08-19 20:46:30 +05:30
Krrish Dholakia
61f4b71ef7 refactor: replace .error() with .exception() logging for better debugging on sentry 2024-08-16 09:22:47 -07:00
prd-tuong-nguyen
3445174ebe feat: hash prompt when caching 2024-08-08 16:19:14 +07:00
Ishaan Jaff
467c506e33 caching use file_checksum 2024-08-06 13:03:14 -07:00
Krrish Dholakia
a9fdfb5a99 fix(init.py): rename feature_flag 2024-08-05 11:23:20 -07:00
Krrish Dholakia
3c4c78a71f feat(caching.py): enable caching on provider-specific optional params
Closes https://github.com/BerriAI/litellm/issues/5049
2024-08-05 11:18:59 -07:00
Ishaan Jaff
b6b19dc128 use file name when getting cache key 2024-08-02 14:52:08 -07:00
Haadi Rakhangi
851db5ecea qdrant semantic caching added 2024-08-02 21:07:19 +05:30
Krrish Dholakia
31445ab20a fix(caching.py): support /completion caching by default
updates supported call types in redis cache to cover text_completion caching
2024-07-29 08:19:30 -07:00
Ishaan Jaff
19fb5cc11c use common helpers for writing to otel 2024-07-27 11:40:39 -07:00
Ishaan Jaff
2a89486948 move _get_parent_otel_span_from_kwargs to otel.py 2024-07-27 11:12:13 -07:00
Ishaan Jaff
677db38f8b add doc string to explain what delete cache does 2024-07-13 12:25:31 -07:00
Ishaan Jaff
0099bf7859 de-ref unused cache items 2024-07-12 16:38:36 -07:00
Krrish Dholakia
3f83e8a8d4 fix(caching.py): fix async redis health check 2024-07-06 09:14:29 -07:00
Ishaan Jaff
511dd18e4b remove debug print statement 2024-06-27 20:58:29 -07:00
Ishaan Jaff
e899359427 ci/cd add debugging for cache eviction 2024-06-25 08:14:09 -07:00
Ishaan Jaff
05fe43f495 fix default ttl for InMemoryCache 2024-06-24 21:21:38 -07:00
Ishaan Jaff
fa57d2e823 feat use custom eviction policy 2024-06-24 20:28:03 -07:00
Ishaan Jaff
effc7579ac fix install on python 3.8 2024-06-24 17:27:14 -07:00
Ishaan Jaff
b13a93d9bc cleanup InMemoryCache 2024-06-24 17:24:59 -07:00