Commit graph

3651 commits

Author SHA1 Message Date
Ishaan Jaff
eba76377ca
[Chore-Proxy] enforce jwt auth as enterprise feature (#5770)
* enforce prometheus as enterprise feature

* show correct error on prometheus metric when not enrterprise user

* docs promethues metrics enforced

* docs enforce JWT auth

* enforce JWT auth as enterprise feature

* fix merge conflicts
2024-09-18 16:28:37 -07:00
Ishaan Jaff
50cc7c0353
[Chore LiteLLM Proxy] enforce prometheus metrics as enterprise feature (#5769)
* enforce prometheus as enterprise feature

* show correct error on prometheus metric when not enrterprise user

* docs promethues metrics enforced

* fix enforcing
2024-09-18 16:28:12 -07:00
Ishaan Jaff
7e07c37be7
[Feat-Proxy] Add Azure Assistants API - Create Assistant, Delete Assistant Support (#5777)
* update docs to show providers

* azure - move assistants in it's own file

* create new azure assistants file

* add azure create assistants

* add test for create / delete assistants

* azure add delete assistants support

* docs add Azure to support providers for assistants api

* fix linting errors

* fix standard logging merge conflict

* docs azure create assistants

* fix doc
2024-09-18 16:27:33 -07:00
Ishaan Jaff
a109853d21
[Prometheus] track requested model (#5774)
* enforce prometheus as enterprise feature

* show correct error on prometheus metric when not enrterprise user

* docs promethues metrics enforced

* track requested model on prometheus

* docs prom metrics

* fix prom tracking failures
2024-09-18 12:46:58 -07:00
Ishaan Jaff
aa84bcebaf docs update standard logging object 2024-09-18 10:17:09 -07:00
Krish Dholakia
9c8fdee068
Additional Fixes (09/17/2024) (#5759)
* fix(auth_checks.py): check if key has all model access via wildcard routing

Fixes issue where key with `openai/*` couldn't call gpt models

* fix(slack_alerting.py): expose flag for disabling failed spend tracking alerts
2024-09-17 23:02:12 -07:00
Krish Dholakia
98c335acd0
LiteLLM Minor Fixes & Improvements (09/17/2024) (#5742)
* fix(proxy_server.py): use default azure credentials to support azure non-client secret kms

* fix(langsmith.py): raise error if credentials missing

* feat(langsmith.py): support error logging for langsmith + standard logging payload

Fixes https://github.com/BerriAI/litellm/issues/5738

* Fix hardcoding of schema in view check (#5749)

* fix - deal with case when check view exists returns None (#5740)

* Revert "fix - deal with case when check view exists returns None (#5740)" (#5741)

This reverts commit 535228159b.

* test(test_router_debug_logs.py): move to mock response

* Fix hardcoding of schema

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* fix(proxy_server.py): allow admin to disable ui via `DISABLE_ADMIN_UI` flag

* fix(router.py): fix default model name value

Fixes 55db19a1e4 (r1763712148)

* fix(utils.py): fix unbound variable error

* feat(rerank/main.py): add azure ai rerank endpoints

Closes https://github.com/BerriAI/litellm/issues/5667

* feat(secret_detection.py): Allow configuring secret detection params

Allows admin to control what plugins to run for secret detection. Prevents overzealous secret detection.

* docs(secret_detection.md): add secret detection guardrail docs

* fix: fix linting errors

* fix - deal with case when check view exists returns None (#5740)

* Revert "fix - deal with case when check view exists returns None (#5740)" (#5741)

This reverts commit 535228159b.

* Litellm fix router testing (#5748)

* test: fix testing - azure changed content policy error logic

* test: fix tests to use mock responses

* test(test_image_generation.py): handle api instability

* test(test_image_generation.py): handle azure api instability

* fix(utils.py): fix unbounded variable error

* fix(utils.py): fix unbounded variable error

* test: refactor test to use mock response

* test: mark flaky azure tests

* Bump next from 14.1.1 to 14.2.10 in /ui/litellm-dashboard (#5753)

Bumps [next](https://github.com/vercel/next.js) from 14.1.1 to 14.2.10.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v14.1.1...v14.2.10)

---
updated-dependencies:
- dependency-name: next
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Fix] o1-mini causes pydantic warnings on `reasoning_tokens`  (#5754)

* add requester_metadata in standard logging payload

* log requester_metadata in metadata

* use StandardLoggingPayload for logging

* docs StandardLoggingPayload

* fix import

* include standard logging object in failure

* add test for requester metadata

* handle completion_tokens_details

* add test for completion_tokens_details

* [Feat-Proxy-DataDog] Log Redis, Postgres Failure events on DataDog  (#5750)

* dd - start tracking redis status on dd

* add async_service_succes_hook / failure hook in custom logger

* add async_service_failure_hook

* log service failures on dd

* fix import error

* add test for redis errors / warning

* [Fix] Router/ Proxy - Tag Based routing, raise correct error when no deployments found and tag filtering is on  (#5745)

* fix tag routing - raise correct error when no model with tag based routing

* fix error string from tag based routing

* test router tag based routing

* raise 401 error when no tags avialable for deploymen

* linting fix

* [Feat] Log Request metadata on gcs bucket logging (#5743)

* add requester_metadata in standard logging payload

* log requester_metadata in metadata

* use StandardLoggingPayload for logging

* docs StandardLoggingPayload

* fix import

* include standard logging object in failure

* add test for requester metadata

* fix(litellm_logging.py): fix logging message

* fix(rerank_api/main.py): fix linting errors

* fix(custom_guardrails.py): maintain backwards compatibility for older guardrails

* fix(rerank_api/main.py): fix cost tracking for rerank endpoints

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: steffen-sbt <148480574+steffen-sbt@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-17 23:00:04 -07:00
Ishaan Jaff
be96c79b3c update datadog docs 2024-09-17 20:42:36 -07:00
Ishaan Jaff
d3406c92aa
[Feat] Log Request metadata on gcs bucket logging (#5743)
* add requester_metadata in standard logging payload

* log requester_metadata in metadata

* use StandardLoggingPayload for logging

* docs StandardLoggingPayload

* fix import

* include standard logging object in failure

* add test for requester metadata
2024-09-17 20:25:39 -07:00
Ishaan Jaff
1bb1f70a47
[Fix] Router/ Proxy - Tag Based routing, raise correct error when no deployments found and tag filtering is on (#5745)
* fix tag routing - raise correct error when no model with tag based routing

* fix error string from tag based routing

* test router tag based routing

* raise 401 error when no tags avialable for deploymen

* linting fix
2024-09-17 20:24:28 -07:00
Ishaan Jaff
911230c434
[Feat-Proxy-DataDog] Log Redis, Postgres Failure events on DataDog (#5750)
* dd - start tracking redis status on dd

* add async_service_succes_hook / failure hook in custom logger

* add async_service_failure_hook

* log service failures on dd

* fix import error

* add test for redis errors / warning
2024-09-17 20:24:06 -07:00
Ishaan Jaff
7f4dfe434a
[Fix] o1-mini causes pydantic warnings on reasoning_tokens (#5754)
* add requester_metadata in standard logging payload

* log requester_metadata in metadata

* use StandardLoggingPayload for logging

* docs StandardLoggingPayload

* fix import

* include standard logging object in failure

* add test for requester metadata

* handle completion_tokens_details

* add test for completion_tokens_details
2024-09-17 20:23:14 -07:00
Ishaan Jaff
8de6e3d3ba
Revert "fix - deal with case when check view exists returns None (#5740)" (#5741)
This reverts commit 535228159b.
2024-09-17 09:04:22 -07:00
Ishaan Jaff
535228159b
fix - deal with case when check view exists returns None (#5740) 2024-09-17 08:38:19 -07:00
Krish Dholakia
234185ec13
LiteLLM Minor Fixes & Improvements (09/16/2024) (#5723) (#5731)
* LiteLLM Minor Fixes & Improvements (09/16/2024)  (#5723)

* coverage (#5713)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* Move (#5714)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix(litellm_logging.py): fix logging client re-init (#5710)

Fixes https://github.com/BerriAI/litellm/issues/5695

* fix(presidio.py): Fix logging_hook response and add support for additional presidio variables in guardrails config

Fixes https://github.com/BerriAI/litellm/issues/5682

* feat(o1_handler.py): fake streaming for openai o1 models

Fixes https://github.com/BerriAI/litellm/issues/5694

* docs: deprecated traceloop integration in favor of native otel (#5249)

* fix: fix linting errors

* fix: fix linting errors

* fix(main.py): fix o1 import

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view (#5730)

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view

Supports having `MonthlyGlobalSpend` view be a material view, and exposes an endpoint to refresh it

* fix(custom_logger.py): reset calltype

* fix: fix linting errors

* fix: fix linting error

* fix: fix import

* test(test_databricks.py): fix databricks tests

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
2024-09-17 08:05:52 -07:00
Ishaan Jaff
9f5a33015f fix linting 2024-09-16 18:07:48 -07:00
Ishaan Jaff
b6ae2204a8
[Feat-Proxy] Slack Alerting - allow using os.environ/ vars for alert to webhook url (#5726)
* allow using os.environ for slack urls

* use env vars for webhook urls

* fix types for get_secret

* fix linting

* fix linting

* fix linting

* linting fixes

* linting fix

* docs alerting slack

* fix get data
2024-09-16 18:03:37 -07:00
Ishaan Jaff
8103e2b2da
[Fix-Proxy] Azure Key Management - Secret Manager (#5728)
* fix azure key mgtm error

* add test for azure kms

* add test for azure kms
2024-09-16 18:01:40 -07:00
Ishaan Jaff
7b09591ca6
[Fix-Proxy] log exceptions from azure key vault on verbose_logger.exceptions (#5719)
* log exceptions from azure key vault

* fix error from azure key vault
2024-09-16 16:58:37 -07:00
Ishaan Jaff
8fbe2abb89
[Feat-Proxy] Add upperbound key duration param (#5727)
* add upperbound key duration param

* use upper bound values when None set

* docs upperbound params
2024-09-16 16:28:36 -07:00
Krish Dholakia
da77706c26
Litellm stable dev (#5711)
* feat(aws_base_llm.py): prevents recreating boto3 credentials during high traffic

Leads to 100ms perf boost in local testing

* fix(base_aws_llm.py): fix credential caching check to see if token is set

* refactor(bedrock/chat): separate converse api and invoke api + isolate converse api transformation logic

Make it easier to see how requests are transformed for /converse

* fix: fix imports

* fix(bedrock/embed): fix reordering of headers

* fix(base_aws_llm.py): fix get credential logic

* fix(converse_handler.py): fix ai21 streaming response
2024-09-14 23:22:59 -07:00
Ishaan Jaff
c8eff2dc65
[Feat-Prometheus] Track exception status on litellm_deployment_failure_responses (#5706)
* add litellm_deployment_cooled_down

* track num cooldowns on prometheus

* track exception status

* fix linting

* docs prom metrics

* cleanup premium user checks

* prom track deployment failure state

* docs prometheus
2024-09-14 18:44:31 -07:00
Ishaan Jaff
c8d15544c8
[Fix] Router cooldown logic - use % thresholds instead of allowed fails to cooldown deployments (#5698)
* move cooldown logic to it's own helper

* add new track deployment metrics folder

* increment success, fails for deployment in current minute

* fix cooldown logic

* fix test_aaarouter_dynamic_cooldown_message_retry_time

* fix test_single_deployment_no_cooldowns_test_prod_mock_completion_calls

* clean up get from deployment test

* fix _async_get_healthy_deployments

* add mock InternalServerError

* test deployment failing 25% requests

* add test_high_traffic_cooldowns_one_bad_deployment

* fix vertex load test

* add test for rate limit error models in cool down

* change default cooldown time

* fix cooldown message time

* fix cooldown on 429 error

* fix doc string for _should_cooldown_deployment

* fix sync cooldown logic router
2024-09-14 18:01:19 -07:00
Krish Dholakia
dad1ad2077
LiteLLM Minor Fixes and Improvements (09/14/2024) (#5697)
* fix(health_check.py): hide sensitive keys from health check debug information k

* fix(route_llm_request.py): fix proxy model not found error message to indicate how to resolve issue

* fix(vertex_llm_base.py): fix exception message to not log credentials
2024-09-14 10:32:39 -07:00
Krish Dholakia
60709a0753
LiteLLM Minor Fixes and Improvements (09/13/2024) (#5689)
* refactor: cleanup unused variables + fix pyright errors

* feat(health_check.py): Closes https://github.com/BerriAI/litellm/issues/5686

* fix(o1_reasoning.py): add stricter check for o-1 reasoning model

* refactor(mistral/): make it easier to see mistral transformation logic

* fix(openai.py): fix openai o-1 model param mapping

Fixes https://github.com/BerriAI/litellm/issues/5685

* feat(main.py): infer finetuned gemini model from base model

Fixes https://github.com/BerriAI/litellm/issues/5678

* docs(vertex.md): update docs to call finetuned gemini models

* feat(proxy_server.py): allow admin to hide proxy model aliases

Closes https://github.com/BerriAI/litellm/issues/5692

* docs(load_balancing.md): add docs on hiding alias models from proxy config

* fix(base.py): don't raise notimplemented error

* fix(user_api_key_auth.py): fix model max budget check

* fix(router.py): fix elif

* fix(user_api_key_auth.py): don't set team_id to empty str

* fix(team_endpoints.py): fix response type

* test(test_completion.py): handle predibase error

* test(test_proxy_server.py): fix test

* fix(o1_transformation.py): fix max_completion_token mapping

* test(test_image_generation.py): mark flaky test
2024-09-14 10:02:55 -07:00
Ishaan Jaff
741c8e8a45
[Feat - Perf Improvement] DataDog Logger 91% lower latency (#5687)
* fix refactor dd to be an instance of custom logger

* migrate dd logger to be async

* clean up dd logging

* add datadog sync and async code

* use batching for datadog logger

* add doc string for dd logging

* add clear doc string

* fix doc string

* allow debugging intake url

* clean up requirements.txt

* allow setting custom batch size on logger

* fix dd logging to use compression

* fix linting

* add dd load test

* fix dd load test

* fix dd url

* add test_datadog_logging_http_request

* fix test_datadog_logging_http_request
2024-09-13 17:39:17 -07:00
Krrish Dholakia
cdd7cd4d69 build: bump from 1.44.28 -> 1.45.0 2024-09-12 23:10:29 -07:00
Krish Dholakia
4657a40ef1
LiteLLM Minor Fixes and Improvements (09/12/2024) (#5658)
* fix(factory.py): handle tool call content as list

Fixes https://github.com/BerriAI/litellm/issues/5652

* fix(factory.py): enforce stronger typing

* fix(router.py): return model alias in /v1/model/info and /v1/model_group/info

* fix(user_api_key_auth.py): move noisy warning message to debug

cleanup logs

* fix(types.py): cleanup pydantic v2 deprecated param

Fixes https://github.com/BerriAI/litellm/issues/5649

* docs(gemini.md): show how to pass inline data to gemini api

Fixes https://github.com/BerriAI/litellm/issues/5674
2024-09-12 23:04:06 -07:00
Krish Dholakia
d94d47424f
fix(proxy/utils.py): auto-update if required view missing from db. raise warning for optional views. (#5675)
Prevents missing optional views from blocking proxy startup.
2024-09-12 22:15:44 -07:00
Ishaan Jaff
19a06d7842
[Fix-Router] Don't cooldown when only 1 deployment exists (#5673)
* fix get model list

* fix test custom callback router

* fix embedding fallback test

* fix router retry policy on AuthErrors

* fix router test

* add test for single deployments no cooldown test prod

* add test test_single_deployment_no_cooldowns_test_prod_mock_completion_calls
2024-09-12 19:14:58 -07:00
Ishaan Jaff
e7c9716841
[Feat-Perf] Use Batching + Squashing (#5645)
* use folder for slack alerting

* clean up slack alerting

* fix test alerting
2024-09-12 18:37:53 -07:00
Krish Dholakia
c76d2c6ade
Refactor 'check_view_exists' logic (#5659)
* fix(proxy/utils.py): comment out auto-upsert logic in check_view_exists

Prevents proxy from failing on startup due to faulty logic

* fix(db/migration_scripts/create_views.py): fix 'DailyTagSpend' quotation on check

* fix(create_views.py): mongly global spend time period should be 30d not 20d

* fix(schema.prisma): index on startTime and endUser for efficient UI querying
2024-09-12 13:39:50 -07:00
Krish Dholakia
98c34a7e27
LiteLLM Minor Fixes and Improvements (11/09/2024) (#5634)
* fix(caching.py): set ttl for async_increment cache

fixes issue where ttl for redis client was not being set on increment_cache

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(caching.py): fix increment cache w/ ttl for sync increment cache on redis

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(router.py): support adding retry policy + allowed fails policy via config.yaml

* fix(router.py): don't cooldown single deployments

No point, as there's no other deployment to loadbalance with.

* fix(user_api_key_auth.py): support setting allowed email domains on jwt tokens

Closes https://github.com/BerriAI/litellm/issues/5605

* docs(token_auth.md): add user upsert + allowed email domain to jwt auth docs

* fix(litellm_pre_call_utils.py): fix dynamic key logging when team id is set

Fixes issue where key logging would not be set if team metadata was not none

* fix(secret_managers/main.py): load environment variables correctly

Fixes issue where os.environ/ was not being loaded correctly

* test(test_router.py): fix test

* feat(spend_tracking_utils.py): support logging additional usage params - e.g. prompt caching values for deepseek

* test: fix tests

* test: fix test

* test: fix test

* test: fix test

* test: fix test
2024-09-11 22:36:06 -07:00
Ishaan Jaff
5dac4abd16
Merge branch 'main' into litellm_otel_fixes 2024-09-11 18:06:29 -07:00
Ishaan Jaff
f55318de47
Merge pull request #5638 from BerriAI/litellm_langsmith_perf
[Langsmith Perf Improvement] Use /batch for Langsmith Logging
2024-09-11 17:43:26 -07:00
steffen-sbt
de9a39e7c6
Add the option to specify a schema in the postgres DB, also modify docs (#5640) 2024-09-11 14:53:52 -07:00
Ishaan Jaff
e681619381 use vars for batch size and flush interval seconds 2024-09-11 14:40:58 -07:00
Ishaan Jaff
3376f151c6 fix otel use sensible defaults 2024-09-11 14:24:04 -07:00
Krish Dholakia
0295a22561
LiteLLM Minor Fixes and Improvements (09/10/2024) (#5618)
* fix(cost_calculator.py): move to debug for noisy warning message on cost calculation error

Fixes https://github.com/BerriAI/litellm/issues/5610

* fix(databricks/cost_calculator.py): Handles model name issues for databricks models

* fix(main.py): fix stream chunk builder for multiple tool calls

Fixes https://github.com/BerriAI/litellm/issues/5591

* fix: correctly set user_alias when passed in

Fixes https://github.com/BerriAI/litellm/issues/5612

* fix(types/utils.py): allow passing role for message object

https://github.com/BerriAI/litellm/issues/5621

* fix(litellm_logging.py): Fix langfuse logging across multiple projects

Fixes issue where langfuse logger was re-using the old logging object

* feat(proxy/_types.py): support adding key-based tags for tag-based routing

Enable tag based routing at key-level

* fix(proxy/_types.py): fix inheritance

* test(test_key_generate_prisma.py): fix test

* test: fix test

* fix(litellm_logging.py): return used callback object
2024-09-11 11:30:29 -07:00
Ishaan Jaff
2fa9709af0 stash - langsmith use batching for logging 2024-09-11 08:06:56 -07:00
Ishaan Jaff
c1262addbe
Merge pull request #5623 from BerriAI/litellm_vertex_use_async_for_getting_token
[Feat-Vertex Perf] Use async func to get auth credentials
2024-09-10 18:53:48 -07:00
Ishaan Jaff
96fa9d46f5 fix case when gemini is used 2024-09-10 17:06:45 -07:00
Ishaan Jaff
1c6f8b1be2 fix vertex use async func to set auth creds 2024-09-10 16:12:18 -07:00
Ishaan Jaff
3ebff903c3
Merge branch 'main' into litellm_use_helper_to_get_httpx_clients 2024-09-10 15:02:54 -07:00
Ishaan Jaff
f3593aed68
Merge pull request #5619 from BerriAI/litellm_vertex_use_get_httpx_client
[Fix-Perf] Vertex AI cache httpx clients
2024-09-10 13:59:39 -07:00
Ishaan Jaff
08f8f9634f use get async httpx client 2024-09-10 13:08:49 -07:00
Ishaan Jaff
421b857714 pass llm provider when creating async httpx clients 2024-09-10 11:51:42 -07:00
Ishaan Jaff
d4b9a1307d rename get_async_httpx_client 2024-09-10 10:38:01 -07:00
Ishaan Jaff
1e8cf9f2a6 fix vertex ai use _get_async_client 2024-09-10 10:33:19 -07:00
Ishaan Jaff
428762542c fix regen keys when no duration is passed 2024-09-10 08:04:18 -07:00