(feat proxy slack alerting) - allow opting in to getting key / internal user alerts (#5990)

* define all slack alert types

* use correct type hints for alert type

* use correct defaults on slack alerting

* add readme for slack alerting

* fix linting error

* update readme

* docs all alert types

* update slack alerting docs

* fix slack alerting docs

* handle new testing dir structure

* fix config for testing

* fix testing folder related imports

* fix /tests import errors

* fix import stream_chunk_testdata

* docs alert types

* fix test test_langfuse_trace_id

* fix type checks for slack alerting

* fix outage alerting test slack
This commit is contained in:
Ishaan Jaff 2024-10-01 10:49:22 -07:00 committed by GitHub
parent 8225880af0
commit 045ecf3ffb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
21 changed files with 283 additions and 210 deletions

View file

@ -16,15 +16,6 @@ jobs:
command: |
echo "Git commit hash: $CIRCLE_SHA1"
- run:
name: Check if litellm dir was updated or if pyproject.toml was modified
command: |
if [ -n "$(git diff --name-only $CIRCLE_SHA1^..$CIRCLE_SHA1 | grep -E 'pyproject\.toml|litellm/')" ]; then
echo "litellm updated"
else
echo "No changes to litellm or pyproject.toml. Skipping tests."
circleci step halt
fi
- restore_cache:
keys:
- v1-dependencies-{{ checksum ".circleci/requirements.txt" }}

View file

@ -112,31 +112,24 @@ response = client.chat.completions.create(
### Opting into specific alert types
Set `alert_types` if you want to Opt into only specific alert types
Set `alert_types` if you want to Opt into only specific alert types. When alert_types is not set, all Default Alert Types are enabled.
👉 [**See all alert types here**](#all-possible-alert-types)
```shell
general_settings:
alerting: ["slack"]
alert_types: ["spend_reports"]
```
All Possible Alert Types
```python
AlertType = Literal[
"llm_exceptions", # LLM API Exceptions
"llm_too_slow", # LLM Responses slower than alerting_threshold
alert_types: [
"llm_exceptions",
"llm_too_slow",
"llm_requests_hanging",
"budget_alerts",
"spend_reports",
"db_exceptions",
"daily_reports",
"spend_reports",
"fallback_reports",
"cooldown_deployment",
"new_model_added",
"outage_alerts",
]
]
```
### Set specific slack channels per alert type
@ -365,7 +358,7 @@ curl -X GET --location 'http://0.0.0.0:4000/health/services?service=webhook' \
}
```
## **API Spec for Webhook Event**
### API Spec for Webhook Event
- `spend` *float*: The current spend amount for the 'event_group'.
- `max_budget` *float or null*: The maximum allowed budget for the 'event_group'. null if not set.
@ -417,4 +410,50 @@ general_settings:
region_outage_alert_ttl: 60 # time-window in seconds
minor_outage_alert_threshold: 5 # number of errors to trigger a minor alert
major_outage_alert_threshold: 10 # number of errors to trigger a major alert
```
```
## **All Possible Alert Types**
👉 [**Here is how you can set specific alert types**](#opting-into-specific-alert-types)
LLM-related Alerts
| Alert Type | Description | Default On |
|------------|-------------|---------|
| `llm_exceptions` | Alerts for LLM API exceptions | ✅ |
| `llm_too_slow` | Notifications for LLM responses slower than the set threshold | ✅ |
| `llm_requests_hanging` | Alerts for LLM requests that are not completing | ✅ |
| `cooldown_deployment` | Alerts when a deployment is put into cooldown | ✅ |
| `new_model_added` | Notifications when a new model is added to litellm proxy through /model/new| ✅ |
| `outage_alerts` | Alerts when a specific LLM deployment is facing an outage | ✅ |
| `region_outage_alerts` | Alerts when a specfic LLM region is facing an outage. Example us-east-1 | ✅ |
Budget and Spend Alerts
| Alert Type | Description | Default On|
|------------|-------------|---------|
| `budget_alerts` | Notifications related to budget limits or thresholds | ✅ |
| `spend_reports` | Periodic reports on spending across teams or tags | ✅ |
| `failed_tracking_spend` | Alerts when spend tracking fails | ✅ |
| `daily_reports` | Daily Spend reports | ✅ |
| `fallback_reports` | Weekly Reports on LLM fallback occurrences | ✅ |
Database Alerts
| Alert Type | Description | Default On |
|------------|-------------|---------|
| `db_exceptions` | Notifications for database-related exceptions | ✅ |
Management Endpoint Alerts - Virtual Key, Team, Internal User
| Alert Type | Description | Default On |
|------------|-------------|---------|
| `new_virtual_key_created` | Notifications when a new virtual key is created | ❌ |
| `virtual_key_updated` | Alerts when a virtual key is modified | ❌ |
| `virtual_key_deleted` | Notifications when a virtual key is removed | ❌ |
| `new_team_created` | Alerts for the creation of a new team | ❌ |
| `team_updated` | Notifications when team details are modified | ❌ |
| `team_deleted` | Alerts when a team is deleted | ❌ |
| `new_internal_user_created` | Notifications for new internal user accounts | ❌ |
| `internal_user_updated` | Alerts when an internal user's details are changed | ❌ |
| `internal_user_deleted` | Notifications when an internal user account is removed | ❌ |

View file

@ -0,0 +1,13 @@
# Slack Alerting on LiteLLM Gateway
This folder contains the Slack Alerting integration for LiteLLM Gateway.
## Folder Structure
- `slack_alerting.py`: This is the main file that handles sending different types of alerts
- `batching_handler.py`: Handles Batching + sending Httpx Post requests to slack. Slack alerts are sent every 10s or when events are greater than X events. Done to ensure litellm has good performance under high traffic
- `types.py`: This file contains the AlertType enum which is used to define the different types of alerts that can be sent to Slack.
- `utils.py`: This file contains common utils used specifically for slack alerting
## Further Reading
- [Doc setting up Alerting on LiteLLM Proxy (Gateway)](https://docs.litellm.ai/docs/proxy/alerting)

View file

@ -41,7 +41,7 @@ from litellm.types.router import LiteLLM_Params
from ..email_templates.templates import *
from .batching_handler import send_to_webhook, squash_payloads
from .types import *
from .utils import process_slack_alerting_variables
from .utils import _add_langfuse_trace_id_to_alert, process_slack_alerting_variables
class SlackAlerting(CustomBatchLogger):
@ -57,7 +57,7 @@ class SlackAlerting(CustomBatchLogger):
float
] = None, # threshold for slow / hanging llm responses (in seconds)
alerting: Optional[List] = [],
alert_types: List[AlertType] = list(get_args(AlertType)),
alert_types: List[AlertType] = DEFAULT_ALERT_TYPES,
alert_to_webhook_url: Optional[
Dict[AlertType, Union[List[str], str]]
] = None, # if user wants to separate alerts to diff channels
@ -87,7 +87,7 @@ class SlackAlerting(CustomBatchLogger):
self,
alerting: Optional[List] = None,
alerting_threshold: Optional[float] = None,
alert_types: Optional[List] = None,
alert_types: Optional[List[AlertType]] = None,
alert_to_webhook_url: Optional[Dict[AlertType, Union[List[str], str]]] = None,
alerting_args: Optional[Dict] = None,
llm_router: Optional[litellm.Router] = None,
@ -127,47 +127,8 @@ class SlackAlerting(CustomBatchLogger):
def _all_possible_alert_types(self):
# used by the UI to show all supported alert types
# Note: This is not the alerts the user has configured, instead it's all possible alert types a user can select
return [
"llm_exceptions",
"llm_too_slow",
"llm_requests_hanging",
"budget_alerts",
"db_exceptions",
]
async def _add_langfuse_trace_id_to_alert(
self,
request_data: Optional[dict] = None,
) -> Optional[str]:
"""
Returns langfuse trace url
- check:
-> existing_trace_id
-> trace_id
-> litellm_call_id
"""
# do nothing for now
if (
request_data is not None
and request_data.get("litellm_logging_obj", None) is not None
):
trace_id: Optional[str] = None
litellm_logging_obj: Logging = request_data["litellm_logging_obj"]
for _ in range(3):
trace_id = litellm_logging_obj._get_trace_id(service_name="langfuse")
if trace_id is not None:
break
await asyncio.sleep(3) # wait 3s before retrying for trace id
_langfuse_object = litellm_logging_obj._get_callback_object(
service_name="langfuse"
)
if _langfuse_object is not None:
base_url = _langfuse_object.Langfuse.base_url
return f"{base_url}/trace/{trace_id}"
return None
# return list of all values AlertType enum
return list(AlertType)
def _response_taking_too_long_callback_helper(
self,
@ -275,7 +236,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=slow_message + request_info,
level="Low",
alert_type="llm_too_slow",
alert_type=AlertType.llm_too_slow,
alerting_metadata=alerting_metadata,
)
@ -467,7 +428,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=message,
level="Low",
alert_type="daily_reports",
alert_type=AlertType.daily_reports,
alerting_metadata={},
)
@ -563,7 +524,7 @@ class SlackAlerting(CustomBatchLogger):
)
if "langfuse" in litellm.success_callback:
langfuse_url = await self._add_langfuse_trace_id_to_alert(
langfuse_url = await _add_langfuse_trace_id_to_alert(
request_data=request_data,
)
@ -580,7 +541,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=alerting_message + request_info,
level="Medium",
alert_type="llm_requests_hanging",
alert_type=AlertType.llm_requests_hanging,
alerting_metadata=alerting_metadata,
)
@ -600,7 +561,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=message,
level="High",
alert_type="failed_tracking_spend",
alert_type=AlertType.failed_tracking_spend,
alerting_metadata={},
)
await _cache.async_set_cache(
@ -701,7 +662,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=event_message + "\n\n" + user_info_str,
level="High",
alert_type="budget_alerts",
alert_type=AlertType.budget_alerts,
user_info=webhook_event,
alerting_metadata={},
)
@ -902,7 +863,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=msg,
level="Medium",
alert_type="outage_alerts",
alert_type=AlertType.outage_alerts,
alerting_metadata={},
)
# set to true
@ -928,7 +889,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=msg,
level="High",
alert_type="outage_alerts",
alert_type=AlertType.outage_alerts,
alerting_metadata={},
)
# set to true
@ -1031,7 +992,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=msg,
level="Medium",
alert_type="outage_alerts",
alert_type=AlertType.outage_alerts,
alerting_metadata={},
)
# set to true
@ -1053,7 +1014,7 @@ class SlackAlerting(CustomBatchLogger):
await self.send_alert(
message=msg,
level="High",
alert_type="outage_alerts",
alert_type=AlertType.outage_alerts,
alerting_metadata={},
)
# set to true
@ -1118,7 +1079,7 @@ Model Info:
alert_val = self.send_alert(
message=message,
level="Low",
alert_type="new_model_added",
alert_type=AlertType.new_model_added,
alerting_metadata={},
)
@ -1354,7 +1315,7 @@ Model Info:
self,
message: str,
level: Literal["Low", "Medium", "High"],
alert_type: Literal[AlertType],
alert_type: AlertType,
alerting_metadata: dict,
user_info: Optional[WebhookEvent] = None,
**kwargs,
@ -1395,7 +1356,6 @@ Model Info:
if "slack" not in self.alerting:
return
if alert_type not in self.alert_types:
return
@ -1654,7 +1614,7 @@ Model Info:
await self.send_alert(
message=_spend_message,
level="Low",
alert_type="spend_reports",
alert_type=AlertType.spend_reports,
alerting_metadata={},
)
except ValueError as ve:
@ -1713,7 +1673,7 @@ Model Info:
await self.send_alert(
message=_spend_message,
level="Low",
alert_type="spend_reports",
alert_type=AlertType.spend_reports,
alerting_metadata={},
)
except Exception as e:
@ -1742,7 +1702,7 @@ Model Info:
await self.send_alert(
message=fallback_message,
level="Low",
alert_type="fallback_reports",
alert_type=AlertType.fallback_reports,
alerting_metadata={},
)
@ -1754,14 +1714,19 @@ Model Info:
async def send_virtual_key_event_slack(
self,
key_event: VirtualKeyEvent,
alert_type: AlertType,
event_name: str,
):
"""
Helper to send fallback statistics from prometheus server -> to slack
Handles sending Virtual Key related alerts
This runs once per day and sends an overview of all the fallback statistics
Example:
- New Virtual Key Created
- Internal User Updated
- Team Created, Updated, Deleted
"""
try:
message = f"`{event_name}`\n"
key_event_dict = key_event.model_dump()
@ -1783,7 +1748,7 @@ Model Info:
await self.send_alert(
message=message,
level="High",
alert_type="fallback_reports",
alert_type=alert_type,
alerting_metadata={},
)

View file

@ -32,7 +32,7 @@ class LiteLLMBase(BaseModel):
Implements default functions, all pydantic objects should have.
"""
def json(self, **kwargs):
def json(self, **kwargs): # type: ignore
try:
return self.model_dump() # noqa
except:
@ -41,14 +41,14 @@ class LiteLLMBase(BaseModel):
class SlackAlertingArgsEnum(Enum):
daily_report_frequency: int = 12 * 60 * 60
report_check_interval: int = 5 * 60
budget_alert_ttl: int = 24 * 60 * 60
outage_alert_ttl: int = 1 * 60
region_outage_alert_ttl: int = 1 * 60
minor_outage_alert_threshold: int = 1 * 5
major_outage_alert_threshold: int = 1 * 10
max_outage_alert_list_size: int = 1 * 10
daily_report_frequency = 12 * 60 * 60
report_check_interval = 5 * 60
budget_alert_ttl = 24 * 60 * 60
outage_alert_ttl = 1 * 60
region_outage_alert_ttl = 1 * 60
minor_outage_alert_threshold = 1 * 5
major_outage_alert_threshold = 1 * 10
max_outage_alert_list_size = 1 * 10
class SlackAlertingArgs(LiteLLMBase):
@ -56,7 +56,7 @@ class SlackAlertingArgs(LiteLLMBase):
default=int(
os.getenv(
"SLACK_DAILY_REPORT_FREQUENCY",
SlackAlertingArgsEnum.daily_report_frequency.value,
int(SlackAlertingArgsEnum.daily_report_frequency.value),
)
),
description="Frequency of receiving deployment latency/failure reports. Default is 12hours. Value is in seconds.",
@ -119,3 +119,75 @@ class SlackAlertingCacheKeys(Enum):
failed_requests_key = "failed_requests_daily_metrics"
latency_key = "latency_daily_metrics"
report_sent_key = "daily_metrics_report_sent"
class AlertType(str, Enum):
"""
Enum for alert types and management event types
"""
# LLM-related alerts
llm_exceptions = "llm_exceptions"
llm_too_slow = "llm_too_slow"
llm_requests_hanging = "llm_requests_hanging"
# Budget and spend alerts
budget_alerts = "budget_alerts"
spend_reports = "spend_reports"
failed_tracking_spend = "failed_tracking_spend"
# Database alerts
db_exceptions = "db_exceptions"
# Report alerts
daily_reports = "daily_reports"
# Deployment alerts
cooldown_deployment = "cooldown_deployment"
new_model_added = "new_model_added"
# Outage alerts
outage_alerts = "outage_alerts"
region_outage_alerts = "region_outage_alerts"
# Fallback alerts
fallback_reports = "fallback_reports"
# Virtual Key Events
new_virtual_key_created = "new_virtual_key_created"
virtual_key_updated = "virtual_key_updated"
virtual_key_deleted = "virtual_key_deleted"
# Team Events
new_team_created = "new_team_created"
team_updated = "team_updated"
team_deleted = "team_deleted"
# Internal User Events
new_internal_user_created = "new_internal_user_created"
internal_user_updated = "internal_user_updated"
internal_user_deleted = "internal_user_deleted"
DEFAULT_ALERT_TYPES: List[AlertType] = [
# LLM related alerts
AlertType.llm_exceptions,
AlertType.llm_too_slow,
AlertType.llm_requests_hanging,
# Budget and spend alerts
AlertType.budget_alerts,
AlertType.spend_reports,
AlertType.failed_tracking_spend,
# Database alerts
AlertType.db_exceptions,
# Report alerts
AlertType.daily_reports,
# Deployment alerts
AlertType.cooldown_deployment,
AlertType.new_model_added,
# Outage alerts
AlertType.outage_alerts,
AlertType.region_outage_alerts,
# Fallback alerts
AlertType.fallback_reports,
]

View file

@ -2,9 +2,11 @@
Utils used for slack alerting
"""
import asyncio
from typing import Dict, List, Optional, Union
import litellm
from litellm.litellm_core_utils.litellm_logging import Logging
from litellm.proxy._types import AlertType
from litellm.secret_managers.main import get_secret
@ -49,3 +51,37 @@ def process_slack_alerting_variables(
alert_to_webhook_url[alert_type] = _webhook_value_str
return alert_to_webhook_url
async def _add_langfuse_trace_id_to_alert(
request_data: Optional[dict] = None,
) -> Optional[str]:
"""
Returns langfuse trace url
- check:
-> existing_trace_id
-> trace_id
-> litellm_call_id
"""
# do nothing for now
if (
request_data is not None
and request_data.get("litellm_logging_obj", None) is not None
):
trace_id: Optional[str] = None
litellm_logging_obj: Logging = request_data["litellm_logging_obj"]
for _ in range(3):
trace_id = litellm_logging_obj._get_trace_id(service_name="langfuse")
if trace_id is not None:
break
await asyncio.sleep(3) # wait 3s before retrying for trace id
_langfuse_object = litellm_logging_obj._get_callback_object(
service_name="langfuse"
)
if _langfuse_object is not None:
base_url = _langfuse_object.Langfuse.base_url
return f"{base_url}/trace/{trace_id}"
return None

View file

@ -10,6 +10,7 @@ from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Union
from pydantic import BaseModel, ConfigDict, Extra, Field, Json, model_validator
from typing_extensions import Annotated, TypedDict
from litellm.integrations.SlackAlerting.types import AlertType
from litellm.types.router import RouterErrors, UpdateRouterConfig
from litellm.types.utils import ProviderField
@ -110,23 +111,6 @@ class LitellmTableNames(enum.Enum):
PROXY_MODEL_TABLE_NAME = "LiteLLM_ModelTable"
AlertType = Literal[
"llm_exceptions",
"llm_too_slow",
"llm_requests_hanging",
"budget_alerts",
"db_exceptions",
"daily_reports",
"spend_reports",
"cooldown_deployment",
"new_model_added",
"outage_alerts",
"region_outage_alerts",
"fallback_reports",
"failed_tracking_spend",
]
def hash_token(token: str):
import hashlib

View file

@ -11,6 +11,7 @@ from fastapi import APIRouter, Depends, Header, HTTPException, Request, Response
import litellm
from litellm._logging import verbose_proxy_logger
from litellm.proxy._types import (
AlertType,
CallInfo,
ProxyErrorTypes,
ProxyException,
@ -159,13 +160,6 @@ async def health_services_endpoint(
for (
alert_type
) in proxy_logging_obj.slack_alerting_instance.alert_to_webhook_url:
"""
"llm_exceptions",
"llm_too_slow",
"llm_requests_hanging",
"budget_alerts",
"db_exceptions",
"""
# only test alert if it's in active alert types
if (
proxy_logging_obj.slack_alerting_instance.alert_types
@ -176,19 +170,19 @@ async def health_services_endpoint(
continue
test_message = "default test message"
if alert_type == "llm_exceptions":
if alert_type == AlertType.llm_exceptions:
test_message = f"LLM Exception test alert"
elif alert_type == "llm_too_slow":
elif alert_type == AlertType.llm_too_slow:
test_message = f"LLM Too Slow test alert"
elif alert_type == "llm_requests_hanging":
elif alert_type == AlertType.llm_requests_hanging:
test_message = f"LLM Requests Hanging test alert"
elif alert_type == "budget_alerts":
elif alert_type == AlertType.budget_alerts:
test_message = f"Budget Alert test alert"
elif alert_type == "db_exceptions":
elif alert_type == AlertType.db_exceptions:
test_message = f"DB Exception test alert"
elif alert_type == "outage_alerts":
elif alert_type == AlertType.outage_alerts:
test_message = f"Outage Alert Exception test alert"
elif alert_type == "daily_reports":
elif alert_type == AlertType.daily_reports:
test_message = f"Daily Reports test alert"
else:
test_message = f"Budget Alert test alert"
@ -200,7 +194,7 @@ async def health_services_endpoint(
await proxy_logging_obj.alerting_handler(
message="This is a test slack alert message",
level="Low",
alert_type="budget_alerts",
alert_type=AlertType.budget_alerts,
)
if prisma_client is not None:

View file

@ -227,25 +227,27 @@ async def send_management_endpoint_alert(
- An internal user is created, updated, or deleted
- A team is created, updated, or deleted
"""
from litellm.integrations.SlackAlerting.types import AlertType
from litellm.proxy.proxy_server import premium_user, proxy_logging_obj
if premium_user is not True:
return
management_function_to_event_name = {
"generate_key_fn": "New Virtual Key Created",
"update_key_fn": "Virtual Key Updated",
"delete_key_fn": "Virtual Key Deleted",
"generate_key_fn": AlertType.new_virtual_key_created,
"update_key_fn": AlertType.virtual_key_updated,
"delete_key_fn": AlertType.virtual_key_deleted,
# Team events
"new_team": "New Team Created",
"update_team": "Team Updated",
"delete_team": "Team Deleted",
"new_team": AlertType.new_team_created,
"update_team": AlertType.team_updated,
"delete_team": AlertType.team_deleted,
# Internal User events
"new_user": "New Internal User Created",
"user_update": "Internal User Updated",
"delete_user": "Internal User Deleted",
"new_user": AlertType.new_internal_user_created,
"user_update": AlertType.internal_user_updated,
"delete_user": AlertType.internal_user_deleted,
}
# Check if alerting is enabled
if (
proxy_logging_obj is not None
and proxy_logging_obj.slack_alerting_instance is not None
@ -253,6 +255,8 @@ async def send_management_endpoint_alert(
# Virtual Key Events
if function_name in management_function_to_event_name:
_event_name: AlertType = management_function_to_event_name[function_name]
key_event = VirtualKeyEvent(
created_by_user_id=user_api_key_dict.user_id or "Unknown",
created_by_user_role=user_api_key_dict.user_role or "Unknown",
@ -260,9 +264,12 @@ async def send_management_endpoint_alert(
request_kwargs=request_kwargs,
)
event_name = management_function_to_event_name[function_name]
# replace all "_" with " " and capitalize
event_name = _event_name.replace("_", " ").title()
await proxy_logging_obj.slack_alerting_instance.send_virtual_key_event_slack(
key_event=key_event, event_name=event_name
key_event=key_event,
event_name=event_name,
alert_type=_event_name,
)

View file

@ -1,42 +1,11 @@
model_list:
- model_name: multimodalembedding@001
- model_name: db-openai-endpoint
litellm_params:
model: vertex_ai/multimodalembedding@001
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: "adroit-crow-413218-a956eef1a2a8.json"
- model_name: text-embedding-ada-002
litellm_params:
model: openai/text-embedding-ada-002 # The `openai/` prefix will call openai.chat.completions.create
api_key: os.environ/OPENAI_API_KEY
- model_name: db-openai-endpoint
litellm_params:
model: openai/gpt-3.5-turbo
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
model_info:
supported_environments: ["development", "production", "staging"]
- model_name: rerank-english-v3.0
litellm_params:
model: cohere/rerank-english-v3.0
api_key: os.environ/COHERE_API_KEY
model_info:
supported_environments: ["production", "staging"]
- model_name: llava-hf
litellm_params:
model: openai/llava-hf/llava-v1.6-vicuna-7b-hf
api_key: fake-key
model_info:
supported_environments: ["production", "staging"]
model: openai/gpt-5
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
general_settings:
service_account_settings:
enforced_params: ["user"]
alerting: ["slack"]
litellm_settings:
drop_params: True
callbacks: ["otel"]
success_callback: ["langfuse"]
failure_callback: ["langfuse"]

View file

@ -49,6 +49,8 @@ from litellm.exceptions import RejectedRequestError
from litellm.integrations.custom_guardrail import CustomGuardrail
from litellm.integrations.custom_logger import CustomLogger
from litellm.integrations.SlackAlerting.slack_alerting import SlackAlerting
from litellm.integrations.SlackAlerting.types import DEFAULT_ALERT_TYPES
from litellm.integrations.SlackAlerting.utils import _add_langfuse_trace_id_to_alert
from litellm.litellm_core_utils.core_helpers import (
_get_parent_otel_span_from_kwargs,
get_litellm_metadata_from_kwargs,
@ -333,12 +335,11 @@ class ProxyLogging:
self.cache_control_check = _PROXY_CacheControlCheck()
self.alerting: Optional[List] = None
self.alerting_threshold: float = 300 # default to 5 min. threshold
self.alert_types: List[AlertType] = list(get_args(AlertType))
self.alert_types: List[AlertType] = DEFAULT_ALERT_TYPES
self.alert_to_webhook_url: Optional[dict] = None
self.slack_alerting_instance: SlackAlerting = SlackAlerting(
alerting_threshold=self.alerting_threshold,
alerting=self.alerting,
alert_types=self.alert_types,
internal_usage_cache=self.internal_usage_cache.dual_cache,
)
self.premium_user = premium_user
@ -644,9 +645,11 @@ class ProxyLogging:
async def failed_tracking_alert(self, error_message: str):
if self.alerting is None:
return
await self.slack_alerting_instance.failed_tracking_alert(
error_message=error_message
)
if self.slack_alerting_instance:
await self.slack_alerting_instance.failed_tracking_alert(
error_message=error_message
)
async def budget_alerts(
self,
@ -705,10 +708,7 @@ class ProxyLogging:
extra_kwargs = {}
alerting_metadata = {}
if request_data is not None:
_url = await self.slack_alerting_instance._add_langfuse_trace_id_to_alert(
request_data=request_data
)
_url = await _add_langfuse_trace_id_to_alert(request_data=request_data)
if _url is not None:
extra_kwargs["🪢 Langfuse Trace"] = _url
@ -744,7 +744,7 @@ class ProxyLogging:
Currently only logs exceptions to sentry
"""
### ALERTING ###
if "db_exceptions" not in self.alert_types:
if AlertType.db_exceptions not in self.alert_types:
return
if isinstance(original_exception, HTTPException):
if isinstance(original_exception.detail, str):
@ -761,7 +761,7 @@ class ProxyLogging:
self.alerting_handler(
message=f"DB read/write call failed: {error_message}",
level="High",
alert_type="db_exceptions",
alert_type=AlertType.db_exceptions,
request_data={},
)
)
@ -796,7 +796,7 @@ class ProxyLogging:
await self.update_request_status(
litellm_call_id=request_data.get("litellm_call_id", ""), status="fail"
)
if "llm_exceptions" in self.alert_types and not isinstance(
if AlertType.llm_exceptions in self.alert_types and not isinstance(
original_exception, HTTPException
):
"""
@ -813,7 +813,7 @@ class ProxyLogging:
self.alerting_handler(
message=f"LLM API call failed: `{exception_str}`",
level="High",
alert_type="llm_exceptions",
alert_type=AlertType.llm_exceptions,
request_data=request_data,
)
)

View file

@ -14,6 +14,8 @@ from typing import Optional
import httpx
from litellm.integrations.SlackAlerting.types import AlertType
# import logging
# logging.basicConfig(level=logging.DEBUG)
sys.path.insert(0, os.path.abspath("../.."))
@ -99,7 +101,7 @@ async def test_get_api_base():
await _pl.alerting_handler(
message=slow_message + request_info,
level="Low",
alert_type="llm_too_slow",
alert_type=AlertType.llm_too_slow,
)
print("passed test_get_api_base")
@ -117,7 +119,7 @@ def test_init():
slack_alerting = SlackAlerting(
alerting_threshold=32,
alerting=["slack"],
alert_types=["llm_exceptions"],
alert_types=[AlertType.llm_exceptions],
internal_usage_cache=DualCache(),
)
assert slack_alerting.alerting_threshold == 32
@ -710,7 +712,7 @@ async def test_region_outage_alerting_called(
If multiple calls fail, outage alert is sent
"""
slack_alerting = SlackAlerting(
alerting=["webhook"], alert_types=["region_outage_alerts"]
alerting=["webhook"], alert_types=[AlertType.region_outage_alerts]
)
litellm.callbacks = [slack_alerting]
@ -829,6 +831,7 @@ async def test_langfuse_trace_id():
- Unit test for `_add_langfuse_trace_id_to_alert` function in slack_alerting.py
"""
from litellm.litellm_core_utils.litellm_logging import Logging
from litellm.integrations.SlackAlerting.utils import _add_langfuse_trace_id_to_alert
litellm.success_callback = ["langfuse"]
@ -856,11 +859,11 @@ async def test_langfuse_trace_id():
slack_alerting = SlackAlerting(
alerting_threshold=32,
alerting=["slack"],
alert_types=["llm_exceptions"],
alert_types=[AlertType.llm_exceptions],
internal_usage_cache=DualCache(),
)
trace_url = await slack_alerting._add_langfuse_trace_id_to_alert(
trace_url = await _add_langfuse_trace_id_to_alert(
request_data={"litellm_logging_obj": litellm_logging_obj}
)

View file

@ -8,7 +8,7 @@ load_dotenv()
import io
import os
from tests.local_testing.test_streaming import streaming_format_tests
from test_streaming import streaming_format_tests
sys.path.insert(
0, os.path.abspath("../..")
@ -933,7 +933,7 @@ async def test_gemini_pro_function_calling_httpx(model, sync_mode):
pytest.fail("An unexpected exception occurred - {}".format(str(e)))
from tests.local_testing.test_completion import response_format_tests
from test_completion import response_format_tests
@pytest.mark.parametrize(

View file

@ -9,7 +9,7 @@ load_dotenv()
import io
import os
from tests.local_testing.test_streaming import streaming_format_tests
from test_streaming import streaming_format_tests
sys.path.insert(
0, os.path.abspath("../..")

View file

@ -829,7 +829,7 @@ def test_vertex_ai_embedding_completion_cost(caplog):
# """
# Relevant issue - https://github.com/BerriAI/litellm/issues/4630
# """
# from tests.local_testing.test_amazing_vertex_completion import load_vertex_ai_credentials
# from test_amazing_vertex_completion import load_vertex_ai_credentials
# load_vertex_ai_credentials()
# os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"

View file

@ -235,7 +235,7 @@ async def test_team_disable_guardrails(mock_acompletion, client_no_auth):
assert e.code == str(403)
from tests.local_testing.test_custom_callback_input import CompletionCustomHandler
from test_custom_callback_input import CompletionCustomHandler
@mock_patch_acompletion()
@ -815,7 +815,7 @@ from litellm.proxy._types import (
)
from litellm.proxy.management_endpoints.internal_user_endpoints import new_user
from litellm.proxy.management_endpoints.team_endpoints import team_member_add
from tests.local_testing.test_key_generate_prisma import prisma_client
from test_key_generate_prisma import prisma_client
@pytest.mark.parametrize(

View file

@ -9,7 +9,7 @@ load_dotenv()
import io
import os
from tests.local_testing.test_streaming import streaming_format_tests
from test_streaming import streaming_format_tests
sys.path.insert(
0, os.path.abspath("../..")

View file

@ -16,7 +16,7 @@ import pytest
from openai import OpenAI
import litellm
from tests.local_testing import stream_chunk_testdata
import stream_chunk_testdata
from litellm import completion, stream_chunk_builder
dotenv.load_dotenv()

View file

@ -1145,7 +1145,7 @@ def test_completion_claude_stream_bad_key():
@pytest.mark.parametrize("provider", ["vertex_ai_beta"]) # ""
def test_vertex_ai_stream(provider):
from tests.local_testing.test_amazing_vertex_completion import (
from test_amazing_vertex_completion import (
load_vertex_ai_credentials,
)
@ -3951,7 +3951,7 @@ def test_unit_test_perplexity_citations_chunk():
@pytest.mark.flaky(retries=3, delay=1)
def test_streaming_tool_calls_valid_json_str(model):
if "vertex_ai" in model:
from tests.local_testing.test_amazing_vertex_completion import (
from test_amazing_vertex_completion import (
load_vertex_ai_credentials,
)

View file

@ -4114,7 +4114,7 @@ async def test_async_text_completion_chat_model_stream():
async def test_completion_codestral_fim_api(model):
try:
if model == "vertex_ai/codestral@2405":
from tests.local_testing.test_amazing_vertex_completion import (
from test_amazing_vertex_completion import (
load_vertex_ai_credentials,
)
@ -4158,7 +4158,7 @@ async def test_completion_codestral_fim_api(model):
async def test_completion_codestral_fim_api_stream(model):
try:
if model == "vertex_ai/codestral@2405":
from tests.local_testing.test_amazing_vertex_completion import (
from test_amazing_vertex_completion import (
load_vertex_ai_credentials,
)

View file

@ -21,8 +21,8 @@ from litellm import (
get_modified_max_tokens,
token_counter,
)
from tests.local_testing.large_text import text
from tests.local_testing.messages_with_counts import (
from large_text import text
from messages_with_counts import (
MESSAGES_TEXT,
MESSAGES_WITH_IMAGES,
MESSAGES_WITH_TOOLS,