phoenix/litellm-mirror

Fork 1

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-24 18:24:20 +00:00

Dimitri Papadopoulos Orfanos 5e2fd49dd3

Fix typos (#10232 )

2025-04-23 20:59:25 -07:00

18 KiB

Raw Blame History

import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Alerting / Webhooks

Get alerts for:

Category	Alert Type
LLM Performance	Hanging API calls, Slow API calls, Failed API calls, Model outage alerting
Budget & Spend	Budget tracking per key/user, Soft budget alerts, Weekly & Monthly spend reports per Team/Tag
System Health	Failed database read/writes
Daily Reports	Top 5 slowest LLM deployments, Top 5 LLM deployments with most failed requests, Weekly & Monthly spend per Team/Tag

Works across:

Slack
Discord
Microsoft Teams

Quick Start

Set up a slack alert channel to receive alerts from proxy.

Step 1: Add a Slack Webhook URL to env

Get a slack webhook url from https://api.slack.com/messaging/webhooks

You can also use Discord Webhooks, see here

Set SLACK_WEBHOOK_URL in your proxy env to enable Slack alerts.

export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/<>/<>/<>"

Step 2: Setup Proxy

general_settings: 
    alerting: ["slack"]
    alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+ 
    spend_report_frequency: "1d" # [Optional] set as 1d, 2d, 30d .... Specify how often you want a Spend Report to be sent
    
    # [OPTIONAL ALERTING ARGS]
    alerting_args:
        daily_report_frequency: 43200  # 12 hours in seconds
        report_check_interval: 3600    # 1 hour in seconds
        budget_alert_ttl: 86400        # 24 hours in seconds
        outage_alert_ttl: 60           # 1 minute in seconds
        region_outage_alert_ttl: 60    # 1 minute in seconds
        minor_outage_alert_threshold: 5 
        major_outage_alert_threshold: 10
        max_outage_alert_list_size: 1000
        log_to_console: false

Start proxy

$ litellm --config /path/to/config.yaml

Step 3: Test it!

curl -X GET 'http://0.0.0.0:4000/health/services?service=slack' \
-H 'Authorization: Bearer sk-1234'

Advanced

Redacting Messages from Alerts

By default alerts show the messages/input passed to the LLM. If you want to redact this from slack alerting set the following setting on your config

general_settings:
  alerting: ["slack"]
  alert_types: ["spend_reports"] 

litellm_settings:
  redact_messages_in_exceptions: True

Soft Budget Alerts for Virtual Keys

Use this to send an alert when a key/team is close to it's budget running out

Step 1. Create a virtual key with a soft budget

Set the soft_budget to 0.001

curl -X 'POST' \
  'http://localhost:4000/key/generate' \
  -H 'accept: application/json' \
  -H 'x-goog-api-key: sk-1234' \
  -H 'Content-Type: application/json' \
  -d '{
  "key_alias": "prod-app1",
  "team_id": "113c1a22-e347-4506-bfb2-b320230ea414",
  "soft_budget": 0.001
}'

Step 2. Send a request to the proxy with the virtual key

curl http://0.0.0.0:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-Nb5eCf427iewOlbxXIH4Ow" \
-d '{
  "model": "openai/gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "this is a test request, write a short poem"
    }
  ]
}'

Step 3. Check slack for Expected Alert

Add Metadata to alerts

Add alerting metadata to proxy calls for debugging.

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [], 
    extra_body={
        "metadata": {
            "alerting_metadata": {
                "hello": "world"
            }
        }
    }
)

Expected Response

Select specific alert types

Set alert_types if you want to Opt into only specific alert types. When alert_types is not set, all Default Alert Types are enabled.

👉 See all alert types here

general_settings:
  alerting: ["slack"]
  alert_types: [
    "llm_exceptions",
    "llm_too_slow",
    "llm_requests_hanging",
    "budget_alerts",
    "spend_reports",
    "db_exceptions",
    "daily_reports",
    "cooldown_deployment",
    "new_model_added",
  ]

Map slack channels to alert type

Use this if you want to set specific channels per alert type

This allows you to do the following

llm_exceptions -> go to slack channel #llm-exceptions
spend_reports -> go to slack channel #llm-spend-reports

Set alert_to_webhook_url on your config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/

general_settings: 
  master_key: sk-1234
  alerting: ["slack"]
  alerting_threshold: 0.0001 # (Seconds) set an artificially low threshold for testing alerting
  alert_to_webhook_url: {
    "llm_exceptions": "https://hooks.slack.com/services/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",
    "llm_too_slow": "https://hooks.slack.com/services/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",
    "llm_requests_hanging": "https://hooks.slack.com/services/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",
    "budget_alerts": "https://hooks.slack.com/services/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",
    "db_exceptions": "https://hooks.slack.com/services/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",
    "daily_reports": "https://hooks.slack.com/services/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",
    "spend_reports": "https://hooks.slack.com/services/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",
    "cooldown_deployment": "https://hooks.slack.com/services/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",
    "new_model_added": "https://hooks.slack.com/services/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",
    "outage_alerts": "https://hooks.slack.com/services/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",
  }

litellm_settings:
  success_callback: ["langfuse"]

Provide multiple slack channels for a given alert type

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/

general_settings: 
  master_key: sk-1234
  alerting: ["slack"]
  alerting_threshold: 0.0001 # (Seconds) set an artificially low threshold for testing alerting
  alert_to_webhook_url: {
    "llm_exceptions": ["os.environ/SLACK_WEBHOOK_URL", "os.environ/SLACK_WEBHOOK_URL_2"],
    "llm_too_slow": ["https://webhook.site/7843a980-a494-4967-80fb-d502dbc16886", "https://webhook.site/28cfb179-f4fb-4408-8129-729ff55cf213"],
    "llm_requests_hanging": ["os.environ/SLACK_WEBHOOK_URL_5", "os.environ/SLACK_WEBHOOK_URL_6"],
    "budget_alerts": ["os.environ/SLACK_WEBHOOK_URL_7", "os.environ/SLACK_WEBHOOK_URL_8"],
    "db_exceptions": ["os.environ/SLACK_WEBHOOK_URL_9", "os.environ/SLACK_WEBHOOK_URL_10"],
    "daily_reports": ["os.environ/SLACK_WEBHOOK_URL_11", "os.environ/SLACK_WEBHOOK_URL_12"],
    "spend_reports": ["os.environ/SLACK_WEBHOOK_URL_13", "os.environ/SLACK_WEBHOOK_URL_14"],
    "cooldown_deployment": ["os.environ/SLACK_WEBHOOK_URL_15", "os.environ/SLACK_WEBHOOK_URL_16"],
    "new_model_added": ["os.environ/SLACK_WEBHOOK_URL_17", "os.environ/SLACK_WEBHOOK_URL_18"],
    "outage_alerts": ["os.environ/SLACK_WEBHOOK_URL_19", "os.environ/SLACK_WEBHOOK_URL_20"],
  }

litellm_settings:
  success_callback: ["langfuse"]

Test it - send a valid llm request - expect to see a llm_too_slow alert in it's own slack channel

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, Claude gm!"}
    ]
}'

MS Teams Webhooks

MS Teams provides a slack compatible webhook url that you can use for alerting

Quick Start

Get a webhook url for your Microsoft Teams channel
Add it to your .env

SLACK_WEBHOOK_URL="https://berriai.webhook.office.com/webhookb2/...6901/IncomingWebhook/b55fa0c2a48647be8e6effedcd540266/e04b1092-4a3e-44a2-ab6b-29a0a4854d1d"

Add it to your litellm config

model_list: 
    model_name: "azure-model"
    litellm_params:
        model: "azure/gpt-35-turbo"
        api_key: "my-bad-key" # 👈 bad key

general_settings: 
    alerting: ["slack"]
    alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+

Run health check!

Call the proxy /health/services endpoint to test if your alerting connection is correctly setup.

curl --location 'http://0.0.0.0:4000/health/services?service=slack' \
--header 'Authorization: Bearer sk-1234'

Expected Response

Discord Webhooks

Discord provides a slack compatible webhook url that you can use for alerting

Quick Start

Get a webhook url for your discord channel
Append /slack to your discord webhook - it should look like

"https://discord.com/api/webhooks/1240030362193760286/cTLWt5ATn1gKmcy_982rl5xmYHsrM1IWJdmCL1AyOmU9JdQXazrp8L1_PYgUtgxj8x4f/slack"

Add it to your litellm config

model_list: 
    model_name: "azure-model"
    litellm_params:
        model: "azure/gpt-35-turbo"
        api_key: "my-bad-key" # 👈 bad key

general_settings: 
    alerting: ["slack"]
    alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+ 

environment_variables:
    SLACK_WEBHOOK_URL: "https://discord.com/api/webhooks/1240030362193760286/cTLWt5ATn1gKmcy_982rl5xmYHsrM1IWJdmCL1AyOmU9JdQXazrp8L1_PYgUtgxj8x4f/slack"

[BETA] Webhooks for Budget Alerts

Note: This is a beta feature, so the spec might change.

Set a webhook to get notified for budget alerts.

Setup config.yaml

Add url to your environment, for testing you can use a link from here

export WEBHOOK_URL="https://webhook.site/6ab090e8-c55f-4a23-b075-3209f5c57906"

Add 'webhook' to config.yaml

general_settings: 
  alerting: ["webhook"] # 👈 KEY CHANGE

Start proxy

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Test it!

curl -X GET --location 'http://0.0.0.0:4000/health/services?service=webhook' \
--header 'Authorization: Bearer sk-1234'

Expected Response

{
  "spend": 1, # the spend for the 'event_group'
  "max_budget": 0, # the 'max_budget' set for the 'event_group'
  "token": "88dc28d0f030c55ed4ab77ed8faf098196cb1c05df778539800c9f1243fe6b4b",
  "user_id": "default_user_id",
  "team_id": null,
  "user_email": null,
  "key_alias": null,
  "projected_exceeded_data": null,
  "projected_spend": null,
  "event": "budget_crossed", # Literal["budget_crossed", "threshold_crossed", "projected_limit_exceeded"]
  "event_group": "user",
  "event_message": "User Budget: Budget Crossed"
}

API Spec for Webhook Event

spend float: The current spend amount for the 'event_group'.
max_budget float or null: The maximum allowed budget for the 'event_group'. null if not set.
token str: A hashed value of the key, used for authentication or identification purposes.
customer_id str or null: The ID of the customer associated with the event (optional).
internal_user_id str or null: The ID of the internal user associated with the event (optional).
team_id str or null: The ID of the team associated with the event (optional).
user_email str or null: The email of the internal user associated with the event (optional).
key_alias str or null: An alias for the key associated with the event (optional).
projected_exceeded_date str or null: The date when the budget is projected to be exceeded, returned when 'soft_budget' is set for key (optional).
projected_spend float or null: The projected spend amount, returned when 'soft_budget' is set for key (optional).
event Literal["budget_crossed", "threshold_crossed", "projected_limit_exceeded"]: The type of event that triggered the webhook. Possible values are:
- "spend_tracked": Emitted whenever spend is tracked for a customer id.
- "budget_crossed": Indicates that the spend has exceeded the max budget.
- "threshold_crossed": Indicates that spend has crossed a threshold (currently sent when 85% and 95% of budget is reached).
- "projected_limit_exceeded": For "key" only - Indicates that the projected spend is expected to exceed the soft budget threshold.
event_group Literal["customer", "internal_user", "key", "team", "proxy"]: The group associated with the event. Possible values are:
- "customer": The event is related to a specific customer
- "internal_user": The event is related to a specific internal user.
- "key": The event is related to a specific key.
- "team": The event is related to a team.
- "proxy": The event is related to a proxy.
event_message str: A human-readable description of the event.

Region-outage alerting (✨ Enterprise feature)

:::info Get a free 2-week license :::

Setup alerts if a provider region is having an outage.

general_settings:
    alerting: ["slack"]
    alert_types: ["region_outage_alerts"]

By default this will trigger if multiple models in a region fail 5+ requests in 1 minute. '400' status code errors are not counted (i.e. BadRequestErrors).

Control thresholds with:

general_settings:
    alerting: ["slack"]
    alert_types: ["region_outage_alerts"] 
    alerting_args:
        region_outage_alert_ttl: 60 # time-window in seconds
        minor_outage_alert_threshold: 5 # number of errors to trigger a minor alert
        major_outage_alert_threshold: 10 # number of errors to trigger a major alert

All Possible Alert Types

👉 Here is how you can set specific alert types

LLM-related Alerts

Alert Type	Description	Default On
`llm_exceptions`	Alerts for LLM API exceptions	✅
`llm_too_slow`	Notifications for LLM responses slower than the set threshold	✅
`llm_requests_hanging`	Alerts for LLM requests that are not completing	✅
`cooldown_deployment`	Alerts when a deployment is put into cooldown	✅
`new_model_added`	Notifications when a new model is added to litellm proxy through /model/new	✅
`outage_alerts`	Alerts when a specific LLM deployment is facing an outage	✅
`region_outage_alerts`	Alerts when a specific LLM region is facing an outage. Example us-east-1	✅

Budget and Spend Alerts

Alert Type	Description	Default On
`budget_alerts`	Notifications related to budget limits or thresholds	✅
`spend_reports`	Periodic reports on spending across teams or tags	✅
`failed_tracking_spend`	Alerts when spend tracking fails	✅
`daily_reports`	Daily Spend reports	✅
`fallback_reports`	Weekly Reports on LLM fallback occurrences	✅

Database Alerts

Alert Type	Description	Default On
`db_exceptions`	Notifications for database-related exceptions	✅

Management Endpoint Alerts - Virtual Key, Team, Internal User

Alert Type	Description	Default On
`new_virtual_key_created`	Notifications when a new virtual key is created	❌
`virtual_key_updated`	Alerts when a virtual key is modified	❌
`virtual_key_deleted`	Notifications when a virtual key is removed	❌
`new_team_created`	Alerts for the creation of a new team	❌
`team_updated`	Notifications when team details are modified	❌
`team_deleted`	Alerts when a team is deleted	❌
`new_internal_user_created`	Notifications for new internal user accounts	❌
`internal_user_updated`	Alerts when an internal user's details are changed	❌
`internal_user_deleted`	Notifications when an internal user account is removed	❌

`alerting_args` Specification

Parameter	Default	Description
`daily_report_frequency`	43200 (12 hours)	Frequency of receiving deployment latency/failure reports in seconds
`report_check_interval`	3600 (1 hour)	How often to check if a report should be sent (background process) in seconds
`budget_alert_ttl`	86400 (24 hours)	Cache TTL for budget alerts to prevent spam when budget is crossed
`outage_alert_ttl`	60 (1 minute)	Time window for collecting model outage errors in seconds
`region_outage_alert_ttl`	60 (1 minute)	Time window for collecting region-based outage errors in seconds
`minor_outage_alert_threshold`	5	Number of errors that trigger a minor outage alert (400 errors not counted)
`major_outage_alert_threshold`	10	Number of errors that trigger a major outage alert (400 errors not counted)
`max_outage_alert_list_size`	1000	Maximum number of errors to store in cache per model/region
`log_to_console`	false	If true, prints alerting payload to console as a `.warning` log.

18 KiB Raw Blame History

Alerting / Webhooks

Quick Start

Step 1: Add a Slack Webhook URL to env

Step 2: Setup Proxy

Step 3: Test it!

Advanced

Redacting Messages from Alerts

Soft Budget Alerts for Virtual Keys

Add Metadata to alerts

Select specific alert types

Map slack channels to alert type

MS Teams Webhooks

Quick Start

Discord Webhooks

Quick Start

[BETA] Webhooks for Budget Alerts

API Spec for Webhook Event

Region-outage alerting (✨ Enterprise feature)

All Possible Alert Types

alerting_args Specification

18 KiB

Raw Blame History

`alerting_args` Specification