litellm/docs/my-website/docs/proxy/prometheus.md
Krish Dholakia 2b9db05e08
feat(proxy_cli.py): add new 'log_config' cli param (#6352)
* feat(proxy_cli.py): add new 'log_config' cli param

Allows passing logging.conf to uvicorn on startup

* docs(cli.md): add logging conf to uvicorn cli docs

* fix(get_llm_provider_logic.py): fix default api base for litellm_proxy

Fixes https://github.com/BerriAI/litellm/issues/6332

* feat(openai_like/embedding): Add support for jina ai embeddings

Closes https://github.com/BerriAI/litellm/issues/6337

* docs(deploy.md): update entrypoint.sh filepath post-refactor

Fixes outdated docs

* feat(prometheus.py): emit time_to_first_token metric on prometheus

Closes https://github.com/BerriAI/litellm/issues/6334

* fix(prometheus.py): only emit time to first token metric if stream is True

enables more accurate ttft usage

* test: handle vertex api instability

* fix(get_llm_provider_logic.py): fix import

* fix(openai.py): fix deepinfra default api base

* fix(anthropic/transformation.py): remove anthropic beta header (#6361)
2024-10-21 21:25:58 -07:00

10 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Image from '@theme/IdealImage';

📈 Prometheus metrics

:::info

Prometheus metrics is on LiteLLM Enterprise starting at $250/mo

Enterprise Pricing

Contact us here to get a free trial

:::

LiteLLM Exposes a /metrics endpoint for Prometheus to Poll

Quick Start

If you're using the LiteLLM CLI with litellm --config proxy_config.yaml then you need to pip install prometheus_client==0.20.0. This is already pre-installed on the litellm Docker image

Add this to your proxy config.yaml

model_list:
 - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
litellm_settings:
  callbacks: ["prometheus"]

Start the proxy

litellm --config config.yaml --debug

Test Request

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ]
}'

View Metrics on /metrics, Visit http://localhost:4000/metrics

http://localhost:4000/metrics

# <proxy_base_url>/metrics

Virtual Keys, Teams, Internal Users Metrics

Use this for for tracking per user, key, team, etc.

Metric Name Description
litellm_spend_metric Total Spend, per "user", "key", "model", "team", "end-user"
litellm_total_tokens input + output tokens per "user", "key", "model", "team", "end-user"
litellm_input_tokens input tokens per "user", "key", "model", "team", "end-user"
litellm_output_tokens output tokens per "user", "key", "model", "team", "end-user"

Proxy Level Tracking Metrics

Use this to track overall LiteLLM Proxy usage.

  • Track Actual traffic rate to proxy
  • Number of client side requests and failures for requests made to proxy
Metric Name Description
litellm_proxy_failed_requests_metric Total number of failed responses from proxy - the client did not get a success response from litellm proxy. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"
litellm_proxy_total_requests_metric Total number of requests made to the proxy server - track number of client side requests. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"

LLM API / Provider Metrics

Use this for LLM API Error monitoring and tracking remaining rate limits and token limits

Labels Tracked for LLM API Metrics

Label Description
litellm_model_name The name of the LLM model used by LiteLLM
requested_model The model sent in the request
model_id The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id
api_base The API Base of the deployment
api_provider The LLM API provider, used for the provider. Example (azure, openai, vertex_ai)
hashed_api_key The hashed api key of the request
api_key_alias The alias of the api key used
team The team of the request
team_alias The alias of the team used
exception_status The status of the exception, if any
exception_class The class of the exception, if any

Success and Failure Metrics for LLM API

Metric Name Description
litellm_deployment_success_responses Total number of successful LLM API calls for deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"
litellm_deployment_failure_responses Total number of failed LLM API calls for a specific LLM deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"
litellm_deployment_total_requests Total number of LLM API calls for deployment - success + failure. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"

Remaining Requests and Tokens Metrics

Metric Name Description
litellm_remaining_requests_metric Track x-ratelimit-remaining-requests returned from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"
litellm_remaining_tokens Track x-ratelimit-remaining-tokens return from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"

Deployment State Metrics

Metric Name Description
litellm_deployment_state The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: "litellm_model_name", "model_id", "api_base", "api_provider"
litellm_deployment_latency_per_output_token Latency per output token for deployment. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"

Fallback (Failover) Metrics

Metric Name Description
litellm_deployment_cooled_down Number of times a deployment has been cooled down by LiteLLM load balancing logic. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "exception_status"
litellm_deployment_successful_fallbacks Number of successful fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"
litellm_deployment_failed_fallbacks Number of failed fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"

Request Latency Metrics

Metric Name Description
litellm_request_total_latency_metric Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels model, hashed_api_key, api_key_alias, team, team_alias
litellm_llm_api_latency_metric Latency (seconds) for just the LLM API call - tracked for labels model, hashed_api_key, api_key_alias, team, team_alias
litellm_llm_api_time_to_first_token_metric Time to first token for LLM API call - tracked for labels model, hashed_api_key, api_key_alias, team, team_alias [Note: only emitted for streaming requests]

Virtual Key - Budget, Rate Limit Metrics

Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic

Metric Name Description
litellm_remaining_team_budget_metric Remaining Budget for Team (A team created on LiteLLM) Labels: "team_id", "team_alias"
litellm_remaining_api_key_budget_metric Remaining Budget for API Key (A key Created on LiteLLM) Labels: "hashed_api_key", "api_key_alias"
litellm_remaining_api_key_requests_for_model Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model"
litellm_remaining_api_key_tokens_for_model Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model"

Monitor System Health

To monitor the health of litellm adjacent services (redis / postgres), do:

model_list:
 - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
litellm_settings:
  service_callback: ["prometheus_system"]
Metric Name Description
litellm_redis_latency histogram latency for redis calls
litellm_redis_fails Number of failed redis calls
litellm_self_latency Histogram latency for successful litellm api call

**🔥 LiteLLM Maintained Grafana Dashboards **

Link to Grafana Dashboards maintained by LiteLLM

https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard

Here is a screenshot of the metrics you can monitor with the LiteLLM Grafana Dashboard

<Image img={require('../../img/grafana_1.png')} />

<Image img={require('../../img/grafana_2.png')} />

<Image img={require('../../img/grafana_3.png')} />

Deprecated Metrics

Metric Name Description
litellm_llm_api_failed_requests_metric deprecated use litellm_proxy_failed_requests_metric
litellm_requests_metric deprecated use litellm_proxy_total_requests_metric