(docs) prometheus metrics document all prometheus metrics (#5989)

* fix doc on prometheus

* (docs) clean up prometheus docs

* docs show what metrics are deprectaed

* doc clarify labels used for bduget metrics

* add litellm_remaining_api_key_requests_for_model
This commit is contained in:
Ishaan Jaff 2024-09-30 16:38:38 -07:00 committed by GitHub
parent ca9c437021
commit 2a7e1e970d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -57,20 +57,18 @@ http://localhost:4000/metrics
# <proxy_base_url>/metrics # <proxy_base_url>/metrics
``` ```
## 📈 Metrics Tracked ## Virtual Keys, Teams, Internal Users Metrics
### Virtual Keys, Teams, Internal Users Metrics
Use this for for tracking per [user, key, team, etc.](virtual_keys) Use this for for tracking per [user, key, team, etc.](virtual_keys)
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
| `litellm_requests_metric` | Number of requests made, per `"user", "key", "model", "team", "end-user"` |
| `litellm_spend_metric` | Total Spend, per `"user", "key", "model", "team", "end-user"` | | `litellm_spend_metric` | Total Spend, per `"user", "key", "model", "team", "end-user"` |
| `litellm_total_tokens` | input + output tokens per `"user", "key", "model", "team", "end-user"` | | `litellm_total_tokens` | input + output tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_input_tokens` | input tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_output_tokens` | output tokens per `"user", "key", "model", "team", "end-user"` |
## Proxy Level Tracking Metrics
### Proxy Level Tracking Metrics
Use this to track overall LiteLLM Proxy usage. Use this to track overall LiteLLM Proxy usage.
- Track Actual traffic rate to proxy - Track Actual traffic rate to proxy
@ -78,56 +76,75 @@ Use this to track overall LiteLLM Proxy usage.
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
| `litellm_proxy_failed_requests_metric` | Total number of failed responses from proxy - the client did not get a success response from litellm proxy `"user", "key", "model", "team", "end-user"` | | `litellm_proxy_failed_requests_metric` | Total number of failed responses from proxy - the client did not get a success response from litellm proxy. Labels: `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"` |
| `litellm_proxy_total_requests_metric` | Total number of requests made to the proxy server - track number of client side requests `"user", "key", "model", "team", "end-user"` | | `litellm_proxy_total_requests_metric` | Total number of requests made to the proxy server - track number of client side requests. Labels: `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"` |
### LLM API / Provider Metrics ## LLM API / Provider Metrics
Use this for LLM API Error monitoring and tracking remaining rate limits and token limits Use this for LLM API Error monitoring and tracking remaining rate limits and token limits
#### Labels Tracked for LLM API Metrics ### Labels Tracked for LLM API Metrics
```json
litellm_model_name: The name of the LLM model used by LiteLLM
requested_model: The model sent in the request | Label | Description |
model_id: The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id |-------|-------------|
api_base: The API Base of the deployment | litellm_model_name | The name of the LLM model used by LiteLLM |
api_provider: The LLM API provider, used for the provider. Example (azure, openai, vertex_ai) | requested_model | The model sent in the request |
``` | model_id | The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id |
| api_base | The API Base of the deployment |
| api_provider | The LLM API provider, used for the provider. Example (azure, openai, vertex_ai) |
| hashed_api_key | The hashed api key of the request |
| api_key_alias | The alias of the api key used |
| team | The team of the request |
| team_alias | The alias of the team used |
| exception_status | The status of the exception, if any |
| exception_class | The class of the exception, if any |
### Success and Failure Metrics for LLM API
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
`litellm_deployment_success_responses` | Total number of successful LLM API calls for deployment | `litellm_deployment_success_responses` | Total number of successful LLM API calls for deployment. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
| `litellm_deployment_failure_responses` | Total number of failed LLM API calls for a specific LLM deploymeny. exception_status is the status of the exception from the llm api | | `litellm_deployment_failure_responses` | Total number of failed LLM API calls for a specific LLM deployment. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
| `litellm_deployment_total_requests` | Total number of LLM API calls for deployment - success + failure | | `litellm_deployment_total_requests` | Total number of LLM API calls for deployment - success + failure. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
| `litellm_remaining_requests_metric` | Track `x-ratelimit-remaining-requests` returned from LLM API Deployment |
| `litellm_remaining_tokens` | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment |
| `litellm_deployment_state` | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. |
| `litellm_deployment_latency_per_output_token` | Latency per output token for deployment |
### Load Balancing, Fallback, Cooldown Metrics ### Remaining Requests and Tokens Metrics
Use this for tracking [litellm router](../routing) load balancing metrics
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
| `litellm_deployment_cooled_down` | Number of times a deployment has been cooled down by LiteLLM load balancing logic. exception_status is the status of the exception that caused the deployment to be cooled down | | `litellm_remaining_requests_metric` | Track `x-ratelimit-remaining-requests` returned from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
| `litellm_deployment_successful_fallbacks` | Number of successful fallback requests from primary model -> fallback model | | `litellm_remaining_tokens` | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
| `litellm_deployment_failed_fallbacks` | Number of failed fallback requests from primary model -> fallback model |
### Deployment State Metrics
### Request Latency Metrics
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `litellm_call_id`, `model` | | `litellm_deployment_state` | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider"` |
| `litellm_llm_api_latency_metric` | latency (seconds) for just the LLM API call - tracked for labels `litellm_call_id`, `model` | | `litellm_deployment_latency_per_output_token` | Latency per output token for deployment. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
#### Fallback (Failover) Metrics
### Budget Metrics
| Metric Name | Description | | Metric Name | Description |
|----------------------|--------------------------------------| |----------------------|--------------------------------------|
| `litellm_remaining_team_budget_metric` | Remaining Budget for Team (A team created on LiteLLM) | | `litellm_deployment_cooled_down` | Number of times a deployment has been cooled down by LiteLLM load balancing logic. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "exception_status"` |
| `litellm_remaining_api_key_budget_metric` | Remaining Budget for API Key (A key Created on LiteLLM)| | `litellm_deployment_successful_fallbacks` | Number of successful fallback requests from primary model -> fallback model. Labels: `"requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
| `litellm_deployment_failed_fallbacks` | Number of failed fallback requests from primary model -> fallback model. Labels: `"requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
## Request Latency Metrics
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
| `litellm_llm_api_latency_metric` | Latency (seconds) for just the LLM API call - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
## Virtual Key - Budget, Rate Limit Metrics
Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_remaining_team_budget_metric` | Remaining Budget for Team (A team created on LiteLLM) Labels: `"team_id", "team_alias"`|
| `litellm_remaining_api_key_budget_metric` | Remaining Budget for API Key (A key Created on LiteLLM) Labels: `"hashed_api_key", "api_key_alias"`|
| `litellm_remaining_api_key_requests_for_model` | Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
| `litellm_remaining_api_key_tokens_for_model` | Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
@ -154,4 +171,11 @@ litellm_settings:
Link to Grafana Dashboards made by LiteLLM community Link to Grafana Dashboards made by LiteLLM community
https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard
## Deprecated Metrics
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_llm_api_failed_requests_metric` | **deprecated** use `litellm_proxy_failed_requests_metric` |
| `litellm_requests_metric` | **deprecated** use `litellm_proxy_total_requests_metric` |