(docs) prometheus metrics document all prometheus metrics (#5989)

* fix doc on prometheus * (docs) clean up prometheus docs * docs show what metrics are deprectaed * doc clarify labels used for bduget metrics * add litellm_remaining_api_key_requests_for_model
2024-09-30 16:38:38 -07:00 · 2024-09-30 16:38:38 -07:00 · 2a7e1e970d
commit 2a7e1e970d
parent ca9c437021
1 changed files with 62 additions and 38 deletions
--- a/docs/my-website/docs/proxy/prometheus.md
+++ b/docs/my-website/docs/proxy/prometheus.md
@ -57,20 +57,18 @@ http://localhost:4000/metrics
 # <proxy_base_url>/metrics
 ```
-## 📈 Metrics Tracked 
+## Virtual Keys, Teams, Internal Users Metrics
 ### Virtual Keys, Teams, Internal Users Metrics
 Use this for for tracking per [user, key, team, etc.](virtual_keys)
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_requests_metric`             | Number of requests made, per `"user", "key", "model", "team", "end-user"`          |
 | `litellm_spend_metric`                | Total Spend, per `"user", "key", "model", "team", "end-user"`                 |
 | `litellm_total_tokens`         | input + output tokens per `"user", "key", "model", "team", "end-user"`     |
 | `litellm_input_tokens`         | input tokens per `"user", "key", "model", "team", "end-user"`              |
 | `litellm_output_tokens`        | output tokens per `"user", "key", "model", "team", "end-user"`             |
-
+## Proxy Level Tracking Metrics
 ### Proxy Level Tracking Metrics
 Use this to track overall LiteLLM Proxy usage.
 - Track Actual traffic rate to proxy 
@ -78,56 +76,75 @@ Use this to track overall LiteLLM Proxy usage.
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
-| `litellm_proxy_failed_requests_metric`             | Total number of failed responses from proxy - the client did not get a success response from litellm proxy `"user", "key", "model", "team", "end-user"`          |
+| `litellm_proxy_failed_requests_metric`             | Total number of failed responses from proxy - the client did not get a success response from litellm proxy. Labels: `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"`          |
-| `litellm_proxy_total_requests_metric`             | Total number of requests made to the proxy server - track number of client side requests `"user", "key", "model", "team", "end-user"`          |
+| `litellm_proxy_total_requests_metric`             | Total number of requests made to the proxy server - track number of client side requests. Labels: `"end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class"`          |
-### LLM API / Provider Metrics
+## LLM API / Provider Metrics
 Use this for LLM API Error monitoring and tracking remaining rate limits and token limits
-#### Labels Tracked for LLM API Metrics
+### Labels Tracked for LLM API Metrics
-```json
+
-litellm_model_name: The name of the LLM model used by LiteLLM
+
-requested_model: The model sent in the request
+| Label | Description |
-model_id: The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id
+|-------|-------------|
-api_base: The API Base of the deployment
+| litellm_model_name | The name of the LLM model used by LiteLLM |
-api_provider: The LLM API provider, used for the provider. Example (azure, openai, vertex_ai)
+| requested_model | The model sent in the request |
-```
+| model_id | The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id |
 | api_base | The API Base of the deployment |
 | api_provider | The LLM API provider, used for the provider. Example (azure, openai, vertex_ai) |
 | hashed_api_key | The hashed api key of the request |
 | api_key_alias | The alias of the api key used |
 | team | The team of the request |
 | team_alias | The alias of the team used |
 | exception_status | The status of the exception, if any |
 | exception_class | The class of the exception, if any |
 ### Success and Failure Metrics for LLM API
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
- `litellm_deployment_success_responses`              |  Total number of successful LLM API calls for deployment                               |
+ `litellm_deployment_success_responses`              | Total number of successful LLM API calls for deployment. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
-| `litellm_deployment_failure_responses`              | Total number of failed LLM API calls for a specific LLM deploymeny. exception_status is the status of the exception from the llm api                                   |
+| `litellm_deployment_failure_responses`              | Total number of failed LLM API calls for a specific LLM deployment. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
-| `litellm_deployment_total_requests`                 | Total number of LLM API calls for deployment - success + failure                      |
+| `litellm_deployment_total_requests`                 | Total number of LLM API calls for deployment - success + failure. Labels: `"requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
 | `litellm_remaining_requests_metric`             | Track `x-ratelimit-remaining-requests` returned from LLM API Deployment |
 | `litellm_remaining_tokens`                | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment |
 | `litellm_deployment_state`             | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. |
 | `litellm_deployment_latency_per_output_token`       | Latency per output token for deployment                                                          |
-### Load Balancing, Fallback, Cooldown Metrics
+### Remaining Requests and Tokens Metrics
 Use this for tracking [litellm router](../routing) load balancing metrics
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
-| `litellm_deployment_cooled_down`             |  Number of times a deployment has been cooled down by LiteLLM load balancing logic. exception_status is the status of the exception that caused the deployment to be cooled down |
+| `litellm_remaining_requests_metric`             | Track `x-ratelimit-remaining-requests` returned from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
-| `litellm_deployment_successful_fallbacks`           |  Number of successful fallback requests from primary model -> fallback model        |
+| `litellm_remaining_tokens`                | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
 | `litellm_deployment_failed_fallbacks`               | Number of failed fallback requests from primary model -> fallback model            |
-
+### Deployment State Metrics
 ### Request Latency Metrics 
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
-| `litellm_request_total_latency_metric`             | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `litellm_call_id`, `model` |
+| `litellm_deployment_state`             | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider"` |
-| `litellm_llm_api_latency_metric`             | latency (seconds) for just the LLM API call - tracked for labels `litellm_call_id`, `model` |
+| `litellm_deployment_latency_per_output_token`       | Latency per output token for deployment. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
 #### Fallback (Failover) Metrics
 ### Budget Metrics
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
-| `litellm_remaining_team_budget_metric`             | Remaining Budget for Team (A team created on LiteLLM) |
+| `litellm_deployment_cooled_down`             | Number of times a deployment has been cooled down by LiteLLM load balancing logic. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "exception_status"` |
-| `litellm_remaining_api_key_budget_metric`                | Remaining Budget for API Key (A key Created on LiteLLM)|
+| `litellm_deployment_successful_fallbacks`           | Number of successful fallback requests from primary model -> fallback model. Labels: `"requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
 | `litellm_deployment_failed_fallbacks`               | Number of failed fallback requests from primary model -> fallback model. Labels: `"requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class"` |
 ## Request Latency Metrics 
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_request_total_latency_metric`             | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
 | `litellm_llm_api_latency_metric`             | Latency (seconds) for just the LLM API call - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
 ## Virtual Key - Budget, Rate Limit Metrics
 Metrics used to track LiteLLM Proxy Budgeting and Rate limiting logic
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_remaining_team_budget_metric`             | Remaining Budget for Team (A team created on LiteLLM) Labels: `"team_id", "team_alias"`|
 | `litellm_remaining_api_key_budget_metric`                | Remaining Budget for API Key (A key Created on LiteLLM) Labels: `"hashed_api_key", "api_key_alias"`|
 | `litellm_remaining_api_key_requests_for_model`                | Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
 | `litellm_remaining_api_key_tokens_for_model`                | Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: `"hashed_api_key", "api_key_alias", "model"`|
@ -154,4 +171,11 @@ litellm_settings:
 Link to Grafana Dashboards made by LiteLLM community 
-https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard
+https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard
 ## Deprecated Metrics 
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_llm_api_failed_requests_metric`             | **deprecated** use `litellm_proxy_failed_requests_metric` |
 | `litellm_requests_metric`             | **deprecated** use `litellm_proxy_total_requests_metric` |