DB Transaction Queue Health Metrics

This commit is contained in:
Ishaan Jaff 2025-04-04 21:16:12 -07:00
parent 8c3670e192
commit 8559bcc252
2 changed files with 25 additions and 0 deletions

View file

@ -71,4 +71,16 @@ litellm_settings:
supported_call_types: [] # Optional: Set cache for proxy, but not on the actual llm api call supported_call_types: [] # Optional: Set cache for proxy, but not on the actual llm api call
``` ```
## Monitoring
LiteLLM emits the following prometheus metrics to monitor the health/status of the in memory buffer and redis buffer.
| Metric Name | Description | Storage Type |
|-----------------------------------------------------|-----------------------------------------------------------------------------|--------------|
| `litellm_pod_lock_manager_size` | Indicates which pod has the lock to write updates to the database. | Redis |
| `litellm_in_memory_daily_spend_update_queue_size` | Number of items in the in-memory daily spend update queue. These are the aggregate spend logs for each user. | In-Memory |
| `litellm_redis_daily_spend_update_queue_size` | Number of items in the Redis daily spend update queue. These are the aggregate spend logs for each user. | Redis |
| `litellm_in_memory_spend_update_queue_size` | In-memory aggregate spend values for keys, users, teams, team members, etc.| In-Memory |
| `litellm_redis_spend_update_queue_size` | Redis aggregate spend values for keys, users, teams, etc. | Redis |

View file

@ -242,6 +242,19 @@ litellm_settings:
| `litellm_redis_fails` | Number of failed redis calls | | `litellm_redis_fails` | Number of failed redis calls |
| `litellm_self_latency` | Histogram latency for successful litellm api call | | `litellm_self_latency` | Histogram latency for successful litellm api call |
#### DB Transaction Queue Health Metrics
Use these metrics to monitor the health of the DB Transaction Queue. Eg. Monitoring the size of the in-memory and redis buffers.
| Metric Name | Description | Storage Type |
|-----------------------------------------------------|-----------------------------------------------------------------------------|--------------|
| `litellm_pod_lock_manager_size` | Indicates which pod has the lock to write updates to the database. | Redis |
| `litellm_in_memory_daily_spend_update_queue_size` | Number of items in the in-memory daily spend update queue. These are the aggregate spend logs for each user. | In-Memory |
| `litellm_redis_daily_spend_update_queue_size` | Number of items in the Redis daily spend update queue. These are the aggregate spend logs for each user. | Redis |
| `litellm_in_memory_spend_update_queue_size` | In-memory aggregate spend values for keys, users, teams, team members, etc.| In-Memory |
| `litellm_redis_spend_update_queue_size` | Redis aggregate spend values for keys, users, teams, etc. | Redis |
## **🔥 LiteLLM Maintained Grafana Dashboards ** ## **🔥 LiteLLM Maintained Grafana Dashboards **