From 4ef898220964c4e36cf7f963d452302f98a2bb4b Mon Sep 17 00:00:00 2001 From: Emilio Garcia Date: Thu, 13 Nov 2025 15:43:29 -0500 Subject: [PATCH] docs(telemetry): update docs to reflect the telemetry re-architecture --- docs/docs/building_applications/telemetry.mdx | 201 ++---------------- 1 file changed, 16 insertions(+), 185 deletions(-) diff --git a/docs/docs/building_applications/telemetry.mdx b/docs/docs/building_applications/telemetry.mdx index 2f1d80d41..6c10faaf7 100644 --- a/docs/docs/building_applications/telemetry.mdx +++ b/docs/docs/building_applications/telemetry.mdx @@ -10,203 +10,34 @@ import TabItem from '@theme/TabItem'; # Telemetry -The Llama Stack uses OpenTelemetry to provide comprehensive tracing, metrics, and logging capabilities. +The preferred way to instrument llama stack is with OpenTelemetry. Llama Stack enriches the data +collected by OpenTelemetry to capture helpful information about the performance and behavior of your +application. Here is an example of how to forward your telemtry to an OTLP collector from llama stack. +```sh +export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318" +export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf +export OTEL_SERVICE_NAME="llama-stack-server" -## Automatic Metrics Generation +uv pip install opentelemetry-distro opentelemetry-exporter-otlp +uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement - -Llama Stack automatically generates metrics during inference operations. These metrics are aggregated at the **inference request level** and provide insights into token usage and model performance. - -### Available Metrics - -The following metrics are automatically generated for each inference request: - -| Metric Name | Type | Unit | Description | Labels | -|-------------|------|------|-------------|--------| -| `llama_stack_prompt_tokens_total` | Counter | `tokens` | Number of tokens in the input prompt | `model_id`, `provider_id` | -| `llama_stack_completion_tokens_total` | Counter | `tokens` | Number of tokens in the generated response | `model_id`, `provider_id` | -| `llama_stack_tokens_total` | Counter | `tokens` | Total tokens used (prompt + completion) | `model_id`, `provider_id` | - -### Metric Generation Flow - -1. **Token Counting**: During inference operations (chat completion, completion, etc.), the system counts tokens in both input prompts and generated responses -2. **Metric Construction**: For each request, `MetricEvent` objects are created with the token counts -3. **Telemetry Logging**: Metrics are sent to the configured telemetry sinks -4. **OpenTelemetry Export**: When OpenTelemetry is enabled, metrics are exposed as standard OpenTelemetry counters - -### Metric Aggregation Level - -All metrics are generated and aggregated at the **inference request level**. This means: - -- Each individual inference request generates its own set of metrics -- Metrics are not pre-aggregated across multiple requests -- Aggregation (sums, averages, etc.) can be performed by your observability tools (Prometheus, Grafana, etc.) -- Each metric includes labels for `model_id` and `provider_id` to enable filtering and grouping - -### Example Metric Event - -```python -MetricEvent( - trace_id="1234567890abcdef", - span_id="abcdef1234567890", - metric="total_tokens", - value=150, - timestamp=1703123456.789, - unit="tokens", - attributes={ - "model_id": "meta-llama/Llama-3.2-3B-Instruct", - "provider_id": "tgi" - }, -) +uv run opentelemetry-instrument llama stack run run.yaml ``` -## Telemetry Sinks -Choose from multiple sink types based on your observability needs: +### Known issues - - +Some database instrumentation libraries have a known bug where spans get wrapped twice, or do not get connected to a trace. +To prevent this, you can disable database specific tracing, and rely just on the sqlalchemy tracing. If you are using +sqlite3, you can disable it like this. -Send events to an OpenTelemetry Collector for integration with observability platforms: - -**Use Cases:** -- Visualizing traces in tools like Jaeger -- Collecting metrics for Prometheus -- Integration with enterprise observability stacks - -**Features:** -- Standard OpenTelemetry format -- Compatible with all OpenTelemetry collectors -- Supports both traces and metrics - - - - -Print events to the console for immediate debugging: - -**Use Cases:** -- Development and testing -- Quick debugging sessions -- Simple logging without external tools - -**Features:** -- Immediate output visibility -- No setup required -- Human-readable format - - - - -## Configuration - -### Meta-Reference Provider - -Currently, only the meta-reference provider is implemented. It can be configured to send events to multiple sink types: - -```yaml -telemetry: - - provider_id: meta-reference - provider_type: inline::meta-reference - config: - service_name: "llama-stack-service" - sinks: ['console', 'otel_trace', 'otel_metric'] - otel_exporter_otlp_endpoint: "http://localhost:4318" +```sh +export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3" ``` -### Environment Variables - -Configure telemetry behavior using environment variables: - -- **`OTEL_EXPORTER_OTLP_ENDPOINT`**: OpenTelemetry Collector endpoint (default: `http://localhost:4318`) -- **`OTEL_SERVICE_NAME`**: Service name for telemetry (default: empty string) -- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `[]`) - -### Quick Setup: Complete Telemetry Stack - -Use the automated setup script to launch the complete telemetry stack (Jaeger, OpenTelemetry Collector, Prometheus, and Grafana): - -```bash -./scripts/telemetry/setup_telemetry.sh -``` - -This sets up: -- **Jaeger UI**: http://localhost:16686 (traces visualization) -- **Prometheus**: http://localhost:9090 (metrics) -- **Grafana**: http://localhost:3000 (dashboards with auto-configured data sources) -- **OTEL Collector**: http://localhost:4318 (OTLP endpoint) - -Once running, you can visualize traces by navigating to [Grafana](http://localhost:3000/) and login with login `admin` and password `admin`. - -## Querying Metrics - -When using the OpenTelemetry sink, metrics are exposed in standard format and can be queried through various tools: - - - - -Example Prometheus queries for analyzing token usage: - -```promql -# Total tokens used across all models -sum(llama_stack_tokens_total) - -# Tokens per model -sum by (model_id) (llama_stack_tokens_total) - -# Average tokens per request over 5 minutes -rate(llama_stack_tokens_total[5m]) - -# Token usage by provider -sum by (provider_id) (llama_stack_tokens_total) -``` - - - - -Create dashboards using Prometheus as a data source: - -- **Token Usage Over Time**: Line charts showing token consumption trends -- **Model Performance**: Comparison of different models by token efficiency -- **Provider Analysis**: Breakdown of usage across different providers -- **Request Patterns**: Understanding peak usage times and patterns - - - - -Forward metrics to other observability systems: - -- Export to multiple backends simultaneously -- Apply transformations and filtering -- Integrate with existing monitoring infrastructure - - - - -## Best Practices - -### 🔍 **Monitoring Strategy** -- Use OpenTelemetry for production environments -- Set up alerts on key metrics like token usage and error rates - -### 📊 **Metrics Analysis** -- Track token usage trends to optimize costs -- Monitor response times across different models -- Analyze usage patterns to improve resource allocation - -### 🚨 **Alerting & Debugging** -- Set up alerts for unusual token consumption spikes -- Use trace data to debug performance issues -- Monitor error rates and failure patterns - -### 🔧 **Configuration Management** -- Use environment variables for flexible deployment -- Ensure proper network access to OpenTelemetry collectors - ## Related Resources -- **[Agents](./agent)** - Monitoring agent execution with telemetry -- **[Evaluations](./evals)** - Using telemetry data for performance evaluation -- **[Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Telemetry examples and queries - **[OpenTelemetry Documentation](https://opentelemetry.io/)** - Comprehensive observability framework - **[Jaeger Documentation](https://www.jaegertracing.io/)** - Distributed tracing visualization