From 4ef898220964c4e36cf7f963d452302f98a2bb4b Mon Sep 17 00:00:00 2001
From: Emilio Garcia <i.am.emilio@gmail.com>
Date: Thu, 13 Nov 2025 15:43:29 -0500
Subject: [PATCH] docs(telemetry): update docs to reflect the telemetry
 re-architecture

---
 docs/docs/building_applications/telemetry.mdx | 201 ++----------------
 1 file changed, 16 insertions(+), 185 deletions(-)
diff --git a/docs/docs/building_applications/telemetry.mdx b/docs/docs/building_applications/telemetry.mdx
index 2f1d80d41..6c10faaf7 100644
--- a/docs/docs/building_applications/telemetry.mdx
+++ b/docs/docs/building_applications/telemetry.mdx
@@ -10,203 +10,34 @@ import TabItem from '@theme/TabItem';
 
 # Telemetry
 
-The Llama Stack uses OpenTelemetry to provide comprehensive tracing, metrics, and logging capabilities.
+The preferred way to instrument llama stack is with OpenTelemetry. Llama Stack enriches the data
+collected by OpenTelemetry to capture helpful information about the performance and behavior of your
+application. Here is an example of how to forward your telemtry to an OTLP collector from llama stack.
 
+```sh
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
+export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
+export OTEL_SERVICE_NAME="llama-stack-server"
 
-## Automatic Metrics Generation
+uv pip install opentelemetry-distro opentelemetry-exporter-otlp
+uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -
 
-Llama Stack automatically generates metrics during inference operations. These metrics are aggregated at the **inference request level** and provide insights into token usage and model performance.
-
-### Available Metrics
-
-The following metrics are automatically generated for each inference request:
-
-| Metric Name | Type | Unit | Description | Labels |
-|-------------|------|------|-------------|--------|
-| `llama_stack_prompt_tokens_total` | Counter | `tokens` | Number of tokens in the input prompt | `model_id`, `provider_id` |
-| `llama_stack_completion_tokens_total` | Counter | `tokens` | Number of tokens in the generated response | `model_id`, `provider_id` |
-| `llama_stack_tokens_total` | Counter | `tokens` | Total tokens used (prompt + completion) | `model_id`, `provider_id` |
-
-### Metric Generation Flow
-
-1. **Token Counting**: During inference operations (chat completion, completion, etc.), the system counts tokens in both input prompts and generated responses
-2. **Metric Construction**: For each request, `MetricEvent` objects are created with the token counts
-3. **Telemetry Logging**: Metrics are sent to the configured telemetry sinks
-4. **OpenTelemetry Export**: When OpenTelemetry is enabled, metrics are exposed as standard OpenTelemetry counters
-
-### Metric Aggregation Level
-
-All metrics are generated and aggregated at the **inference request level**. This means:
-
-- Each individual inference request generates its own set of metrics
-- Metrics are not pre-aggregated across multiple requests
-- Aggregation (sums, averages, etc.) can be performed by your observability tools (Prometheus, Grafana, etc.)
-- Each metric includes labels for `model_id` and `provider_id` to enable filtering and grouping
-
-### Example Metric Event
-
-```python
-MetricEvent(
-    trace_id="1234567890abcdef",
-    span_id="abcdef1234567890",
-    metric="total_tokens",
-    value=150,
-    timestamp=1703123456.789,
-    unit="tokens",
-    attributes={
-        "model_id": "meta-llama/Llama-3.2-3B-Instruct",
-        "provider_id": "tgi"
-    },
-)
+uv run opentelemetry-instrument llama stack run run.yaml
 ```
 
-## Telemetry Sinks
 
-Choose from multiple sink types based on your observability needs:
+### Known issues
 
-<Tabs>
-<TabItem value="opentelemetry" label="OpenTelemetry">
+Some database instrumentation libraries have a known bug where spans get wrapped twice, or do not get connected to a trace.
+To prevent this, you can disable database specific tracing, and rely just on the sqlalchemy tracing. If you are using
+sqlite3, you can disable it like this.
 
-Send events to an OpenTelemetry Collector for integration with observability platforms:
-
-**Use Cases:**
-- Visualizing traces in tools like Jaeger
-- Collecting metrics for Prometheus
-- Integration with enterprise observability stacks
-
-**Features:**
-- Standard OpenTelemetry format
-- Compatible with all OpenTelemetry collectors
-- Supports both traces and metrics
-
-</TabItem>
-<TabItem value="console" label="Console">
-
-Print events to the console for immediate debugging:
-
-**Use Cases:**
-- Development and testing
-- Quick debugging sessions
-- Simple logging without external tools
-
-**Features:**
-- Immediate output visibility
-- No setup required
-- Human-readable format
-
-</TabItem>
-</Tabs>
-
-## Configuration
-
-### Meta-Reference Provider
-
-Currently, only the meta-reference provider is implemented. It can be configured to send events to multiple sink types:
-
-```yaml
-telemetry:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      service_name: "llama-stack-service"
-      sinks: ['console', 'otel_trace', 'otel_metric']
-      otel_exporter_otlp_endpoint: "http://localhost:4318"
+```sh
+export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3"
 ```
 
-### Environment Variables
-
-Configure telemetry behavior using environment variables:
-
-- **`OTEL_EXPORTER_OTLP_ENDPOINT`**: OpenTelemetry Collector endpoint (default: `http://localhost:4318`)
-- **`OTEL_SERVICE_NAME`**: Service name for telemetry (default: empty string)
-- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `[]`)
-
-### Quick Setup: Complete Telemetry Stack
-
-Use the automated setup script to launch the complete telemetry stack (Jaeger, OpenTelemetry Collector, Prometheus, and Grafana):
-
-```bash
-./scripts/telemetry/setup_telemetry.sh
-```
-
-This sets up:
-- **Jaeger UI**: http://localhost:16686 (traces visualization)
-- **Prometheus**: http://localhost:9090 (metrics)
-- **Grafana**: http://localhost:3000 (dashboards with auto-configured data sources)
-- **OTEL Collector**: http://localhost:4318 (OTLP endpoint)
-
-Once running, you can visualize traces by navigating to [Grafana](http://localhost:3000/) and login with login `admin` and password `admin`.
-
-## Querying Metrics
-
-When using the OpenTelemetry sink, metrics are exposed in standard format and can be queried through various tools:
-
-<Tabs>
-<TabItem value="prometheus" label="Prometheus Queries">
-
-Example Prometheus queries for analyzing token usage:
-
-```promql
-# Total tokens used across all models
-sum(llama_stack_tokens_total)
-
-# Tokens per model
-sum by (model_id) (llama_stack_tokens_total)
-
-# Average tokens per request over 5 minutes
-rate(llama_stack_tokens_total[5m])
-
-# Token usage by provider
-sum by (provider_id) (llama_stack_tokens_total)
-```
-
-</TabItem>
-<TabItem value="grafana" label="Grafana Dashboards">
-
-Create dashboards using Prometheus as a data source:
-
-- **Token Usage Over Time**: Line charts showing token consumption trends
-- **Model Performance**: Comparison of different models by token efficiency
-- **Provider Analysis**: Breakdown of usage across different providers
-- **Request Patterns**: Understanding peak usage times and patterns
-
-</TabItem>
-<TabItem value="otlp" label="OpenTelemetry Collector">
-
-Forward metrics to other observability systems:
-
-- Export to multiple backends simultaneously
-- Apply transformations and filtering
-- Integrate with existing monitoring infrastructure
-
-</TabItem>
-</Tabs>
-
-## Best Practices
-
-### 🔍 **Monitoring Strategy**
-- Use OpenTelemetry for production environments
-- Set up alerts on key metrics like token usage and error rates
-
-### 📊 **Metrics Analysis**
-- Track token usage trends to optimize costs
-- Monitor response times across different models
-- Analyze usage patterns to improve resource allocation
-
-### 🚨 **Alerting & Debugging**
-- Set up alerts for unusual token consumption spikes
-- Use trace data to debug performance issues
-- Monitor error rates and failure patterns
-
-### 🔧 **Configuration Management**
-- Use environment variables for flexible deployment
-- Ensure proper network access to OpenTelemetry collectors
-
 
 ## Related Resources
 
-- **[Agents](./agent)** - Monitoring agent execution with telemetry
-- **[Evaluations](./evals)** - Using telemetry data for performance evaluation
-- **[Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Telemetry examples and queries
 - **[OpenTelemetry Documentation](https://opentelemetry.io/)** - Comprehensive observability framework
 - **[Jaeger Documentation](https://www.jaegertracing.io/)** - Distributed tracing visualization