feat: improve telemetry (#2590)

# What does this PR do? * Use a single env variable to setup OTEL endpoint * Update telemetry provider doc * Update general telemetry doc with the metric with generate * Left a script to setup telemetry for testing Closes: https://github.com/meta-llama/llama-stack/issues/783 Note to reviewer: the `setup_telemetry.sh` script was useful for me, it was nicely generated by AI, if we don't want it in the repo, and I can delete it, and I would understand. Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-05 12:21:52 +00:00 · 2025-07-04 17:29:09 +02:00 · 2025-07-04 17:29:09 +02:00 · ea966565f6
commit ea966565f6
parent 4eae0cbfa4
11 changed files with 237 additions and 38 deletions
--- a/docs/source/building_applications/telemetry.md
+++ b/docs/source/building_applications/telemetry.md
@ -24,37 +24,106 @@ structured_log_event = SpanStartPayload(name="my_span", parent_span_id="parent_s
 - **Spans**: Represent operations with timing and hierarchical relationships
 - **Traces**: Collection of related spans forming a complete request flow
 ### Metrics
 Llama Stack automatically generates metrics during inference operations. These metrics are aggregated at the **inference request level** and provide insights into token usage and model performance.
 #### Available Metrics
 The following metrics are automatically generated for each inference request:
 | Metric Name | Type | Unit | Description | Labels |
 |-------------|------|------|-------------|--------|
 | `llama_stack_prompt_tokens_total` | Counter | `tokens` | Number of tokens in the input prompt | `model_id`, `provider_id` |
 | `llama_stack_completion_tokens_total` | Counter | `tokens` | Number of tokens in the generated response | `model_id`, `provider_id` |
 | `llama_stack_tokens_total` | Counter | `tokens` | Total tokens used (prompt + completion) | `model_id`, `provider_id` |
 #### Metric Generation Flow
 1. **Token Counting**: During inference operations (chat completion, completion, etc.), the system counts tokens in both input prompts and generated responses
 2. **Metric Construction**: For each request, `MetricEvent` objects are created with the token counts
 3. **Telemetry Logging**: Metrics are sent to the configured telemetry sinks
 4. **OpenTelemetry Export**: When OpenTelemetry is enabled, metrics are exposed as standard OpenTelemetry counters
 #### Metric Aggregation Level
 All metrics are generated and aggregated at the **inference request level**. This means:
 - Each individual inference request generates its own set of metrics
 - Metrics are not pre-aggregated across multiple requests
 - Aggregation (sums, averages, etc.) can be performed by your observability tools (Prometheus, Grafana, etc.)
 - Each metric includes labels for `model_id` and `provider_id` to enable filtering and grouping
 #### Example Metric Event
 ```python
 MetricEvent(
    trace_id="1234567890abcdef",
    span_id="abcdef1234567890",
    metric="total_tokens",
    value=150,
    timestamp=1703123456.789,
    unit="tokens",
    attributes={"model_id": "meta-llama/Llama-3.2-3B-Instruct", "provider_id": "tgi"},
 )
 ```
 #### Querying Metrics
 When using the OpenTelemetry sink, metrics are exposed in standard OpenTelemetry format and can be queried through:
 - **Prometheus**: Scrape metrics from the OpenTelemetry Collector's metrics endpoint
 - **Grafana**: Create dashboards using Prometheus as a data source
 - **OpenTelemetry Collector**: Forward metrics to other observability systems
 Example Prometheus queries:
 ```promql
 # Total tokens used across all models
 sum(llama_stack_tokens_total)
 # Tokens per model
 sum by (model_id) (llama_stack_tokens_total)
 # Average tokens per request
 rate(llama_stack_tokens_total[5m])
 ```
 ### Sinks
- **OpenTelemetry**: Send events to an OpenTelemetry Collector. This is useful for visualizing traces in a tool like Jaeger.
+- **OpenTelemetry**: Send events to an OpenTelemetry Collector. This is useful for visualizing traces in a tool like Jaeger and collecting metrics for Prometheus.
 - **SQLite**: Store events in a local SQLite database. This is needed if you want to query the events later through the Llama Stack API.
 - **Console**: Print events to the console.
 ### Providers
 #### Meta-Reference Provider
-Currently, only the meta-reference provider is implemented. It can be configured to send events to three sink types:
+Currently, only the meta-reference provider is implemented. It can be configured to send events to multiple sink types:
-1) OpenTelemetry Collector
+1) OpenTelemetry Collector (traces and metrics)
-2) SQLite
+2) SQLite (traces only)
-3) Console
+3) Console (all events)
 #### Configuration
-Here's an example that sends telemetry signals to all three sink types. Your configuration might use only one.
+Here's an example that sends telemetry signals to all sink types. Your configuration might use only one or a subset.
 ```yaml
  telemetry:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      service_name: "llama-stack-service"
      sinks: ['console', 'sqlite', 'otel_trace', 'otel_metric']
-      otel_trace_endpoint: "http://localhost:4318/v1/traces"
+      otel_exporter_otlp_endpoint: "http://localhost:4318"
      otel_metric_endpoint: "http://localhost:4318/v1/metrics"
      sqlite_db_path: "/path/to/telemetry.db"
 ```
 **Environment Variables:**
 - `OTEL_EXPORTER_OTLP_ENDPOINT`: OpenTelemetry Collector endpoint (default: `http://localhost:4318`)
 - `OTEL_SERVICE_NAME`: Service name for telemetry (default: empty string)
 - `TELEMETRY_SINKS`: Comma-separated list of sinks (default: `console,sqlite`)
 ### Jaeger to visualize traces
-The `otel` sink works with any service compatible with the OpenTelemetry collector, traces and metrics has two separate endpoints.
+The `otel_trace` sink works with any service compatible with the OpenTelemetry collector. Traces and metrics use separate endpoints but can share the same collector.
 Let's use Jaeger to visualize this data.
 Start a Jaeger instance with the OTLP HTTP endpoint at 4318 and the Jaeger UI at 16686 using the following command:
@ -68,4 +137,7 @@ Once the Jaeger instance is running, you can visualize traces by navigating to h
 ### Querying Traces Stored in SQLite
-The `sqlite` sink allows you to query traces without an external system. Here are some example queries. Refer to the notebook at [Llama Stack Building AI Applications](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for more examples on how to query traces and spaces.
+The `sqlite` sink allows you to query traces without an external system. Here are some example
 queries. Refer to the notebook at [Llama Stack Building AI
 Applications](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for
 more examples on how to query traces and spans.
--- a/docs/source/providers/telemetry/inline_meta-reference.md
+++ b/docs/source/providers/telemetry/inline_meta-reference.md
@ -8,10 +8,9 @@ Meta's reference implementation of telemetry and observability using OpenTelemet
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `otel_trace_endpoint` | `str \| None` | No |  | The OpenTelemetry collector endpoint URL for traces |
+| `otel_exporter_otlp_endpoint` | `str \| None` | No |  | The OpenTelemetry collector endpoint URL (base URL for traces, metrics, and logs). If not set, the SDK will use OTEL_EXPORTER_OTLP_ENDPOINT environment variable. |
 | `otel_metric_endpoint` | `str \| None` | No |  | The OpenTelemetry collector endpoint URL for metrics |
 | `service_name` | `<class 'str'>` | No |  | The service name to use for telemetry |
-| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [<TelemetrySink.CONSOLE: 'console'>, <TelemetrySink.SQLITE: 'sqlite'>] | List of telemetry sinks to enable (possible values: otel, sqlite, console) |
+| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [<TelemetrySink.CONSOLE: 'console'>, <TelemetrySink.SQLITE: 'sqlite'>] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, sqlite, console) |
 | `sqlite_db_path` | `<class 'str'>` | No | ~/.llama/runtime/trace_store.db | The path to the SQLite database to use for storing traces |
 ## Sample Configuration
@ -20,6 +19,7 @@ Meta's reference implementation of telemetry and observability using OpenTelemet
 service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
 sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
 sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/trace_store.db
 otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
 ```
--- a/llama_stack/providers/inline/telemetry/meta_reference/config.py
+++ b/llama_stack/providers/inline/telemetry/meta_reference/config.py
@ -20,13 +20,9 @@ class TelemetrySink(StrEnum):
 class TelemetryConfig(BaseModel):
-    otel_trace_endpoint: str | None = Field(
+    otel_exporter_otlp_endpoint: str | None = Field(
        default=None,
-        description="The OpenTelemetry collector endpoint URL for traces",
+        description="The OpenTelemetry collector endpoint URL (base URL for traces, metrics, and logs). If not set, the SDK will use OTEL_EXPORTER_OTLP_ENDPOINT environment variable.",
    )
    otel_metric_endpoint: str | None = Field(
        default=None,
        description="The OpenTelemetry collector endpoint URL for metrics",
    )
    service_name: str = Field(
        # service name is always the same, use zero-width space to avoid clutter
@ -35,7 +31,7 @@ class TelemetryConfig(BaseModel):
    )
    sinks: list[TelemetrySink] = Field(
        default=[TelemetrySink.CONSOLE, TelemetrySink.SQLITE],
-        description="List of telemetry sinks to enable (possible values: otel, sqlite, console)",
+        description="List of telemetry sinks to enable (possible values: otel_trace, otel_metric, sqlite, console)",
    )
    sqlite_db_path: str = Field(
        default_factory=lambda: (RUNTIME_BASE_DIR / "trace_store.db").as_posix(),
@ -55,4 +51,5 @@ class TelemetryConfig(BaseModel):
            "service_name": "${env.OTEL_SERVICE_NAME:=\u200b}",
            "sinks": "${env.TELEMETRY_SINKS:=console,sqlite}",
            "sqlite_db_path": "${env.SQLITE_STORE_DIR:=" + __distro_dir__ + "}/" + db_name,
            "otel_exporter_otlp_endpoint": "${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}",
        }
--- a/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py
+++ b/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py
@ -86,24 +86,27 @@ class TelemetryAdapter(TelemetryDatasetMixin, Telemetry):
            provider = TracerProvider(resource=resource)
            trace.set_tracer_provider(provider)
            _TRACER_PROVIDER = provider
-            if TelemetrySink.OTEL_TRACE in self.config.sinks:
+
-                if self.config.otel_trace_endpoint is None:
+            # Use single OTLP endpoint for all telemetry signals
-                    raise ValueError("otel_trace_endpoint is required when OTEL_TRACE is enabled")
+            if TelemetrySink.OTEL_TRACE in self.config.sinks or TelemetrySink.OTEL_METRIC in self.config.sinks:
-                span_exporter = OTLPSpanExporter(
+                if self.config.otel_exporter_otlp_endpoint is None:
-                    endpoint=self.config.otel_trace_endpoint,
+                    raise ValueError(
-                )
+                        "otel_exporter_otlp_endpoint is required when OTEL_TRACE or OTEL_METRIC is enabled"
                span_processor = BatchSpanProcessor(span_exporter)
                trace.get_tracer_provider().add_span_processor(span_processor)
            if TelemetrySink.OTEL_METRIC in self.config.sinks:
                if self.config.otel_metric_endpoint is None:
                    raise ValueError("otel_metric_endpoint is required when OTEL_METRIC is enabled")
                metric_reader = PeriodicExportingMetricReader(
                    OTLPMetricExporter(
                        endpoint=self.config.otel_metric_endpoint,
                    )
-                )
+
-                metric_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
+                # Let OpenTelemetry SDK handle endpoint construction automatically
-                metrics.set_meter_provider(metric_provider)
+                # The SDK will read OTEL_EXPORTER_OTLP_ENDPOINT and construct appropriate URLs
                # https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter
                if TelemetrySink.OTEL_TRACE in self.config.sinks:
                    span_exporter = OTLPSpanExporter()
                    span_processor = BatchSpanProcessor(span_exporter)
                    trace.get_tracer_provider().add_span_processor(span_processor)
                if TelemetrySink.OTEL_METRIC in self.config.sinks:
                    metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter())
                    metric_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
                    metrics.set_meter_provider(metric_provider)
            if TelemetrySink.SQLITE in self.config.sinks:
                trace.get_tracer_provider().add_span_processor(SQLiteSpanProcessor(self.config.sqlite_db_path))
            if TelemetrySink.CONSOLE in self.config.sinks:
--- a/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml
+++ b/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml
@ -64,6 +64,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/meta-reference-gpu}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/templates/meta-reference-gpu/run.yaml
+++ b/llama_stack/templates/meta-reference-gpu/run.yaml
@ -54,6 +54,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/meta-reference-gpu}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/templates/open-benchmark/run.yaml
+++ b/llama_stack/templates/open-benchmark/run.yaml
@ -73,6 +73,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/open-benchmark}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/templates/starter/run.yaml
+++ b/llama_stack/templates/starter/run.yaml
@ -193,6 +193,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
  - provider_id: huggingface
    provider_type: inline::huggingface
--- a/llama_stack/templates/vllm-gpu/run.yaml
+++ b/llama_stack/templates/vllm-gpu/run.yaml
@ -53,6 +53,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/vllm-gpu}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/templates/watsonx/run.yaml
+++ b/llama_stack/templates/watsonx/run.yaml
@ -50,6 +50,7 @@ providers:
      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/watsonx}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/scripts/setup_telemetry.sh
+++ b/scripts/setup_telemetry.sh
@ -0,0 +1,121 @@
 #!/usr/bin/env bash
 # Copyright (c) Meta Platforms, Inc. and affiliates.
 # All rights reserved.
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 # Telemetry Setup Script for Llama Stack
 # This script sets up Jaeger, OpenTelemetry Collector, Prometheus, and Grafana using Podman
 # For whoever is interested in testing the telemetry stack, you can run this script to set up the stack.
 #    export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
 #    export TELEMETRY_SINKS=otel_trace,otel_metric
 #    export OTEL_SERVICE_NAME=my-llama-app
 # Then run the distro server
 set -Eeuo pipefail
 CONTAINER_RUNTIME=${CONTAINER_RUNTIME:-docker}
 echo "🚀 Setting up telemetry stack for Llama Stack using Podman..."
 if ! command -v "$CONTAINER_RUNTIME" &> /dev/null; then
  echo "🚨 $CONTAINER_RUNTIME could not be found"
  echo "Docker or Podman is required. Install Docker: https://docs.docker.com/get-docker/ or Podman: https://podman.io/getting-started/installation"
  exit 1
 fi
 # Create a network for the services
 echo "📡 Creating $CONTAINER_RUNTIME network..."
 $CONTAINER_RUNTIME network create llama-telemetry 2>/dev/null || echo "Network already exists"
 # Stop and remove existing containers
 echo "🧹 Cleaning up existing containers..."
 $CONTAINER_RUNTIME stop jaeger otel-collector prometheus grafana 2>/dev/null || true
 $CONTAINER_RUNTIME rm jaeger otel-collector prometheus grafana 2>/dev/null || true
 # Start Jaeger
 echo "🔍 Starting Jaeger..."
 $CONTAINER_RUNTIME run -d --name jaeger \
  --network llama-telemetry \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 16686:16686 \
  -p 14250:14250 \
  -p 9411:9411 \
  docker.io/jaegertracing/all-in-one:latest
 # Start OpenTelemetry Collector
 echo "📊 Starting OpenTelemetry Collector..."
 $CONTAINER_RUNTIME run -d --name otel-collector \
  --network llama-telemetry \
  -p 4318:4318 \
  -p 4317:4317 \
  -p 9464:9464 \
  -p 13133:13133 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml:Z \
  docker.io/otel/opentelemetry-collector-contrib:latest \
  --config /etc/otel-collector-config.yaml
 # Start Prometheus
 echo "📈 Starting Prometheus..."
 $CONTAINER_RUNTIME run -d --name prometheus \
  --network llama-telemetry \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml:Z \
  docker.io/prom/prometheus:latest \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/prometheus \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.console.templates=/etc/prometheus/consoles \
  --storage.tsdb.retention.time=200h \
  --web.enable-lifecycle
 # Start Grafana
 echo "📊 Starting Grafana..."
 $CONTAINER_RUNTIME run -d --name grafana \
  --network llama-telemetry \
  -p 3000:3000 \
  -e GF_SECURITY_ADMIN_PASSWORD=admin \
  -e GF_USERS_ALLOW_SIGN_UP=false \
  docker.io/grafana/grafana:latest
 # Wait for services to start
 echo "⏳ Waiting for services to start..."
 sleep 10
 # Check if services are running
 echo "🔍 Checking service status..."
 $CONTAINER_RUNTIME ps --filter "name=jaeger|otel-collector|prometheus|grafana" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
 echo ""
 echo "✅ Telemetry stack is ready!"
 echo ""
 echo "🌐 Service URLs:"
 echo "   Jaeger UI:        http://localhost:16686"
 echo "   Prometheus:       http://localhost:9090"
 echo "   Grafana:          http://localhost:3000 (admin/admin)"
 echo "   OTEL Collector:   http://localhost:4318 (OTLP endpoint)"
 echo ""
 echo "🔧 Environment variables for Llama Stack:"
 echo "   export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318"
 echo "   export TELEMETRY_SINKS=otel_trace,otel_metric"
 echo "   export OTEL_SERVICE_NAME=my-llama-app"
 echo ""
 echo "📊 Next steps:"
 echo "   1. Set the environment variables above"
 echo "   2. Start your Llama Stack application"
 echo "   3. Make some inference calls to generate metrics"
 echo "   4. Check Jaeger for traces: http://localhost:16686"
 echo "   5. Check Prometheus for metrics: http://localhost:9090"
 echo "   6. Set up Grafana dashboards: http://localhost:3000"
 echo ""
 echo "🔍 To test the setup, run:"
 echo "   curl -X POST http://localhost:5000/v1/inference/chat/completions \\"
 echo "     -H 'Content-Type: application/json' \\"
 echo "     -d '{\"model_id\": \"your-model\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'"
 echo ""
 echo "🧹 To clean up when done:"
 echo "   $CONTAINER_RUNTIME stop jaeger otel-collector prometheus grafana"
 echo "   $CONTAINER_RUNTIME rm jaeger otel-collector prometheus grafana"
 echo "   $CONTAINER_RUNTIME network rm llama-telemetry"