llama-stack-mirror/docs/source/building_applications/telemetry.md at ea966565f68ee34d759ae20942cdec4cb36d2784

phoenix-oss/llama-stack-mirror

Fork 1

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-06 05:59:13 +00:00

Sébastien Han ea966565f6

Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s

Details

Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s

Details

Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s

Details

Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s

Details

Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s

Details

Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 13s

Details

Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 14s

Details

Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 15s

Details

Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 13s

Details

Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s

Details

Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 14s

Details

Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 17s

Details

Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 20s

Details

Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 12s

Details

Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 16s

Details

Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 17s

Details

Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 10s

Details

Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 20s

Details

Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 20s

Details

Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 10s

Details

Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 18s

Details

Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 17s

Details

Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 10s

Details

Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 8s

Details

Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 10s

Details

Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 9s

Details

Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 6s

Details

Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 7s

Details

Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s

Details

Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s

Details

Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s

Details

Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s

Details

Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s

Details

Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s

Details

Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s

Details

Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s

Details

Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s

Details

Test Llama Stack Build / generate-matrix (push) Successful in 3s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s

Details

Python Package Build Test / build (3.13) (push) Failing after 0s

Details

Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s

Details

Test Llama Stack Build / build-single-provider (push) Failing after 6s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s

Details

Update ReadTheDocs / update-readthedocs (push) Failing after 4s

Details

Test Llama Stack Build / build (push) Failing after 4s

Details

Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s

Details

Test External Providers / test-external-providers (venv) (push) Failing after 5s

Details

Unit Tests / unit-tests (3.13) (push) Failing after 4s

Details

Unit Tests / unit-tests (3.12) (push) Failing after 7s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s

Details

Python Package Build Test / build (3.12) (push) Failing after 49s

Details

Pre-commit / pre-commit (push) Successful in 1m40s

Details

feat: improve telemetry (#2590 )

# What does this PR do?

* Use a single env variable to setup OTEL endpoint
* Update telemetry provider doc
* Update general telemetry doc with the metric with generate
* Left a script to setup telemetry for testing

Closes: https://github.com/meta-llama/llama-stack/issues/783

Note to reviewer: the `setup_telemetry.sh` script was useful for me, it
was nicely generated by AI, if we don't want it in the repo, and I can
delete it, and I would understand.

Signed-off-by: Sébastien Han <seb@redhat.com>

2025-07-04 17:29:09 +02:00

5.8 KiB

Raw Blame History

Telemetry

The Llama Stack telemetry system provides comprehensive tracing, metrics, and logging capabilities. It supports multiple sink types including OpenTelemetry, SQLite, and Console output.

Events

The telemetry system supports three main types of events:

Unstructured Log Events: Free-form log messages with severity levels

unstructured_log_event = UnstructuredLogEvent(
    message="This is a log message", severity=LogSeverity.INFO
)

Metric Events: Numerical measurements with units

metric_event = MetricEvent(metric="my_metric", value=10, unit="count")

Structured Log Events: System events like span start/end. Extensible to add more structured log types.

structured_log_event = SpanStartPayload(name="my_span", parent_span_id="parent_span_id")

Spans and Traces

Spans: Represent operations with timing and hierarchical relationships
Traces: Collection of related spans forming a complete request flow

Metrics

Llama Stack automatically generates metrics during inference operations. These metrics are aggregated at the inference request level and provide insights into token usage and model performance.

Available Metrics

The following metrics are automatically generated for each inference request:

Metric Name	Type	Unit	Description	Labels
`llama_stack_prompt_tokens_total`	Counter	`tokens`	Number of tokens in the input prompt	`model_id`, `provider_id`
`llama_stack_completion_tokens_total`	Counter	`tokens`	Number of tokens in the generated response	`model_id`, `provider_id`
`llama_stack_tokens_total`	Counter	`tokens`	Total tokens used (prompt + completion)	`model_id`, `provider_id`

Metric Generation Flow

Token Counting: During inference operations (chat completion, completion, etc.), the system counts tokens in both input prompts and generated responses
Metric Construction: For each request, MetricEvent objects are created with the token counts
Telemetry Logging: Metrics are sent to the configured telemetry sinks
OpenTelemetry Export: When OpenTelemetry is enabled, metrics are exposed as standard OpenTelemetry counters

Metric Aggregation Level

All metrics are generated and aggregated at the inference request level. This means:

Each individual inference request generates its own set of metrics
Metrics are not pre-aggregated across multiple requests
Aggregation (sums, averages, etc.) can be performed by your observability tools (Prometheus, Grafana, etc.)
Each metric includes labels for model_id and provider_id to enable filtering and grouping

Example Metric Event

MetricEvent(
    trace_id="1234567890abcdef",
    span_id="abcdef1234567890",
    metric="total_tokens",
    value=150,
    timestamp=1703123456.789,
    unit="tokens",
    attributes={"model_id": "meta-llama/Llama-3.2-3B-Instruct", "provider_id": "tgi"},
)

Querying Metrics

When using the OpenTelemetry sink, metrics are exposed in standard OpenTelemetry format and can be queried through:

Prometheus: Scrape metrics from the OpenTelemetry Collector's metrics endpoint
Grafana: Create dashboards using Prometheus as a data source
OpenTelemetry Collector: Forward metrics to other observability systems

Example Prometheus queries:

# Total tokens used across all models
sum(llama_stack_tokens_total)

# Tokens per model
sum by (model_id) (llama_stack_tokens_total)

# Average tokens per request
rate(llama_stack_tokens_total[5m])

Sinks

OpenTelemetry: Send events to an OpenTelemetry Collector. This is useful for visualizing traces in a tool like Jaeger and collecting metrics for Prometheus.
SQLite: Store events in a local SQLite database. This is needed if you want to query the events later through the Llama Stack API.
Console: Print events to the console.

Providers

Meta-Reference Provider

Currently, only the meta-reference provider is implemented. It can be configured to send events to multiple sink types:

OpenTelemetry Collector (traces and metrics)
SQLite (traces only)
Console (all events)

Configuration

Here's an example that sends telemetry signals to all sink types. Your configuration might use only one or a subset.

  telemetry:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      service_name: "llama-stack-service"
      sinks: ['console', 'sqlite', 'otel_trace', 'otel_metric']
      otel_exporter_otlp_endpoint: "http://localhost:4318"
      sqlite_db_path: "/path/to/telemetry.db"

Environment Variables:

OTEL_EXPORTER_OTLP_ENDPOINT: OpenTelemetry Collector endpoint (default: http://localhost:4318)
OTEL_SERVICE_NAME: Service name for telemetry (default: empty string)
TELEMETRY_SINKS: Comma-separated list of sinks (default: console,sqlite)

Jaeger to visualize traces

The otel_trace sink works with any service compatible with the OpenTelemetry collector. Traces and metrics use separate endpoints but can share the same collector.

Start a Jaeger instance with the OTLP HTTP endpoint at 4318 and the Jaeger UI at 16686 using the following command:

$ docker run --pull always --rm --name jaeger \
  -p 16686:16686 -p 4318:4318 \
  jaegertracing/jaeger:2.1.0

Once the Jaeger instance is running, you can visualize traces by navigating to http://localhost:16686/.

Querying Traces Stored in SQLite

The sqlite sink allows you to query traces without an external system. Here are some example queries. Refer to the notebook at Llama Stack Building AI Applications for more examples on how to query traces and spans.

5.8 KiB Raw Blame History

Telemetry

Events

Spans and Traces

Metrics

Available Metrics

Metric Generation Flow

Metric Aggregation Level

Example Metric Event

Querying Metrics

Sinks

Providers

Meta-Reference Provider

Configuration

Jaeger to visualize traces

Querying Traces Stored in SQLite

5.8 KiB

Raw Blame History