feat: Add opt-in OpenTelemetry auto-instrumentation to Docker images (#4281)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 17s
Python Package Build Test / build (3.13) (push) Successful in 21s
Test Llama Stack Build / build-single-provider (push) Successful in 27s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 37s
Test Llama Stack Build / build (push) Successful in 40s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Unit Tests / unit-tests (3.12) (push) Failing after 1m50s
Unit Tests / unit-tests (3.13) (push) Failing after 2m9s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m41s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m51s
Pre-commit / pre-commit (push) Successful in 2m54s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s

# What does this PR do?

This allows llama-stack users of the Docker image to use OpenTelemetry
like previous versions.

#4127 migrated to automatic instrumentation, but unless we add those
libraries to the image, everyone needs to build a custom image to enable
otel. Also, unless we establish a convention for enabling it, users who
formerly just set config now need to override the entrypoint.

This PR bootstraps OTEL packages, so they are available (only +10MB). It
also prefixes `llama stack run` with `opentelemetry-instrument` when any
`OTEL_*` environment variable is set.

The result is implicit tracing like before, where you don't need a
custom image to use traces or metrics.

## Test Plan

```bash
# Build image
docker build -f containers/Containerfile \
  --build-arg DISTRO_NAME=starter \
  --build-arg INSTALL_MODE=editable \
  --tag llamastack/distribution-starter:otel-test .

# Run with OTEL env to implicitly use `opentelemetry-instrument`. The
# Settings below ensure inbound traces are honored, but no
# "junk traces" like SQL connects are created.
docker run -p 8321:8321 \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318 \
  -e OTEL_SERVICE_NAME=llama-stack \
  -e OTEL_TRACES_SAMPLER=parentbased_traceidratio \
  -e OTEL_TRACES_SAMPLER_ARG=0.0 \
  llamastack/distribution-starter:otel-test
```

Ran a sample flight search agent which is instrumented on the client
side. This and llama-stack target
[otel-tui](https://github.com/ymtdzzz/otel-tui) I verified no root
database spans, yet database spans are attached to incoming traces.


<img width="1608" height="742" alt="screenshot"
src="https://github.com/user-attachments/assets/69f59b74-3054-42cd-947d-a6c0d9472a7c"
/>

Signed-off-by: Adrian Cole <adrian@tetrate.io>
This commit is contained in:
Adrian Cole 2025-12-03 09:03:27 +08:00 committed by GitHub
parent e243892ef0
commit 4237eb4aaa
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -120,6 +120,11 @@ RUN set -eux; \
printf '%s\n' "$deps" | xargs -L1 uv pip install --no-cache; \
fi
# Install OpenTelemetry auto-instrumentation support
RUN set -eux; \
pip install --no-cache opentelemetry-distro opentelemetry-exporter-otlp; \
opentelemetry-bootstrap -a install
# Cleanup
RUN set -eux; \
pip uninstall -y uv; \
@ -135,15 +140,21 @@ RUN cat <<'EOF' >/usr/local/bin/llama-stack-entrypoint.sh
#!/bin/sh
set -e
# Enable OpenTelemetry auto-instrumentation if any OTEL_* variable is set
CMD_PREFIX=""
if env | grep -q '^OTEL_'; then
CMD_PREFIX="opentelemetry-instrument"
fi
if [ -n "$RUN_CONFIG_PATH" ] && [ -f "$RUN_CONFIG_PATH" ]; then
exec llama stack run "$RUN_CONFIG_PATH" "$@"
exec $CMD_PREFIX llama stack run "$RUN_CONFIG_PATH" "$@"
fi
if [ -n "$DISTRO_NAME" ]; then
exec llama stack run "$DISTRO_NAME" "$@"
exec $CMD_PREFIX llama stack run "$DISTRO_NAME" "$@"
fi
exec llama stack run "$@"
exec $CMD_PREFIX llama stack run "$@"
EOF
RUN chmod +x /usr/local/bin/llama-stack-entrypoint.sh