llama-stack-mirror/llama_stack/providers/utils/telemetry
Charlie Doern d52722b0d1 fix: actually propagate inference metrics
currently metrics are only propagated in `chat_completion` and `completion`

since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated.
3. define a `stream_generator` which counts the tokens and computes the metrics
4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used
5. add metrics to response

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 15:48:41 -04:00
..
__init__.py kill unnecessarily large imports from telemetry init 2024-12-08 16:57:16 -08:00
dataset_mixin.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
sqlite_trace_store.py chore(test): fix flaky telemetry tests (#2815) 2025-07-22 12:30:14 -07:00
trace_protocol.py chore: update pre-commit hook versions (#2708) 2025-07-10 16:47:59 +02:00
tracing.py fix: actually propagate inference metrics 2025-08-06 15:48:41 -04:00