llama-stack-mirror/llama_stack
Charlie Doern 0caef40e0d
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do?

I found a few issues while adding new metrics for various APIs:

currently metrics are only propagated in `chat_completion` and
`completion`

since most providers use the `openai_..` routes as the default in
`llama-stack-client inference chat-completion`, metrics are currently
not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new `openai_` versions of the metric gathering functions which
use `.usage` from the `OpenAI..` response types to gather the metrics
which are already populated.
3. define a `stream_generator` which counts the tokens and computes the
metrics (only for stream=True)
5. add metrics to response


NOTE: I could not add metrics to `openai_completion` where stream=True
because that ONLY returns an `OpenAICompletion` not an AsyncGenerator
that we can manipulate.


acquire the lock, and add event to the span as the other `_log_...`
methods do

some new output:

`llama-stack-client inference chat-completion --message hi`

<img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM"
src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326"
/>


and in the client:

<img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM"
src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007"
/>

these were not previously being recorded nor were they being printed to
the server due to the improper console sink handling

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 13:37:40 -07:00
..
apis refactor: introduce common 'ResourceNotFoundError' exception (#3032) 2025-08-06 10:22:55 -07:00
cli chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
core fix: telemetry fixes (inference and core telemetry) (#2733) 2025-08-06 13:37:40 -07:00
distributions chore: update postgres_demo with new config (#3045) 2025-08-06 07:48:40 -07:00
models chore(api): add mypy coverage to chat_format (#2654) 2025-07-18 11:56:53 +02:00
providers fix: telemetry fixes (inference and core telemetry) (#2733) 2025-08-06 13:37:40 -07:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
testing fix(recording): endpoint resolution (#3013) 2025-08-01 16:23:54 -07:00
ui fix(ci): allow tests to skip llama stack client instantiation (#3052) 2025-08-06 11:15:41 -07:00
__init__.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
schema_utils.py feat(auth): API access control (#2822) 2025-07-24 15:30:48 -07:00