fix: telemetry fixes (inference and core telemetry) (#2733)

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 02:03:44 +00:00

# What does this PR do?

I found a few issues while adding new metrics for various APIs:

currently metrics are only propagated in `chat_completion` and
`completion`

since most providers use the `openai_..` routes as the default in
`llama-stack-client inference chat-completion`, metrics are currently
not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new `openai_` versions of the metric gathering functions which
use `.usage` from the `OpenAI..` response types to gather the metrics
which are already populated.
3. define a `stream_generator` which counts the tokens and computes the
metrics (only for stream=True)
5. add metrics to response


NOTE: I could not add metrics to `openai_completion` where stream=True
because that ONLY returns an `OpenAICompletion` not an AsyncGenerator
that we can manipulate.


acquire the lock, and add event to the span as the other `_log_...`
methods do

some new output:

`llama-stack-client inference chat-completion --message hi`

<img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM"
src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326"
/>


and in the client:

<img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM"
src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007"
/>

these were not previously being recorded nor were they being printed to
the server due to the improper console sink handling

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>

This commit is contained in:

Charlie Doern

2025-08-06 16:37:40 -04:00

• committed by

GitHub

parent c252dfa3ef

commit 0caef40e0d

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

26 changed files with 1595 additions and 246 deletions

									
										2

llama_stack/providers/utils/telemetry/tracing.py
									
										View file
										
				@ -81,7 +81,7 @@ BACKGROUND_LOGGER = None

				class BackgroundLogger:

				    def __init__(self, api: Telemetry, capacity: int = 1000):

				    def __init__(self, api: Telemetry, capacity: int = 100000):

				        self.api = api

				        self.log_queue = queue.Queue(maxsize=capacity)

				        self.worker_thread = threading.Thread(target=self._process_logs, daemon=True)

Rows
Columns

fix: telemetry fixes (inference and core telemetry) (#2733)

2 llama_stack/providers/utils/telemetry/tracing.py Unescape Escape View file

2

llama_stack/providers/utils/telemetry/tracing.py

View file