llama-stack-mirror/llama_stack/providers/utils
Charlie Doern d52722b0d1 fix: actually propagate inference metrics
currently metrics are only propagated in `chat_completion` and `completion`

since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated.
3. define a `stream_generator` which counts the tokens and computes the metrics
4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used
5. add metrics to response

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 15:48:41 -04:00
..
bedrock feat: drop python 3.10 support (#2469) 2025-06-19 12:07:14 +05:30
common chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
datasetio chore(misc): make tests and starter faster (#3042) 2025-08-05 14:55:05 -07:00
inference fix: actually propagate inference metrics 2025-08-06 15:48:41 -04:00
kvstore chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
memory refactor: Remove double filtering based on score threshold (#3019) 2025-08-02 15:57:03 -07:00
responses chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
scoring chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
sqlstore chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
telemetry fix: actually propagate inference metrics 2025-08-06 15:48:41 -04:00
tools chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
vector_io chore: Enabling Integration tests for Weaviate (#2882) 2025-07-31 20:29:50 -04:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
pagination.py chore(refact): move paginate_records fn outside of datasetio (#2137) 2025-05-12 10:56:14 -07:00
scheduler.py chore: bump python supported version to 3.12 (#2475) 2025-06-24 09:22:04 +05:30