llama-stack-mirror/llama_stack/core
Charlie Doern d52722b0d1 fix: actually propagate inference metrics
currently metrics are only propagated in `chat_completion` and `completion`

since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated.
3. define a `stream_generator` which counts the tokens and computes the metrics
4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used
5. add metrics to response

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 15:48:41 -04:00
..
access_control chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
routers fix: actually propagate inference metrics 2025-08-06 15:48:41 -04:00
routing_tables feat: create unregister shield API endpoint in Llama Stack (#2853) 2025-08-05 07:33:46 -07:00
server chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
store chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
ui chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
utils chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
__init__.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
build.py chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
build_conda_env.sh chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
build_container.sh chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
build_venv.sh refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
client.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
common.sh refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
configure.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
datatypes.py refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
distribution.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
external.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
inspect.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
library_client.py chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
providers.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
request_headers.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
resolver.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
stack.py chore: rename templates to distributions (#3035) 2025-08-04 11:34:17 -07:00
start_stack.sh refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00