llama-stack-mirror

2423 commits 57 branches 151 tags 128 MiB

Author	SHA1	Message	Date
Charlie Doern	d52722b0d1	fix: actually propagate inference metrics currently metrics are only propagated in `chat_completion` and `completion` since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics 4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used 5. add metrics to response Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-06 15:48:41 -04:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00

Author

SHA1

Message

Date

Charlie Doern

d52722b0d1

fix: actually propagate inference metrics

currently metrics are only propagated in `chat_completion` and `completion`

since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated.
3. define a `stream_generator` which counts the tokens and computes the metrics
4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used
5. add metrics to response

Signed-off-by: Charlie Doern <cdoern@redhat.com>

2025-08-06 15:48:41 -04:00

Ashwin Bharambe

2665f00102

chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )

We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb

2025-07-30 23:30:53 -07:00

Renamed from llama_stack/distribution/routers/inference.py (Browse further)

2 commits