mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-23 03:09:41 +00:00
currently metrics are only propagated in `chat_completion` and `completion` since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics 4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used 5. add metrics to response Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|---|---|---|
| .. | ||
| bedrock | ||
| common | ||
| datasetio | ||
| inference | ||
| kvstore | ||
| memory | ||
| responses | ||
| scoring | ||
| sqlstore | ||
| telemetry | ||
| tools | ||
| vector_io | ||
| __init__.py | ||
| pagination.py | ||
| scheduler.py | ||