llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-24 01:03:55 +00:00

History

Charlie Doern d52722b0d1 fix: actually propagate inference metrics currently metrics are only propagated in `chat_completion` and `completion` since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics 4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used 5. add metrics to response Signed-off-by: Charlie Doern <cdoern@redhat.com>		2025-08-06 15:48:41 -04:00
..
inline	chore(misc): make tests and starter faster (#3042 )	2025-08-05 14:55:05 -07:00
registry	feat: Add openAI compatible APIs to Qdrant (#2465 )	2025-08-01 00:41:34 -04:00
remote	chore(misc): make tests and starter faster (#3042 )	2025-08-05 14:55:05 -07:00
utils	fix: actually propagate inference metrics	2025-08-06 15:48:41 -04:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	feat: create unregister shield API endpoint in Llama Stack (#2853 )	2025-08-05 07:33:46 -07:00