llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-23 06:09:40 +00:00

History

Charlie Doern d52722b0d1 fix: actually propagate inference metrics currently metrics are only propagated in `chat_completion` and `completion` since most providers use the openai_.. routes as the default in llama-stack-client inference chat-completion, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new openai_ versions of the metric gathering functions which use .usage from the OpenAI.. response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics 4. use a NEW span and log_metrics because the span of the request ends before this processing is complete, leading to no logging unless a custom span is used 5. add metrics to response Signed-off-by: Charlie Doern <cdoern@redhat.com>		2025-08-06 15:48:41 -04:00
..
__init__.py	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
embedding_mixin.py	feat(registry): more flexible model lookup (#2859 )	2025-07-22 15:22:48 -07:00
inference_store.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
litellm_openai_mixin.py	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 )	2025-08-03 12:08:56 -07:00
model_registry.py	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
openai_compat.py	fix: sambanova inference provider (#2996 )	2025-08-01 09:09:14 -07:00
openai_mixin.py	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 )	2025-07-23 06:49:40 -04:00
prompt_adapter.py	fix(ollama): Download remote image URLs for Ollama (#2551 )	2025-06-30 20:36:11 +05:30