llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

History

Charlie Doern 0caef40e0d fix: telemetry fixes (inference and core telemetry) (#2733 ) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>		2025-08-06 13:37:40 -07:00
..
bedrock	feat: drop python 3.10 support (#2469 )	2025-06-19 12:07:14 +05:30
common	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
datasetio	chore(misc): make tests and starter faster (#3042 )	2025-08-05 14:55:05 -07:00
inference	fix: telemetry fixes (inference and core telemetry) (#2733 )	2025-08-06 13:37:40 -07:00
kvstore	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
memory	refactor: Remove double filtering based on score threshold (#3019 )	2025-08-02 15:57:03 -07:00
responses	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
scoring	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
sqlstore	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
telemetry	fix: telemetry fixes (inference and core telemetry) (#2733 )	2025-08-06 13:37:40 -07:00
tools	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
vector_io	chore: Enabling Integration tests for Weaviate (#2882 )	2025-07-31 20:29:50 -04:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
pagination.py	chore(refact): move paginate_records fn outside of datasetio (#2137 )	2025-05-12 10:56:14 -07:00
scheduler.py	chore: bump python supported version to 3.12 (#2475 )	2025-06-24 09:22:04 +05:30