llama-stack-mirror/llama_stack
Charlie Doern 49b729b30a feat: api level request metrics via middleware
add RequestMetricsMiddleware which tracks key metrics related to each request the LLS server will recieve:

1. llama_stack_requests_total: tracks the total amount of requests the server has processed
2. llama_stack_request_duration_seconds: tracks the duration of each request
3. llama_stack_concurrent_requests: tracks concurrently processed requests by the server

The usage of a middleware allows this to be done on the server level without having to add custom handling to each router like the inference router has today for its API specific metrics.

Also, add some unit tests for this functionality

resolves #2597

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-03 13:14:25 -04:00
..
apis chore: Enabling Integration tests for Weaviate (#2882) 2025-07-31 20:29:50 -04:00
cli refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
core feat: api level request metrics via middleware 2025-08-03 13:14:25 -04:00
models chore(api): add mypy coverage to chat_format (#2654) 2025-07-18 11:56:53 +02:00
providers refactor: Remove double filtering based on score threshold (#3019) 2025-08-02 15:57:03 -07:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
templates refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
testing fix(recording): endpoint resolution (#3013) 2025-08-01 16:23:54 -07:00
ui feat(UI): adding MVP playground UI (#2828) 2025-07-30 19:44:16 -07:00
__init__.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
schema_utils.py feat(auth): API access control (#2822) 2025-07-24 15:30:48 -07:00