llama-stack-mirror/tests
Charlie Doern 49b729b30a feat: api level request metrics via middleware
add RequestMetricsMiddleware which tracks key metrics related to each request the LLS server will recieve:

1. llama_stack_requests_total: tracks the total amount of requests the server has processed
2. llama_stack_request_duration_seconds: tracks the duration of each request
3. llama_stack_concurrent_requests: tracks concurrently processed requests by the server

The usage of a middleware allows this to be done on the server level without having to add custom handling to each router like the inference router has today for its API specific metrics.

Also, add some unit tests for this functionality

resolves #2597

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-03 13:14:25 -04:00
..
client-sdk/post_training feat: Add nemo customizer (#1448) 2025-03-25 11:01:10 -07:00
common feat(responses): implement full multi-turn support (#2295) 2025-06-02 15:35:49 -07:00
containers feat(ci): add support for running vision inference tests (#2972) 2025-07-31 11:50:42 -07:00
external fix: adjust provider type used in external provider test (#2921) 2025-07-28 10:14:16 -07:00
integration test: Implement vector store search test (#3001) 2025-08-02 15:57:38 -07:00
unit feat: api level request metrics via middleware 2025-08-03 13:14:25 -04:00
verifications chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
__init__.py refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00
README.md docs: revamp testing documentation (#2155) 2025-05-13 11:28:29 -07:00

Llama Stack Tests

Llama Stack has multiple layers of testing done to ensure continuous functionality and prevent regressions to the codebase.

Testing Type Details
Unit unit/README.md
Integration integration/README.md
Verification verifications/README.md