llama-stack-mirror/docs
Charlie Doern 49b729b30a feat: api level request metrics via middleware
add RequestMetricsMiddleware which tracks key metrics related to each request the LLS server will recieve:

1. llama_stack_requests_total: tracks the total amount of requests the server has processed
2. llama_stack_request_duration_seconds: tracks the duration of each request
3. llama_stack_concurrent_requests: tracks concurrently processed requests by the server

The usage of a middleware allows this to be done on the server level without having to add custom handling to each router like the inference router has today for its API specific metrics.

Also, add some unit tests for this functionality

resolves #2597

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-03 13:14:25 -04:00
..
_static fix: remove unused DPO parameters from schema and tests (#2988) 2025-07-31 09:11:08 -07:00
notebooks chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
openapi_generator chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
resources Several documentation fixes and fix link to API reference 2025-02-04 14:00:43 -08:00
source feat: api level request metrics via middleware 2025-08-03 13:14:25 -04:00
zero_to_hero_guide refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
conftest.py fix: sleep after notebook test 2025-03-23 14:03:35 -07:00
contbuild.sh Fix broken links with docs 2024-11-22 20:42:17 -08:00
dog.jpg Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
getting_started.ipynb chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
getting_started_llama4.ipynb chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
getting_started_llama_api.ipynb chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
license_header.txt Initial commit 2024-07-23 08:32:33 -07:00
make.bat feat(pre-commit): enhance pre-commit hooks with additional checks (#2014) 2025-04-30 11:35:49 -07:00
Makefile first version of readthedocs (#278) 2024-10-22 10:15:58 +05:30
original_rfc.md chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
quick_start.ipynb chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
README.md feat: add auto-generated CI documentation pre-commit hook (#2890) 2025-07-25 17:57:01 +02:00

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our ReadTheDocs page.

Render locally

From the llama-stack root directory, run the following command to render the docs locally:

uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all

You can open up the docs in your browser at http://localhost:8000

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks: