mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

ehhuang 2c06b24c77 test: benchmark scripts (#3160 ) # What does this PR do? 1. Add our own benchmark script instead of locust (doesn't support measuring streaming latency well) 2. Simplify k8s deployment 3. Add a simple profile script for locally running server ## Test Plan ❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10 ============================================================ BENCHMARK RESULTS ============================================================ Total time: 180.00s Concurrent users: 10 Total requests: 1636 Successful requests: 1636 Failed requests: 0 Success rate: 100.0% Requests per second: 9.09 Response Time Statistics: Mean: 1.095s Median: 1.721s Min: 0.136s Max: 3.218s Std Dev: 0.762s Percentiles: P50: 1.721s P90: 1.751s P95: 1.756s P99: 1.796s Time to First Token (TTFT) Statistics: Mean: 0.037s Median: 0.037s Min: 0.023s Max: 0.211s Std Dev: 0.011s TTFT Percentiles: P50: 0.037s P90: 0.040s P95: 0.044s P99: 0.055s Streaming Statistics: Mean chunks per response: 64.0 Total chunks received: 104775		2025-08-15 11:24:29 -07:00
..
_static	Revert "feat: add batches API with OpenAI compatibility" (#3149 )	2025-08-14 10:08:54 -07:00
notebooks	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
openapi_generator	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
resources	Several documentation fixes and fix link to API reference	2025-02-04 14:00:43 -08:00
source	test: benchmark scripts (#3160 )	2025-08-15 11:24:29 -07:00
zero_to_hero_guide	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
conftest.py	fix: sleep after notebook test	2025-03-23 14:03:35 -07:00
contbuild.sh	Fix broken links with docs	2024-11-22 20:42:17 -08:00
dog.jpg	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
getting_started.ipynb	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
getting_started_llama4.ipynb	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
getting_started_llama_api.ipynb	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
license_header.txt	Initial commit	2024-07-23 08:32:33 -07:00
make.bat	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 )	2025-04-30 11:35:49 -07:00
Makefile	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
original_rfc.md	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
quick_start.ipynb	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
README.md	feat: add auto-generated CI documentation pre-commit hook (#2890 )	2025-07-25 17:57:01 +02:00

README.md

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our ReadTheDocs page.

Render locally

From the llama-stack root directory, run the following command to render the docs locally:

uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all

You can open up the docs in your browser at http://localhost:8000

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks:

Building AI Applications Notebook - A comprehensive guide to building production-ready AI applications using Llama Stack
Benchmark Evaluations Notebook - Detailed performance evaluations and benchmarking results
Zero-to-Hero Guide - Step-by-step guide for getting started with Llama Stack