mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-27 22:51:59 +00:00

History

Wen Liang dacd522f57 feat(quota): support per‑client and anonymous server‑side request quotas Unrestricted API usage can lead to runaway costs and fragmented client-side throttling logic. This commit introduces a built-in quota mechanism at the server level, enabling operators to centrally enforce per-client and anonymous rate limits—without needing external proxies or client changes. This helps contain compute costs, enforces fair usage, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Currently, SQLite is the only supported KV store. If quotas are configured but authentication is disabled, authenticated limits will gracefully fall back to anonymous limits. Highlights: - Adds `QuotaMiddleware` to enforce request quotas: - Uses bearer token as client ID if present; otherwise falls back to IP address - Tracks requests in KV store with per-key TTL expiration - Returns HTTP 429 if a client exceeds their quota - Extends `ServerConfig` with a `quota` section: - `kvstore`: configuration for the backend (currently only SQLite) - `anonymous_max_requests`: per-period cap for unauthenticated clients - `authenticated_max_requests`: per-period cap for authenticated clients - `period`: duration of the quota window (currently only `day` is supported) - Adds full test coverage with FastAPI `TestClient` and custom middleware injection Behavior changes: - Quotas are disabled by default unless explicitly configured - Anonymous users get a conservative default quota; authenticated clients can be given more generous limits To enable per-client request quotas in `run.yaml`, add: ```yaml server: port: 8321 auth: provider_type: custom config: endpoint: https://auth.example.com/validate quota: kvstore: type: sqlite db_path: ./quotas.db anonymous_max_requests: 100 authenticated_max_requests: 1000 period: day ``` Signed-off-by: Wen Liang <wenliang@redhat.com>		2025-05-20 09:31:58 -04:00
..
_static	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
notebooks	chore: remove last instances of code-interpreter provider (#2143 )	2025-05-12 10:54:43 -07:00
openapi_generator	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
resources	Several documentation fixes and fix link to API reference	2025-02-04 14:00:43 -08:00
source	feat(quota): support per‑client and anonymous server‑side request quotas	2025-05-20 09:31:58 -04:00
zero_to_hero_guide	feat: add additional logging to llama stack build (#1689 )	2025-04-30 11:06:24 -07:00
conftest.py	fix: sleep after notebook test	2025-03-23 14:03:35 -07:00
contbuild.sh	Fix broken links with docs	2024-11-22 20:42:17 -08:00
dog.jpg	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
getting_started.ipynb	chore: remove last instances of code-interpreter provider (#2143 )	2025-05-12 10:54:43 -07:00
getting_started_llama4.ipynb	docs: llama4 getting started nb (#1878 )	2025-04-06 18:51:34 -07:00
getting_started_llama_api.ipynb	feat: add api.llama provider, llama-guard-4 model (#2058 )	2025-04-29 10:07:41 -07:00
license_header.txt	Initial commit	2024-07-23 08:32:33 -07:00
make.bat	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 )	2025-04-30 11:35:49 -07:00
Makefile	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
readme.md	docs: fixing sphinx imports (#1884 )	2025-04-05 14:21:45 -07:00
requirements.txt	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 )	2025-04-30 11:35:49 -07:00

readme.md

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our ReadTheDocs page.

Render locally

pip install -r requirements.txt
cd docs
python -m sphinx_autobuild source _build

You can open up the docs in your browser at http://localhost:8000

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks:

Building AI Applications Notebook - A comprehensive guide to building production-ready AI applications using Llama Stack
Benchmark Evaluations Notebook - Detailed performance evaluations and benchmarking results
Zero-to-Hero Guide - Step-by-step guide for getting started with Llama Stack