mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-27 19:10:26 +00:00

History

Wen Liang dacd522f57 feat(quota): support per‑client and anonymous server‑side request quotas Unrestricted API usage can lead to runaway costs and fragmented client-side throttling logic. This commit introduces a built-in quota mechanism at the server level, enabling operators to centrally enforce per-client and anonymous rate limits—without needing external proxies or client changes. This helps contain compute costs, enforces fair usage, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Currently, SQLite is the only supported KV store. If quotas are configured but authentication is disabled, authenticated limits will gracefully fall back to anonymous limits. Highlights: - Adds `QuotaMiddleware` to enforce request quotas: - Uses bearer token as client ID if present; otherwise falls back to IP address - Tracks requests in KV store with per-key TTL expiration - Returns HTTP 429 if a client exceeds their quota - Extends `ServerConfig` with a `quota` section: - `kvstore`: configuration for the backend (currently only SQLite) - `anonymous_max_requests`: per-period cap for unauthenticated clients - `authenticated_max_requests`: per-period cap for authenticated clients - `period`: duration of the quota window (currently only `day` is supported) - Adds full test coverage with FastAPI `TestClient` and custom middleware injection Behavior changes: - Quotas are disabled by default unless explicitly configured - Anonymous users get a conservative default quota; authenticated clients can be given more generous limits To enable per-client request quotas in `run.yaml`, add: ```yaml server: port: 8321 auth: provider_type: custom config: endpoint: https://auth.example.com/validate quota: kvstore: type: sqlite db_path: ./quotas.db anonymous_max_requests: 100 authenticated_max_requests: 1000 period: day ``` Signed-off-by: Wen Liang <wenliang@redhat.com>		2025-05-20 09:31:58 -04:00
..
cli	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
distribution	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
models	feat: support '-' in tool names (#1807 )	2025-04-12 14:23:03 -07:00
providers	fix: multiple tool calls in remote-vllm chat_completion (#2161 )	2025-05-15 11:23:29 -07:00
rag	feat: Adding support for customizing chunk context in RAG insertion and querying (#2134 )	2025-05-14 21:56:20 -04:00
registry	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
server	feat(quota): support per‑client and anonymous server‑side request quotas	2025-05-20 09:31:58 -04:00
__init__.py	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
conftest.py	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
fixtures.py	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
README.md	docs: revamp testing documentation (#2155 )	2025-05-13 11:28:29 -07:00

README.md

Llama Stack Unit Tests

You can run the unit tests by running:

source .venv/bin/activate
./scripts/unit-tests.sh [PYTEST_ARGS]

Any additional arguments are passed to pytest. For example, you can specify a test directory, a specific test file, or any pytest flags (e.g., -vvv for verbosity). If no test directory is specified, it defaults to "tests/unit", e.g:

./scripts/unit-tests.sh tests/unit/registry/test_registry.py -vvv

If you'd like to run for a non-default version of Python (currently 3.10), pass PYTHON_VERSION variable as follows:

source .venv/bin/activate
PYTHON_VERSION=3.13 ./scripts/unit-tests.sh