llama-stack-mirror/llama_stack/distribution
Wen Liang dacd522f57 feat(quota): support per‑client and anonymous server‑side request quotas
Unrestricted API usage can lead to runaway costs and fragmented client-side
throttling logic. This commit introduces a built-in quota mechanism at the
server level, enabling operators to centrally enforce per-client and anonymous
rate limits—without needing external proxies or client changes.

This helps contain compute costs, enforces fair usage, and simplifies deployment
and monitoring of Llama Stack services. Quotas are fully opt-in and have no
effect unless explicitly configured.

Currently, SQLite is the only supported KV store. If quotas are
configured but authentication is disabled, authenticated limits will
gracefully fall back to anonymous limits.

Highlights:
- Adds `QuotaMiddleware` to enforce request quotas:
  - Uses bearer token as client ID if present; otherwise falls back to IP address
  - Tracks requests in KV store with per-key TTL expiration
  - Returns HTTP 429 if a client exceeds their quota

- Extends `ServerConfig` with a `quota` section:
  - `kvstore`: configuration for the backend (currently only SQLite)
  - `anonymous_max_requests`: per-period cap for unauthenticated clients
  - `authenticated_max_requests`: per-period cap for authenticated clients
  - `period`: duration of the quota window (currently only `day` is supported)

- Adds full test coverage with FastAPI `TestClient` and custom middleware injection

Behavior changes:
- Quotas are disabled by default unless explicitly configured
- Anonymous users get a conservative default quota; authenticated clients can be given more generous limits

To enable per-client request quotas in `run.yaml`, add:
```yaml
server:
  port: 8321
  auth:
    provider_type: custom
    config:
      endpoint: https://auth.example.com/validate
  quota:
    kvstore:
      type: sqlite
      db_path: ./quotas.db
    anonymous_max_requests: 100
    authenticated_max_requests: 1000
    period: day
```

Signed-off-by: Wen Liang <wenliang@redhat.com>
2025-05-20 09:31:58 -04:00
..
routers fix: catch TimeoutError in place of asyncio.TimeoutError (#2131) 2025-05-12 11:49:59 +02:00
server feat(quota): support per‑client and anonymous server‑side request quotas 2025-05-20 09:31:58 -04:00
store feat: implementation for agent/session list and describe (#1606) 2025-05-07 14:49:23 +02:00
ui chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
utils feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
access_control.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
build.py fix: enforce stricter ASCII rules lint rules in Ruff (#2062) 2025-04-30 18:05:27 +02:00
build_conda_env.sh chore: remove straggler references to llama-models (#1345) 2025-03-01 14:26:03 -08:00
build_container.sh feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
build_venv.sh chore: remove straggler references to llama-models (#1345) 2025-03-01 14:26:03 -08:00
client.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
common.sh feat(pre-commit): enhance pre-commit hooks with additional checks (#2014) 2025-04-30 11:35:49 -07:00
configure.py feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
datatypes.py feat(quota): support per‑client and anonymous server‑side request quotas 2025-05-20 09:31:58 -04:00
distribution.py feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
inspect.py feat: add health to all providers through providers endpoint (#1418) 2025-04-14 11:59:36 +02:00
library_client.py fix: Pass external_config_dir to BuildConfig (#2190) 2025-05-19 14:01:28 +02:00
providers.py fix: catch TimeoutError in place of asyncio.TimeoutError (#2131) 2025-05-12 11:49:59 +02:00
request_headers.py chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
resolver.py feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
stack.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
start_stack.sh fix: replace all instances of --yaml-config with --config (#2196) 2025-05-16 14:31:12 -07:00