mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-06 02:30:58 +00:00
feat(quota): add server‑side per‑client request quotas (requires auth) (#2096)
# What does this PR do?
feat(quota): add server‑side per‑client request quotas (requires auth)
Unrestricted usage can lead to runaway costs and fragmented client-side
workarounds. This commit introduces a native quota mechanism to the
server, giving operators a unified, centrally managed throttle for
per-client requests—without needing extra proxies or custom client
logic. This helps contain cloud-compute expenses, enables fine-grained
usage control, and simplifies deployment and monitoring of Llama Stack
services. Quotas are fully opt-in and have no effect unless explicitly
configured.
Notice that Quotas are fully opt-in and require authentication to be
enabled. The 'sqlite' is the only supported quota `type` at this time,
any other `type` will be rejected. And the only supported `period` is
'day'.
Highlights:
- Adds `QuotaMiddleware` to enforce per-client request quotas:
- Uses `Authorization: Bearer <client_id>` (from
AuthenticationMiddleware)
- Tracks usage via a SQLite-based KV store
- Returns 429 when the quota is exceeded
- Extends `ServerConfig` with a `quota` section (type + config)
- Enforces strict coupling: quotas require authentication or the server
will fail to start
Behavior changes:
- Quotas are disabled by default unless explicitly configured
- SQLite defaults to `./quotas.db` if no DB path is set
- The server requires authentication when quotas are enabled
To enable per-client request quotas in `run.yaml`, add:
```
server:
port: 8321
auth:
provider_type: "custom"
config:
endpoint: "https://auth.example.com/validate"
quota:
type: sqlite
config:
db_path: ./quotas.db
limit:
max_requests: 1000
period: day
[//]: # (If resolving an issue, uncomment and update the line below)
Closes #2093
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Wen Liang <wenliang@redhat.com>
Co-authored-by: Wen Liang <wenliang@redhat.com>
This commit is contained in:
parent
5a3d777b20
commit
2890243107
6 changed files with 363 additions and 1 deletions
|
|
@ -113,6 +113,10 @@ class AuthenticationMiddleware:
|
|||
"roles": [token],
|
||||
}
|
||||
|
||||
# Store the client ID in the request scope so that downstream middleware (like QuotaMiddleware)
|
||||
# can identify the requester and enforce per-client rate limits.
|
||||
scope["authenticated_client_id"] = token
|
||||
|
||||
# Store attributes in request scope
|
||||
scope["user_attributes"] = user_attributes
|
||||
scope["principal"] = validation_result.principal
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue