# What does this PR do?
Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our
recent oauth2 recent implementation.
The CI test has been kept intact for validation.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
feat(quota): add server‑side per‑client request quotas (requires auth)
Unrestricted usage can lead to runaway costs and fragmented client-side
workarounds. This commit introduces a native quota mechanism to the
server, giving operators a unified, centrally managed throttle for
per-client requests—without needing extra proxies or custom client
logic. This helps contain cloud-compute expenses, enables fine-grained
usage control, and simplifies deployment and monitoring of Llama Stack
services. Quotas are fully opt-in and have no effect unless explicitly
configured.
Notice that Quotas are fully opt-in and require authentication to be
enabled. The 'sqlite' is the only supported quota `type` at this time,
any other `type` will be rejected. And the only supported `period` is
'day'.
Highlights:
- Adds `QuotaMiddleware` to enforce per-client request quotas:
- Uses `Authorization: Bearer <client_id>` (from
AuthenticationMiddleware)
- Tracks usage via a SQLite-based KV store
- Returns 429 when the quota is exceeded
- Extends `ServerConfig` with a `quota` section (type + config)
- Enforces strict coupling: quotas require authentication or the server
will fail to start
Behavior changes:
- Quotas are disabled by default unless explicitly configured
- SQLite defaults to `./quotas.db` if no DB path is set
- The server requires authentication when quotas are enabled
To enable per-client request quotas in `run.yaml`, add:
```
server:
port: 8321
auth:
provider_type: "custom"
config:
endpoint: "https://auth.example.com/validate"
quota:
type: sqlite
config:
db_path: ./quotas.db
limit:
max_requests: 1000
period: day
[//]: # (If resolving an issue, uncomment and update the line below)
Closes#2093
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Wen Liang <wenliang@redhat.com>
Co-authored-by: Wen Liang <wenliang@redhat.com>
This PR adds a notion of `principal` (aka some kind of persistent
identity) to the authentication infrastructure of the Stack. Until now
we only used access attributes ("claims" in the more standard OAuth /
OIDC setup) but we need the notion of a User fundamentally as well.
(Thanks @rhuss for bringing this up.)
This value is not yet _used_ anywhere downstream but will be used to
segregate access to resources.
In addition, the PR introduces a built-in JWT token validator so the
Stack does not need to contact an authentication provider to validating
the authorization and merely check the signed token for the represented
claims. Public keys are refreshed via the configured JWKS server. This
Auth Provider should overwhelmingly be considered the default given the
seamless integration it offers with OAuth setups.
# What does this PR do?
This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:
- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage
The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints
## Test Plan
Setup a Kube cluster using Kind or Minikube.
Run a server with:
```
server:
port: 8321
auth:
provider_type: kubernetes
config:
api_server_url: http://url
ca_cert_path: path/to/cert (optional)
```
Run:
```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```
Or replace "my-user" with your service account.
Signed-off-by: Sébastien Han <seb@redhat.com>
This PR introduces a way to implement Attribute Based Access Control
(ABAC) for the Llama Stack server.
The rough design is:
- https://github.com/meta-llama/llama-stack/pull/1626 added a way for
the Llama Stack server to query an authenticator
- We build upon that and expect "access attributes" as part of the
response. These attributes indicate the scopes available for the
request.
- We use these attributes to perform access control for registered
resources as well as for constructing the default access control
policies for newly created resources.
- By default, if you support authentication but don't return access
attributes, we will add a unique namespace pointing to the API_KEY. That
way, all resources by default will be scoped to API_KEYs.
An important aspect of this design is that Llama Stack stays out of the
business of credential management or the CRUD for attributes. How you
manage your namespaces or projects is entirely up to you. The design
only implements access control checks for the metadata / book-keeping
information that the Stack tracks.
### Limitations
- Currently, read vs. write vs. admin permissions aren't made explicit,
but this can be easily extended by adding appropriate attributes to the
`AccessAttributes` data structure.
- This design does not apply to agent instances since they are not
considered resources the Stack knows about. Agent instances are
completely within the scope of the Agents API provider.
### Test Plan
Added unit tests, existing integration tests
This PR adds support (or is a proposal for) for supporting API KEY
authentication on the Llama Stack server end. `llama-stack-client`
already supports accepting an api_key parameter and passes it down
through every request as an `Authentication: ` header.
Currently, Llama Stack does not propose APIs for handling authentication
or authorization for resources of any kind. Given that, and the fact
that any deployment will typically have _some_ authentication system
present, we simply adopt a delegation mechanism: delegate to an HTTPS
endpoint performing key management / authentication.
It is configured via:
```yaml
server:
auth:
endpoint: <...>
```
in the run.yaml configuration.
## How It Works
When authentication is enabled:
1. Every API request must include an `Authorization: Bearer <token>`
header
2. The server will send a _POST_ validation request to the configured
endpoint with the following payload:
```json
{
"api_key": "<token>",
"request": {
"path": "/api/path",
"headers": { "header1": "value1", ... },
"params": { "param1": "value1", ... }
}
}
```
3. If the authentication endpoint returns a 200 status code, the request
is allowed to proceed
4. If the authentication endpoint returns any other status code, a 401
Unauthorized response is returned
## Test Plan
Unit tests