llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-27 19:12:00 +00:00

Author	SHA1	Message	Date
Wen Liang	dacd522f57	feat(quota): support per‑client and anonymous server‑side request quotas Unrestricted API usage can lead to runaway costs and fragmented client-side throttling logic. This commit introduces a built-in quota mechanism at the server level, enabling operators to centrally enforce per-client and anonymous rate limits—without needing external proxies or client changes. This helps contain compute costs, enforces fair usage, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Currently, SQLite is the only supported KV store. If quotas are configured but authentication is disabled, authenticated limits will gracefully fall back to anonymous limits. Highlights: - Adds `QuotaMiddleware` to enforce request quotas: - Uses bearer token as client ID if present; otherwise falls back to IP address - Tracks requests in KV store with per-key TTL expiration - Returns HTTP 429 if a client exceeds their quota - Extends `ServerConfig` with a `quota` section: - `kvstore`: configuration for the backend (currently only SQLite) - `anonymous_max_requests`: per-period cap for unauthenticated clients - `authenticated_max_requests`: per-period cap for authenticated clients - `period`: duration of the quota window (currently only `day` is supported) - Adds full test coverage with FastAPI `TestClient` and custom middleware injection Behavior changes: - Quotas are disabled by default unless explicitly configured - Anonymous users get a conservative default quota; authenticated clients can be given more generous limits To enable per-client request quotas in `run.yaml`, add: ```yaml server: port: 8321 auth: provider_type: custom config: endpoint: https://auth.example.com/validate quota: kvstore: type: sqlite db_path: ./quotas.db anonymous_max_requests: 100 authenticated_max_requests: 1000 period: day ``` Signed-off-by: Wen Liang <wenliang@redhat.com>	2025-05-20 09:31:58 -04:00
Ashwin Bharambe	c7015d3d60	feat: introduce OAuth2TokenAuthProvider and notion of "principal" (#2185 ) This PR adds a notion of `principal` (aka some kind of persistent identity) to the authentication infrastructure of the Stack. Until now we only used access attributes ("claims" in the more standard OAuth / OIDC setup) but we need the notion of a User fundamentally as well. (Thanks @rhuss for bringing this up.) This value is not yet _used_ anywhere downstream but will be used to segregate access to resources. In addition, the PR introduces a built-in JWT token validator so the Stack does not need to contact an authentication provider to validating the authorization and merely check the signed token for the represented claims. Public keys are refreshed via the configured JWKS server. This Auth Provider should overwhelmingly be considered the default given the seamless integration it offers with OAuth setups.	2025-05-18 17:54:19 -07:00
Sébastien Han	79851d93aa	feat: Add Kubernetes authentication (#1778 ) # What does this PR do? This commit adds a new authentication system to the Llama Stack server with support for Kubernetes and custom authentication providers. Key changes include: - Implemented KubernetesAuthProvider for validating Kubernetes service account tokens - Implemented CustomAuthProvider for validating tokens against external endpoints - this is the same code that was already present. - Added test for Kubernetes - Updated server configuration to support authentication settings - Added documentation for authentication configuration and usage The authentication system supports: - Bearer token validation - Kubernetes service account token validation - Custom authentication endpoints ## Test Plan Setup a Kube cluster using Kind or Minikube. Run a server with: ``` server: port: 8321 auth: provider_type: kubernetes config: api_server_url: http://url ca_cert_path: path/to/cert (optional) ``` Run: ``` curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers ``` Or replace "my-user" with your service account. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 22:24:58 +02:00
Ashwin Bharambe	01a25d9744	feat(server): add attribute based access control for resources (#1703 ) This PR introduces a way to implement Attribute Based Access Control (ABAC) for the Llama Stack server. The rough design is: - https://github.com/meta-llama/llama-stack/pull/1626 added a way for the Llama Stack server to query an authenticator - We build upon that and expect "access attributes" as part of the response. These attributes indicate the scopes available for the request. - We use these attributes to perform access control for registered resources as well as for constructing the default access control policies for newly created resources. - By default, if you support authentication but don't return access attributes, we will add a unique namespace pointing to the API_KEY. That way, all resources by default will be scoped to API_KEYs. An important aspect of this design is that Llama Stack stays out of the business of credential management or the CRUD for attributes. How you manage your namespaces or projects is entirely up to you. The design only implements access control checks for the metadata / book-keeping information that the Stack tracks. ### Limitations - Currently, read vs. write vs. admin permissions aren't made explicit, but this can be easily extended by adding appropriate attributes to the `AccessAttributes` data structure. - This design does not apply to agent instances since they are not considered resources the Stack knows about. Agent instances are completely within the scope of the Agents API provider. ### Test Plan Added unit tests, existing integration tests	2025-03-19 21:28:52 -07:00
Ashwin Bharambe	5b39d5a76a	feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626 ) This PR adds support (or is a proposal for) for supporting API KEY authentication on the Llama Stack server end. `llama-stack-client` already supports accepting an api_key parameter and passes it down through every request as an `Authentication: ` header. Currently, Llama Stack does not propose APIs for handling authentication or authorization for resources of any kind. Given that, and the fact that any deployment will typically have _some_ authentication system present, we simply adopt a delegation mechanism: delegate to an HTTPS endpoint performing key management / authentication. It is configured via: ```yaml server: auth: endpoint: <...> ``` in the run.yaml configuration. ## How It Works When authentication is enabled: 1. Every API request must include an `Authorization: Bearer <token>` header 2. The server will send a _POST_ validation request to the configured endpoint with the following payload: ```json { "api_key": "<token>", "request": { "path": "/api/path", "headers": { "header1": "value1", ... }, "params": { "param1": "value1", ... } } } ``` 3. If the authentication endpoint returns a 200 status code, the request is allowed to proceed 4. If the authentication endpoint returns any other status code, a 401 Unauthorized response is returned ## Test Plan Unit tests	2025-03-18 16:24:18 -07:00

5 commits