Commit graph

11 commits

Author SHA1 Message Date
Wen Liang
dacd522f57 feat(quota): support per‑client and anonymous server‑side request quotas
Unrestricted API usage can lead to runaway costs and fragmented client-side
throttling logic. This commit introduces a built-in quota mechanism at the
server level, enabling operators to centrally enforce per-client and anonymous
rate limits—without needing external proxies or client changes.

This helps contain compute costs, enforces fair usage, and simplifies deployment
and monitoring of Llama Stack services. Quotas are fully opt-in and have no
effect unless explicitly configured.

Currently, SQLite is the only supported KV store. If quotas are
configured but authentication is disabled, authenticated limits will
gracefully fall back to anonymous limits.

Highlights:
- Adds `QuotaMiddleware` to enforce request quotas:
  - Uses bearer token as client ID if present; otherwise falls back to IP address
  - Tracks requests in KV store with per-key TTL expiration
  - Returns HTTP 429 if a client exceeds their quota

- Extends `ServerConfig` with a `quota` section:
  - `kvstore`: configuration for the backend (currently only SQLite)
  - `anonymous_max_requests`: per-period cap for unauthenticated clients
  - `authenticated_max_requests`: per-period cap for authenticated clients
  - `period`: duration of the quota window (currently only `day` is supported)

- Adds full test coverage with FastAPI `TestClient` and custom middleware injection

Behavior changes:
- Quotas are disabled by default unless explicitly configured
- Anonymous users get a conservative default quota; authenticated clients can be given more generous limits

To enable per-client request quotas in `run.yaml`, add:
```yaml
server:
  port: 8321
  auth:
    provider_type: custom
    config:
      endpoint: https://auth.example.com/validate
  quota:
    kvstore:
      type: sqlite
      db_path: ./quotas.db
    anonymous_max_requests: 100
    authenticated_max_requests: 1000
    period: day
```

Signed-off-by: Wen Liang <wenliang@redhat.com>
2025-05-20 09:31:58 -04:00
Sébastien Han
79851d93aa
feat: Add Kubernetes authentication (#1778)
# What does this PR do?

This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:

- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage

The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints

## Test Plan

Setup a Kube cluster using Kind or Minikube.

Run a server with:

```
server:
  port: 8321
  auth:
    provider_type: kubernetes
    config:
      api_server_url: http://url
      ca_cert_path: path/to/cert (optional)
```

Run:

```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```

Or replace "my-user" with your service account.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-28 22:24:58 +02:00
Francisco Arceo
49955a06b1
docs: Update quickstart page to structure things a little more for the novices (#1873)
# What does this PR do?
Another doc enhancement for
https://github.com/meta-llama/llama-stack/issues/1818

Summary of changes:
- `docs/source/distributions/configuration.md`
   - Updated dropdown title to include a more user-friendly description.

- `docs/_static/css/my_theme.css`
   - Added styling for `<h3>` elements to set a normal font weight.

- `docs/source/distributions/starting_llama_stack_server.md`
- Changed section headers from bold text to proper markdown headers
(e.g., `##`).
- Improved descriptions for starting Llama Stack server using different
methods (library, container, conda, Kubernetes).
- Enhanced clarity and structure by converting instructions into
markdown headers and improved formatting.

- `docs/source/getting_started/index.md`
   - Major restructuring of the "Quick Start" guide:
- Added new introductory section for Llama Stack and its capabilities.
- Reorganized steps into clearer subsections with proper markdown
headers.
- Replaced dropdowns with tabbed content for OS-specific instructions.
- Added detailed steps for setting up and running the Llama Stack server
and client.
- Introduced new sections for running basic inference and building
agents.
- Enhanced readability and visual structure with emojis, admonitions,
and examples.

- `docs/source/providers/index.md`
   - Updated the list of LLM inference providers to include "Ollama."
   - Expanded the list of vector databases to include "SQLite-Vec."

Let me know if you need further details!

## Test Plan
Renders locally, included screenshot.

# Documentation

For https://github.com/meta-llama/llama-stack/issues/1818

<img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM"
src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-10 14:09:00 -07:00
Francisco Arceo
d495922949
docs: Updated documentation and Sphinx configuration (#1845)
# What does this PR do?

The goal of this PR is to make the pages easier to navigate by surfacing
the child pages on the navbar, updating some of the copy, moving some of
the files around.

Some changes:
1. Clarifying Titles
2. Restructuring "Distributions" more formally in its own page to be
consistent with Providers and adding some clarity to the child pages to
surface them and make them easier to navigate
3. Updated sphinx config to not collapse navigation by default
4. Updated copyright year to be calculated dynamically 
5. Moved `docs/source/distributions/index.md` ->
`docs/source/distributions/starting_llama_stack_server.md`

Another for https://github.com/meta-llama/llama-stack/issues/1815

## Test Plan
Tested locally and pages build (screen shots for example).

## Documentation
###  Before:
![Screenshot 2025-03-31 at 1 09
21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a)

### After:
![Screenshot 2025-03-31 at 1 08
52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f)

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-31 13:08:05 -07:00
Sixian Yi
531940aea9
script for running client sdk tests (#895)
# What does this PR do?
Create a script for running all client-sdk tests on Async Library
client, with the option to generate report


## Test Plan

```
python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report
```



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-19 22:38:06 -08:00
Hardik Shah
74e933cbfd
More Updates to Read the Docs (#856) 2025-01-23 11:39:33 -08:00
Yuan Tang
7e1d628864
Fix some typos in distributions/providers docs (#603)
Fixed some typos that I spotted while reading the new/updated docs.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-11 10:10:52 -08:00
Ashwin Bharambe
1274fa4c0d Add documentations for building applications and with some content for agentic loop 2024-12-08 14:56:37 -08:00
Ashwin Bharambe
9ddda91180 Add Safety section for Configuration 2024-11-23 21:36:41 -08:00
Ashwin Bharambe
0758eb9cb1 Try new llama stack image 2024-11-23 00:03:42 -08:00
Ashwin Bharambe
c7bfac5382 Add a section for run.yamls 2024-11-22 22:58:39 -08:00