llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

Author	SHA1	Message	Date
Sébastien Han	1862de4be5	chore: clarify cache_ttl to be key_recheck_period (#2220 ) # What does this PR do? The cache_ttl config value is not in fact tied to the lifetime of any of the keys, it represents the time interval between for our key cache refresher. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:30:23 +02:00
Sébastien Han	c25acedbcd	chore: remove k8s auth in favor of k8s jwks endpoint (#2216 ) # What does this PR do? Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our recent oauth2 recent implementation. The CI test has been kept intact for validation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 16:23:54 +02:00
liangwen12year	2890243107	feat(quota): add server‑side per‑client request quotas (requires auth) (#2096 ) # What does this PR do? feat(quota): add server‑side per‑client request quotas (requires auth) Unrestricted usage can lead to runaway costs and fragmented client-side workarounds. This commit introduces a native quota mechanism to the server, giving operators a unified, centrally managed throttle for per-client requests—without needing extra proxies or custom client logic. This helps contain cloud-compute expenses, enables fine-grained usage control, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Notice that Quotas are fully opt-in and require authentication to be enabled. The 'sqlite' is the only supported quota `type` at this time, any other `type` will be rejected. And the only supported `period` is 'day'. Highlights: - Adds `QuotaMiddleware` to enforce per-client request quotas: - Uses `Authorization: Bearer <client_id>` (from AuthenticationMiddleware) - Tracks usage via a SQLite-based KV store - Returns 429 when the quota is exceeded - Extends `ServerConfig` with a `quota` section (type + config) - Enforces strict coupling: quotas require authentication or the server will fail to start Behavior changes: - Quotas are disabled by default unless explicitly configured - SQLite defaults to `./quotas.db` if no DB path is set - The server requires authentication when quotas are enabled To enable per-client request quotas in `run.yaml`, add: ``` server: port: 8321 auth: provider_type: "custom" config: endpoint: "https://auth.example.com/validate" quota: type: sqlite config: db_path: ./quotas.db limit: max_requests: 1000 period: day [//]: # (If resolving an issue, uncomment and update the line below) Closes #2093 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Wen Liang <wenliang@redhat.com> Co-authored-by: Wen Liang <wenliang@redhat.com>	2025-05-21 10:58:45 +02:00
Sébastien Han	79851d93aa	feat: Add Kubernetes authentication (#1778 ) # What does this PR do? This commit adds a new authentication system to the Llama Stack server with support for Kubernetes and custom authentication providers. Key changes include: - Implemented KubernetesAuthProvider for validating Kubernetes service account tokens - Implemented CustomAuthProvider for validating tokens against external endpoints - this is the same code that was already present. - Added test for Kubernetes - Updated server configuration to support authentication settings - Added documentation for authentication configuration and usage The authentication system supports: - Bearer token validation - Kubernetes service account token validation - Custom authentication endpoints ## Test Plan Setup a Kube cluster using Kind or Minikube. Run a server with: ``` server: port: 8321 auth: provider_type: kubernetes config: api_server_url: http://url ca_cert_path: path/to/cert (optional) ``` Run: ``` curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers ``` Or replace "my-user" with your service account. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 22:24:58 +02:00
Francisco Arceo	49955a06b1	docs: Update quickstart page to structure things a little more for the novices (#1873 ) # What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `<h3>` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 <img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM" src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 14:09:00 -07:00
Francisco Arceo	d495922949	docs: Updated documentation and Sphinx configuration (#1845 ) # What does this PR do? The goal of this PR is to make the pages easier to navigate by surfacing the child pages on the navbar, updating some of the copy, moving some of the files around. Some changes: 1. Clarifying Titles 2. Restructuring "Distributions" more formally in its own page to be consistent with Providers and adding some clarity to the child pages to surface them and make them easier to navigate 3. Updated sphinx config to not collapse navigation by default 4. Updated copyright year to be calculated dynamically 5. Moved `docs/source/distributions/index.md` -> `docs/source/distributions/starting_llama_stack_server.md` Another for https://github.com/meta-llama/llama-stack/issues/1815 ## Test Plan Tested locally and pages build (screen shots for example). ## Documentation ### Before: ![Screenshot 2025-03-31 at 1 09 21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a) ### After: ![Screenshot 2025-03-31 at 1 08 52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 13:08:05 -07:00
Sixian Yi	531940aea9	script for running client sdk tests (#895 ) # What does this PR do? Create a script for running all client-sdk tests on Async Library client, with the option to generate report ## Test Plan ``` python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-19 22:38:06 -08:00
Hardik Shah	74e933cbfd	More Updates to Read the Docs (#856 )	2025-01-23 11:39:33 -08:00
Yuan Tang	7e1d628864	Fix some typos in distributions/providers docs (#603 ) Fixed some typos that I spotted while reading the new/updated docs. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-11 10:10:52 -08:00
Ashwin Bharambe	1274fa4c0d	Add documentations for building applications and with some content for agentic loop	2024-12-08 14:56:37 -08:00
Ashwin Bharambe	9ddda91180	Add Safety section for Configuration	2024-11-23 21:36:41 -08:00
Ashwin Bharambe	0758eb9cb1	Try new llama stack image	2024-11-23 00:03:42 -08:00
Ashwin Bharambe	c7bfac5382	Add a section for run.yamls	2024-11-22 22:58:39 -08:00

13 commits