llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-27 06:28:50 +00:00

Author	SHA1	Message	Date
Derek Higgins	52201612de	feat: implement chunk deletion for vector stores (#2701 ) Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-25 10:30:30 -04:00
Francisco Arceo	9e77be1f72	chore: Fix chroma unit tests (#2896 ) # What does this PR do? Enable Chroma inline unit tests and fix integration tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-25 10:12:14 -04:00
Ashwin Bharambe	ed07a58b50	fix(registry): ensure clean shutdown (#2901 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 57s Details Avoid the error message: ``` INFO 2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully... ERROR 2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending! task: <Task pending name='Task-15' coro=<refresh_registry() running at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=> ```	2025-07-25 09:44:31 -04:00
Charlie Doern	de6919ecdd	refactor: install external providers from module (#2637 ) # What does this PR do? Today, external providers are installed via the `external_providers_dir` in the config. This necessitates users to understand the `ProviderSpec` and set up their directories accordingly. This process splits up the config for the stack across multiple files, directories, and formats. Most (if not all) external providers today have a [get_provider_spec](`559cb18fbb/src/ramalama_stack/provider.py (L9)`) method that sits unused. Utilizing this method rather than the providers.d route allows for a much easier installation process for external providers and limits the amount of extra configuration a regular user has to do to get their stack off the ground. To accomplish this and wire it throughout the build process, Introduce the concept of a `module` for users to specify for an external provider upon build time. In order to facilitate this, align the build and run spec to use `Provider` class rather than the stringified provider_type that build currently uses. For example, say this is in your build config: ``` - provider_id: ramalama provider_type: remote::ramalama module: ramalama_stack ``` during build (in the various `build_...` scripts), additionally to installing any pip dependencies we will also install this module and use the `get_provider_spec` method to retrieve the ProviderSpec that is currently specified using `providers.d`. In production so far, providing instructions for installing external providers for users has been difficult: they need to install the module as a pre-req, create the providers.d directory, copy in the provider spec, and also copy in the necessary build/run yaml files. Accessing an external provider should be as easy as possible, and pointing to its installable module aligns more with the rest of our build and dependency management process. For now, `external_providers_dir` still exists as an alternate more declarative method of using external providers. ## Test Plan added an integration test installing an external provider from module and more unit test coverage for `get_provider_registry` ( the warning in yellow is expected, the module is installed inside of the build env, not where we are running the command) <img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30 48 AM" src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d" /> <img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31 14 AM" src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:41:26 +02:00
dependabot[bot]	3216765c26	chore(deps): bump form-data from 4.0.2 to 4.0.4 in /llama_stack/ui (#2898 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 47s Details Bumps [form-data](https://github.com/form-data/form-data) from 4.0.2 to 4.0.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/form-data/form-data/releases">form-data's releases</a>.</em></p> <blockquote> <h2>v4.0.4</h2> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a> - 2025-07-16</h2> <h3>Commits</h3> <ul> <li>[meta] add <code>auto-changelog</code> <a href="`811f68282f`"><code>811f682</code></a></li> <li>[Tests] handle predict-v8-randomness failures in node < 17 and node > 23 <a href="`1d11a76434`"><code>1d11a76</code></a></li> <li>[Fix] Switch to using <code>crypto</code> random for boundary values <a href="`3d1723080e`"><code>3d17230</code></a></li> <li>[Tests] fix linting errors <a href="`5e340800b5`"><code>5e34080</code></a></li> <li>[meta] actually ensure the readme backup isn’t published <a href="`316c82ba93`"><code>316c82b</code></a></li> <li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a href="`58c25d7640`"><code>58c25d7</code></a></li> <li>[meta] fix readme capitalization <a href="`2300ca1959`"><code>2300ca1</code></a></li> </ul> <h2>v4.0.3</h2> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a> - 2025-06-05</h2> <h3>Fixed</h3> <ul> <li>[Fix] <code>append</code>: avoid a crash on nullish values <a href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li> </ul> <h3>Commits</h3> <ul> <li>[eslint] use a shared config <a href="`426ba9ac44`"><code>426ba9a</code></a></li> <li>[eslint] fix some spacing issues <a href="`20941917f0`"><code>2094191</code></a></li> <li>[Refactor] use <code>hasown</code> <a href="`81ab41b46f`"><code>81ab41b</code></a></li> <li>[Fix] validate boundary type in <code>setBoundary()</code> method <a href="`8d8e469309`"><code>8d8e469</code></a></li> <li>[Tests] add tests to check the behavior of <code>getBoundary</code> with non-strings <a href="`837b8a1f75`"><code>837b8a1</code></a></li> <li>[Dev Deps] remove unused deps <a href="`870e4e6659`"><code>870e4e6</code></a></li> <li>[meta] remove local commit hooks <a href="`e6e83ccb54`"><code>e6e83cc</code></a></li> <li>[Dev Deps] update <code>eslint</code> <a href="`4066fd6f65`"><code>4066fd6</code></a></li> <li>[meta] fix scripts to use prepublishOnly <a href="`c4bbb13c0e`"><code>c4bbb13</code></a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/form-data/form-data/blob/master/CHANGELOG.md">form-data's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a> - 2025-07-16</h2> <h3>Commits</h3> <ul> <li>[meta] add <code>auto-changelog</code> <a href="`811f68282f`"><code>811f682</code></a></li> <li>[Tests] handle predict-v8-randomness failures in node < 17 and node > 23 <a href="`1d11a76434`"><code>1d11a76</code></a></li> <li>[Fix] Switch to using <code>crypto</code> random for boundary values <a href="`3d1723080e`"><code>3d17230</code></a></li> <li>[Tests] fix linting errors <a href="`5e340800b5`"><code>5e34080</code></a></li> <li>[meta] actually ensure the readme backup isn’t published <a href="`316c82ba93`"><code>316c82b</code></a></li> <li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a href="`58c25d7640`"><code>58c25d7</code></a></li> <li>[meta] fix readme capitalization <a href="`2300ca1959`"><code>2300ca1</code></a></li> </ul> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a> - 2025-06-05</h2> <h3>Fixed</h3> <ul> <li>[Fix] <code>append</code>: avoid a crash on nullish values <a href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li> </ul> <h3>Commits</h3> <ul> <li>[eslint] use a shared config <a href="`426ba9ac44`"><code>426ba9a</code></a></li> <li>[eslint] fix some spacing issues <a href="`20941917f0`"><code>2094191</code></a></li> <li>[Refactor] use <code>hasown</code> <a href="`81ab41b46f`"><code>81ab41b</code></a></li> <li>[Fix] validate boundary type in <code>setBoundary()</code> method <a href="`8d8e469309`"><code>8d8e469</code></a></li> <li>[Tests] add tests to check the behavior of <code>getBoundary</code> with non-strings <a href="`837b8a1f75`"><code>837b8a1</code></a></li> <li>[Dev Deps] remove unused deps <a href="`870e4e6659`"><code>870e4e6</code></a></li> <li>[meta] remove local commit hooks <a href="`e6e83ccb54`"><code>e6e83cc</code></a></li> <li>[Dev Deps] update <code>eslint</code> <a href="`4066fd6f65`"><code>4066fd6</code></a></li> <li>[meta] fix scripts to use prepublishOnly <a href="`c4bbb13c0e`"><code>c4bbb13</code></a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`41996f5ac7`"><code>41996f5</code></a> v4.0.4</li> <li><a href="`316c82ba93`"><code>316c82b</code></a> [meta] actually ensure the readme backup isn’t published</li> <li><a href="`2300ca1959`"><code>2300ca1</code></a> [meta] fix readme capitalization</li> <li><a href="`811f68282f`"><code>811f682</code></a> [meta] add <code>auto-changelog</code></li> <li><a href="`5e340800b5`"><code>5e34080</code></a> [Tests] fix linting errors</li> <li><a href="`1d11a76434`"><code>1d11a76</code></a> [Tests] handle predict-v8-randomness failures in node < 17 and node > 23</li> <li><a href="`58c25d7640`"><code>58c25d7</code></a> [Dev Deps] update <code>@ljharb/eslint-config</code></li> <li><a href="`3d1723080e`"><code>3d17230</code></a> [Fix] Switch to using <code>crypto</code> random for boundary values</li> <li><a href="`d8d67dc8ac`"><code>d8d67dc</code></a> v4.0.3</li> <li><a href="`e6e83ccb54`"><code>e6e83cc</code></a> [meta] remove local commit hooks</li> <li>Additional commits viewable in <a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=form-data&package-manager=npm_and_yarn&previous-version=4.0.2&new-version=4.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meta-llama/llama-stack/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-24 21:24:56 -04:00
ehhuang	21bae296f2	feat(auth): API access control (#2822 ) # What does this PR do? - Added ability to specify `required_scope` when declaring an API. This is part of the `@webmethod` decorator. - If auth is enabled, a user can access an API only if `user.attributes['scope']` includes the `required_scope` - We add `required_scope='telemetry.read'` to the telemetry read APIs. ## Test Plan CI with added tests 1. Enable server.auth with github token 2. Observe `client.telemetry.query_traces()` returns 403	2025-07-24 15:30:48 -07:00
Calum Murray	7cc4819e90	feat: add MCP Streamable HTTP support (#2554 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds support for the new Streamable HTTP transport for MCP, as well as falling back to the SSE protocol if the Streamable HTTP connection fails. <!-- If resolving an issue, uncomment and update the line below --> Closes #2542 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Calum Murray <cmurray@redhat.com>	2025-07-24 15:04:27 -07:00
Sébastien Han	632cf9eb72	feat: Bring Your Own API (BYOA) (#2228 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Installer CI / lint (push) Failing after 3s Details Integration Tests / discover-tests (push) Successful in 3s Details Installer CI / smoke-test-on-dev (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? Prototype on a new feature to allow new APIs to be plugged in Llama Stack. Opened for early feedback on the approach and test appetite on the functionality. @ashwinb @raghotham open for early feedback, thanks! --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-24 13:41:14 -07:00
Charlie Doern	341504869e	fix: use logger for console telemetry (#2844 ) # What does this PR do? currently `print` is being used with custom formatting to achieve telemetry output in the console_span_processor This causes telemetry not to show up in log files when using `LLAMA_STACK_LOG_FILE`. During testing it looks like telemetry is not being captured when it is switch to using Rich formatting with the logger and then strip the formatting off when a log file is being used so the formatting looks normal ## Test Plan before: console: <img width="967" height="127" alt="Screenshot 2025-07-21 at 4 02 15 PM" src="https://github.com/user-attachments/assets/b09518cc-9d38-4970-9877-70e2c41fcbb5" /> log file (no telemetry): ``` 2025-07-21 16:01:32,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`... 2025-07-21 16:01:34,779 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed 2025-07-21 16:01:35,083 __main__:587 server: Listening on ['::', '0.0.0.0']:8321 2025-07-21 16:01:35,091 uvicorn.error:84 uncategorized: Started server process [68679] 2025-07-21 16:01:35,091 uvicorn.error:48 uncategorized: Waiting for application startup. 2025-07-21 16:01:35,092 __main__:163 server: Starting up 2025-07-21 16:01:35,092 uvicorn.error:62 uncategorized: Application startup complete. 2025-07-21 16:01:35,092 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) 2025-07-21 16:01:37,167 uvicorn.access:473 uncategorized: 127.0.0.1:53145 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 ``` after: console: <img width="797" height="165" alt="Screenshot 2025-07-22 at 3 28 44 PM" src="https://github.com/user-attachments/assets/44d40e3b-6502-439d-9ea5-38058b289962" /> log file: ``` 2025-07-21 15:59:51,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`... 2025-07-21 15:59:53,801 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed 2025-07-21 15:59:54,059 __main__:587 server: Listening on ['::', '0.0.0.0']:8321 2025-07-21 15:59:54,066 uvicorn.error:84 uncategorized: Started server process [68578] 2025-07-21 15:59:54,067 uvicorn.error:48 uncategorized: Waiting for application startup. 2025-07-21 15:59:54,067 __main__:163 server: Starting up 2025-07-21 15:59:54,067 uvicorn.error:62 uncategorized: Application startup complete. 2025-07-21 15:59:54,068 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) 2025-07-21 15:59:55,381 [TELEMETRY] 19:59:55.381 /v1/openai/v1/chat/completions 2025-07-21 15:59:55,619 uvicorn.access:473 uncategorized: 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 2025-07-21 15:59:55,621 [TELEMETRY] 19:59:55.621 /v1/openai/v1/chat/completions [StatusCode.OK] (240.07ms) 2025-07-21 15:59:55,622 [TELEMETRY] 19:59:55.620 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-24 16:26:59 -04:00
Kelly Brown	abade761e0	docs: Update nvidia docs template (#2893 ) Description Fixes generation issue in nvidia code gen file. Closes #2873	2025-07-24 22:11:34 +02:00
ehhuang	cbe89d2bdd	chore: return webmethod from find_matching_route (#2883 ) This will be used to support API access control, i.e. Webmethod would have a `required_scope` attribute, and we need access to that in the middleware.	2025-07-24 11:37:21 -07:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Stefan Thaler	537dc693ee	chore: add mypy coverage to inspect.py and library_client.py in /distribution (#2707 ) # What does this PR do? Adds type guards in /distribution/inspect.py and ignores a valid-type mypy error in library_client.py. This PR is part of issue #2647 . I'm rather unsure whether ignoring the valid-type error is correct in this case. It appears that args[0] is interpreted as [any] but I didn't find any way to specify the type. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-24 09:51:46 -07:00
Matthew Farrellee	e33a50480d	fix: starter template and litellm backward compat conflict for openai (#2885 ) # What does this PR do? openai/models.py has backward compat entries for litellm model names. the starter template includes these in the list of registered models. the inclusion results in duplicate model registrations. the backward compat is no longer necessary. ## Test Plan ci	2025-07-24 17:28:37 +02:00
Sarthak Deshpande	cd8715d327	chore: Added openai compatible vector io endpoints for chromadb (#2489 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-23 13:51:58 -07:00
Derek Higgins	fd2aab8582	fix: prevent shell redirection issues with pip dependencies (#2867 ) - Use printf to to escape special characters (e.g. < > ) - Apply escaping to pip_dependencies and special_pip_deps Resolves shell interpretation of >= operators as redirections that were causing build failing to respect versions and unexpected file creation in /app directory. Closes: #2866 ## Test Plan Manually tested, will also be tested by existing CI Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-23 21:43:33 +02:00
Derek Higgins	427136bb63	fix: cleanup after build_container.sh (#2869 ) - rm TEMP_DIR when build_container.sh succeeds - prevents multiple temp directories with Containerfile being left in /tmp Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-23 11:54:54 -07:00
Ashwin Bharambe	2fcfb0f0b5	fix: bring back dell template (#2880 ) This template is definitely needed since it (and related docker, which will push soon) is used by folks at Dell.	2025-07-23 11:40:59 -07:00
Mark Campbell	8353ad4981	fix: search mode validation for rag query (#2857 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> I noticed a few issues with my implementation of the search mode validation for RagQuery. This PR replaces the check for search mode in RagQuery with a Literal. There were issues before with ``` TypeError: Object of type RAGSearchMode is not JSON serializable ``` When using ``` query_config = RAGQueryConfig(max_chunks=6, mode="vector").model_dump() ``` It also fixes the fact that despite user input "vector" was always the used search mode. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Verify that a chosen search mode works when using Rag Query or use below agent config: ``` agent = Agent( client, model=model_id, instructions="You are a helpful assistant", tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], "query_config": { "mode": "keyword", "max_chunks": 6 } }, } ], ) ``` Running Unit Tests: ``` uv sync --extra dev uv run pytest tests/unit/rag/test_rag_query.py -v ```	2025-07-23 11:25:12 -07:00
Francisco Arceo	2aba2c1236	chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863 ) # What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-23 13:35:48 -04:00
Matthew Farrellee	e1ed152779	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests	2025-07-23 06:49:40 -04:00
Sébastien Han	c0563c0560	fix: honour deprecation of --config and --template (#2856 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Integration Tests / test-matrix (push) Failing after 12s Details Test Llama Stack Build / build (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2716/ broke commands like: ``` python -m llama_stack.distribution.server.server --config llama_stack/templates/starter/run.yaml ``` And will fail with: ``` Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 626, in <module> main() File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 402, in main config_file = resolve_config_or_template(args.config, Mode.RUN) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/utils/config_resolution.py", line 43, in resolve_config_or_template config_path = Path(config_or_template) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1162, in __init__ super().__init__(args) File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 373, in __init__ raise TypeError( TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType' ``` Complaining that no positional arguments are present. We now honour the deprecation until --config and --template are removed completely. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.* --> Both ` python -m llama_stack.distribution.server.server --config llama_stack/templates/starter/run.yaml` and ` python -m llama_stack.distribution.server.server llama_stack/templates/starter/run.yaml` should run the server. Same for `--template starter`. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-22 20:48:23 -07:00
Derek Higgins	340448e0aa	fix: optimize container build by enabling uv cache (#2855 ) - Remove --no-cache flags from uv pip install commands to enable caching - Mount host uv cache directory to container for persistent caching - Set UV_LINK_MODE=copy to prevent uv using hardlinks - When building the starter image o Build time reduced from ~4:45 to ~3:05 on subsequent builds (environment specific) o Eliminates re-downloading of 3G+ of data on each build o Cache size: ~6.2G (when building starter image) Fixes excessive data downloads during distro container builds. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-22 16:51:52 -07:00
Ashwin Bharambe	3b83032555	feat(registry): more flexible model lookup (#2859 ) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests	2025-07-22 15:22:48 -07:00
Mustafa Elbehery	9736f096f6	chore(test): fix flaky telemetry tests (#2815 ) Some checks failed Installer CI / lint (push) Failing after 2s Details Installer CI / smoke-test (push) Has been skipped Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 48s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes flaky telemetry tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> See https://github.com/meta-llama/llama-stack/pull/2814 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-22 12:30:14 -07:00
Francisco Arceo	20c3197952	chore: Making name optional in openai_create_vector_store (#2858 ) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-22 13:31:31 -04:00
ehhuang	8e1a2b4703	chore: remove *_openai_compat providers (#2849 ) # What does this PR do? These are no longer needed as llama-stack-evals can run against OAI endpoints directly. ## Test Plan	2025-07-22 10:25:36 -07:00
Omer Tuchfeld	5e18d4d097	fix(agent): ensure turns are sorted (#2854 ) # What does this PR do? Ensures that session turns retrieved from the agent persistence layer are sorted by their `started_at` timestamp, as the key-value store does not guarantee order. Closes #2852 ## Test Plan - [ ] Add unit tests	2025-07-22 10:24:51 -07:00
Jeremy Bonghwan Choi	b5a6ecc331	docs: minor fix of the pgvector provider spec description (#2847 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> minor update of the pgvector doc, changing 'faiss' to 'pgvector' <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-21 22:10:35 -07:00
Francisco Arceo	c8f274347d	chore: Adding Access Control for OpenAI Vector Stores methods (#2772 ) # What does this PR do? Refactors the vector store routing logic by moving OpenAI-compatible vector store operations from the `VectorIORouter` to the `VectorDBsRoutingTable`. Closes https://github.com/meta-llama/llama-stack/issues/2761 ## Test Plan Added unit tests to cover new routing logic and ACL checks. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-21 16:22:44 -04:00
ehhuang	0d7a90b8bc	chore: merge --config and --template in server.py (#2716 ) # What does this PR do? Part of #2696 ## Test Plan Run `llama stack run starter` Error: ``` myenv ❯ llama stack run starters WARNING 2025-07-10 12:12:43,052 llama_stack.cli.stack.run:82 server: Conda detected. Using conda environment myenv for the run. usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] [--image-type {conda,venv}] [--enable-ui] [config \| template] llama stack run: error: Could not resolve config or template 'starters'. Tried the following locations: 1. As file path: /Users/erichuang/projects/llama-stack-git/starters 2. As template: /Users/erichuang/projects/llama-stack-git/llama_stack/templates/starters/run.yaml 3. As built distribution: (/Users/erichuang/.llama/distributions/llamastack-starters/starters-run.yaml, /Users/erichuang/.llama/distributions/starters/starters-run.yaml) Available templates: dell, test-env, vllm-gpu, test-template, cerebras, openai-api-verification, sambanova, passthrough, direct-config, together, openai, fireworks, meta-reference-gpu, __pycache__, dev, ollama, watsonx, remote-vllm, llama_api, groq, dummy, oracle, nvidia, ci-tests, postgres-demo, test-stack, bedrock, starter, hf-serverless, hf-endpoint, tgi, open-benchmark, verification Did you mean one of these templates? - starter - together - postgres-demo ```	2025-07-21 13:19:27 -07:00
Charlie Doern	9a03526672	fix: uvicorn respect log_config (#2842 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 9s Details Coverage Badge / unit-tests (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Integration Tests / test-matrix (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m7s Details	2025-07-21 12:50:39 -07:00
Sébastien Han	019ddda138	fix: graceful SIGINT on server (#2831 ) # What does this PR do? After https://github.com/meta-llama/llama-stack/pull/2818, SIGINT will print a stack trace. This is because uvicorn re-raises SIGINT and it gets converted by Python internal signal handler (default handles SIGINT) to KeyboardInterrupt exception. We know simply catch the exception to get a clean exit, this is not changing the behavior on SIGINT. ## Test Plan Run the server, hit Ctrl+C or `kill -2 <server pid>` and expect a clean exit with no stack trace. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-21 11:35:15 -07:00
Ondrej Metelka	89c49eb003	feat: Allow application/yaml as mime_type (#2575 ) # What does this PR do? Allow application/yaml as mime_type for documents. ## Test Plan Added unit tests.	2025-07-21 15:43:32 +02:00
Mustafa Elbehery	b2c7543af7	fix(vectordb): VectorDBInput has no provider_id (#2830 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Integration Tests / discover-tests (push) Successful in 21s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s Details Integration Tests / test-matrix (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 59s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR add `provider_id` field to `VectorDBInput` class. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2819 Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-21 14:03:40 +02:00
ehhuang	0a6e588f68	feat: enable auth for LocalFS Files Provider (#2773 ) Some checks failed Integration Tests / discover-tests (push) Successful in 4s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Coverage Badge / unit-tests (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Test Llama Stack Build / build (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Python Package Build Test / build (3.13) (push) Failing after 2m19s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 2m25s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m32s Details Integration Tests / test-matrix (push) Failing after 2m24s Details Pre-commit / pre-commit (push) Successful in 3m57s Details # What does this PR do? Supports authentication for LocalFS Files provider. closes https://github.com/meta-llama/llama-stack/issues/2760 ## Test Plan CI. added tests.	2025-07-18 19:11:01 -07:00
Ashwin Bharambe	dd303327f3	feat(ci): add a ci-tests distro (#2826 )	2025-07-18 17:11:06 -07:00
Ashwin Bharambe	199f859eec	feat(vllm): periodically refresh models (#2823 ) Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM.	2025-07-18 15:53:09 -07:00
Ashwin Bharambe	ade075152e	chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`	2025-07-18 15:52:18 -07:00
Ashwin Bharambe	68a2dfbad7	feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`	2025-07-18 12:20:36 -07:00
ehhuang	6d55f2f137	feat: enable ls client for files tests (#2769 ) # What does this PR do? titled ## Test Plan CI	2025-07-18 12:10:30 -07:00
Nehanth Narendrula	874b1cb00f	fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Integration Tests / test-matrix (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Test Llama Stack Build / build (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 1m50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Pre-commit / pre-commit (push) Successful in 3m20s Details # What does this PR do? This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct Preference Optimization (DPO) parameters. The current schema incorrectly uses PPO-inspired parameters (`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of the DPO algorithm. This PR updates it to use the standard DPO parameters: - `beta`: The KL divergence coefficient that controls deviation from the reference model - `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo, kto_pair) These parameters align with standard DPO implementations like HuggingFace's TRL library. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>	2025-07-18 11:56:00 -07:00
Charlie Doern	d994305f0a	fix: remove disabled providers from model dump (#2784 ) # What does this PR do? currently when running `llama stack run --template starter...` the __disabled__ providers, their models, etc are printed alongside the enabled ones making the output really confusing in server.py add a utility `remove_disabled_providers` which post-processes the model_dump output to remove any dict with `provider_id: __disabled__` we also have `debug` logs printing the disabled providers, so I think its safe to say that is the only indicator we need when using starter. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan before (output truncated because it was huge): ``` ... model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '******' base_url: https://api.cerebras.ai provider_id: __disabled__ provider_type: remote::cerebras - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: api_token: '****' max_tokens: ${env.VLLM_MAX_TOKENS:=4096} tls_verify: ${env.VLLM_TLS_VERIFY:=true} url: ${env.VLLM_URL} provider_id: __disabled__ provider_type: remote::vllm - config: url: ${env.TGI_URL} provider_id: __disabled__ provider_type: remote::tgi - config: api_token: '****' huggingface_repo: ${env.INFERENCE_MODEL} provider_id: __disabled__ provider_type: remote::hf::serverless - config: api_token: '****' endpoint_name: ${env.INFERENCE_ENDPOINT_NAME} provider_id: __disabled__ provider_type: remote::hf::endpoint - config: api_key: '****' url: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks - config: api_key: '****' url: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together - config: {} provider_id: __disabled__ provider_type: remote::bedrock - config: api_token: '****' url: ${env.DATABRICKS_URL} provider_id: __disabled__ provider_type: remote::databricks - config: api_key: '****' append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True} url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com} provider_id: __disabled__ provider_type: remote::nvidia - config: api_token: '****' url: ${env.RUNPOD_URL:=} provider_id: __disabled__ provider_type: remote::runpod - config: api_key: '****' provider_id: __disabled__ provider_type: remote::openai - config: api_key: '****' provider_id: __disabled__ provider_type: remote::anthropic - config: api_key: '****' provider_id: __disabled__ provider_type: remote::gemini - config: api_key: '****' url: https://api.groq.com provider_id: __disabled__ provider_type: remote::groq - config: api_key: '****' openai_compat_api_base: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.llama.com/compat/v1/ provider_id: __disabled__ provider_type: remote::llama-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.groq.com/openai/v1 provider_id: __disabled__ provider_type: remote::groq-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.cerebras.ai/v1 provider_id: __disabled__ provider_type: remote::cerebras-openai-compat - config: api_key: '****' url: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova - config: api_key: '****' url: ${env.PASSTHROUGH_URL} provider_id: __disabled__ provider_type: remote::passthrough - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu distributed_backend: null provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: otel_exporter_otlp_endpoint: null service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '****' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss - config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::sqlite-vec - config: db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::milvus - config: url: ${env.CHROMADB_URL:=} provider_id: __disabled__ provider_type: remote::chromadb - config: db: ${env.PGVECTOR_DB:=} host: ${env.PGVECTOR_HOST:=localhost} kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db type: sqlite password: '****' port: ${env.PGVECTOR_PORT:=5432} user: ${env.PGVECTOR_USER:=} provider_id: __disabled__ provider_type: remote::pgvector scoring_fns: [] server: auth: null host: null port: 8321 quota: null tls_cafile: null tls_certfile: null tls_keyfile: null shields: - params: null provider_id: null provider_shield_id: ollama/__disabled__ shield_id: __disabled__ tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` after: ``` INFO 2025-07-16 13:00:32,604 __main__:448 server: Run configuration: INFO 2025-07-16 13:00:32,606 __main__:450 server: apis: - agents - datasetio - eval - files - inference - post_training - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] datasets: [] image_name: starter inference_store: db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db type: sqlite metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/registry.db type: sqlite models: - metadata: {} model_id: ollama/llama3.2:3b model_type: llm provider_id: ollama provider_model_id: llama3.2:3b - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-18 10:44:35 -07:00
Ashwin Bharambe	9e3ae50306	feat(server): construct the stack in a persistent event loop (#2818 ) When we call `construct_stack()`, providers are instantiated and `initialize()` is called. This call can end up doing _anything_ at all -- specifically, providers are free to create long running background tasks as part of this. If we wrapped this within a `asyncio.run()` as in the current code, these tasks get canceled when the stack construction finishes. This is not correct. The PR addresses the issue by creating a persistent event loop which is used for both the stack as well as for running the uvicorn server. In other words, the lifetime of the providers (and downstream async code) is now the same as the lifetime of the uvicorn server. ## Test Plan This should not affect any current code since we don't have background tasks created right now. However, https://github.com/meta-llama/llama-stack/pull/2805 will start using this functionality.	2025-07-18 10:29:19 -07:00
Mustafa Elbehery	b78b8e1486	chore: add `mypy` inference parallel utils (#2670 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 12:01:10 +02:00
Mustafa Elbehery	ca7edcd6a4	chore(api): add `mypy` coverage to `chat_format` (#2654 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:56:53 +02:00
Ashwin Bharambe	d64e096c5f	fix(cli): image name should not default to CONDA_DEFAULT_ENV (#2806 ) Some checks failed Integration Tests / discover-tests (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 19s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Unit Tests / unit-tests (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 53s Details Pre-commit / pre-commit (push) Failing after 2m14s Details If I am running `uv run llama stack run --image-type venv` it should not be saying to me "Conda detected" because I am pretty clearly telling it I need venv. The root cause is the offending line.	2025-07-17 16:40:35 -07:00
Matthew Farrellee	477bcd4d09	feat: allow dynamic model registration for nvidia inference provider (#2726 ) # What does this PR do? let's users register models available at https://integrate.api.nvidia.com/v1/models that isn't already in llama_stack/providers/remote/inference/nvidia/models.py ## Test Plan 1. run the nvidia distro 2. register a model from https://integrate.api.nvidia.com/v1/models that isn't already know, as of this writing nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example 3. perform inference w/ the model	2025-07-17 12:11:30 -07:00
Matthew Farrellee	57745101be	chore: internal change, make Model.provider_model_id non-optional (#2690 ) Some checks failed Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 25s Details Test Llama Stack Build / build-single-provider (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Details Unit Tests / unit-tests (3.12) (push) Failing after 32s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s Details Unit Tests / unit-tests (3.13) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s Details Python Package Build Test / build (3.13) (push) Failing after 1m51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Integration Tests / test-matrix (push) Failing after 36s Details Test Llama Stack Build / build (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 3m40s Details - POST /v1/models accepts optional provider_model_id - ModelsRoutingTable.register_model handler ensures it is non-None, providing a default usage of Model.provider_model_id will no longer need to detect None	2025-07-17 08:26:57 -07:00
Derek Higgins	c2b64dce5b	fix: Move sentence-transformers to the top (#2703 ) Move sentence-transformers to be the first embedding in the list of models. This ensures it will always be the default and is more consistent then having the default change based on what env variables are available Closes: #2702 ## Test Plan Manually verified Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-17 10:31:30 -04:00

1 2 3 4 5 ...

1396 commits