llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-30 23:51:00 +00:00

Author	SHA1	Message	Date
Matthew Farrellee	968fc132d3	fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932 ) What: - Added OpenAIChatCompletionTextOnlyMessageContent type for text-only content validation - Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam, OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only content type instead of mixed content - OpenAIUserMessageParam unchanged - still accepts both text and images - Updated OpenAPI spec files to reflect text-only content restrictions in schemas closes #2894 Why: - Enforces OpenAI API compatibility by restricting image content to user messages only - Prevents API misuse where images might be sent in message types that don't support them - Aligns with OpenAI's actual API behavior where only user messages can contain multimodal content - Improves type safety and validation at the API boundary Test plan: - Added comprehensive parametrized tests covering all 5 OpenAI message types - Tests verify text string acceptance for all message types - Tests verify text list acceptance for all message types - Tests verify image rejection for system/assistant/developer/tool messages (ValidationError expected) - Tests verify user messages still accept images (backward compatibility maintained)	2025-07-28 10:36:34 -07:00
Matthew Farrellee	60bb5e307e	feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919 ) # What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests	2025-07-28 10:16:02 -07:00
Charlie Doern	b1c21a25ec	docs: remove provider_id from external docs (#2922 ) # What does this PR do? external provider docs mention setting provider_id in the build yaml. Since we changed that to just be provider_type and module, remove instances of provider_id Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-28 10:14:39 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Nathan Weinberg	025163d8e6	feat: add auto-generated CI documentation pre-commit hook (#2890 ) # What does this PR do? Our CI is entirely undocumented, this commit adds a README.md file with a table of the current CI and what is does --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-25 17:57:01 +02:00
Charlie Doern	de6919ecdd	refactor: install external providers from module (#2637 ) # What does this PR do? Today, external providers are installed via the `external_providers_dir` in the config. This necessitates users to understand the `ProviderSpec` and set up their directories accordingly. This process splits up the config for the stack across multiple files, directories, and formats. Most (if not all) external providers today have a [get_provider_spec](`559cb18fbb/src/ramalama_stack/provider.py (L9)`) method that sits unused. Utilizing this method rather than the providers.d route allows for a much easier installation process for external providers and limits the amount of extra configuration a regular user has to do to get their stack off the ground. To accomplish this and wire it throughout the build process, Introduce the concept of a `module` for users to specify for an external provider upon build time. In order to facilitate this, align the build and run spec to use `Provider` class rather than the stringified provider_type that build currently uses. For example, say this is in your build config: ``` - provider_id: ramalama provider_type: remote::ramalama module: ramalama_stack ``` during build (in the various `build_...` scripts), additionally to installing any pip dependencies we will also install this module and use the `get_provider_spec` method to retrieve the ProviderSpec that is currently specified using `providers.d`. In production so far, providing instructions for installing external providers for users has been difficult: they need to install the module as a pre-req, create the providers.d directory, copy in the provider spec, and also copy in the necessary build/run yaml files. Accessing an external provider should be as easy as possible, and pointing to its installable module aligns more with the rest of our build and dependency management process. For now, `external_providers_dir` still exists as an alternate more declarative method of using external providers. ## Test Plan added an integration test installing an external provider from module and more unit test coverage for `get_provider_registry` ( the warning in yellow is expected, the module is installed inside of the build env, not where we are running the command) <img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30 48 AM" src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d" /> <img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31 14 AM" src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:41:26 +02:00
ehhuang	21bae296f2	feat(auth): API access control (#2822 ) # What does this PR do? - Added ability to specify `required_scope` when declaring an API. This is part of the `@webmethod` decorator. - If auth is enabled, a user can access an API only if `user.attributes['scope']` includes the `required_scope` - We add `required_scope='telemetry.read'` to the telemetry read APIs. ## Test Plan CI with added tests 1. Enable server.auth with github token 2. Observe `client.telemetry.query_traces()` returns 403	2025-07-24 15:30:48 -07:00
Sébastien Han	632cf9eb72	feat: Bring Your Own API (BYOA) (#2228 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Installer CI / lint (push) Failing after 3s Details Integration Tests / discover-tests (push) Successful in 3s Details Installer CI / smoke-test-on-dev (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? Prototype on a new feature to allow new APIs to be plugged in Llama Stack. Opened for early feedback on the approach and test appetite on the functionality. @ashwinb @raghotham open for early feedback, thanks! --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-24 13:41:14 -07:00
Kelly Brown	abade761e0	docs: Update nvidia docs template (#2893 ) Description Fixes generation issue in nvidia code gen file. Closes #2873	2025-07-24 22:11:34 +02:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Charlie Doern	d4f0b430e2	docs: update list of apis (#2697 ) # What does this PR do? apis.md had a few APIs missing and incorrectly described APIs Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-24 09:50:14 -07:00
Christian Zaccaria	7f7b990b80	docs: Document use cases for Responses and Agents APIs (#2756 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This pull request adds documentation to clarify the differences between the Agents API and the OpenAI Responses API, including use cases for each. It also updates the index page to reference the new documentation. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2368	2025-07-24 12:20:04 -04:00
Sarthak Deshpande	cd8715d327	chore: Added openai compatible vector io endpoints for chromadb (#2489 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-23 13:51:58 -07:00
Matthew Farrellee	e1ed152779	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests	2025-07-23 06:49:40 -04:00
grs	fc67ad408a	chore: add some documentation for access policy rules (#2785 ) # What does this PR do? Adds some documentation on setting explicit access_policy rules in config.	2025-07-23 10:27:27 +02:00
Francisco Arceo	20c3197952	chore: Making name optional in openai_create_vector_store (#2858 ) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-22 13:31:31 -04:00
ehhuang	8e1a2b4703	chore: remove *_openai_compat providers (#2849 ) # What does this PR do? These are no longer needed as llama-stack-evals can run against OAI endpoints directly. ## Test Plan	2025-07-22 10:25:36 -07:00
Jeremy Bonghwan Choi	b5a6ecc331	docs: minor fix of the pgvector provider spec description (#2847 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> minor update of the pgvector doc, changing 'faiss' to 'pgvector' <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-21 22:10:35 -07:00
Francisco Arceo	2bc96613f9	chore: Adding demo script and importing it into the docs (#2848 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Coverage Badge / unit-tests (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m0s Details Test Llama Stack Build / build (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 2m39s Details # What does this PR do? This PR adds the quickstart as a file to the docs so that it can be more easily maintained and run, as mentioned in https://github.com/meta-llama/llama-stack/pull/2800. ## Test Plan I could add this as a test in the CI but I wasn't sure if we wanted to add additional jobs there. 😅 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-21 22:53:32 -04:00
Ashwin Bharambe	199f859eec	feat(vllm): periodically refresh models (#2823 ) Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM.	2025-07-18 15:53:09 -07:00
Ashwin Bharambe	ade075152e	chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`	2025-07-18 15:52:18 -07:00
Ashwin Bharambe	68a2dfbad7	feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`	2025-07-18 12:20:36 -07:00
Nehanth Narendrula	874b1cb00f	fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Integration Tests / test-matrix (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Test Llama Stack Build / build (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 1m50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Pre-commit / pre-commit (push) Successful in 3m20s Details # What does this PR do? This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct Preference Optimization (DPO) parameters. The current schema incorrectly uses PPO-inspired parameters (`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of the DPO algorithm. This PR updates it to use the standard DPO parameters: - `beta`: The KL divergence coefficient that controls deviation from the reference model - `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo, kto_pair) These parameters align with standard DPO implementations like HuggingFace's TRL library. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>	2025-07-18 11:56:00 -07:00
Nathan Weinberg	2bb9039173	docs: fix steps in the Quick Start Guide (#2800 ) # What does this PR do? 'build' command didn't take into account ENABLE flags for starter distro for some reason, I was having issues with HuggingFace access for the embedding model, so added a tip for that as well Closes #2779 ## Test Plan I ran the described steps manually, but it would be nice if someone else could try it and verify this still works We might consider having some CI job ensure the QSG remains functional - it's not a great experience for new users if they try Llama Stack for the first time and it doesn't work as we describe Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:08:46 -07:00
Nathan Weinberg	1785a6b39c	docs: add virtualenv instructions for running starter distro (#2780 ) # What does this PR do? we had directions for a container and conda but not venv Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:07:43 -07:00
ehhuang	51b179e1c5	chore: update k8s template (#2786 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s Details Unit Tests / unit-tests (3.13) (push) Failing after 54s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? - enables auth - updates to use distribution-starter docker ## Test Plan bash apply.sh	2025-07-16 15:07:26 -07:00
Nathan Weinberg	919ee3199b	docs: add missing bold title to match others (#2782 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 18:05:48 +02:00
Kelly Brown	b096794959	docs: Reorganize documentation on the webpage (#2651 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 14s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Test Llama Stack Build / build (push) Failing after 25s Details Integration Tests / test-matrix (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 2m24s Details # What does this PR do? Reorganizes the Llama stack webpage into more concise index pages, introduce more of a workflow, and reduce repetition of content. New nav structure so far based on #2637 Further discussions in https://github.com/meta-llama/llama-stack/discussions/2585 Preview: ![Screenshot 2025-07-09 at 2 31 53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af) You can also build a full local preview locally Feedback Looking for feedback on page titles and general feedback on the new structure Follow up documentation I plan on reducing some sections and standardizing some terminology in a follow up PR. More discussions on that in https://github.com/meta-llama/llama-stack/discussions/2585	2025-07-15 14:19:35 -07:00
Mark Campbell	65fcd03461	docs: update outdated llama stack client documentation (#2758 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds new documentation that was missing for the Llama Stack Python Client as well as updates old/outdated docs	2025-07-15 11:49:59 -07:00
Francisco Arceo	31b088978a	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 11:24:41 -04:00
Varsha	4ae5656c2f	feat: Implement keyword search in milvus (#2231 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s Details Update ReadTheDocs / update-readthedocs (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m9s Details # What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally. In order to verify the implementation, run: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` You can also test the changes using the below script: ``` #!/usr/bin/env python3 import asyncio import os import uuid from typing import List from llama_stack_client import ( Agent, AgentEventLogger, LlamaStackClient, RAGDocument ) class MilvusRAGDemo: def __init__(self, base_url: str = "http://localhost:8321/"): self.client = LlamaStackClient(base_url=base_url) self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}" self.model_id = None self.embedding_model_id = None self.embedding_dimension = None def setup_models(self): """Get available models and select appropriate ones for LLM and embeddings.""" models = self.client.models.list() # Select embedding model embedding_models = [m for m in models if m.model_type == "embedding"] if not embedding_models: raise ValueError("No embedding models found") self.embedding_model_id = embedding_models[0].identifier self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"] def register_vector_db(self): print(f"Registering Milvus vector database: {self.vector_db_id}") response = self.client.vector_dbs.register( vector_db_id=self.vector_db_id, embedding_model=self.embedding_model_id, embedding_dimension=self.embedding_dimension, provider_id="milvus-remote", # Use remote Milvus ) print(f"Vector database registered successfully") return response def insert_documents(self): """Insert sample documents into the vector database.""" print("\nInserting sample documents...") # Sample documents about different topics documents = [ RAGDocument( document_id="ai_ml_basics", content=""" Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world. AI refers to the simulation of human intelligence in machines, while ML is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Deep learning, a subset of ML, uses neural networks with multiple layers to process complex patterns in data. Key concepts in AI/ML include: - Supervised Learning: Training with labeled data - Unsupervised Learning: Finding patterns in unlabeled data - Reinforcement Learning: Learning through trial and error - Neural Networks: Computing systems inspired by biological brains """, mime_type="text/plain", metadata={"topic": "technology", "category": "ai_ml"}, ), ] # Insert documents with chunking self.client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=self.vector_db_id, chunk_size_in_tokens=200, # Smaller chunks for better granularity ) print(f"Inserted {len(documents)} documents with chunking") def test_keyword_search(self): """Test keyword-based search using BM25.""" queries = [ "neural networks", "Python frameworks", "data cleaning", ] for query in queries: response = self.client.vector_io.query( vector_db_id=self.vector_db_id, query=query, params={ "mode": "keyword", # Keyword search "max_chunks": 3, "score_threshold": 0.0, } ) for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)): print(f" {i+1}. Score: {score:.4f}") print(f" Content: {chunk.content[:100]}...") print(f" Metadata: {chunk.metadata}") def run_demo(self): try: self.setup_models() self.register_vector_db() self.insert_documents() self.test_keyword_search() except Exception as e: print(f"Error during demo: {e}") raise def main(): """Main function to run the demo.""" # Check if Llama Stack server is running demo = MilvusRAGDemo() try: demo.run_demo() except Exception as e: print(f"Demo failed: {e}") if __name__ == "__main__": main() ``` [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-07-14 19:39:55 -04:00
Francisco Arceo	33f0d83ad3	chore: Move vector store `kvstore` implementation into `openai_vector_store_mixin.py` (#2748 )	2025-07-14 18:10:35 -04:00
ehhuang	aa0840c281	docs: fix building distro link (#2750 ) # What does this PR do? ## Test Plan Co-authored-by: raghotham <rsm@meta.com>	2025-07-14 12:06:56 -07:00
Sumanth Kamenani	77d2c8e95d	docs: clarify run.yaml files are starting points for customization (#2746 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Integration Tests / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Update ReadTheDocs / update-readthedocs (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 29s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? This PR improves documentation clarity around run.yaml file usage. It adds comprehensive guidance to help users understand that generated run.yaml files are templates meant to be customized for production use, not used as-is. ## Changes - Add new documentation section on customizing run.yaml files - Clarify that generated run.yaml files are templates, not production configs - Add guidance on customization best practices and common scenarios - Update existing documentation to reference customization guide - Improve clarity around run.yaml file usage for better user experience ## Test Plan - Verified new documentation file exists at correct location - Confirmed documentation is properly integrated into the toctree structure - Checked all internal links use correct paths and reference existing files - Validated references are added to relevant existing documentation files - Documentation build testing will be handled by CI environment	2025-07-14 09:53:13 -07:00
Mark Campbell	618ccea090	feat: add input validation for search mode of rag query config (#2275 ) # What does this PR do? Adds input validation for mode in RagQueryConfig This will prevent users from inputting search modes other than `vector` and `keyword` for the time being with `hybrid` to follow when that functionality is implemented. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] ``` # Check out this PR and enter the LS directory uv sync --extra dev ``` Run the quickstart [example](https://llama-stack.readthedocs.io/en/latest/getting_started/#step-3-run-the-demo) Alter the Agent to include a query_config ``` agent = Agent( client, model=model_id, instructions="You are a helpful assistant", tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], "query_config": { "mode": "i-am-not-vector", # Test for non valid search mode "max_chunks": 6 } }, } ], ) ``` Ensure you get the following error: ``` 400: {'errors': [{'loc': ['mode'], 'msg': "Value error, mode must be either 'vector' or 'keyword' if supported by the vector_io provider", 'type': 'value_error'}]} ``` ## Running unit tests ``` uv sync --extra dev uv run pytest tests/unit/rag/test_rag_query.py -v ``` [//]: # (## Documentation)	2025-07-14 09:11:34 -04:00
ehhuang	4cf1952c32	chore: update vllm k8s command to support tool calling (#2717 ) # What does this PR do? ## Test Plan	2025-07-10 14:40:17 -07:00
Nathan Weinberg	5fe3027cbf	chore: remove "rfc" directory and move original rfc to "docs" (#2718 ) # What does this PR do? the "rfc" directory has only a single document in it, and its the original RFC for creating Llama Stack simply the project directory structure by moving this into the "docs" directory and renaming it to "original_rfc" to preserve the context of the doc ## Why did you do this? A simplified top-level directory structure helps keep the project simpler and prevents misleading new contributors into thinking we use it (we really don't) --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Co-authored-by: raghotham <raghotham@gmail.com>	2025-07-10 14:06:10 -07:00
Francisco Arceo	6a6b66ae4f	chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665 ) # What does this PR do? This PR refactors and the VectorIO backend logic for `sqlite-vec` and adds unit tests and fixtures to make it easy to test both `sqlite-vec` and `milvus`. Key changes: - `sqlite-vec` migrated to `kvstore` registry - added in-memory cache for sqlite-vec to be consistent with `milvus` - default fixtures moved to `conftest.py` - removed redundant tests from sqlite`-vec` - made `test_vector_io_openai_vector_stores.py` more easily extensible ## Test Plan Unit tests added testing inline providers. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-10 14:22:13 -04:00
Jorge	dafd9ed5c0	docs: Update links to Android Demo App (#2687 ) # What does this PR do? Updates some broken or outdated links pointing to the Android Demo App Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>	2025-07-09 15:41:57 +02:00
Mustafa Elbehery	cd0ad21111	chore(api): add `mypy` coverage to `apis` (#2648 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack/apis` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 12:55:16 +02:00
pgustafs	d39660afed	fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630 ) - Fix constructor call missing files_api parameter - Add kvstore field to MilvusVectorIOConfig - Resolves #2626 # What does this PR do? [https://github.com/meta-llama/llama-stack/issues/2626] ## Problem The `MilvusVectorIOAdapter` fails to initialize due to two missing configuration issues: 1. Missing `files_api` parameter in the constructor call 2. Missing `kvstore` field in the `MilvusVectorIOConfig` class ## Root Cause 1. The adapter constructor expects 3 parameters `(config, inference_api, files_api)` but the `get_adapter_impl` function only passes 2 parameters 2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the adapter's `initialize()` method expects for metadata persistence ## Solution - Added `files_api = deps.get(Api.files, None)` to safely retrieve files API from dependencies - Pass the files_api parameter to MilvusVectorIOAdapter constructor - Added `kvstore: KVStoreConfig \| None = None` field to MilvusVectorIOConfig - Maintains backward compatibility since both files_api and kvstore can be None Closes #2626 ## Test Plan - [x] Tested with Milvus configuration - server starts successfully ```yaml vector_io: - provider_id: milvus provider_type: remote::milvus config: uri: http://localhost:19530 token: root:Milvus kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db ``` - [x] Vector operations work as expected ```python from llama_stack_client import LlamaStackClient from llama_stack_client.types.shared_params.document import Document as RAGDocument from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger import os endpoint = os.getenv("LLAMA_STACK_ENDPOINT") model = os.getenv("INFERENCE_MODEL") # Initialize the client client = LlamaStackClient(base_url=endpoint) vector_db_id = "my_documents" response = client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, provider_id="milvus", ) urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"] documents = [ RAGDocument( document_id=f"num-{i}", content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}", mime_type="application/pdf", metadata={}, ) for i, url in enumerate(urls) ] client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) rag_agent = Agent( client, model=model, # Define instructions for the agent (system prompt) instructions="You are a helpful assistant", enable_session_persistence=False, # Define tools available to the agent tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], }, } ], ) session_id = rag_agent.create_session("test-session") user_prompts = [ "How to start the AI Inference Server container image? use the knowledge_search tool to get information.", ] for prompt in user_prompts: print(f"User> {prompt}") response = rag_agent.create_turn( messages=[{"role": "user", "content": prompt}], session_id=session_id, ) for log in AgentEventLogger().log(response): log.print() ``` server logs: ``` INFO 2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [769725] INFO: Waiting for application startup. INFO 2025-07-04 22:18:30,390 __main__:158 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) INFO 2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to 20:18:52.194 [START] /v1/vector-dbs INFO: 192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK 20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms) 20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert INFO 2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for all-MiniLM-L6-v2... WARNING 2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed INFO 2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0 INFO 2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO: 192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK 20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms) 20:19:01.839 [START] /v1/agents INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK 20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms) 20:19:01.844 [START] /v1/tools INFO 2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with base_url=http://192.168.1.183:8080/v1 20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms) 20:19:01.868 [START] /v1/agents/{agent_id}/session 20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms) 20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn 20:19:01.885 [START] inference 20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms) INFO 2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.538 [START] tool_execution 20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms) 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.935 [START] inference 20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms) 20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms) ``` - [x] No regressions in functionality - [x] Configuration properly accepts kvstore settings --------- Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-07-09 10:08:14 +02:00
ehhuang	84fa83b788	fix: update k8s templates (#2645 ) Some checks failed Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 17s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Python Package Build Test / build (3.13) (push) Failing after 33s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? - fix env variables - use gpu for vllm - add eks/apply.py for aws - add template to set hf secret ## Test Plan bash apply.sh Co-authored-by: Eric Huang <erichuang@fb.com>	2025-07-08 15:57:01 -07:00
ehhuang	c8bac888af	feat(auth): support github tokens (#2509 ) # What does this PR do? This PR adds GitHub OAuth authentication support to Llama Stack, allowing users to authenticate using their GitHub credentials (#2508) . 1. support verifying github acesss tokens 2. support provider-specific auth error messages 3. opportunistic reorganized the auth configs for better ergonomics ## Test Plan Added unit tests. Also tested e2e manually: ``` server: port: 8321 auth: provider_config: type: github_token ``` ``` ~/projects/llama-stack/llama_stack/ui ❯ curl -v http://localhost:8321/v1/models * Host localhost:8321 was resolved. * IPv6: ::1 * IPv4: 127.0.0.1 * Trying [::1]:8321... * Connected to localhost (::1) port 8321 > GET /v1/models HTTP/1.1 > Host: localhost:8321 > User-Agent: curl/8.7.1 > Accept: / > * Request completely sent off < HTTP/1.1 401 Unauthorized < date: Fri, 27 Jun 2025 21:51:25 GMT < server: uvicorn < content-type: application/json < x-trace-id: 5390c6c0654086c55d87c86d7cbf2f6a < Transfer-Encoding: chunked < * Connection #0 to host localhost left intact {"error": {"message": "Authentication required. Please provide a valid GitHub access token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in the Authorization header (Bearer <token>)"}} ~/projects/llama-stack/llama_stack/ui ❯ ./scripts/unit-tests.sh ~/projects/llama-stack/llama_stack/ui ❯ curl "http://localhost:8321/v1/models" \ -H "Authorization: Bearer <token_obtained_from_github>" \ {"data":[{"identifier":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_resource_id":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-guard-3-8b","provider_resource_id":"accounts/fireworks/models/llama-guard-3-8b","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-v3p1-405b-instruct","provider_resource_id":"accounts/f ``` --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-08 11:02:36 -07:00
Francisco Arceo	83c89265e0	chore: Adding unit tests for Milvus and OpenAI compatibility (#2640 ) Some checks failed Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 36s Details Test Llama Stack Build / build-single-provider (push) Failing after 36s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? - Enabling Unit tests for Milvus to start to test OpenAI compatibility and fixing a few bugs. - Also fixed an inconsistency in the Milvus config between remote and inline. - Added pymilvus to extras for testing in CI I'm going to refactor this later to include the other inline providers so that we can catch issues sooner. I have another PR where I've been testing to find other bugs in the implementation (and required changes drafted here: https://github.com/meta-llama/llama-stack/pull/2617). ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-08 00:50:16 -07:00
Ben Browning	5bb3817c49	fix: Restore the nvidia distro (#2639 ) # What does this PR do? The `nvidia` distro was previously collapsed into the `starter` distro. However, the `nvidia` distro was setup specifically to use NVIDIA NeMo microservices as providers for all APIs and not just inference, which means it was doing quite a bit more than what the `starter` distro covers today. We should work with our friends at NVIDIA to determine the best place to maintain this distro long-term, but for now this restores the `nvidia` distro and its docs back to where they were so that things continue to work for their users. ## Test Plan I ensure the `nvidia` distro could build, and run at least to the point of complaining that I didn't provide the necessary API keys. ``` uv run llama stack build --template nvidia --image-type venv uv run llama stack run llama_stack/templates/nvidia/run.yaml ``` I also made sure the docs website built and looks reasonable, with the `nvidia` distro docs at the same URL it was previously (because it has incoming links from official NVIDIA NeMo docs, among other places). ``` uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-07-07 15:50:05 -07:00
Wen Zhou	4bca4af3e4	refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627 ) Some checks failed Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s Details Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s Details Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 39s Details Update ReadTheDocs / update-readthedocs (push) Failing after 41s Details Unit Tests / unit-tests (3.12) (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 1m30s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - we are using `all-minilm:l6-v2` but the model we download from ollama is `all-minilm:latest` latest: https://ollama.com/library/all-minilm:latest 1b226e2802db l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db - even currently they are exactly the same model but if [all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is updated, "latest" might not be the same for l6-v2. - the only change in this PR is pin the model id in ollama - also update detailed_tutorial with "starter" to replace deprecated "ollama". <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` >INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" >llama stack build --run --template ollama --image-type venv ... Build Successful! You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml .... - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: ollama provider_model_id: all-minilm:l6-v2 ... ``` test ``` >llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon" INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK" OpenAIChatCompletion( id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7', choices=[ OpenAIChatCompletionChoice( finish_reason='stop', index=0, message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam( role='assistant', content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.", name=None, tool_calls=None, refusal=None, annotations=None, audio=None, function_call=None ), logprobs=None ) ], created=1751644429, model='llama3.2:3b-instruct-fp16', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None} ) ``` --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>	2025-07-06 09:07:37 +05:30
Wen Zhou	c025cab3a3	docs: update docs to use "starter" than "ollama" (#2629 )	2025-07-05 08:44:57 +05:30
Francisco Arceo	dc7df60d42	docs: Update starter docs to include milvus inline (#2631 )	2025-07-05 08:43:39 +05:30
Sébastien Han	ea966565f6	feat: improve telemetry (#2590 ) Some checks failed Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 0s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s Details Python Package Build Test / build (3.12) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? * Use a single env variable to setup OTEL endpoint * Update telemetry provider doc * Update general telemetry doc with the metric with generate * Left a script to setup telemetry for testing Closes: https://github.com/meta-llama/llama-stack/issues/783 Note to reviewer: the `setup_telemetry.sh` script was useful for me, it was nicely generated by AI, if we don't want it in the repo, and I can delete it, and I would understand. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 17:29:09 +02:00
Sébastien Han	c4349f532b	feat: consolidate most distros into "starter" (#2516 ) # What does this PR do? * Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: https://github.com/meta-llama/llama-stack/issues/2502. ~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama to work properly in the CI.~ TODO: - [ ] We can only update `install.sh` when we get a new release. - [x] Update providers documentation - [ ] Update notebooks to reference starter instead of ollama Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 15:58:03 +02:00

1 2 3 4 5 ...

708 commits