llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-12 13:00:39 +00:00

Author	SHA1	Message	Date
Eran Cohen	a4bad6c0b4	feat: Add Google Vertex AI inference provider support (#2841 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 39s Details Pre-commit / pre-commit (push) Successful in 1m37s Details # What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-11 08:22:04 -04:00
Varsha	69dc789e15	docs: Add unsupported search mode info about FAISS (#3089 )	2025-08-10 17:34:34 -06:00
Varsha	ce72a28525	docs: Update doc on search modes for Milvus (#3078 ) # What does this PR do? Update Milvus doc on using search modes. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-10 18:48:36 -04:00
Ashwin Bharambe	7f834339ba	chore(misc): make tests and starter faster (#3042 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Details Pre-commit / pre-commit (push) Successful in 1m53s Details A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client`	2025-08-05 14:55:05 -07:00
Varsha	1f0766308d	feat: Add openAI compatible APIs to Qdrant (#2465 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Integration Tests (Replay) / discover-tests (push) Successful in 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s Details Test Llama Stack Build / build (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 2m39s Details # What does this PR do? Adds support to Vector store Open AI APIs in Qdrant. <!-- If resolving an issue, uncomment and update the line below --> Closes #2463 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-01 00:41:34 -04:00
Francisco Arceo	33cca26154	chore: Enabling Integration tests for Weaviate (#2882 ) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-31 20:29:50 -04:00
Kelly Brown	8a6c0fb930	docs: Reformat external provider documentation (#2982 ) Description This PR adjusts the external providers documentation to align with the new providers format. Splits up sections into the existing external providers and how to create them as well. <img width="1049" height="478" alt="Screenshot 2025-07-31 at 9 48 26 AM" src="https://github.com/user-attachments/assets/f13599cb-2fd1-4e57-8ca9-27b067264e33" /> Open to feedback and adjusting titles	2025-07-31 09:21:13 -07:00
Nehanth Narendrula	cf73146132	feat: Enable DPO training with HuggingFace inline provider (#2825 ) Some checks failed Integration Tests / discover-tests (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Integration Tests / record-tests (push) Has been skipped Details Integration Tests / run-tests (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s Details Test External API and Providers / test-external (venv) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s Details Update ReadTheDocs / update-readthedocs (push) Failing after 31s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s Details Unit Tests / unit-tests (3.12) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s Details Test Llama Stack Build / build (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m0s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s Details Pre-commit / pre-commit (push) Successful in 1m50s Details What does this PR do? This PR adds support for Direct Preference Optimization (DPO) training via the existing HuggingFace inline provider. It introduces a new DPO training recipe, config schema updates, dataset integration, and end-to-end testing to support preference-based fine-tuning with TRL. Test Plan Added integration test: tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize Ran tests on both CPU and CUDA environments --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-30 23:33:36 -07:00
Kelly Brown	026caa5551	docs: part 1 - fix warnings in documentation generation (#2861 ) Description This PR removes some of the warnings when uv builds the docs - Errors appear when generating docs about .md files not appearing in toctree. ~~Adding content to the `providers-gen.py ` file that adds `--- orphan: true ---` to to each file.~~. Added a toctree generator to the `providers-gen.py` file, this gets rid of the errors in the builds. - Deletes the `_openai_compat` files, extension of PR #2849 - Adds the `files` APIs section to the `providers` toctree on the index page - Manually adds the `--- orphan: true ---` to the advanced apis. Ill try to find a way to modify the providers code gen so it automatically adds it, but this fixes the errors. - Adds the `testing.md` to the `contributing` toctree - Adds `starting_llama_stack_server.md` to `distributions` toctree There are some other warnings im still looking at but this PR gets rid of most of the toctree errors Theres also an issue with the actual distribution-codegen that I can investigate in another PR. Opened a bug for it here #2873	2025-07-30 10:50:10 -07:00
Matthew Farrellee	60bb5e307e	feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919 ) # What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests	2025-07-28 10:16:02 -07:00
Charlie Doern	b1c21a25ec	docs: remove provider_id from external docs (#2922 ) # What does this PR do? external provider docs mention setting provider_id in the build yaml. Since we changed that to just be provider_type and module, remove instances of provider_id Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-28 10:14:39 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Charlie Doern	de6919ecdd	refactor: install external providers from module (#2637 ) # What does this PR do? Today, external providers are installed via the `external_providers_dir` in the config. This necessitates users to understand the `ProviderSpec` and set up their directories accordingly. This process splits up the config for the stack across multiple files, directories, and formats. Most (if not all) external providers today have a [get_provider_spec](`559cb18fbb/src/ramalama_stack/provider.py (L9)`) method that sits unused. Utilizing this method rather than the providers.d route allows for a much easier installation process for external providers and limits the amount of extra configuration a regular user has to do to get their stack off the ground. To accomplish this and wire it throughout the build process, Introduce the concept of a `module` for users to specify for an external provider upon build time. In order to facilitate this, align the build and run spec to use `Provider` class rather than the stringified provider_type that build currently uses. For example, say this is in your build config: ``` - provider_id: ramalama provider_type: remote::ramalama module: ramalama_stack ``` during build (in the various `build_...` scripts), additionally to installing any pip dependencies we will also install this module and use the `get_provider_spec` method to retrieve the ProviderSpec that is currently specified using `providers.d`. In production so far, providing instructions for installing external providers for users has been difficult: they need to install the module as a pre-req, create the providers.d directory, copy in the provider spec, and also copy in the necessary build/run yaml files. Accessing an external provider should be as easy as possible, and pointing to its installable module aligns more with the rest of our build and dependency management process. For now, `external_providers_dir` still exists as an alternate more declarative method of using external providers. ## Test Plan added an integration test installing an external provider from module and more unit test coverage for `get_provider_registry` ( the warning in yellow is expected, the module is installed inside of the build env, not where we are running the command) <img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30 48 AM" src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d" /> <img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31 14 AM" src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:41:26 +02:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Sarthak Deshpande	cd8715d327	chore: Added openai compatible vector io endpoints for chromadb (#2489 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-23 13:51:58 -07:00
ehhuang	8e1a2b4703	chore: remove *_openai_compat providers (#2849 ) # What does this PR do? These are no longer needed as llama-stack-evals can run against OAI endpoints directly. ## Test Plan	2025-07-22 10:25:36 -07:00
Jeremy Bonghwan Choi	b5a6ecc331	docs: minor fix of the pgvector provider spec description (#2847 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> minor update of the pgvector doc, changing 'faiss' to 'pgvector' <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-21 22:10:35 -07:00
Ashwin Bharambe	199f859eec	feat(vllm): periodically refresh models (#2823 ) Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM.	2025-07-18 15:53:09 -07:00
Ashwin Bharambe	ade075152e	chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`	2025-07-18 15:52:18 -07:00
Ashwin Bharambe	68a2dfbad7	feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`	2025-07-18 12:20:36 -07:00
Kelly Brown	b096794959	docs: Reorganize documentation on the webpage (#2651 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 14s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Test Llama Stack Build / build (push) Failing after 25s Details Integration Tests / test-matrix (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 2m24s Details # What does this PR do? Reorganizes the Llama stack webpage into more concise index pages, introduce more of a workflow, and reduce repetition of content. New nav structure so far based on #2637 Further discussions in https://github.com/meta-llama/llama-stack/discussions/2585 Preview: ![Screenshot 2025-07-09 at 2 31 53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af) You can also build a full local preview locally Feedback Looking for feedback on page titles and general feedback on the new structure Follow up documentation I plan on reducing some sections and standardizing some terminology in a follow up PR. More discussions on that in https://github.com/meta-llama/llama-stack/discussions/2585	2025-07-15 14:19:35 -07:00
Varsha	4ae5656c2f	feat: Implement keyword search in milvus (#2231 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s Details Update ReadTheDocs / update-readthedocs (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m9s Details # What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally. In order to verify the implementation, run: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` You can also test the changes using the below script: ``` #!/usr/bin/env python3 import asyncio import os import uuid from typing import List from llama_stack_client import ( Agent, AgentEventLogger, LlamaStackClient, RAGDocument ) class MilvusRAGDemo: def __init__(self, base_url: str = "http://localhost:8321/"): self.client = LlamaStackClient(base_url=base_url) self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}" self.model_id = None self.embedding_model_id = None self.embedding_dimension = None def setup_models(self): """Get available models and select appropriate ones for LLM and embeddings.""" models = self.client.models.list() # Select embedding model embedding_models = [m for m in models if m.model_type == "embedding"] if not embedding_models: raise ValueError("No embedding models found") self.embedding_model_id = embedding_models[0].identifier self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"] def register_vector_db(self): print(f"Registering Milvus vector database: {self.vector_db_id}") response = self.client.vector_dbs.register( vector_db_id=self.vector_db_id, embedding_model=self.embedding_model_id, embedding_dimension=self.embedding_dimension, provider_id="milvus-remote", # Use remote Milvus ) print(f"Vector database registered successfully") return response def insert_documents(self): """Insert sample documents into the vector database.""" print("\nInserting sample documents...") # Sample documents about different topics documents = [ RAGDocument( document_id="ai_ml_basics", content=""" Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world. AI refers to the simulation of human intelligence in machines, while ML is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Deep learning, a subset of ML, uses neural networks with multiple layers to process complex patterns in data. Key concepts in AI/ML include: - Supervised Learning: Training with labeled data - Unsupervised Learning: Finding patterns in unlabeled data - Reinforcement Learning: Learning through trial and error - Neural Networks: Computing systems inspired by biological brains """, mime_type="text/plain", metadata={"topic": "technology", "category": "ai_ml"}, ), ] # Insert documents with chunking self.client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=self.vector_db_id, chunk_size_in_tokens=200, # Smaller chunks for better granularity ) print(f"Inserted {len(documents)} documents with chunking") def test_keyword_search(self): """Test keyword-based search using BM25.""" queries = [ "neural networks", "Python frameworks", "data cleaning", ] for query in queries: response = self.client.vector_io.query( vector_db_id=self.vector_db_id, query=query, params={ "mode": "keyword", # Keyword search "max_chunks": 3, "score_threshold": 0.0, } ) for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)): print(f" {i+1}. Score: {score:.4f}") print(f" Content: {chunk.content[:100]}...") print(f" Metadata: {chunk.metadata}") def run_demo(self): try: self.setup_models() self.register_vector_db() self.insert_documents() self.test_keyword_search() except Exception as e: print(f"Error during demo: {e}") raise def main(): """Main function to run the demo.""" # Check if Llama Stack server is running demo = MilvusRAGDemo() try: demo.run_demo() except Exception as e: print(f"Demo failed: {e}") if __name__ == "__main__": main() ``` [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-07-14 19:39:55 -04:00
Francisco Arceo	33f0d83ad3	chore: Move vector store `kvstore` implementation into `openai_vector_store_mixin.py` (#2748 )	2025-07-14 18:10:35 -04:00
Francisco Arceo	6a6b66ae4f	chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665 ) # What does this PR do? This PR refactors and the VectorIO backend logic for `sqlite-vec` and adds unit tests and fixtures to make it easy to test both `sqlite-vec` and `milvus`. Key changes: - `sqlite-vec` migrated to `kvstore` registry - added in-memory cache for sqlite-vec to be consistent with `milvus` - default fixtures moved to `conftest.py` - removed redundant tests from sqlite`-vec` - made `test_vector_io_openai_vector_stores.py` more easily extensible ## Test Plan Unit tests added testing inline providers. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-10 14:22:13 -04:00
pgustafs	d39660afed	fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630 ) - Fix constructor call missing files_api parameter - Add kvstore field to MilvusVectorIOConfig - Resolves #2626 # What does this PR do? [https://github.com/meta-llama/llama-stack/issues/2626] ## Problem The `MilvusVectorIOAdapter` fails to initialize due to two missing configuration issues: 1. Missing `files_api` parameter in the constructor call 2. Missing `kvstore` field in the `MilvusVectorIOConfig` class ## Root Cause 1. The adapter constructor expects 3 parameters `(config, inference_api, files_api)` but the `get_adapter_impl` function only passes 2 parameters 2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the adapter's `initialize()` method expects for metadata persistence ## Solution - Added `files_api = deps.get(Api.files, None)` to safely retrieve files API from dependencies - Pass the files_api parameter to MilvusVectorIOAdapter constructor - Added `kvstore: KVStoreConfig \| None = None` field to MilvusVectorIOConfig - Maintains backward compatibility since both files_api and kvstore can be None Closes #2626 ## Test Plan - [x] Tested with Milvus configuration - server starts successfully ```yaml vector_io: - provider_id: milvus provider_type: remote::milvus config: uri: http://localhost:19530 token: root:Milvus kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db ``` - [x] Vector operations work as expected ```python from llama_stack_client import LlamaStackClient from llama_stack_client.types.shared_params.document import Document as RAGDocument from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger import os endpoint = os.getenv("LLAMA_STACK_ENDPOINT") model = os.getenv("INFERENCE_MODEL") # Initialize the client client = LlamaStackClient(base_url=endpoint) vector_db_id = "my_documents" response = client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, provider_id="milvus", ) urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"] documents = [ RAGDocument( document_id=f"num-{i}", content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}", mime_type="application/pdf", metadata={}, ) for i, url in enumerate(urls) ] client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) rag_agent = Agent( client, model=model, # Define instructions for the agent (system prompt) instructions="You are a helpful assistant", enable_session_persistence=False, # Define tools available to the agent tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], }, } ], ) session_id = rag_agent.create_session("test-session") user_prompts = [ "How to start the AI Inference Server container image? use the knowledge_search tool to get information.", ] for prompt in user_prompts: print(f"User> {prompt}") response = rag_agent.create_turn( messages=[{"role": "user", "content": prompt}], session_id=session_id, ) for log in AgentEventLogger().log(response): log.print() ``` server logs: ``` INFO 2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [769725] INFO: Waiting for application startup. INFO 2025-07-04 22:18:30,390 __main__:158 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) INFO 2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to 20:18:52.194 [START] /v1/vector-dbs INFO: 192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK 20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms) 20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert INFO 2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for all-MiniLM-L6-v2... WARNING 2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed INFO 2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0 INFO 2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO: 192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK 20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms) 20:19:01.839 [START] /v1/agents INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK 20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms) 20:19:01.844 [START] /v1/tools INFO 2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with base_url=http://192.168.1.183:8080/v1 20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms) 20:19:01.868 [START] /v1/agents/{agent_id}/session 20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms) 20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn 20:19:01.885 [START] inference 20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms) INFO 2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.538 [START] tool_execution 20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms) 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.935 [START] inference 20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms) 20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms) ``` - [x] No regressions in functionality - [x] Configuration properly accepts kvstore settings --------- Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-07-09 10:08:14 +02:00
Francisco Arceo	83c89265e0	chore: Adding unit tests for Milvus and OpenAI compatibility (#2640 ) Some checks failed Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 36s Details Test Llama Stack Build / build-single-provider (push) Failing after 36s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? - Enabling Unit tests for Milvus to start to test OpenAI compatibility and fixing a few bugs. - Also fixed an inconsistency in the Milvus config between remote and inline. - Added pymilvus to extras for testing in CI I'm going to refactor this later to include the other inline providers so that we can catch issues sooner. I have another PR where I've been testing to find other bugs in the implementation (and required changes drafted here: https://github.com/meta-llama/llama-stack/pull/2617). ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-08 00:50:16 -07:00
Sébastien Han	ea966565f6	feat: improve telemetry (#2590 ) Some checks failed Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 0s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s Details Python Package Build Test / build (3.12) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? * Use a single env variable to setup OTEL endpoint * Update telemetry provider doc * Update general telemetry doc with the metric with generate * Left a script to setup telemetry for testing Closes: https://github.com/meta-llama/llama-stack/issues/783 Note to reviewer: the `setup_telemetry.sh` script was useful for me, it was nicely generated by AI, if we don't want it in the repo, and I can delete it, and I would understand. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 17:29:09 +02:00
Sébastien Han	c4349f532b	feat: consolidate most distros into "starter" (#2516 ) # What does this PR do? * Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: https://github.com/meta-llama/llama-stack/issues/2502. ~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama to work properly in the CI.~ TODO: - [ ] We can only update `install.sh` when we get a new release. - [x] Update providers documentation - [ ] Update notebooks to reference starter instead of ollama Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 15:58:03 +02:00
ehhuang	3c43a2f529	fix: store configs (#2593 ) # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo, as the config expected a str but the value was converted to int. This PR: 1. Updates the type of port in sqlstore to be int 2. template generation uses `dict` instead of `StackRunConfig` so as to avoid failing pydantic typechecks. 3. Adds `replace_env_vars` to StackRunConfig instantiation in `configure.py` (not sure why this wasn't needed before). ## Test Plan `llama stack build --template postgres_demo --image-type conda --run`	2025-07-03 10:07:23 -07:00
Wen Zhou	040424acf5	docs: update full list of providers with matched APIs and dockerhub images (#2452 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - add model_type in example - change "Memory" to "VectorIO" as column name - update index.md and README.md <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> run pre-commit to catch changes. --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-07-03 10:12:56 +02:00
Sébastien Han	25268854bc	fix: allow default empty vars for conditionals (#2570 ) # What does this PR do? We were not using conditionals correctly, conditionals can only be used when the env variable is set, so `${env.ENVIRONMENT:+}` would return None is ENVIRONMENT is not set. If you want to create a conditional value, you need to do `${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set, otherwise will return None. Closes: https://github.com/meta-llama/llama-stack/issues/2564 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-01 14:42:05 +02:00
Nathan Weinberg	faaeccc6fd	docs: update external provider guide and navigation (#2567 ) Some checks failed Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 25s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 31s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 28s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 14s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? The external providers guide can now be accessed directly from the sidebar ## Test Plan Build locally to test the changes Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-01 09:42:32 +02:00
Francisco Arceo	5785ccda35	fix: Fixing Milvus sample config and updating documentation (#2568 )	2025-06-30 19:25:23 -07:00
Sébastien Han	c9a49a80e8	docs: auto generated documentation for providers (#2543 ) # What does this PR do? Simple approach to get some provider pages in the docs. Add or update description fields in the provider configuration class using Pydantic’s Field, ensuring these descriptions are clear and complete, as they will be used to auto-generate provider documentation via ./scripts/distro_codegen.py instead of editing the docs manually. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-30 15:13:20 +02:00
Varsha	2e8054bede	feat: Implement hybrid search in SQLite-vec (#2312 ) Some checks failed Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s Details Test Llama Stack Build / generate-matrix (push) Successful in 37s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.11) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.10) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? Add support for hybrid search mode in SQLite-vec provider, which combines keyword and vector search for better results. The implementation: - Adds hybrid search mode as a new option alongside vector and keyword search - Implements query_hybrid method in SQLiteVecIndex that: - First performs keyword search to get candidate matches - Then applies vector similarity search on those candidates - Updates documentation to reflect the new search mode This change improves search quality by leveraging both semantic similarity and keyword matching, while maintaining backward compatibility with existing vector and keyword search modes. ## Test Plan ``` pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) =============================================================================================== test session starts =============================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 10 items tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-13 15:54:06 -04:00
Charlie Doern	a7ecc92be1	docs: add post training to providers list (#2280 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (http, agents) (push) Failing after 13s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m0s Details # What does this PR do? the providers list is missing post_training. Add that column and `HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers. also point to these providers in docs/source/providers/index.md, and describe basic functionality There are other missing provider types here as well, but starting with this Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-28 09:32:00 -04:00
Varsha	e92301f2d7	feat(sqlite-vec): enable keyword search for sqlite-vec (#1439 ) # What does this PR do? This PR introduces support for keyword based FTS5 search with BM25 relevance scoring. It makes changes to the existing EmbeddingIndex base class in order to support a search_mode and query_str parameter, that can be used for keyword based search implementations. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan run ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` Output: ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ====================================================== test session starts ======================================================= platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0 asyncio: mode=auto, asyncio_default_fixture_loop_scope=None collected 7 items llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED ``` For reference, with the implementation, the fts table looks like below: ``` Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0 Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0 Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0 Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0 Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0 Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0 Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0 Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0 Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0 Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0 Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1 Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1 Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1 Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1 Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1 ``` Query results: Result 1: Sentence 5 from document 0 Result 2: Sentence 5 from document 1 Result 3: Sentence 5 from document 2 [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-05-21 15:24:24 -04:00
Charlie Doern	e46de23be6	feat: refactor external providers dir (#2049 ) # What does this PR do? currently the "default" dir for external providers is `/etc/llama-stack/providers.d` This dir is not used anywhere nor created. Switch to a more friendly `~/.llama/providers.d/` This allows external providers to actually create this dir and/or populate it upon installation, `pip` cannot create directories in `etc`. If a user does not specify a dir, default to this one see https://github.com/containers/ramalama-stack/issues/36 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-15 20:17:03 +02:00
Ihar Hrachyshka	1de0dfaab5	docs: Clarify kfp provider is both inline and remote (#2144 ) The provider selling point is using the same provider for both. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-14 09:37:07 +02:00
Divya	3022f7b642	feat: Adding TLS support for Remote::Milvus vector_io (#2011 ) # What does this PR do? For the Issue :- #[2010](https://github.com/meta-llama/llama-stack/issues/2010) Currently, if we try to connect the Llama stack server to a remote Milvus instance that has TLS enabled, the connection fails because TLS support is not implemented in the Llama stack codebase. As a result, users are unable to use secured Milvus deployments out of the box. After adding this , the user will be able to connect to remote::Milvus which is TLS enabled . if TLS enabled :- ``` vector_io: - provider_id: milvus provider_type: remote::milvus config: uri: "http://<host>:<port>" token: "<user>:<password>" secure: True server_pem_path: "path/to/server.pem" ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I have already tested it by connecting to a Milvus instance which is TLS enabled and i was able to start llama stack server .	2025-05-06 14:15:34 +02:00
Christina Xu	65cc971877	docs: Add TrustyAI LM-Eval to list of known external providers (#2020 ) # What does this PR do? Adds documentation for the remote [TrustyAI LM-Eval Eval Provider](https://github.com/trustyai-explainability/llama-stack-provider-lmeval). LM-Eval is a service for large language model evaluation based on the open source project [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and is integrated into the [TrustyAI Kubernetes Operator](https://trustyai-explainability.github.io/trustyai-site/main/trustyai-operator.html).	2025-05-06 14:11:55 +02:00
Sébastien Han	a5d151e912	docs: fix typo mivus.md -> milvus.md (#2102 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-05 09:48:38 -07:00
Ihar Hrachyshka	16e163da0e	docs: List external kubeflow pipelines provider prototype (#2100 ) # What does this PR do? Lists another external provider example (kfp). Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-05 10:24:52 +02:00
Charlie Doern	a673697858	chore: rename ramalama provider (#2008 ) # What does this PR do? the ramalama team has decided to rename their external provider `ramalama-stack` (more catchy!). Update docs accordingly Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-04-24 09:34:15 +02:00
Nathan Weinberg	6a44e7ba20	docs: add API to external providers table (#2006 ) Also does a minor reorg of the columns Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-23 15:58:10 +02:00
Nathan Weinberg	d6e88e0bc6	docs: add RamaLama to list of known external providers (#2004 ) The RamaLama project now has an external provider offering for Llama Stack: https://github.com/containers/llama-stack-provider-ramalama See also: https://github.com/meta-llama/llama-stack/pull/1676 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-23 09:44:18 +02:00
Francisco Arceo	49955a06b1	docs: Update quickstart page to structure things a little more for the novices (#1873 ) # What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `<h3>` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 <img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM" src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 14:09:00 -07:00
Sébastien Han	389767010b	feat: ability to execute external providers (#1672 ) # What does this PR do? Providers that live outside of the llama-stack codebase are now supported. A new property `external_providers_dir` has been added to the main config and can be configured as follow: ``` external_providers_dir: /etc/llama-stack/providers.d/ ``` Where the expected structure is: ``` providers.d/ inference/ custom_ollama.yaml vllm.yaml vector_io/ qdrant.yaml ``` Where `custom_ollama.yaml` is: ``` adapter: adapter_type: custom_ollama pip_packages: ["ollama", "aiohttp"] config_class: llama_stack_ollama_provider.config.OllamaImplConfig module: llama_stack_ollama_provider api_dependencies: [] optional_api_dependencies: [] ``` Obviously the package must be installed on the system, here is the `llama_stack_ollama_provider` example: ``` $ uv pip show llama-stack-ollama-provider Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv Name: llama-stack-ollama-provider Version: 0.1.0 Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider Requires: Required-by: ``` Closes: https://github.com/meta-llama/llama-stack/issues/658 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 10:30:41 +02:00
Francisco Arceo	37b6da37ba	docs: Document sqlite-vec faiss comparison (#1821 ) # What does this PR do? This PR documents and benchmarks the performance tradeoffs between sqlite-vec and FAISS inline VectorDB providers. # Closes https://github.com/meta-llama/llama-stack/issues/1165 ## Test Plan The test was run using this script: <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python import cProfile import os import uuid import time import random import string import matplotlib.pyplot as plt import pandas as pd from termcolor import cprint from llama_stack_client.types import Document from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from memory_profiler import profile from line_profiler import LineProfiler os.environ["INFERENCE_MODEL"] = "llama3.2:3b-instruct-fp16" os.environ["LLAMA_STACK_CONFIG"] = "ollama" def generate_random_chars(count=400): return ''.join(random.choices(string.ascii_letters, k=count)) def generate_documents(num_docs: int, num_chars: int): documents = [ Document( document_id=f"doc-{i}", content=f"Document content for document {i} - {generate_random_chars(count=num_chars)}", mime_type="text/plain", metadata={}, ) for i in range(num_docs) ] return documents @profile def benchmark_write(client, vector_db_id, documents, batch_size=100): write_times = [] for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] start_time = time.time() client.tool_runtime.rag_tool.insert( documents=batch, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) end_time = time.time() write_times.append(end_time - start_time) return write_times @profile def benchmark_read(client, provider_id, vector_db_id, user_prompts): response_times = [] for prompt in user_prompts: start_time = time.time() response = client.vector_io.query( vector_db_id=vector_db_id, query=prompt, ) end_time = time.time() response_times.append(end_time - start_time) return response_times def profile_functions(): profiler = LineProfiler() profiler.add_function(benchmark_write) profiler.add_function(benchmark_read) return profiler def plot_results(output, batch_size): # Create a DataFrame for easy manipulation df_sqlite = pd.DataFrame(output['sqlite-vec']) df_faiss = pd.DataFrame(output['faiss']) df_sqlite['write_times'] = 1000 df_faiss['write_times'] = 1000 avg_write_sqlite = df_sqlite['write_times'].mean() avg_write_faiss = df_faiss['write_times'].mean() avg_read_sqlite = df_sqlite['read_times'].mean() avg_read_faiss = df_faiss['read_times'].mean() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['write_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Write Times') plt.hist(df_faiss['write_times'], bins=10, alpha=0.5, color='red', label='faiss Write Times') plt.axvline(avg_write_sqlite, color='blue', linestyle='--', label=f'Average Write Time (sqlite-vec): {avg_write_sqlite:.3f} ms') plt.axvline(avg_write_faiss, color='red', linestyle='--', label=f'Average Write Time (faiss): {avg_write_faiss:.3f} ms') plt.title(f'Histogram of Write Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]} with batch size = {batch_size}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('write_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.plot(df_sqlite.index, df_sqlite['write_times'], marker='o', markersize=4, linestyle='-', color='blue', label='sqlite-vec Write Times') plt.plot(df_faiss.index, df_faiss['write_times'], marker='x', markersize=4, linestyle='-', color='red', label='faiss Write Times') plt.title(f'Write Times by Operation Sequence\n(batch size = {batch_size})') plt.xlabel('Write Operation Sequence') plt.ylabel('Time (milliseconds)') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() plt.savefig('write_time_sequence.png') plt.close() # Print out the summary table print("\nPerformance Summary for sqlite-vec:") print(df_sqlite) # Print out the summary table print("\nPerformance Summary for faiss:") print(df_faiss) def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() # Generate a large dataset num_chars = 50 num_docs = 100 num_writes = 100 write_batch_size = 100 num_reads = 100 documents = generate_documents(num_docs * write_batch_size, num_chars) user_prompts = [ f"Tell me about document {i}" for i in range(1, num_reads + 1) ] providers = ["sqlite-vec", "faiss"] output = { provider_id: {"write_times": None, "read_times": None} for provider_id in providers } # Benchmark writes and reads for SQLite and Faiss for provider_id in providers: cprint(f"Benchmarking provider: {provider_id}", "yellow") client.vector_dbs.register( provider_id=provider_id, vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) write_times = benchmark_write(client, vector_db_id, documents, write_batch_size) average_write_time_ms = sum(write_times) / len(write_times) * 1000. cprint(f"Average write time for {provider_id} is {average_write_time_ms:.2f} milliseconds for {num_writes} runs", "blue") cprint(f"Benchmarking reads for provider: {provider_id}", "yellow") read_times = benchmark_read(client, provider_id, vector_db_id, user_prompts) average_read_time_ms = sum(read_times) / len(read_times) * 1000. cprint(f"Average read time for {provider_id} is {average_read_time_ms:.2f} milliseconds for {num_reads} runs", "blue") client.vector_dbs.unregister(vector_db_id=vector_db_id) output[provider_id]['write_times'] = write_times output[provider_id]['read_times'] = read_times # Generate plots and summary plot_results(output, write_batch_size) if __name__ == "__main__": cProfile.run('main()', 'profile_output.prof') ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 17:41:33 +01:00
Daniele Martinoli	cca9bd6cc3	feat: Qdrant inline provider (#1273 ) # What does this PR do? Removed local execution option from the remote Qdrant provider and introduced an explicit inline provider for the embedded execution. Updated the ollama template to include this option: this part can be reverted in case we don't want to have two default `vector_io` providers. (Closes #1082) ## Test Plan Build and run an ollama distro: ```bash llama stack build --template ollama --image-type conda llama stack run --image-type conda ollama ``` Run one of the sample ingestionapplicatinos like [rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py), but replace this line: ```py selected_vector_provider = vector_providers[0] ``` with the following, to use the `qdrant` provider: ```py selected_vector_provider = vector_providers[1] ``` After running the test code, verify the timestamp of the Qdrant store: ```bash % ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_* total 784 -rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite ``` [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-03-18 14:04:21 -07:00

1 2

54 commits