Commit graph

421 commits

Author SHA1 Message Date
Francisco Arceo
f7adf58b1b
docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093)
# What does this PR do?

- Adds documentation on how to contribute a Vector DB provider.
- Updates the testing section to be a little friendlier to navigate.
- Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K`
trigger search


<img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10
12 AM"
src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29"
/>

<img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10
25 AM"
src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 11:11:09 -07:00
Eran Cohen
a4bad6c0b4
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration

<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-11 08:22:04 -04:00
Varsha
69dc789e15
docs: Add unsupported search mode info about FAISS (#3089) 2025-08-10 17:34:34 -06:00
Varsha
ce72a28525
docs: Update doc on search modes for Milvus (#3078)
# What does this PR do?
Update Milvus doc on using search modes. 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-10 18:48:36 -04:00
Jiayi Ni
9e78f2da96
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
2025-08-08 11:27:55 +02:00
Dean Wampler
342550c1e2
docs: Added comment about a known limitation of AgentEventLogger (#2930)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Pre-commit / pre-commit (push) Successful in 1m11s
# What does this PR do?
`AgentEventLogger` only supports streaming responses, so I suggest
adding a comment near the bottom of `demo_script.py` letting the user
know this, e.g., if they change the `stream` value to `False` in the
call to `create_turn`, they need to comment out the logging lines.

See https://github.com/llamastack/llama-stack-client-python/issues/15 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Dean Wampler <dean.wampler@ibm.com>
2025-08-07 10:09:57 -07:00
IAN MILLER
8ba04205ac
docs: remove pure venv references (#3047)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Remove pure venv (without uv) references in docs

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-06 10:42:34 -07:00
Ashwin Bharambe
dfce05d0c5
fix(docs): update llama stack build CLI doc (#3050) 2025-08-06 09:32:09 -07:00
Ashwin Bharambe
7f834339ba
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
2025-08-05 14:55:05 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
IAN MILLER
a749d5f4a4
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:52:59 -07:00
Varsha
1f0766308d
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #2463 


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-01 00:41:34 -04:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Kelly Brown
8a6c0fb930
docs: Reformat external provider documentation (#2982)
**Description** 
This PR adjusts the external providers documentation to align with the
new providers format. Splits up sections into the existing external
providers and how to create them as well.

<img width="1049" height="478" alt="Screenshot 2025-07-31 at 9 48 26 AM"
src="https://github.com/user-attachments/assets/f13599cb-2fd1-4e57-8ca9-27b067264e33"
/>

Open to feedback and adjusting titles
2025-07-31 09:21:13 -07:00
Nehanth Narendrula
cf73146132
feat: Enable DPO training with HuggingFace inline provider (#2825)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 31s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s
Test Llama Stack Build / build (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s
Pre-commit / pre-commit (push) Successful in 1m50s
What does this PR do?

This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.

Test Plan

Added integration test:

tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize

Ran tests on both CPU and CUDA environments

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 23:33:36 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Kelly Brown
026caa5551
docs: part 1 - fix warnings in documentation generation (#2861)
**Description**
This PR removes some of the warnings when uv builds the docs
- Errors appear when generating docs about .md files not appearing in
toctree. ~~Adding content to the `providers-gen.py ` file that adds `---
orphan: true ---` to to each file.~~. Added a toctree generator to the
`providers-gen.py` file, this gets rid of the errors in the builds.
- Deletes the `_openai_compat` files, extension of PR #2849
- Adds the `files` APIs section to the `providers` toctree on the index
page
- Manually adds the `--- orphan: true ---` to the advanced apis. Ill try
to find a way to modify the providers code gen so it automatically adds
it, but this fixes the errors.
- Adds the `testing.md` to the `contributing` toctree
- Adds `starting_llama_stack_server.md` to `distributions` toctree

There are some other warnings im still looking at but this PR gets rid
of most of the toctree errors
Theres also an issue with the actual distribution-codegen that I can
investigate in another PR. Opened a bug for it here #2873
2025-07-30 10:50:10 -07:00
ehhuang
38d5c44354
chore: fix k8s config (#2959)
# What does this PR do?


## Test Plan
deployed to EKS
2025-07-30 10:11:59 -07:00
Ashwin Bharambe
fd2aaf4978
fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)
We tried to always keep Ollama enabled. However doing so makes the
provider implementation half-assed -- should it error when it cannot
connect to Ollama or not? What happens during periodic model refresh?
Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to
conditionally enable the provider.

## Test Plan

Run `uv run llama stack build --template starter --image-type venv
--run` with and without `OLLAMA_URL` set. Verify using
`llama-stack-client provider list` that ollama is correctly enabled.
2025-07-30 10:11:17 -07:00
Mark Campbell
19c90d9bfc
docs: update using llama stack as library docs (#2931)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Coverage Badge / unit-tests (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 42s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m58s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Updates provider template from outdated `ollama` to `starter` 
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes: #2839 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-28 15:35:26 -07:00
Matthew Farrellee
60bb5e307e
feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919)
# What does this PR do?

- Add base_url field to OpenAIConfig with default
"https://api.openai.com/v1"
- Update sample_run_config to support OPENAI_BASE_URL environment
variable
- Modify get_base_url() to return configured base_url instead of
hardcoded value
- Add comprehensive test suite covering:
  - Default base URL behavior
  - Custom base URL from config
  - Environment variable override
  - Config precedence over environment variables
  - Client initialization with configured URL
  - Model availability checks using configured URL

This enables users to configure custom OpenAI-compatible API endpoints
via environment variables or configuration files.

Closes #2910 

## Test Plan

run unit tests
2025-07-28 10:16:02 -07:00
Charlie Doern
b1c21a25ec
docs: remove provider_id from external docs (#2922)
# What does this PR do?

external provider docs mention setting provider_id in the build yaml.
Since we changed that to just be provider_type and module, remove
instances of provider_id

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-28 10:14:39 -07:00
Ashwin Bharambe
9583f468f8
feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
Charlie Doern
de6919ecdd
refactor: install external providers from module (#2637)
# What does this PR do?

Today, external providers are installed via the `external_providers_dir`
in the config. This necessitates users to understand the `ProviderSpec`
and set up their directories accordingly. This process splits up the
config for the stack across multiple files, directories, and formats.

Most (if not all) external providers today have a
[get_provider_spec](559cb18fbb/src/ramalama_stack/provider.py (L9))
method that sits unused. Utilizing this method rather than the
providers.d route allows for a much easier installation process for
external providers and limits the amount of extra configuration a
regular user has to do to get their stack off the ground.

To accomplish this and wire it throughout the build process, Introduce
the concept of a `module` for users to specify for an external provider
upon build time. In order to facilitate this, align the build and run
spec to use `Provider` class rather than the stringified provider_type
that build currently uses.

For example, say this is in your build config:

```
- provider_id: ramalama
  provider_type: remote::ramalama
  module: ramalama_stack
```

during build (in the various `build_...` scripts), additionally to
installing any pip dependencies we will also install this module and use
the `get_provider_spec` method to retrieve the ProviderSpec that is
currently specified using `providers.d`.

In production so far, providing instructions for installing external
providers for users has been difficult: they need to install the module
as a pre-req, create the providers.d directory, copy in the provider
spec, and also copy in the necessary build/run yaml files. Accessing an
external provider should be as easy as possible, and pointing to its
installable module aligns more with the rest of our build and dependency
management process.

For now, `external_providers_dir` still exists as an alternate more
declarative method of using external providers.

## Test Plan

added an integration test installing an external provider from module
and more unit test coverage for `get_provider_registry`


( the warning in yellow is expected, the module is installed inside of
the build env, not where we are running the command)
<img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30
48 AM"
src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d"
/>

<img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31
14 AM"
src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 15:41:26 +02:00
ehhuang
21bae296f2
feat(auth): API access control (#2822)
# What does this PR do?
- Added ability to specify `required_scope` when declaring an API. This
is part of the `@webmethod` decorator.
- If auth is enabled, a user can access an API only if
`user.attributes['scope']` includes the `required_scope`
- We add `required_scope='telemetry.read'` to the telemetry read APIs.

## Test Plan
CI with added tests

1. Enable server.auth with github token
2. Observe `client.telemetry.query_traces()` returns 403
2025-07-24 15:30:48 -07:00
Sébastien Han
632cf9eb72
feat: Bring Your Own API (BYOA) (#2228)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Installer CI / lint (push) Failing after 3s
Integration Tests / discover-tests (push) Successful in 3s
Installer CI / smoke-test-on-dev (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?

Prototype on a new feature to allow new APIs to be plugged in Llama
Stack. Opened for early feedback on the approach and test appetite on
the functionality.

@ashwinb @raghotham open for early feedback, thanks!

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-24 13:41:14 -07:00
Kelly Brown
abade761e0
docs: Update nvidia docs template (#2893)
**Description**

Fixes generation issue in nvidia code gen file.

Closes #2873
2025-07-24 22:11:34 +02:00
Ashwin Bharambe
1463b79218
feat(registry): make the Stack query providers for model listing (#2862)
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
2025-07-24 10:39:53 -07:00
Charlie Doern
d4f0b430e2
docs: update list of apis (#2697)
# What does this PR do?

apis.md had a few APIs missing and incorrectly described APIs

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-24 09:50:14 -07:00
Christian Zaccaria
7f7b990b80
docs: Document use cases for Responses and Agents APIs (#2756)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This pull request adds documentation to clarify the differences between
the Agents API and the OpenAI Responses API, including use cases for
each. It also updates the index page to reference the new documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2368
2025-07-24 12:20:04 -04:00
Sarthak Deshpande
cd8715d327
chore: Added openai compatible vector io endpoints for chromadb (#2489)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
This PR implements the openai compatible endpoints for chromadb

Closes #2462 

## Test Plan
Ran ollama llama stack server and ran the command
`pytest -sv --stack-config=http://localhost:8321
tests/integration/vector_io/test_openai_vector_stores.py
--embedding-model all-MiniLM-L6-v2`
8 failed, 27 passed, 8 skipped, 1 xfailed
The failed ones are regarding files api

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
Co-authored-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-23 13:51:58 -07:00
Matthew Farrellee
e1ed152779
chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.

use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter

## Test Plan

existing unit and integration tests
2025-07-23 06:49:40 -04:00
grs
fc67ad408a
chore: add some documentation for access policy rules (#2785)
# What does this PR do?

Adds some documentation on setting explicit access_policy rules in
config.
2025-07-23 10:27:27 +02:00
ehhuang
8e1a2b4703
chore: remove *_openai_compat providers (#2849)
# What does this PR do?
These are no longer needed as llama-stack-evals can run against OAI
endpoints directly.

## Test Plan
2025-07-22 10:25:36 -07:00
Jeremy Bonghwan Choi
b5a6ecc331
docs: minor fix of the pgvector provider spec description (#2847)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Integration Tests / test-matrix (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

minor update of the pgvector doc, changing 'faiss' to 'pgvector'

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-21 22:10:35 -07:00
Francisco Arceo
2bc96613f9
chore: Adding demo script and importing it into the docs (#2848)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Coverage Badge / unit-tests (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m0s
Test Llama Stack Build / build (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
This PR adds the quickstart as a file to the docs so that it can be more
easily maintained and run, as mentioned in
https://github.com/meta-llama/llama-stack/pull/2800.

## Test Plan
I could add this as a test in the CI but I wasn't sure if we wanted to
add additional jobs there. 😅

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-21 22:53:32 -04:00
Ashwin Bharambe
199f859eec
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
2025-07-18 15:53:09 -07:00
Ashwin Bharambe
ade075152e
chore: kill inline::vllm (#2824)
Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
2025-07-18 15:52:18 -07:00
Ashwin Bharambe
68a2dfbad7
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.

_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._

## Test Plan

Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:

```
LLAMA_STACK_LOGGING=all=debug \
  ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```

See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
2025-07-18 12:20:36 -07:00
Nathan Weinberg
2bb9039173
docs: fix steps in the Quick Start Guide (#2800)
# What does this PR do?
'build' command didn't take into account ENABLE flags for starter distro

for some reason, I was having issues with HuggingFace access for the
embedding model, so added a tip for that as well

Closes #2779

## Test Plan
I ran the described steps manually, but it would be nice if someone else
could try it and verify this still works

We might consider having some CI job ensure the QSG remains functional -
it's not a great experience for new users if they try Llama Stack for
the first time and it doesn't work as we describe

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:08:46 -07:00
Nathan Weinberg
1785a6b39c
docs: add virtualenv instructions for running starter distro (#2780)
# What does this PR do?
we had directions for a container and conda but not venv

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:07:43 -07:00
ehhuang
51b179e1c5
chore: update k8s template (#2786)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 50s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s
Unit Tests / unit-tests (3.13) (push) Failing after 54s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m40s
# What does this PR do?
- enables auth
- updates to use distribution-starter docker

## Test Plan
bash apply.sh
2025-07-16 15:07:26 -07:00
Nathan Weinberg
919ee3199b
docs: add missing bold title to match others (#2782)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 18:05:48 +02:00
Kelly Brown
b096794959
docs: Reorganize documentation on the webpage (#2651)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
Reorganizes the Llama stack webpage into more concise index pages,
introduce more of a workflow, and reduce repetition of content.

New nav structure so far based on #2637 

Further discussions in
https://github.com/meta-llama/llama-stack/discussions/2585

**Preview:**
![Screenshot 2025-07-09 at 2 31
53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af)

You can also build a full local preview locally 

 **Feedback**
Looking for feedback on page titles and general feedback on the new
structure

**Follow up documentation**
I plan on reducing some sections and standardizing some terminology in a
follow up PR.
More discussions on that in
https://github.com/meta-llama/llama-stack/discussions/2585
2025-07-15 14:19:35 -07:00
Mark Campbell
65fcd03461
docs: update outdated llama stack client documentation (#2758)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds new documentation that was missing for the Llama Stack Python
Client as well as updates old/outdated docs
2025-07-15 11:49:59 -07:00
Varsha
4ae5656c2f
feat: Implement keyword search in milvus (#2231)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m9s
# What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.

In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```

You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List

from llama_stack_client import (
    Agent, 
    AgentEventLogger, 
    LlamaStackClient, 
    RAGDocument
)


class MilvusRAGDemo:
    def __init__(self, base_url: str = "http://localhost:8321/"):
        self.client = LlamaStackClient(base_url=base_url)
        self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
        self.model_id = None
        self.embedding_model_id = None
        self.embedding_dimension = None
        
    def setup_models(self):
        """Get available models and select appropriate ones for LLM and embeddings."""
        models = self.client.models.list()
    
        # Select embedding model
        embedding_models = [m for m in models if m.model_type == "embedding"]
        if not embedding_models:
            raise ValueError("No embedding models found")
        self.embedding_model_id = embedding_models[0].identifier
        self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
        
    def register_vector_db(self):
        print(f"Registering Milvus vector database: {self.vector_db_id}")
        
        response = self.client.vector_dbs.register(
            vector_db_id=self.vector_db_id,
            embedding_model=self.embedding_model_id,
            embedding_dimension=self.embedding_dimension,
            provider_id="milvus-remote",  # Use remote Milvus
        )
        print(f"Vector database registered successfully")
        return response
        
    def insert_documents(self):
        """Insert sample documents into the vector database."""
        print("\nInserting sample documents...")
        
        # Sample documents about different topics
        documents = [
            RAGDocument(
                document_id="ai_ml_basics",
                content="""
                Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
                AI refers to the simulation of human intelligence in machines, while ML is a subset
                of AI that enables computers to learn and improve from experience without being
                explicitly programmed. Deep learning, a subset of ML, uses neural networks with
                multiple layers to process complex patterns in data.
                
                Key concepts in AI/ML include:
                - Supervised Learning: Training with labeled data
                - Unsupervised Learning: Finding patterns in unlabeled data
                - Reinforcement Learning: Learning through trial and error
                - Neural Networks: Computing systems inspired by biological brains
                """,
                mime_type="text/plain",
                metadata={"topic": "technology", "category": "ai_ml"},
            ),
        ]
        
        # Insert documents with chunking
        self.client.tool_runtime.rag_tool.insert(
            documents=documents,
            vector_db_id=self.vector_db_id,
            chunk_size_in_tokens=200,  # Smaller chunks for better granularity
        )
        print(f"Inserted {len(documents)} documents with chunking")
                
    def test_keyword_search(self):
        """Test keyword-based search using BM25."""
        
        queries = [
            "neural networks",
            "Python frameworks",
            "data cleaning",
        ]
        
        for query in queries:
            response = self.client.vector_io.query(
                vector_db_id=self.vector_db_id,
                query=query,
                params={
                    "mode": "keyword",  # Keyword search
                    "max_chunks": 3,
                    "score_threshold": 0.0,
                }
            )
            
            for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
                print(f"  {i+1}. Score: {score:.4f}")
                print(f"     Content: {chunk.content[:100]}...")
                print(f"     Metadata: {chunk.metadata}")    

                
    def run_demo(self):       
        try:
            self.setup_models()
            self.register_vector_db()
            self.insert_documents()
            self.test_keyword_search()
        except Exception as e:
            print(f"Error during demo: {e}")
            raise


def main():
    """Main function to run the demo."""
    # Check if Llama Stack server is running
    demo = MilvusRAGDemo()    
    try:
        demo.run_demo()
    except Exception as e:
        print(f"Demo failed: {e}")

if __name__ == "__main__":
    main()
```

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-07-14 19:39:55 -04:00
Francisco Arceo
33f0d83ad3
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748) 2025-07-14 18:10:35 -04:00
ehhuang
aa0840c281
docs: fix building distro link (#2750)
# What does this PR do?


## Test Plan

Co-authored-by: raghotham <rsm@meta.com>
2025-07-14 12:06:56 -07:00
Sumanth Kamenani
77d2c8e95d
docs: clarify run.yaml files are starting points for customization (#2746)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Integration Tests / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Update ReadTheDocs / update-readthedocs (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
This PR improves documentation clarity around run.yaml file usage. It
adds comprehensive guidance to help users understand that generated
run.yaml files are templates meant to be customized for production use,
not used as-is.

## Changes
- Add new documentation section on customizing run.yaml files
- Clarify that generated run.yaml files are templates, not production
configs
- Add guidance on customization best practices and common scenarios  
- Update existing documentation to reference customization guide
- Improve clarity around run.yaml file usage for better user experience

## Test Plan
- Verified new documentation file exists at correct location
- Confirmed documentation is properly integrated into the toctree
structure
- Checked all internal links use correct paths and reference existing
files
- Validated references are added to relevant existing documentation
files
- Documentation build testing will be handled by CI environment
2025-07-14 09:53:13 -07:00
ehhuang
4cf1952c32
chore: update vllm k8s command to support tool calling (#2717)
# What does this PR do?


## Test Plan
2025-07-10 14:40:17 -07:00