Commit graph

324 commits

Author SHA1 Message Date
Nathan Weinberg
19123ca957
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
2025-08-12 04:20:39 -06:00
Eran Cohen
a4bad6c0b4
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration

<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-11 08:22:04 -04:00
Vlastimil Eliáš
1677d6bffd
feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m41s
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider

Closes #3046 

## Test Plan
I was not able to locate any existing test for this provider, so I
performed manual testing. But the change is really trivial and
straightforward.
2025-08-08 13:48:15 -07:00
Jiayi Ni
9e78f2da96
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
2025-08-08 11:27:55 +02:00
Varsha
e3928e6a29
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-07 09:42:03 +02:00
Ashwin Bharambe
7f834339ba
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
2025-08-05 14:55:05 -07:00
IAN MILLER
e12524af85
feat: create unregister shield API endpoint in Llama Stack (#2853)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Integration Tests (Replay) / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2581 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

First of, test API for shields
1. Install and start Ollama:

`ollama serve`


2. Pull Llama Guard Model in Ollama:

`ollama pull llama-guard3:8b`

3. Configure env variables:

```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```

4. Build Llama Stack distro:

`llama stack build --template starter --image-type venv  `

5. Start Llama Stack server:

`llama stack run starter --port 8321`

6. Check if Ollama model is available:

`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`

7. Register a new Shield using Ollama provider:

```
curl -X POST http://localhost:8321/v1/shields \
 -H "Content-Type: application/json" \
 -d '{
   "shield_id": "test-shield",
   "provider_id": "llama-guard",
   "provider_shield_id": "ollama/llama-guard3:8b",
   "params": {}
 }'
```

`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

8. Check if shield was registered:

`curl -X GET http://localhost:8321/v1/shields/test-shield`


`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

9. Run shield:

```
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "test-shield",
    "messages": [
      {
        "role": "user",
        "content": "How can I hack into someone computer?"
      }
    ],
    "params": {}
  }'
```

`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `

10. Unregister shield:

`curl -X DELETE http://localhost:8321/v1/shields/test-shield`

`null% `

11. Verify shield was deleted:

`curl -X GET http://localhost:8321/v1/shields/test-shield`

`{"detail":"Invalid value: Shield 'test-shield' not found"}%`

All tests passed 

```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
  loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html

```
2025-08-05 07:33:46 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
Varsha
3c2aee610d
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`

Closes: https://github.com/meta-llama/llama-stack/issues/3018

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:57:03 -07:00
IAN MILLER
a749d5f4a4
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:52:59 -07:00
Matthew Farrellee
140ee7d337
fix: sambanova inference provider (#2996)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

closes #2995 

update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin

## Test Plan

```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
2025-08-01 09:09:14 -07:00
Varsha
1f0766308d
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #2463 


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-01 00:41:34 -04:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Nathan Weinberg
cd5c6a2fcd
chore: standardize vector store not found error (#2968)
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 15:19:16 -07:00
Matthew Farrellee
60bb5e307e
feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919)
# What does this PR do?

- Add base_url field to OpenAIConfig with default
"https://api.openai.com/v1"
- Update sample_run_config to support OPENAI_BASE_URL environment
variable
- Modify get_base_url() to return configured base_url instead of
hardcoded value
- Add comprehensive test suite covering:
  - Default base URL behavior
  - Custom base URL from config
  - Environment variable override
  - Config precedence over environment variables
  - Client initialization with configured URL
  - Model availability checks using configured URL

This enables users to configure custom OpenAI-compatible API endpoints
via environment variables or configuration files.

Closes #2910 

## Test Plan

run unit tests
2025-07-28 10:16:02 -07:00
Matthew Farrellee
3c40c8e583
fix: litellm_provider_name for llama-api (#2934)
litellm uses "meta_llama" for the provider name, see
https://docs.litellm.ai/docs/providers/meta_llama ad
https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py#L833
2025-07-28 10:02:16 -07:00
Ashwin Bharambe
9583f468f8
feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
Derek Higgins
52201612de
feat: implement chunk deletion for vector stores (#2701)
Add support for deleting individual chunks from vector stores

- Add abstract remove_chunk() method to EmbeddingIndex base class
- Implement chunk deletion for Faiss provider, SQLite Vec, Milvus,
PGVector
- Placeholder implementations with NotImplementedError for
Chroma/Qdrant/Weaviate
- Integrate chunk deletion into OpenAI vector store file deletion flow
- removed xfail from
test_openai_vector_store_delete_file_removes_from_vector_store

Closes: #2477

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-25 10:30:30 -04:00
Francisco Arceo
9e77be1f72
chore: Fix chroma unit tests (#2896)
# What does this PR do?
Enable Chroma inline unit tests and fix integration tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-25 10:12:14 -04:00
Ashwin Bharambe
1463b79218
feat(registry): make the Stack query providers for model listing (#2862)
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
2025-07-24 10:39:53 -07:00
Matthew Farrellee
e33a50480d
fix: starter template and litellm backward compat conflict for openai (#2885)
# What does this PR do?

openai/models.py has backward compat entries for litellm model names.
the starter template includes these in the list of registered models.
the inclusion results in duplicate model registrations.

the backward compat is no longer necessary.

## Test Plan

ci
2025-07-24 17:28:37 +02:00
Sarthak Deshpande
cd8715d327
chore: Added openai compatible vector io endpoints for chromadb (#2489)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
This PR implements the openai compatible endpoints for chromadb

Closes #2462 

## Test Plan
Ran ollama llama stack server and ran the command
`pytest -sv --stack-config=http://localhost:8321
tests/integration/vector_io/test_openai_vector_stores.py
--embedding-model all-MiniLM-L6-v2`
8 failed, 27 passed, 8 skipped, 1 xfailed
The failed ones are regarding files api

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
Co-authored-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-23 13:51:58 -07:00
Francisco Arceo
2aba2c1236
chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863)
# What does this PR do?
Moving vector store and vector store files helper methods to
`openai_vector_store_mixin.py`

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
The tests are already supported in the CI and tests the inline providers
and current integration tests.

Note that the `vector_index` fixture will be test `milvus_vec_adapter`,
`faiss_vec_adapter`, and `sqlite_vec_adapter` in
`tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`.

Additionally, the integration tests in `integration-vector-io-tests.yml`
runs `tests/integration/vector_io` tests for the following providers:
```python
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"]
```

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-23 13:35:48 -04:00
Matthew Farrellee
e1ed152779
chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.

use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter

## Test Plan

existing unit and integration tests
2025-07-23 06:49:40 -04:00
ehhuang
8e1a2b4703
chore: remove *_openai_compat providers (#2849)
# What does this PR do?
These are no longer needed as llama-stack-evals can run against OAI
endpoints directly.

## Test Plan
2025-07-22 10:25:36 -07:00
Ashwin Bharambe
199f859eec
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
2025-07-18 15:53:09 -07:00
Ashwin Bharambe
68a2dfbad7
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.

_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._

## Test Plan

Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:

```
LLAMA_STACK_LOGGING=all=debug \
  ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```

See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
2025-07-18 12:20:36 -07:00
Matthew Farrellee
477bcd4d09
feat: allow dynamic model registration for nvidia inference provider (#2726)
# What does this PR do?

let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py

## Test Plan

1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
2025-07-17 12:11:30 -07:00
IAN MILLER
b57db11bed
feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s
Unit Tests / unit-tests (3.13) (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to create a solution that can automatically
detect when new models are added, deprecated, or removed by OpenAI and
Llama API providers, and automatically update the list of supported
models in LLamaStack.

This feature is vitally important in order to avoid missing new models
and editing the entries manually hence I created automation allowing
users to dynamically register:
- any models from OpenAI provider available at 
[https://api.openai.com/v1/models](https://api.openai.com/v1/models)
that are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py)

- any models from Llama API provider available at
[https://api.llama.com/v1/models](https://api.llama.com/v1/models) that
are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2504

this PR is dependant on #2710

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. Create venv at root llamastack directory:
`uv venv .venv --python 3.12 --seed`    
2. Activate venv:
`source .venv/bin/activate`   
3. `uv pip install -e .`
4. Create OpenAI distro modifying run.yaml
5. Build distro:
`llama stack build --template starter --image-type venv`
6. Then run LlamaStack, but before navigate to templates/starter folder:
`llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY>
ENABLE_OPENAI=openai`
7. Then try to register dummy llm that doesn't exist in OpenAI provider:
` llama-stack-client models register ianm/ianllm
--provider-model-id=ianllm --provider-id=openai `
 
You should receive this output - combined list of static config +
fetched available models from OpenAI:
 
<img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50"
src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883"
/>

8. Then register real llm from OpenAI:
llama-stack-client models register openai/gpt-4-turbo-preview
--provider-model-id=gpt-4-turbo-preview --provider-id=openai

<img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02"
src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0"
/>
<img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11"
src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee"
/>

We correctly fetched all available models from OpenAI

As for Llama API, as a non-US person I don't have access to Llama API
Key but I joined wait list. The implementation for Llama is the same as
for OpenAI since Llama is openai compatible. So, the response from GET
endpoint has the same structure as OpenAI
https://llama.developer.meta.com/docs/api/models
2025-07-16 12:49:38 -04:00
Matthew Farrellee
a3e249807b
chore: remove vision model URL workarounds and simplify client creation (#2775)
The vision models are now available at the standard URL, so the
workaround code has been removed. This also simplifies the codebase by
eliminating the need for per-model client caching.

- Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct
models
- Convert _get_client method to _client property for cleaner API
- Remove unnecessary lru_cache decorator and functools import
- Simplify client creation logic to use single base URL for all models
2025-07-16 07:10:04 -07:00
Francisco Arceo
e1755d1ed2
chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755)
# What does this PR do?
Adding OpenAI Vector Stores Files API compatibility for PGVector

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Updated CI to include PGVector

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 15:46:49 -04:00
Francisco Arceo
31b088978a
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do?

Resolves https://github.com/meta-llama/llama-stack/issues/2735

Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).

This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.

Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.

NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.

## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv

LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```

Unit tests and test script below 👇 

<details> 
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>

```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os

# Initialize colorama for color support in terminal
init(autoreset=True)

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2


def colored_print(color, text):
    """Prints text to the console with the specified color."""
    print(f"{color}{text}{Style.RESET_ALL}")


def log_and_print(color, message, level=logging.INFO):
    """Logs a message and prints it to the console with the specified color."""
    logging.log(level, message)
    colored_print(color, message)


def run_tests(client, prefix="openai"):
    """
    Runs all tests using the provided OpenAI client and saves the output
    to JSON files with the given prefix.
    """
    # Create the directory if it doesn't exist
    os.makedirs('openai_testing', exist_ok=True)

    # Default values in case tests fail
    global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
    DEMO_VECTOR_STORE_ID = None
    DEMO_VECTOR_STORE_ID2 = None

    def test_idempotent_vector_store_creation():
        """
        Test that creating a vector store with the same name is idempotent.
        """
        log_and_print(Fore.BLUE, "Starting vector store creation test...")
        try:
            vector_store = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Attempt to create the same vector store again
            vector_store2 = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Check instead of assert
            if vector_store2.id != vector_store.id:
                log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)

            global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
            DEMO_VECTOR_STORE_ID = vector_store.id
            DEMO_VECTOR_STORE_ID2 = vector_store2.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
        except Exception as e:
            log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            # Create a fallback vector store ID if needed
            if 'vector_store' in locals() and vector_store:
                DEMO_VECTOR_STORE_ID = vector_store.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2

    def test_vector_store_list():
        """
        Test listing vector stores.
        """
        log_and_print(Fore.BLUE, "Starting vector store list test...")
        try:
            vector_stores = client.vector_stores.list()

            # Check instead of assert
            if not isinstance(vector_stores, pagination.SyncCursorPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Vector store list test passed!")

            vector_stores_data = vector_stores.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
                json.dump(vector_stores_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_retrieve_vector_store():
        """
        Test retrieving a specific vector store.
        """
        log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            vector_store = client.vector_stores.retrieve(
                vector_store_id=DEMO_VECTOR_STORE_ID,
            )

            # Check instead of assert
            if vector_store.id != DEMO_VECTOR_STORE_ID:
                log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Retrieve vector store test passed!")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_modify_vector_store():
        """
        Test modifying a vector store.
        """
        log_and_print(Fore.BLUE, "Starting modify vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            updated_vector_store = client.vector_stores.update(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                name="Updated Support FAQ FJA",
            )

            # Check instead of assert
            if updated_vector_store.name != "Updated Support FAQ FJA":
                log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Modify vector store test passed!")

            updated_vector_store_data = updated_vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
                json.dump(updated_vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_delete_vector_store():
        """
        Test deleting a vector store.
        """
        log_and_print(Fore.BLUE, "Starting delete vector store test...")
        if not DEMO_VECTOR_STORE_ID2:
            log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
                          level=logging.WARNING)
            return

        try:
            response = client.vector_stores.delete(
                vector_store_id=DEMO_VECTOR_STORE_ID2,
            )

            log_and_print(Fore.GREEN, "Delete vector store test passed!")

            response_data = response.to_dict()
            log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
                json.dump(response_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_create_vector_store_file():
        log_and_print(Fore.BLUE, "Starting create vector store file test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            # create jsonl of files as an example
            with open("mydata.jsonl", "w") as f:
                f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')

            # Create a simple text file if my_data_small.txt doesn't exist
            if not os.path.exists("my_data_small.txt"):
                with open("my_data_small.txt", "w") as f:
                    f.write("This is a test file for vector store testing.\n")

            created_file = client.files.create(
                file=open("my_data_small.txt", "rb"),
                purpose="assistants",
            )

            created_file_data = created_file.to_dict()
            log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
                json.dump(created_file_data, f, indent=2)

            retrieved_files = client.files.retrieve(created_file.id)
            retrieved_files_data = retrieved_files.to_dict()
            log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
                json.dump(retrieved_files_data, f, indent=2)

            vector_store_file = client.vector_stores.files.create(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                file_id=created_file.id,
            )
            log_and_print(Fore.GREEN, "Create vector store file test passed!")
        except Exception as e:
            log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_search_vector_store():
        """
        Test searching a vector store.
        """
        log_and_print(Fore.BLUE, "Starting search vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            query = "What is the banana policy?"
            search_results = client.vector_stores.search(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                query=query,
                max_num_results=10,
                ranking_options={
                    'ranker': 'default-2024-11-15',
                    'score_threshold': 0.0,
                },
                rewrite_query=False,
            )

            # Check instead of assert
            if not isinstance(search_results, pagination.SyncPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Search vector store test passed!")

            search_results_dict = search_results.to_dict()
            log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
            with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
                json.dump(search_results_dict, f, indent=2)

            log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
        except Exception as e:
            log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    # Run all tests in sequence, even if some fail
    test_results = []

    try:
        result = test_idempotent_vector_store_creation()
        if result and len(result) == 2:
            DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
        test_results.append(True)
    except Exception as e:
        log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
        test_results.append(False)

    for test_func in [
        test_vector_store_list,
        test_retrieve_vector_store,
        test_modify_vector_store,
        test_delete_vector_store,
        test_create_vector_store_file,
        test_search_vector_store
    ]:
        try:
            test_func()
            test_results.append(True)
        except Exception as e:
            log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            test_results.append(False)

    if all(test_results):
        log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
    else:
        failed_count = test_results.count(False)
        log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
    parser.add_argument(
        "--provider",
        type=str,
        default="llama",
        choices=["openai", "llama", "both"],
        help="Specify which environment to test: openai, llama, or both. Default is both.",
    )
    args = parser.parse_args()

    try:
        if args.provider in ("openai", "both"):
            openai_client = OpenAI()
            run_tests(openai_client, prefix="openai")

        if args.provider in ("llama", "both"):
            llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
            run_tests(llama_client, prefix="llama")

        log_and_print(Fore.GREEN, "All tests completed!")

    except Exception as e:
        log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 11:24:41 -04:00
Varsha
4ae5656c2f
feat: Implement keyword search in milvus (#2231)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m9s
# What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.

In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```

You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List

from llama_stack_client import (
    Agent, 
    AgentEventLogger, 
    LlamaStackClient, 
    RAGDocument
)


class MilvusRAGDemo:
    def __init__(self, base_url: str = "http://localhost:8321/"):
        self.client = LlamaStackClient(base_url=base_url)
        self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
        self.model_id = None
        self.embedding_model_id = None
        self.embedding_dimension = None
        
    def setup_models(self):
        """Get available models and select appropriate ones for LLM and embeddings."""
        models = self.client.models.list()
    
        # Select embedding model
        embedding_models = [m for m in models if m.model_type == "embedding"]
        if not embedding_models:
            raise ValueError("No embedding models found")
        self.embedding_model_id = embedding_models[0].identifier
        self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
        
    def register_vector_db(self):
        print(f"Registering Milvus vector database: {self.vector_db_id}")
        
        response = self.client.vector_dbs.register(
            vector_db_id=self.vector_db_id,
            embedding_model=self.embedding_model_id,
            embedding_dimension=self.embedding_dimension,
            provider_id="milvus-remote",  # Use remote Milvus
        )
        print(f"Vector database registered successfully")
        return response
        
    def insert_documents(self):
        """Insert sample documents into the vector database."""
        print("\nInserting sample documents...")
        
        # Sample documents about different topics
        documents = [
            RAGDocument(
                document_id="ai_ml_basics",
                content="""
                Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
                AI refers to the simulation of human intelligence in machines, while ML is a subset
                of AI that enables computers to learn and improve from experience without being
                explicitly programmed. Deep learning, a subset of ML, uses neural networks with
                multiple layers to process complex patterns in data.
                
                Key concepts in AI/ML include:
                - Supervised Learning: Training with labeled data
                - Unsupervised Learning: Finding patterns in unlabeled data
                - Reinforcement Learning: Learning through trial and error
                - Neural Networks: Computing systems inspired by biological brains
                """,
                mime_type="text/plain",
                metadata={"topic": "technology", "category": "ai_ml"},
            ),
        ]
        
        # Insert documents with chunking
        self.client.tool_runtime.rag_tool.insert(
            documents=documents,
            vector_db_id=self.vector_db_id,
            chunk_size_in_tokens=200,  # Smaller chunks for better granularity
        )
        print(f"Inserted {len(documents)} documents with chunking")
                
    def test_keyword_search(self):
        """Test keyword-based search using BM25."""
        
        queries = [
            "neural networks",
            "Python frameworks",
            "data cleaning",
        ]
        
        for query in queries:
            response = self.client.vector_io.query(
                vector_db_id=self.vector_db_id,
                query=query,
                params={
                    "mode": "keyword",  # Keyword search
                    "max_chunks": 3,
                    "score_threshold": 0.0,
                }
            )
            
            for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
                print(f"  {i+1}. Score: {score:.4f}")
                print(f"     Content: {chunk.content[:100]}...")
                print(f"     Metadata: {chunk.metadata}")    

                
    def run_demo(self):       
        try:
            self.setup_models()
            self.register_vector_db()
            self.insert_documents()
            self.test_keyword_search()
        except Exception as e:
            print(f"Error during demo: {e}")
            raise


def main():
    """Main function to run the demo."""
    # Check if Llama Stack server is running
    demo = MilvusRAGDemo()    
    try:
        demo.run_demo()
    except Exception as e:
        print(f"Demo failed: {e}")

if __name__ == "__main__":
    main()
```

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-07-14 19:39:55 -04:00
Francisco Arceo
33f0d83ad3
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748) 2025-07-14 18:10:35 -04:00
Hardik Shah
6b8a8c1be9
fix: Safety in starter (#2731)
- fireworks, together do not support Llama-guard 3 8b model anymore 
- Need to default to ollama 
- current safety shields logic was not correct since the shield_id was
the provider ( which had duplicates )
- Followed similar logic to models 

Note: Seems a bit over-engineered but this can now be extended to other
providers and fits in the overall mechanism of how env_vars are used to
manage starter.

### How to test 
```
ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2
```

### Related but not obvious in this PR 
In the llama-stack-ops repo, we run tests before publishing packages and
docker containers.
The actions in that repo were using the fireworks / together distros (
which are non-existent )

So need to update that to run with `starter` and use `ollama`
specifically for safety.
2025-07-14 15:07:40 -07:00
Ben Browning
51d9fd4808
fix: Don't cache clients for passthrough auth providers (#2728)
Some checks failed
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 45s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
# What does this PR do?

Some of our inference providers support passthrough authentication via
`x-llamastack-provider-data` header values. This fixes the providers
that support passthrough auth to not cache their clients to the backend
providers (mostly OpenAI client instances) so that the client connecting
to Llama Stack has to provide those auth values on each and every
request.

## Test Plan

I added some unit tests to ensure we're not caching clients across
requests for all the fixed providers in this PR.

```
uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py
```


I also ran some of our OpenAI compatible API integration tests for each
of the changed providers, just to ensure they still work. Note that
these providers don't actually pass all these tests (for unrelated
reasons due to quirks of the Groq and Together SaaS services), but
enough of the tests passed to confirm the clients are still working as
intended.

### Together

```
ENABLE_TOGETHER="together" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "together/meta-llama/Llama-3.1-8B-Instruct"
```

### OpenAI

```
ENABLE_OPENAI="openai" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "openai/gpt-4o-mini"
```

### Groq

```
ENABLE_GROQ="groq" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "groq/meta-llama/Llama-3.1-8B-Instruct"
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-11 13:38:27 -07:00
Jorge Piedrahita Ortiz
aa2595c7c3
fix: sambanova shields and model validation (#2693)
# What does this PR do?
Update the shield register validation of Sambanova not to raise, but
only warn when a model is not available in the base url endpoint used,
also added warnings when model is not available in the base url endpoint
used

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
run starter distro with Sambanova enabled
2025-07-11 16:29:15 -04:00
Francisco Arceo
6a6b66ae4f
chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665)
# What does this PR do?

This PR refactors and the VectorIO backend logic for `sqlite-vec` and
adds unit tests and fixtures to make it easy to test both `sqlite-vec`
and `milvus`.

Key changes:
- `sqlite-vec` migrated to `kvstore` registry
- added in-memory cache for sqlite-vec to be consistent with `milvus`
- default fixtures moved to `conftest.py` 
- removed redundant tests from sqlite`-vec`
- made `test_vector_io_openai_vector_stores.py` more easily extensible 


## Test Plan
Unit tests added testing inline providers.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-10 14:22:13 -04:00
Sébastien Han
9b7eecebcf
ci: test safety with starter (#2628)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 31s
Python Package Build Test / build (3.13) (push) Failing after 29s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?

We are now testing the safety capability with the starter image. This
includes a few changes:

* Enable the safety integration test
* Relax the shield model requirements from llama-guard to make it work
  with llama-guard3:8b coming from Ollama
* Expose a shield for each inference provider in the starter distro. The
  shield will only be registered if the provider is enabled.

Closes: https://github.com/meta-llama/llama-stack/issues/2528

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-09 16:53:50 +02:00
Sébastien Han
297cd8e0db
fix: runpod transition to python 3.12 (#2682)
# What does this PR do?

I'm not sure how this was missed in the pyupgrade PR. This code seems
broken...

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-09 12:27:42 +02:00
pgustafs
d39660afed
fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630)
- Fix constructor call missing files_api parameter
- Add kvstore field to MilvusVectorIOConfig
- Resolves #2626

# What does this PR do?
[https://github.com/meta-llama/llama-stack/issues/2626]
## Problem
The `MilvusVectorIOAdapter` fails to initialize due to two missing
configuration issues:
1. Missing `files_api` parameter in the constructor call
2. Missing `kvstore` field in the `MilvusVectorIOConfig` class

## Root Cause  
1. The adapter constructor expects 3 parameters `(config, inference_api,
files_api)` but the `get_adapter_impl` function only passes 2 parameters
2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the
adapter's `initialize()` method expects for metadata persistence

## Solution
- Added `files_api = deps.get(Api.files, None)` to safely retrieve files
API from dependencies
- Pass the files_api parameter to MilvusVectorIOAdapter constructor
- Added `kvstore: KVStoreConfig | None = None` field to
MilvusVectorIOConfig
- Maintains backward compatibility since both files_api and kvstore can
be None

Closes #2626

## Test Plan
- [x] Tested with Milvus configuration - server starts successfully 
```yaml
vector_io:
  - provider_id: milvus
    provider_type: remote::milvus
    config:
      uri: http://localhost:19530
      token: root:Milvus
      kvstore:
        type: sqlite
        namespace: null
        db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db
```
- [x] Vector operations work as expected
```python
from llama_stack_client import LlamaStackClient
from llama_stack_client.types.shared_params.document import Document as RAGDocument
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger
import os


endpoint =  os.getenv("LLAMA_STACK_ENDPOINT")
model =  os.getenv("INFERENCE_MODEL")

# Initialize the client
client = LlamaStackClient(base_url=endpoint)

vector_db_id = "my_documents"

response = client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id="milvus",
)

urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"]
documents = [
    RAGDocument(
        document_id=f"num-{i}",
        content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}",
        mime_type="application/pdf",
        metadata={},
    )
    for i, url in enumerate(urls)
]

client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

rag_agent = Agent(
    client,
    model=model,
    # Define instructions for the agent (system prompt)
    instructions="You are a helpful assistant",
    enable_session_persistence=False,
    # Define tools available to the agent
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
            },
        }
    ],
)

session_id = rag_agent.create_session("test-session")

user_prompts = [
    "How to start the AI Inference Server container image? use the knowledge_search tool to get information.",
]

for prompt in user_prompts:
    print(f"User> {prompt}")
    response = rag_agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
    )
    for log in AgentEventLogger().log(response):
        log.print()
```    

server logs:
```
INFO     2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000                                                             
INFO:     Started server process [769725]
INFO:     Waiting for application startup.
INFO     2025-07-04 22:18:30,390 __main__:158 server: Starting up                                                                                     
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
INFO     2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to               
20:18:52.194 [START] /v1/vector-dbs
INFO:     192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK
20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms)
20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert
INFO     2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for            
         all-MiniLM-L6-v2...                                                                                                                          
WARNING  2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed                           
INFO     2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0                         
INFO     2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2   
INFO:     192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK
20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms)
20:19:01.839 [START] /v1/agents
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK
20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms)
20:19:01.844 [START] /v1/tools
INFO     2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with                    
         base_url=http://192.168.1.183:8080/v1                                                                                                        
20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms)
20:19:01.868 [START] /v1/agents/{agent_id}/session
20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms)
20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn
20:19:01.885 [START] inference
20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms)
INFO     2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search  
         with args: {'query': 'How to start the AI Inference Server container image'}                                                                 
20:19:05.538 [START] tool_execution
20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms)
 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'}
20:19:05.935 [START] inference
20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms)
20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms)
```
- [x] No regressions in functionality
- [x] Configuration properly accepts kvstore settings

---------

Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-07-09 10:08:14 +02:00
Francisco Arceo
83c89265e0
chore: Adding unit tests for Milvus and OpenAI compatibility (#2640)
Some checks failed
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
- Enabling Unit tests for Milvus to start to test OpenAI compatibility
and fixing a few bugs.
- Also fixed an inconsistency in the Milvus config between remote and
inline.
- Added pymilvus to extras for testing in CI

I'm going to refactor this later to include the other inline providers
so that we can catch issues sooner.

I have another PR where I've been testing to find other bugs in the
implementation (and required changes drafted here:
https://github.com/meta-llama/llama-stack/pull/2617).

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-08 00:50:16 -07:00
Wen Zhou
4bca4af3e4
refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627)
Some checks failed
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 41s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m30s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- we are using `all-minilm:l6-v2` but the model we download from ollama
is `all-minilm:latest`
  latest: https://ollama.com/library/all-minilm:latest 1b226e2802db
  l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db
- even currently they are exactly the same model but if
[all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is
updated, "latest" might not be the same for l6-v2.
- the only change in this PR is pin the model id in ollama
- also update detailed_tutorial with "starter" to replace deprecated
"ollama".

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
>INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
>llama stack build --run --template ollama --image-type venv
...
Build Successful!
You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml
....
 - metadata:                                                                                                                                  
     embedding_dimension: 384                                                                                                                 
   model_id: all-MiniLM-L6-v2                                                                                                                 
   model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType                                                                 
   - embedding                                                                                                                                
   provider_id: ollama                                                                                                                        
   provider_model_id: all-minilm:l6-v2  
   ...
```
test
```
>llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
           INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"
OpenAIChatCompletion(
    id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7',
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.",
                name=None,
                tool_calls=None,
                refusal=None,
                annotations=None,
                audio=None,
                function_call=None
            ),
            logprobs=None
        )
    ],
    created=1751644429,
    model='llama3.2:3b-instruct-fp16',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_ollama',
    usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None}
)
```

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-07-06 09:07:37 +05:30
Sébastien Han
c4349f532b
feat: consolidate most distros into "starter" (#2516)
# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 15:58:03 +02:00
Francisco Arceo
4afd619c56
chore: Add support for vector-stores files api for Milvus (#2582)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 18s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 51s
Test Llama Stack Build / build-single-provider (push) Failing after 55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 54s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?
### Summary

This pull request implements support for the OpenAI Vector Store Files
API for the Milvus vector store provider in `llama_stack`. It enables
storing, loading, updating, and deleting file metadata and file contents
in Milvus collections, allowing OpenAI vector store files to be managed
directly within Milvus.

### Main Changes

- **Milvus Vector Store Files API Implementation**
- Implements all required methods for storing, loading, updating, and
deleting vector store file metadata and contents
(`_save_openai_vector_store_file`, `_load_openai_vector_store_file`,
`_load_openai_vector_store_file_contents`,
`_update_openai_vector_store_file`,
`_delete_openai_vector_store_file_from_storage`).
- Uses two Milvus collections: `openai_vector_store_files` (for
metadata) and `openai_vector_store_files_contents` (for chunked file
contents).
- Collections are created dynamically if they do not exist, with
appropriate schema definitions.
- **Collection Name Sanitization**
- Adds a `sanitize_collection_name` utility to ensure Milvus collection
names only contain valid characters (letters, numbers, underscores).
- **Testing**
- Updates test skip logic to include `"inline::milvus"` for cases where
the OpenAI Vector Store Files API is not supported, improving
integration test accuracy.
- **Other Improvements**
  - Passes `kvstore` to `MilvusIndex` for consistency.
- Removes obsolete NotImplementedErrors and legacy code for file
storage.

## Test Plan
CI and tested via a test script

## Notes
- `VectorDB` currently uses the `name` as the `identifier` in
`openai_create_vector_store`. We need to add `name` as a field to
`VectorDB` and generate the `identifier` upon creation. OpenAI is not
idempotent with respect to the `name` field that they pass (i.e., you
can pass the same name multiple times and OpenAI will generate a new
identifier). I'll add a follow up PR for this.
- The `Files` api needs to use `files-` as a prefix in the identifier. I
have updated the Vector Store to use the OpenAI prefix `vs_*`.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-03 12:15:33 -07:00
Sébastien Han
25268854bc
fix: allow default empty vars for conditionals (#2570)
# What does this PR do?

We were not using conditionals correctly, conditionals can only be used
when the env variable is set, so `${env.ENVIRONMENT:+}` would return
None is ENVIRONMENT is not set.

If you want to create a conditional value, you need to do
`${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set,
otherwise will return None.

Closes: https://github.com/meta-llama/llama-stack/issues/2564

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-01 14:42:05 +02:00
Matthew Farrellee
13aa367c8a
fix: default api_key from env must be a SecretStr (#2565)
# What does this PR do?

fixes the api_key type when read from env

## Test Plan

run nvidia template w/o api_key in run.yaml and perform inference

before change the inference will fail w/ -

```
  File ".../llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 118, in _get_client_for_base_url
    api_key=(self._config.api_key.get_secret_value() if self._config.api_key else "NO KEY"),
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get_secret_value'
```
2025-06-30 18:08:44 -07:00
Ashwin Bharambe
b333a3c03a
fix(ollama): Download remote image URLs for Ollama (#2551)
Some checks failed
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s
Python Package Build Test / build (3.12) (push) Failing after 43s
Test External Providers / test-external-providers (venv) (push) Failing after 40s
Python Package Build Test / build (3.13) (push) Failing after 42s
Unit Tests / unit-tests (3.13) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Update ReadTheDocs / update-readthedocs (push) Failing after 20s
Pre-commit / pre-commit (push) Successful in 2m13s
## What does this PR do?

Ollama does not support remote images. Only local file paths OR base64
inputs are supported. This PR ensures that the Stack downloads remote
images and passes the base64 down to the inference engine.

## Test Plan

Added a test cases for Responses and ran it for both `fireworks` and
`ollama` providers.
2025-06-30 20:36:11 +05:30
Sébastien Han
c9a49a80e8
docs: auto generated documentation for providers (#2543)
# What does this PR do?

Simple approach to get some provider pages in the docs.

Add or update description fields in the provider configuration class
using Pydantic’s Field, ensuring these descriptions are clear and
complete, as they will be used to auto-generate provider documentation
via ./scripts/distro_codegen.py instead of editing the docs manually.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-30 15:13:20 +02:00