Commit graph

1464 commits

Author SHA1 Message Date
Matthew Farrellee
44263ce954 remove unused CreateBatchRequest, update completion_window to be literal "24h" 2025-08-14 09:25:22 -04:00
Matthew Farrellee
95a3ecdffc Merge branch 'main' into add-batches 2025-08-13 07:33:41 -04:00
Matthew Farrellee
04a73c89ef add notes about batches development status to docs
this also captures other notes from agents, eval and inference apis
2025-08-13 07:26:40 -04:00
Ashwin Bharambe
3d90117891
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers.

I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`

## Test Plan

Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
   --text-model openai/gpt-4o \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
  -k "client_with_models"
```

Do the same with `-k openai_client`

The rest should be taken care of by CI.
2025-08-12 16:15:53 -07:00
Ashwin Bharambe
1721aafc1f
feat(responses): type file results properly (#3117)
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m16s
Another thing our tests implicitly depended on.
2025-08-12 10:39:09 -07:00
Ashwin Bharambe
4fec49dfdb
feat(responses): add include parameter (#3115)
Well our Responses tests use it so we better include it in the API, no?

I discovered it because I want to make sure `llama-stack-client` can be
used always instead of `openai-python` as the client (we do want to be
_truly_ compatible.)
2025-08-12 10:24:01 -07:00
Matthew Farrellee
b70e2f1f09
fix(dep): update to openai >= 1.99.6 and use new Function location (#3087)
# What does this PR do?

closes #3072 

## Test Plan

ci
2025-08-12 08:40:32 -07:00
Mustafa Elbehery
4a13ef45e9
fix: Implement missing run_moderation method in PromptGuardSafetyImpl (#3101)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR addresses an issue where `PromptGuardSafetyImpl` was an
incomplete implementation of an abstract class. The class was missing
the required run_moderation method from its parent interface.


Currently, running `pre-commit` locally fails with the error below.

```
llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation"  [abstract]
Found 1 error in 1 file (checked 410 source files)
```

This PR fixes the issue as follows

- Added the missing run_moderation method to PromptGuardSafetyImpl
- Method raises NotImplementedError with appropriate message indicating
this functionality is not implemented for PromptGuard
- This allows the class to be properly instantiated while clearly
indicating the limitation

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-12 08:32:52 -07:00
Nathan Weinberg
19123ca957
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
2025-08-12 04:20:39 -06:00
Ashwin Bharambe
803114180b
chore(logging)!: use comma as a delimiter (#3095)
Some checks failed
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Integration Tests (Replay) / discover-tests (push) Successful in 31s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 32s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m24s
Test Llama Stack Build / build (push) Failing after 54s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 13s
Using commas is much more shell-friendly. A semi-colon is a statement
delimiter and must be escaped.

This change is backwards incompatible but I imagine not many people are
using this. I could be wrong. Looking for feedback.
2025-08-11 11:51:43 -07:00
Mustafa Elbehery
b5b5f5b9ae
chore: add mypy prompt guard (#2678)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-11 08:40:40 -07:00
Francisco Arceo
7448a4a88c
chore: Updating UI Sidebar (#3081)
# What does this PR do?
This updates the sidebar to look a little more like other popular ones.

<img width="1913" height="1352" alt="Screenshot 2025-08-08 at 11 25
31 PM"
src="https://github.com/user-attachments/assets/00738412-1101-48ec-8864-cde4a8733ec1"
/>

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 07:39:52 -07:00
Matthew Farrellee
8faff92591
chore: remove redundant code in unregister_toolgroup (#3092)
# What does this PR do?

removes redundant code

## Test Plan

ci
2025-08-11 07:38:54 -07:00
Eran Cohen
a4bad6c0b4
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration

<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-11 08:22:04 -04:00
Varsha
69dc789e15
docs: Add unsupported search mode info about FAISS (#3089) 2025-08-10 17:34:34 -06:00
Varsha
ce72a28525
docs: Update doc on search modes for Milvus (#3078)
# What does this PR do?
Update Milvus doc on using search modes. 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-10 18:48:36 -04:00
Vlastimil Eliáš
1677d6bffd
feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m41s
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider

Closes #3046 

## Test Plan
I was not able to locate any existing test for this provider, so I
performed manual testing. But the change is really trivial and
straightforward.
2025-08-08 13:48:15 -07:00
ehhuang
0b5a794c27
fix: telemetry logger spams when queue is full (#3070)
# What does this PR do?


## Test Plan
Ran a stress test on chat completion endpoint locally:

For 10 concurrent users over 3 minutes:
Before:
<img width="1440" height="201" alt="image"
src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6"
/>

After:
<img width="1434" height="204" alt="image"
src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866"
/>

(Will send scripts in a future PR.)
2025-08-08 13:47:36 -07:00
Francisco Arceo
9b70bb9d4b
feat(ui): Adding Vector Store Files to Admin UI (#3041)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 57s
Unit Tests / unit-tests (3.12) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 2m10s
# What does this PR do?
This PR updates the UI to create new:
1. `/files/{file_id}` 
2. `files/{file_id}/contents`
3. `files/{file_id}/contents/{content_id}` 

The list of files are clickable which brings the user to the FIles
Detail page
The File Details page shows all of the content
The content details page shows the individual chunk/content parsed 

These only use our existing OpenAI compatible APIs. I have a separate
branch where I expose the embedding and the portal is correctly
populated. I included the FE rendering code for that in this PR.

1. `vector-stores/{vector_store_id}/files/{file_id}` 
<img width="1913" height="1351" alt="Screenshot 2025-08-06 at 10 20
12 PM"
src="https://github.com/user-attachments/assets/08010d5e-60c8-4bd9-9f3e-a2731ed1ad55"
/>

2. `vector-stores/{vector_store_id}/files/{file_id}/contents`
<img width="1920" height="1272" alt="Screenshot 2025-08-06 at 10 21
23 PM"
src="https://github.com/user-attachments/assets/3b91e67b-5d64-4fe6-91b6-18f14587e850"
/>

3.
`vector-stores/{vector_store_id}/files/{file_id}/contents/{content_id}`
<img width="1916" height="1273" alt="Screenshot 2025-08-06 at 10 21
45 PM"
src="https://github.com/user-attachments/assets/d38ca996-e8d9-460c-9e39-7ff0cb5ec0dd"
/>

## Test Plan
I tested this locally and reviewed the code. I generated a significant
share of the code with Claude and some manual intervention. After this,
I'll begin adding tests to the UI.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-08 07:44:06 -07:00
Matthew Farrellee
8e678912ec feat: add batches API with OpenAI compatibility
Add complete batches API implementation with protocol, providers, and tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
2025-08-08 08:08:08 -04:00
Jiayi Ni
9e78f2da96
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
2025-08-08 11:27:55 +02:00
Varsha
e3928e6a29
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-07 09:42:03 +02:00
slekkala1
26d3d25c87
feat: Add moderations create api (#3020)
# What does this PR do?
This PR adds Open AI Compatible moderations api. Currently only
implementing for llama guard safety provider
Image support, expand to other safety providers and Deprecation of
run_shield will be next steps.


## Test Plan
Added 2 new tests for safe/ unsafe text prompt examples for the new open
ai compatible moderations api usage
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
(Had some issue with previous PR
https://github.com/meta-llama/llama-stack/pull/2994 while updating and
accidentally close it , reopened new one )
2025-08-06 13:51:23 -07:00
Charlie Doern
0caef40e0d
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do?

I found a few issues while adding new metrics for various APIs:

currently metrics are only propagated in `chat_completion` and
`completion`

since most providers use the `openai_..` routes as the default in
`llama-stack-client inference chat-completion`, metrics are currently
not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new `openai_` versions of the metric gathering functions which
use `.usage` from the `OpenAI..` response types to gather the metrics
which are already populated.
3. define a `stream_generator` which counts the tokens and computes the
metrics (only for stream=True)
5. add metrics to response


NOTE: I could not add metrics to `openai_completion` where stream=True
because that ONLY returns an `OpenAICompletion` not an AsyncGenerator
that we can manipulate.


acquire the lock, and add event to the span as the other `_log_...`
methods do

some new output:

`llama-stack-client inference chat-completion --message hi`

<img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM"
src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326"
/>


and in the client:

<img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM"
src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007"
/>

these were not previously being recorded nor were they being printed to
the server due to the improper console sink handling

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 13:37:40 -07:00
Ashwin Bharambe
c252dfa3ef
fix(ci): allow tests to skip llama stack client instantiation (#3052)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m16s
Test Llama Stack Build / build (push) Failing after 8s
2025-08-06 11:15:41 -07:00
Nathan Weinberg
e9fced773a
refactor: introduce common 'ResourceNotFoundError' exception (#3032)
# What does this PR do?
1. Introduce new base custom exception class `ResourceNotFoundError`
2. All other "not found" exception classes now inherit from
`ResourceNotFoundError`

Closes #3030

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-06 10:22:55 -07:00
ehhuang
3e695cf320
chore: update postgres_demo with new config (#3045)
# What does this PR do?

closes https://github.com/meta-llama/llama-stack/issues/3044

## Test Plan
matches starter's template
2025-08-06 07:48:40 -07:00
Ashwin Bharambe
7f834339ba
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
2025-08-05 14:55:05 -07:00
IAN MILLER
e12524af85
feat: create unregister shield API endpoint in Llama Stack (#2853)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Integration Tests (Replay) / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2581 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

First of, test API for shields
1. Install and start Ollama:

`ollama serve`


2. Pull Llama Guard Model in Ollama:

`ollama pull llama-guard3:8b`

3. Configure env variables:

```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```

4. Build Llama Stack distro:

`llama stack build --template starter --image-type venv  `

5. Start Llama Stack server:

`llama stack run starter --port 8321`

6. Check if Ollama model is available:

`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`

7. Register a new Shield using Ollama provider:

```
curl -X POST http://localhost:8321/v1/shields \
 -H "Content-Type: application/json" \
 -d '{
   "shield_id": "test-shield",
   "provider_id": "llama-guard",
   "provider_shield_id": "ollama/llama-guard3:8b",
   "params": {}
 }'
```

`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

8. Check if shield was registered:

`curl -X GET http://localhost:8321/v1/shields/test-shield`


`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

9. Run shield:

```
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "test-shield",
    "messages": [
      {
        "role": "user",
        "content": "How can I hack into someone computer?"
      }
    ],
    "params": {}
  }'
```

`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `

10. Unregister shield:

`curl -X DELETE http://localhost:8321/v1/shields/test-shield`

`null% `

11. Verify shield was deleted:

`curl -X GET http://localhost:8321/v1/shields/test-shield`

`{"detail":"Invalid value: Shield 'test-shield' not found"}%`

All tests passed 

```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
  loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html

```
2025-08-05 07:33:46 -07:00
github-actions[bot]
e565b91182 build: Bump version to 0.2.17
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m38s
2025-08-05 01:43:30 +00:00
Francisco Arceo
eac1e0c7d4
chore: Fixing Markdown renderer (#3038) 2025-08-04 14:16:09 -07:00
Nathan Weinberg
68b0071861
chore: standardize session not found error (#3031)
# What does this PR do?
1. Creates a new `SessionNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 13:12:02 -07:00
Nathan Weinberg
05cfa213b6
chore: standardize tool group not found error (#2986)
# What does this PR do?
1. Creates a new `ToolGroupNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 11:41:33 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
Eran Cohen
e5b542dd8e
feat: switch to async completion in LiteLLM OpenAI mixin (#3029)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 17s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Test External API and Providers / test-external (venv) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m10s
2025-08-03 12:08:56 -07:00
Varsha
3c2aee610d
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`

Closes: https://github.com/meta-llama/llama-stack/issues/3018

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:57:03 -07:00
IAN MILLER
a749d5f4a4
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:52:59 -07:00
ehhuang
6ac710f3b0
fix(recording): endpoint resolution (#3013)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Unit Tests / unit-tests (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 56s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 1m49s
# What does this PR do?


## Test Plan
2025-08-01 16:23:54 -07:00
Matthew Farrellee
140ee7d337
fix: sambanova inference provider (#2996)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

closes #2995 

update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin

## Test Plan

```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
2025-08-01 09:09:14 -07:00
Varsha
1f0766308d
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #2463 


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-01 00:41:34 -04:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Matthew Farrellee
218c89fff1
feat: Add clear error message when API key is missing (#2992)
# What does this PR do?

Improve user experience by providing specific guidance when no API key
is available, showing both provider data header and config options with
the correct field name for each provider.

Also adds comprehensive test coverage for API key resolution scenarios.

addresses #2990 for providers using litellm openai mixin

## Test Plan

`./scripts/unit-tests.sh
tests/unit/providers/inference/test_litellm_openai_mixin.py`
2025-07-31 16:33:16 -04:00
Nathan Weinberg
ffb6306fbd
fix: remove redundant code from unregister_vector_db (#2983)
get_vector_db() will raise an exception if a vector store won't be
returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-31 09:22:04 -07:00
Nehanth Narendrula
3a574ef23c
fix: remove unused DPO parameters from schema and tests (#2988)
# What does this PR do?

I removed these DPO parameters from the schema in [this
PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not
have done it correctly, since they were reintroduced in [this
commit](cb7354a9ce (diff-4e9a8cb358213d6118c4b6ec2a76d0367af06441bf0717e13a775ade75e2061dR15081))—likely
due to a pre-commit hook.

I've made the changes again, and the pre-commit hook automatically
updated the spec sheet.
2025-07-31 09:11:08 -07:00
Nehanth Narendrula
cf73146132
feat: Enable DPO training with HuggingFace inline provider (#2825)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 31s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s
Test Llama Stack Build / build (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s
Pre-commit / pre-commit (push) Successful in 1m50s
What does this PR do?

This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.

Test Plan

Added integration test:

tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize

Ran tests on both CPU and CUDA environments

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 23:33:36 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Francisco Arceo
f3d5459647
feat(UI): adding MVP playground UI (#2828)
# What does this PR do?
I've been tinkering a little with a simple chat playground in the UI, so
I'm opening the PR with what's kind of a WIP.

If you look at the first commit, that includes the big part of the
changes. The rest of the files changed come from adding installing the
`shadcn` components.

Note this is missing a lot; e.g.,
- sessions
- document upload
- audio (the shadcn components install these by default from
https://shadcn-chatbot-kit.vercel.app/docs/components/chat)

I still need to wire up a lot more to make it actually fully functional
but it does basic chat using the LS Typescript Client.

Basic demo: 

<img width="1329" height="1430" alt="Image"
src="https://github.com/user-attachments/assets/917a2096-36d4-4925-b83b-f1f2cda98698"
/>

<img width="1319" height="1424" alt="Image"
src="https://github.com/user-attachments/assets/fab1583b-1c72-4bf3-baf2-405aee13c6bb"
/>


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-30 19:44:16 -07:00
Nathan Weinberg
406ca72957
fix: remove redundant code from unregister_dataset (#2971)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 53s
Update ReadTheDocs / update-readthedocs (push) Failing after 52s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s
Pre-commit / pre-commit (push) Successful in 1m44s
get_dataset() will raise an exception if a dataset won't be returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 16:40:01 -07:00
Sai Prashanth S
cb7354a9ce
docs: Add detailed docstrings to API models and update OpenAPI spec (#2889)
This PR focuses on improving the developer experience by adding
comprehensive docstrings to the API data models across the Llama Stack.
These docstrings provide detailed explanations for each model and its
fields, making the API easier to understand and use.

**Key changes:**
- **Added Docstrings:** Added reST formatted docstrings to Pydantic
models in the `llama_stack/apis/` directory. This includes models for:
  - Agents (`agents.py`)
  - Benchmarks (`benchmarks.py`)
  - Datasets (`datasets.py`)
  - Inference (`inference.py`)
  - And many other API modules.
- **OpenAPI Spec Update:** Regenerated the OpenAPI specification
(`docs/_static/llama-stack-spec.yaml` and
`docs/_static/llama-stack-spec.html`) to include the new docstrings.
This will be reflected in the API documentation, providing richer
information to users.

**Impact:**
- Developers using the Llama Stack API will have a better understanding
of the data structures.
- The auto-generated API documentation is now more informative.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 16:32:59 -07:00
Nathan Weinberg
cd5c6a2fcd
chore: standardize vector store not found error (#2968)
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 15:19:16 -07:00