mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 17:11:12 +00:00 
			
		
		
		
	
	
		
			86 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | 968fc132d3 | fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932) **What:** - Added OpenAIChatCompletionTextOnlyMessageContent type for text-only content validation - Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam, OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only content type instead of mixed content - OpenAIUserMessageParam unchanged - still accepts both text and images - Updated OpenAPI spec files to reflect text-only content restrictions in schemas closes #2894 **Why:** - Enforces OpenAI API compatibility by restricting image content to user messages only - Prevents API misuse where images might be sent in message types that don't support them - Aligns with OpenAI's actual API behavior where only user messages can contain multimodal content - Improves type safety and validation at the API boundary **Test plan:** - Added comprehensive parametrized tests covering all 5 OpenAI message types - Tests verify text string acceptance for all message types - Tests verify text list acceptance for all message types - Tests verify image rejection for system/assistant/developer/tool messages (ValidationError expected) - Tests verify user messages still accept images (backward compatibility maintained) | ||
|  | 60bb5e307e | feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919) # What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests | ||
|  | c48dcafc77 | fix: Fix unit tests CI and failing tests (#2928) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Added `set -e` to the beginning of the unit test script to ensure the script exits on failure and correctly fails the CI when tests do not pass. - Fixed all unit tests that were silently failing in the CI. - Fixed Python 3.13 unit test CI failing silently. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2877 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> - **Previously:** Unit tests passing in CI eventhough it failed 11 tests -> [CI-run]( | ||
|  | 9e77be1f72 | chore: Fix chroma unit tests (#2896) # What does this PR do? Enable Chroma inline unit tests and fix integration tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 6ab5760a1b | chore(test): migrate unit tests from unittest to pytest nvidia test safety (#2793) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | cd8715d327 | chore: Added openai compatible vector io endpoints for chromadb (#2489) 
		
			Some checks failed
		
		
	 Integration Tests / discover-tests (push) Successful in 3s Coverage Badge / unit-tests (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Test Llama Stack Build / generate-matrix (push) Successful in 3s Python Package Build Test / build (3.13) (push) Failing after 2s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Python Package Build Test / build (3.12) (push) Failing after 12s Test External Providers / test-external-providers (venv) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 10s Test Llama Stack Build / build-single-provider (push) Failing after 15s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Unit Tests / unit-tests (3.13) (push) Failing after 14s Test Llama Stack Build / build (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Unit Tests / unit-tests (3.12) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s Integration Tests / test-matrix (push) Failing after 53s Pre-commit / pre-commit (push) Successful in 1m42s # What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> | ||
|  | e1ed152779 | chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Integration Tests / discover-tests (push) Successful in 7s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Python Package Build Test / build (3.13) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Unit Tests / unit-tests (3.13) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Integration Tests / test-matrix (push) Failing after 18s Pre-commit / pre-commit (push) Successful in 1m14s # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests | ||
|  | 9e6860b9cf | fix: remove @pytest.mark.asyncio from test_get_raw_document_text.py (#2840) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The pre-commit workflow was failing in the main branch and removing `@pytest.mark.asyncio `from `test_get_raw_document_text.py` fixed that. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> | ||
|  | 89c49eb003 | feat: Allow application/yaml as mime_type (#2575) # What does this PR do? Allow application/yaml as mime_type for documents. ## Test Plan Added unit tests. | ||
|  | fe6af7dc8b | chore(test): migrate unit tests from unittest to pytest nvidia test f… (#2794) 
		
			Some checks failed
		
		
	 Integration Tests / discover-tests (push) Successful in 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Test Llama Stack Build / generate-matrix (push) Successful in 10s Python Package Build Test / build (3.13) (push) Failing after 11s Test Llama Stack Build / build-single-provider (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Test External Providers / test-external-providers (venv) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Integration Tests / test-matrix (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s Unit Tests / unit-tests (3.13) (push) Failing after 17s Test Llama Stack Build / build (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Unit Tests / unit-tests (3.12) (push) Failing after 29s Python Package Build Test / build (3.12) (push) Failing after 1m46s Update ReadTheDocs / update-readthedocs (push) Failing after 1m44s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m51s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m53s Pre-commit / pre-commit (push) Successful in 3m17s This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 3cdf748a8e | chore(test): migrate unit tests from unittest to pytest for nvidia datastore (#2790) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 55713abe7d | chore(test): migrate unit tests from unittest to pytest nvidia test p… (#2792) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | d7cc38e934 | fix: remove async test markers (fix pre-commit) (#2808) # What does this PR do? some async test markers are in the codebase causing pre-commit to fail due to #2744 remove these pytest fixtures ## Test Plan pre-commit passes Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | 477bcd4d09 | feat: allow dynamic model registration for nvidia inference provider (#2726) # What does this PR do? let's users register models available at https://integrate.api.nvidia.com/v1/models that isn't already in llama_stack/providers/remote/inference/nvidia/models.py ## Test Plan 1. run the nvidia distro 2. register a model from https://integrate.api.nvidia.com/v1/models that isn't already know, as of this writing nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example 3. perform inference w/ the model | ||
|  | 30be1fd8b7 | fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770) (#2771) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2770. It replaces characters in SQLite table names that are not alphanumeric or underscores with underscores and quotes the table names with square brackets in SQL statements. Closes #[2770] ## Test Plan I added a ".123" suffix to the bank_id on the following line ``` index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123") ``` in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the fix in place, demonstrates the issue. | ||
|  | a3e249807b | chore: remove vision model URL workarounds and simplify client creation (#2775) The vision models are now available at the standard URL, so the workaround code has been removed. This also simplifies the codebase by eliminating the need for per-model client caching. - Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct models - Convert _get_client method to _client property for cleaner API - Remove unnecessary lru_cache decorator and functools import - Simplify client creation logic to use single base URL for all models | ||
|  | 4ae5656c2f | feat: Implement keyword search in milvus (#2231) 
		
			Some checks failed
		
		
	 SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Integration Tests / discover-tests (push) Successful in 8s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Test Llama Stack Build / generate-matrix (push) Successful in 8s Python Package Build Test / build (3.13) (push) Failing after 6s Unit Tests / unit-tests (3.12) (push) Failing after 6s Unit Tests / unit-tests (3.13) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Test External Providers / test-external-providers (venv) (push) Failing after 9s Test Llama Stack Build / build-single-provider (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Integration Tests / test-matrix (push) Failing after 8s Test Llama Stack Build / build (push) Failing after 5s Python Package Build Test / build (3.12) (push) Failing after 51s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s Update ReadTheDocs / update-readthedocs (push) Failing after 50s Pre-commit / pre-commit (push) Successful in 2m9s # What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.
In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List
from llama_stack_client import (
    Agent, 
    AgentEventLogger, 
    LlamaStackClient, 
    RAGDocument
)
class MilvusRAGDemo:
    def __init__(self, base_url: str = "http://localhost:8321/"):
        self.client = LlamaStackClient(base_url=base_url)
        self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
        self.model_id = None
        self.embedding_model_id = None
        self.embedding_dimension = None
        
    def setup_models(self):
        """Get available models and select appropriate ones for LLM and embeddings."""
        models = self.client.models.list()
    
        # Select embedding model
        embedding_models = [m for m in models if m.model_type == "embedding"]
        if not embedding_models:
            raise ValueError("No embedding models found")
        self.embedding_model_id = embedding_models[0].identifier
        self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
        
    def register_vector_db(self):
        print(f"Registering Milvus vector database: {self.vector_db_id}")
        
        response = self.client.vector_dbs.register(
            vector_db_id=self.vector_db_id,
            embedding_model=self.embedding_model_id,
            embedding_dimension=self.embedding_dimension,
            provider_id="milvus-remote",  # Use remote Milvus
        )
        print(f"Vector database registered successfully")
        return response
        
    def insert_documents(self):
        """Insert sample documents into the vector database."""
        print("\nInserting sample documents...")
        
        # Sample documents about different topics
        documents = [
            RAGDocument(
                document_id="ai_ml_basics",
                content="""
                Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
                AI refers to the simulation of human intelligence in machines, while ML is a subset
                of AI that enables computers to learn and improve from experience without being
                explicitly programmed. Deep learning, a subset of ML, uses neural networks with
                multiple layers to process complex patterns in data.
                
                Key concepts in AI/ML include:
                - Supervised Learning: Training with labeled data
                - Unsupervised Learning: Finding patterns in unlabeled data
                - Reinforcement Learning: Learning through trial and error
                - Neural Networks: Computing systems inspired by biological brains
                """,
                mime_type="text/plain",
                metadata={"topic": "technology", "category": "ai_ml"},
            ),
        ]
        
        # Insert documents with chunking
        self.client.tool_runtime.rag_tool.insert(
            documents=documents,
            vector_db_id=self.vector_db_id,
            chunk_size_in_tokens=200,  # Smaller chunks for better granularity
        )
        print(f"Inserted {len(documents)} documents with chunking")
                
    def test_keyword_search(self):
        """Test keyword-based search using BM25."""
        
        queries = [
            "neural networks",
            "Python frameworks",
            "data cleaning",
        ]
        
        for query in queries:
            response = self.client.vector_io.query(
                vector_db_id=self.vector_db_id,
                query=query,
                params={
                    "mode": "keyword",  # Keyword search
                    "max_chunks": 3,
                    "score_threshold": 0.0,
                }
            )
            
            for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
                print(f"  {i+1}. Score: {score:.4f}")
                print(f"     Content: {chunk.content[:100]}...")
                print(f"     Metadata: {chunk.metadata}")    
                
    def run_demo(self):       
        try:
            self.setup_models()
            self.register_vector_db()
            self.insert_documents()
            self.test_keyword_search()
        except Exception as e:
            print(f"Error during demo: {e}")
            raise
def main():
    """Main function to run the demo."""
    # Check if Llama Stack server is running
    demo = MilvusRAGDemo()    
    try:
        demo.run_demo()
    except Exception as e:
        print(f"Demo failed: {e}")
if __name__ == "__main__":
    main()
```
[//]: # (## Documentation)
---------
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | f731f369a2 | feat: add infrastructure to allow inference model discovery (#2710) # What does this PR do?
inference providers each have a static list of supported / known models.
some also have access to a dynamic list of currently available models.
this change gives prodivers using the ModelRegistryHelper the ability to
combine their static and dynamic lists.
for instance, OpenAIInferenceAdapter can implement
```
   def query_available_models(self) -> list[str]:
      return [entry.model for entry in self.openai_client.models.list()]
```
to augment its static list w/ a current list from openai.
## Test Plan
scripts/unit-test.sh | ||
|  | a7ed86181c | fix(faiss): Delete file contents from kvstore (#2686) Remove both the metadata and content from the kvstore when a file is being removed from the vector store. Closes: #2685 Also add faiss provider to openai_vector_stores test suite --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: raghotham <rsm@meta.com> | ||
|  | 68e7978c88 | chore: block network access from unit tests (#2732) 
		
			Some checks failed
		
		
	 Python Package Build Test / build (3.12) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Unit Tests / unit-tests (3.12) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Python Package Build Test / build (3.13) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Update ReadTheDocs / update-readthedocs (push) Failing after 10s Integration Tests / test-matrix (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Test Llama Stack Build / build (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 14s Pre-commit / pre-commit (push) Successful in 1m0s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Integration Tests / discover-tests (push) Successful in 5s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Test Llama Stack Build / generate-matrix (push) Successful in 5s Test External Providers / test-external-providers (venv) (push) Failing after 4s Test Llama Stack Build / build-single-provider (push) Failing after 7s # What does this PR do? this blocks network access for all `tests/unit/` tests. `tests/integration/` are untouched. it also introduces an `allow_network` marker to explicitly allow network access. ## Test Plan `./scripts/unit-tests.sh` | ||
|  | 51d9fd4808 | fix: Don't cache clients for passthrough auth providers (#2728) 
		
			Some checks failed
		
		
	 Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s Unit Tests / unit-tests (3.12) (push) Failing after 45s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s Integration Tests / discover-tests (push) Successful in 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Pre-commit / pre-commit (push) Successful in 2m8s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Test Llama Stack Build / generate-matrix (push) Successful in 5s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Test Llama Stack Build / build-single-provider (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 5s Python Package Build Test / build (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Test External Providers / test-external-providers (venv) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 6s Integration Tests / test-matrix (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s # What does this PR do? Some of our inference providers support passthrough authentication via `x-llamastack-provider-data` header values. This fixes the providers that support passthrough auth to not cache their clients to the backend providers (mostly OpenAI client instances) so that the client connecting to Llama Stack has to provide those auth values on each and every request. ## Test Plan I added some unit tests to ensure we're not caching clients across requests for all the fixed providers in this PR. ``` uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py ``` I also ran some of our OpenAI compatible API integration tests for each of the changed providers, just to ensure they still work. Note that these providers don't actually pass all these tests (for unrelated reasons due to quirks of the Groq and Together SaaS services), but enough of the tests passed to confirm the clients are still working as intended. ### Together ``` ENABLE_TOGETHER="together" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "together/meta-llama/Llama-3.1-8B-Instruct" ``` ### OpenAI ``` ENABLE_OPENAI="openai" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "openai/gpt-4o-mini" ``` ### Groq ``` ENABLE_GROQ="groq" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "groq/meta-llama/Llama-3.1-8B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 30b2e6a495 | chore: default to pytest asyncio-mode=auto (#2730) # What does this PR do? previously, developers who ran `./scripts/unit-tests.sh` would get `asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and `@pytest_asyncio.fixture` were redundent. developers who ran `pytest` directly would get pytest's default (strict mode), would run into errors leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture` to their code. with this change - - `asyncio_mode=auto` is included in `pyproject.toml` making behavior consistent for all invocations of pytest - removes all redundant `@pytest_asyncio.fixture` and `@pytest.mark.asyncio` - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0` ## Test Plan - `./scripts/unit-tests.sh` - `uv run pytest tests/unit` | ||
|  | 6a6b66ae4f | chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665) # What does this PR do? This PR refactors and the VectorIO backend logic for `sqlite-vec` and adds unit tests and fixtures to make it easy to test both `sqlite-vec` and `milvus`. Key changes: - `sqlite-vec` migrated to `kvstore` registry - added in-memory cache for sqlite-vec to be consistent with `milvus` - default fixtures moved to `conftest.py` - removed redundant tests from sqlite`-vec` - made `test_vector_io_openai_vector_stores.py` more easily extensible ## Test Plan Unit tests added testing inline providers. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 83c89265e0 | chore: Adding unit tests for Milvus and OpenAI compatibility (#2640) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Test Llama Stack Build / generate-matrix (push) Successful in 36s Test Llama Stack Build / build-single-provider (push) Failing after 36s Python Package Build Test / build (3.13) (push) Failing after 2s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Test External Providers / test-external-providers (venv) (push) Failing after 4s Test Llama Stack Build / build (push) Failing after 3s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 8s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s Python Package Build Test / build (3.12) (push) Failing after 17s Unit Tests / unit-tests (3.13) (push) Failing after 18s Pre-commit / pre-commit (push) Successful in 1m35s # What does this PR do? - Enabling Unit tests for Milvus to start to test OpenAI compatibility and fixing a few bugs. - Also fixed an inconsistency in the Milvus config between remote and inline. - Added pymilvus to extras for testing in CI I'm going to refactor this later to include the other inline providers so that we can catch issues sooner. I have another PR where I've been testing to find other bugs in the implementation (and required changes drafted here: https://github.com/meta-llama/llama-stack/pull/2617). ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | f77d4d91f5 | fix: handle encoding errors when adding files to vector store (#2574) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Test Llama Stack Build / generate-matrix (push) Successful in 5s Python Package Build Test / build (3.13) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 1s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test External Providers / test-external-providers (venv) (push) Failing after 6s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s Test Llama Stack Build / build-single-provider (push) Failing after 37s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s Pre-commit / pre-commit (push) Successful in 1m35s - Add try-catch block around data.decode() to handle UnicodeDecodeError - Implement UTF-8 fallback when detected encoding fails - Return empty string when both encodings fail - add unit tests Fixes #2572: UnicodeDecodeError when uploading files with problematic encodings Signed-off-by: Derek Higgins <derekh@redhat.com> | ||
|  | ea80ea63ac | chore: Updating chunk id generation to ensure uniqueness (#2618) # What does this PR do? This handles an edge case for `generate_chunk_id` if the concatenation of the `document_id` and `chunk_text` combination are not unique. Adding the window location ensures uniqueness. ## Test Plan Added unit test Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | f4950f4ef0 | fix: AccessDeniedError leads to HTTP 500 instead of error 403 (#2595) Resolves access control error visibility issues where 500 errors were
returned instead of proper 403 responses with actionable error messages.
• Enhance AccessDeniedError with detailed context and improve exception
handling
• Enhanced AccessDeniedError class to include user, action, and resource
context
  - Added constructor parameters for action, resource, and user
- Generate detailed error messages showing user principal, attributes,
and attempted resource
- Backward compatible with existing usage (falls back to generic
message)
• Updated exception handling in server.py
  - Import AccessDeniedError from access_control module
  - Return proper 403 status codes with detailed error messages
- Separate handling for PermissionError (generic) vs AccessDeniedError
(detailed)
• Enhanced error context at raise sites
- Updated routing_tables/common.py to pass action, resource, and user
context
- Updated agents persistence to include context in access denied errors
  - Provides better debugging information for access control issues
• Added comprehensive unit tests
  - Created tests/unit/server/test_server.py with 13 test cases
  - Covers AccessDeniedError with and without context
- Tests all exception types (ValidationError, BadRequestError,
AuthenticationRequiredError, etc.)
  - Validates proper HTTP status codes and error message formats
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
```
server:
  port: 8321
    access_policy:
    - permit:
        principal: admin
        actions: [create, read, delete]
        when: user with admin in groups
    - permit:
        actions: [read]
        when: user with system:authenticated in roles
```
then:
```
curl --request POST --url http://localhost:8321/v1/vector-dbs \
  --header "Authorization: Bearer your-bearer" \
  --data '{
    "vector_db_id": "my_demo_vector_db",
    "embedding_model": "ibm-granite/granite-embedding-125m-english",
    "embedding_dimension": 768,
    "provider_id": "milvus"
  }'
 
```
depending if user is in group admin or not, you should get the
`AccessDeniedError`. Before this PR, this was leading to an error 500
and `Traceback` displayed in the logs.
After the PR, logs display a simpler error (unless DEBUG logging is set)
and a 403 Forbidden error is returned on the HTTP side.
---------
Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>> | ||
|  | ac5fd57387 | chore: remove nested imports (#2515) # What does this PR do? * Given that our API packages use "import *" in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import *` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 2d9fd041eb | fix: annotations list and web_search_preview in Responses (#2520) # What does this PR do? These are a couple of fixes to get an example LangChain app working with our OpenAI Responses API implementation. The Responses API spec requires an annotations array in `output[*].content[*].annotations` and we were not providing one. So, this adds that as an empty list, even though we don't do anything to populate it yet. This prevents an error from client libraries like Langchain that expect this field to always exist, even if an empty list. The other fix is `web_search_preview` is a valid name for the web search tool in the Responses API, but we only responded to `web_search` or `web_search_preview_2025_03_11`. ## Test Plan The existing Responses unit tests were expanded to test these cases, via: ``` pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` The existing test_openai_responses.py integration tests still pass with this change, tested as below with Fireworks: ``` uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv tests/integration/agents/test_openai_responses.py \ --text-model accounts/fireworks/models/llama4-scout-instruct-basic ``` Lastly, this example LangChain app now works with Llama stack (tested with Ollama in the starter template in this case). This LangChain code is using the example snippets for using Responses API at https://python.langchain.com/docs/integrations/chat/openai/#responses-api ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="fake", model="ollama/meta-llama/Llama-3.2-3B-Instruct", ) tool = {"type": "web_search_preview"} llm_with_tools = llm.bind_tools([tool]) response = llm_with_tools.invoke("What was a positive news story from today?") print(response.content) ``` Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 82f13fe83e | feat: Add ChunkMetadata to Chunk (#2497) # What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.
More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.
```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.
<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 
## Test Plan
Added unit tests
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | d3b60507d7 | feat: support auth attributes in inference/responses stores (#2389) # What does this PR do? Inference/Response stores now store user attributes when inserting, and respects them when fetching. ## Test Plan pytest tests/unit/utils/test_sqlstore.py | ||
|  | a2f054607d | fix: cancel scheduler tasks on shutdown (#2130) # What does this PR do?
Scheduler: cancel tasks on shutdown.
Otherwise the currently running tasks will never exit (before they
actually complete), which means the process can't be properly shut down
(only with SIGKILL).
Ideally, we let tasks know that they are about to shutdown and give them
some time to do so; but in the lack of the mechanism, it's better to
cancel than linger forever.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Start a long running task (e.g. torchtune or external kfp-provider
training).
Ctr-C the process in TTY. Confirm it exits in reasonable time.
```
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
13:32:26.187 - INFO - Shutting down
13:32:26.187 - INFO - Shutting down DatasetsRoutingTable
13:32:26.187 - INFO - Shutting down DatasetIORouter
13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl
    Traceback (most recent call last):
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
        return self._loop.run_until_complete(task)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
    asyncio.exceptions.CancelledError
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module>
        executor_main()
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main
        output_file = executor.execute()
                      ^^^^^^^^^^^^^^^^^^
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute
        result = self.func(**func_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component
        asyncio.run(recipe.setup())
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
        return runner.run(main)
               ^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run
        raise KeyboardInterrupt()
    KeyboardInterrupt
13:32:31.219 - ERROR - Task 'component' finished with status FAILURE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO     2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m
         finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
ERROR    2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed.
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45   │
         │ in do                                                                                                       │
         │                                                                                                             │
         │    42 │   │   │                                                                                             │
         │    43 │   │   │   job.status = JobStatus.running                                                            │
         │    44 │   │   │   try:                                                                                      │
         │ ❱  45 │   │   │   │   artifacts = self._to_artifacts(job.handler().output)                                  │
         │    46 │   │   │   │   for artifact in artifacts:                                                            │
         │    47 │   │   │   │   │   on_artifact_collected_cb(artifact)                                                │
         │    48                                                                                                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │
         │ ent.py:101 in __call__                                                                                      │
         │                                                                                                             │
         │    98 │   │   │   │   f'{self.name}() missing {len(missing_arguments)} required '                           │
         │    99 │   │   │   │   f'{argument_or_arguments}: {arguments}.')                                             │
         │   100 │   │                                                                                                 │
         │ ❱ 101 │   │   return pipeline_task.PipelineTask(                                                            │
         │   102 │   │   │   component_spec=self.component_spec,                                                       │
         │   103 │   │   │   args=task_inputs,                                                                         │
         │   104 │   │   │   execute_locally=pipeline_context.Pipeline.get_default_pipeline() is                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:187 in __init__                                                                                       │
         │                                                                                                             │
         │   184 │   │   ])                                                                                            │
         │   185 │   │                                                                                                 │
         │   186 │   │   if execute_locally:                                                                           │
         │ ❱ 187 │   │   │   self._execute_locally(args=args)                                                          │
         │   188 │                                                                                                     │
         │   189 │   def _execute_locally(self, args: Dict[str, Any]) -> None:                                         │
         │   190 │   │   """Execute the pipeline task locally.                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:197 in _execute_locally                                                                               │
         │                                                                                                             │
         │   194 │   │   from kfp.local import task_dispatcher                                                         │
         │   195 │   │                                                                                                 │
         │   196 │   │   if self.pipeline_spec is not None:                                                            │
         │ ❱ 197 │   │   │   self._outputs = pipeline_orchestrator.run_local_pipeline(                                 │
         │   198 │   │   │   │   pipeline_spec=self.pipeline_spec,                                                     │
         │   199 │   │   │   │   arguments=args,                                                                       │
         │   200 │   │   │   )                                                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:43 in run_local_pipeline                                                                    │
         │                                                                                                             │
         │    40 │                                                                                                     │
         │    41 │   # validate and access all global state in this function, not downstream                           │
         │    42 │   config.LocalExecutionConfig.validate()                                                            │
         │ ❱  43 │   return _run_local_pipeline_implementation(                                                        │
         │    44 │   │   pipeline_spec=pipeline_spec,                                                                  │
         │    45 │   │   arguments=arguments,                                                                          │
         │    46 │   │   raise_on_error=config.LocalExecutionConfig.instance.raise_on_error,                           │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:108 in _run_local_pipeline_implementation                                                   │
         │                                                                                                             │
         │   105 │   │   │   )                                                                                         │
         │   106 │   │   return outputs                                                                                │
         │   107 │   elif dag_status == status.Status.FAILURE:                                                         │
         │ ❱ 108 │   │   log_and_maybe_raise_for_failure(                                                              │
         │   109 │   │   │   pipeline_name=pipeline_name,                                                              │
         │   110 │   │   │   fail_stack=fail_stack,                                                                    │
         │   111 │   │   │   raise_on_error=raise_on_error,                                                            │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:137 in log_and_maybe_raise_for_failure                                                      │
         │                                                                                                             │
         │   134 │   │   logging_utils.format_task_name(task_name) for task_name in fail_stack)                        │
         │   135 │   msg = f'Pipeline {pipeline_name_with_color} finished with status                                  │
         │       {status_with_color}. Inner task failed: {task_chain_with_color}.'                                     │
         │   136 │   if raise_on_error:                                                                                │
         │ ❱ 137 │   │   raise RuntimeError(msg)                                                                       │
         │   138 │   with logging_utils.local_logger_context():                                                        │
         │   139 │   │   logging.error(msg)                                                                            │
         │   140                                                                                                       │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status
         [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down
         DistributionInspectImpl
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [26648]
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> | ||
|  | db2cd9e8f3 | feat: support filters in file search (#2472) # What does this PR do? Move to use vector_stores.search for file search tool in Responses, which supports filters. closes #2435 ## Test Plan Added e2e test with fitlers. myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search and filters' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct | ||
|  | 90d03552d4 | feat: To add health check for faiss inline vector_io provider (#2319) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 13s Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 4s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 4s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 7s Test External Providers / test-external-providers (venv) (push) Failing after 1m1s Unit Tests / unit-tests (3.11) (push) Failing after 1m11s Unit Tests / unit-tests (3.10) (push) Failing after 1m13s Unit Tests / unit-tests (3.12) (push) Failing after 1m9s Unit Tests / unit-tests (3.13) (push) Failing after 15s Pre-commit / pre-commit (push) Successful in 1m52s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health check for faiss inline vector_io provider. I tried adding `async def health(self) -> HealthResponse:` like in inference provider, but it didn't worked for `inline->vector_io->faiss` provider. And via debug logs, I understood the critical issue, that the health responses are being stored with the API name as the key, not as a nested dictionary with provider IDs. This means that all providers of the same API type (e.g., "vector_io") will share the same health response, and only the last one processed will be visible in the API response. I've created a patch file that fixes this issue by: - Storing the original get_providers_health method - Creating a patched version that correctly maps health responses to providers - Applying the patch to the `ProviderImpl` class Not an expert, so please let me know, if there can be any other workaround using which I can get the health status updated directly from `faiss.py`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Added unit tests to test the provider patch implementation in the PR. Adding a screenshot with the FAISS inline vector_io health status as "OK"  | ||
|  | 2e8054bede | feat: Implement hybrid search in SQLite-vec (#2312) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s Test Llama Stack Build / generate-matrix (push) Successful in 37s Test Llama Stack Build / build-single-provider (push) Failing after 37s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s Test External Providers / test-external-providers (venv) (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Unit Tests / unit-tests (3.11) (push) Failing after 6s Unit Tests / unit-tests (3.12) (push) Failing after 6s Unit Tests / unit-tests (3.13) (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 7s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Unit Tests / unit-tests (3.10) (push) Failing after 17s Pre-commit / pre-commit (push) Successful in 2m0s # What does this PR do?
Add support for hybrid search mode in SQLite-vec provider, which
combines
keyword and vector search for better results. The implementation:
- Adds hybrid search mode as a new option alongside vector and keyword
search
- Implements query_hybrid method in SQLiteVecIndex that:
  - First performs keyword search to get candidate matches
  - Then applies vector similarity search on those candidates
- Updates documentation to reflect the new search mode
This change improves search quality by leveraging both semantic
similarity
and keyword matching, while maintaining backward compatibility with
existing
vector and keyword search modes.
## Test Plan
```
pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 10 items                                                                                                                                                                                                
tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED
```
---------
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | a34cef925b | fix(faiss): handle case where distance is 0 by setting d to minimum positive… (#2387) # What does this PR do? Adds try-catch to faiss `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then appends `(1.0 / sys.float_info.min)` to `scores` to represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="faiss", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ``` ### Running unit tests `uv run pytest tests/unit/rag/test_rag_query.py -v` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 33ecefd284 | feat: To add health status check for remote VLLM (#2303) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 7s Unit Tests / unit-tests (3.10) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Unit Tests / unit-tests (3.11) (push) Failing after 9s Unit Tests / unit-tests (3.13) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 8s Pre-commit / pre-commit (push) Successful in 56s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health status check for remote VLLM <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> PR includes the unit test to test the added health check implementation feature. | ||
|  | 3251b44d8a | refactor: unify stream and non-stream impls for responses (#2388) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Integration Tests / test-matrix (http, agents) (push) Failing after 10s Integration Tests / test-matrix (http, inference) (push) Failing after 9s Integration Tests / test-matrix (http, inspect) (push) Failing after 8s Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Integration Tests / test-matrix (http, providers) (push) Failing after 10s Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, agents) (push) Failing after 9s Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Integration Tests / test-matrix (library, inference) (push) Failing after 9s Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, providers) (push) Failing after 9s Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 7s Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Unit Tests / unit-tests (3.11) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 9s Unit Tests / unit-tests (3.10) (push) Failing after 30s Pre-commit / pre-commit (push) Successful in 1m18s The non-streaming version is just a small layer on top of the streaming version - just pluck off the final `response.completed` event and return that as the response! This PR also includes a couple other changes which I ended up making while working on it on a flight: - changes to `ollama` so it does not pull embedding models unconditionally - a small fix to library client to make the stream and non-stream cases a bit more symmetric | ||
|  | 7c1998db25 | feat: fine grained access control policy (#2264) This allows a set of rules to be defined for determining access to
resources. The rules are (loosely) based on the cedar policy format.
A rule defines a list of action either to permit or to forbid. It may
specify a principal or a resource that must match for the rule to take
effect. It may also specify a condition, either a 'when' or an 'unless',
with additional constraints as to where the rule applies.
A list of rules is held for each type to be protected and tried in order
to find a match. If a match is found, the request is permitted or
forbidden depening on the type of rule. If no match is found, the
request is denied. If no rules are specified for a given type, a rule
that allows any action as long as the resource attributes match the user
attributes is added (i.e. the previous behaviour is the default.
Some examples in yaml:
```
    model:
    - permit:
      principal: user-1
      actions: [create, read, delete]
      comment: user-1 has full access to all models
    - permit:
      principal: user-2
      actions: [read]
      resource: model-1
      comment: user-2 has read access to model-1 only
    - permit:
      actions: [read]
      when:
        user_in: resource.namespaces
      comment: any user has read access to models with matching attributes
    vector_db:
    - forbid:
      actions: [create, read, delete]
      unless:
        user_in: role::admin
      comment: only user with admin role can use vector_db resources
```
---------
Signed-off-by: Gordon Sim <gsim@redhat.com> | ||
|  | 8bee2954be | feat: Structured output for Responses API (#2324) # What does this PR do? This adds the missing `text` parameter to the Responses API that is how users control structured outputs. All we do with that parameter is map it to the corresponding chat completion response_format. ## Test Plan The new unit tests exercise the various permutations allowed for this property, while a couple of new verification tests actually use it for real to verify the model outputs are following the format as expected. Unit tests: `python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -vv 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Note that the verification tests can only be run with a real Llama Stack server (as opposed to using the library client via `--provider=stack:together`) because the Llama Stack python client is not yet updated to accept this text field. Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | dbe4e84aca | feat(responses): implement full multi-turn support (#2295) I think the implementation needs more simplification. Spent way too much time trying to get the tests pass with models not co-operating :( Finally had to switch claude-sonnet to get things to pass reliably. ### Test Plan ``` export TAVILY_SEARCH_API_KEY=... export OPENAI_API_KEY=... uv run pytest -p no:warnings \ -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:starter \ --model openai/gpt-4o ``` | ||
|  | 17f4414be9 | fix: remote-vllm event loop blocking unit test on Mac (#2332) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, agents) (push) Failing after 9s Integration Tests / test-matrix (http, inference) (push) Failing after 11s Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (http, agents) (push) Failing after 14s Integration Tests / test-matrix (http, providers) (push) Failing after 13s Integration Tests / test-matrix (library, inference) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 5s Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, providers) (push) Failing after 10s Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Unit Tests / unit-tests (3.11) (push) Failing after 8s Unit Tests / unit-tests (3.10) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 7s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Integration Tests / test-matrix (library, post_training) (push) Failing after 29s Pre-commit / pre-commit (push) Successful in 1m11s # What does this PR do? The remote-vllm `test_chat_completion_doesnt_block_event_loop` unit test was often failing for me on a Mac with a `httpx.ReadError`. I traced this back to the swap to the `AsyncOpenAI` client in the remote-vllm provider as where this started, and it looks like the async client needs a bit more accurate HTTP request handling from our mock server. So, this fixes that unit test to send proper Content-Type and Content-Length headers which makes the `AsyncOpenAI` client happier on Macs. ## Test Plan All the test_remote_vllm.py unit tests consistently pass for me on a Mac now, without any flaking in the event loop one. `pytest -s -v tests/unit/providers/inference/test_remote_vllm.py` Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | f328436831 | feat: Enable ingestion of precomputed embeddings (#2317) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Integration Tests / test-matrix (http, agents) (push) Failing after 10s Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Integration Tests / test-matrix (http, inference) (push) Failing after 10s Integration Tests / test-matrix (library, agents) (push) Failing after 9s Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Integration Tests / test-matrix (http, providers) (push) Failing after 9s Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, inference) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 6s Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, providers) (push) Failing after 8s Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Unit Tests / unit-tests (3.11) (push) Failing after 7s Unit Tests / unit-tests (3.10) (push) Failing after 9s Unit Tests / unit-tests (3.13) (push) Failing after 7s Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Pre-commit / pre-commit (push) Successful in 1m15s | ||
|  | bfdd15d1fa | fix(responses): use input, not original_input when storing the Response (#2300) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (http, providers) (push) Failing after 7s Integration Tests / test-matrix (http, agents) (push) Failing after 9s Integration Tests / test-matrix (http, inference) (push) Failing after 10s Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, agents) (push) Failing after 10s Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, inference) (push) Failing after 7s Test External Providers / test-external-providers (venv) (push) Failing after 6s Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Integration Tests / test-matrix (library, providers) (push) Failing after 10s Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (library, inspect) (push) Failing after 11s Unit Tests / unit-tests (3.10) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 9s Unit Tests / unit-tests (3.11) (push) Failing after 9s Unit Tests / unit-tests (3.13) (push) Failing after 7s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Pre-commit / pre-commit (push) Failing after 53s We must store the full (re-hydrated) input not just the original input in the Response object. Of course, this is not very space efficient and we should likely find a better storage scheme so that we can only store unique entries in the database and then re-hydrate them efficiently later. But that can be done safely later. Closes https://github.com/meta-llama/llama-stack/issues/2299 ## Test Plan Unit test | ||
|  | 5cdb29758a | feat(responses): add output_text delta events to responses (#2265) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` | ||
|  | 15b0a67555 | feat: add responses input items api (#2239) # What does this PR do? TSIA ## Test Plan added integration and unit tests | ||
|  | 5844c2da68 | feat: add list responses API (#2233) # What does this PR do? This is not part of the official OpenAI API, but we'll use this for the logs UI. In order to support more filtering options, I'm adopting the newly introduced sql store in in place of the kv store. ## Test Plan Added integration/unit tests. | ||
|  | e92301f2d7 | feat(sqlite-vec): enable keyword search for sqlite-vec (#1439) # What does this PR do?
This PR introduces support for keyword based FTS5 search with BM25
relevance scoring. It makes changes to the existing EmbeddingIndex base
class in order to support a search_mode and query_str parameter, that
can be used for keyword based search implementations.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
run 
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
Output:
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
====================================================== test session starts =======================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None
collected 7 items                                                                                                                
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```
For reference, with the implementation, the fts table looks like below:
```
Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0
Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0
Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0
Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0
Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0
Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0
Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0
Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0
Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0
Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0
Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1
Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1
Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1
Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1
Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1
```
Query results:
Result 1: Sentence 5 from document 0
Result 2: Sentence 5 from document 1
Result 3: Sentence 5 from document 2
[//]: # (## Documentation)
---------
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 3339844fda | feat: Add "instructions" support to responses API (#2205) # What does this PR do? Add support for "instructions" to the responses API. Instructions provide a way to swap out system (or developer) messages in new responses. ## Test Plan unit tests added Signed-off-by: Derek Higgins <derekh@redhat.com> | ||
|  | 1a770cf8ac | fix: Pass model parameter as config name to NeMo Customizer (#2218) # What does this PR do?
When launching a fine-tuning job, an upcoming version of NeMo Customizer
will expect the `config` name to be formatted as
`namespace/name@version`. Here, `config` is a reference to a model +
additional metadata. There could be multiple `config`s that reference
the same base model.
This PR updates NVIDIA's `supervised_fine_tune` to simply pass the
`model` param as-is to NeMo Customizer. Currently, it expects a
specific, allowlisted llama model (i.e. `meta/Llama3.1-8B-Instruct`) and
converts it to the provider format (`meta/llama-3.1-8b-instruct`).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
From a notebook, I built an image with my changes: 
```
!llama stack build --template nvidia --image-type venv
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
client.initialize()
```
And could successfully launch a job:
```
response = client.post_training.supervised_fine_tune(
    job_uuid="",
    model="meta/llama-3.2-1b-instruct@v1.0.0+A100", # Model passed as-is to Customimzer
    ...
)
job_id = response.job_uuid
print(f"Created job with ID: {job_id}")
Output:
Created job with ID: cust-Jm4oGmbwcvoufaLU4XkrRU
```
[//]: # (## Documentation)
---------
Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com> |