mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 17:11:12 +00:00 
			
		
		
		
	
	
		
			242 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | 47c078fcef | feat: implement dynamic model detection support for inference providers using litellm (#2886) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI. | ||
|  | 9583f468f8 | feat(starter)!: simplify starter distro; litellm model registry changes (#2916) | ||
|  | 52201612de | feat: implement chunk deletion for vector stores (#2701) Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> | ||
|  | 7cc4819e90 | feat: add MCP Streamable HTTP support (#2554) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds support for the new Streamable HTTP transport for MCP, as well as falling back to the SSE protocol if the Streamable HTTP connection fails. <!-- If resolving an issue, uncomment and update the line below --> Closes #2542 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Calum Murray <cmurray@redhat.com> | ||
|  | 1463b79218 | feat(registry): make the Stack query providers for model listing (#2862) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider. | ||
|  | 2aba2c1236 | chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863) # What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | e1ed152779 | chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Integration Tests / discover-tests (push) Successful in 7s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Python Package Build Test / build (3.13) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Unit Tests / unit-tests (3.13) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Integration Tests / test-matrix (push) Failing after 18s Pre-commit / pre-commit (push) Successful in 1m14s # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests | ||
|  | 3b83032555 | feat(registry): more flexible model lookup (#2859) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests | ||
|  | 9736f096f6 | chore(test): fix flaky telemetry tests (#2815) 
		
			Some checks failed
		
		
	 Installer CI / lint (push) Failing after 2s Installer CI / smoke-test (push) Has been skipped Integration Tests / discover-tests (push) Successful in 3s Coverage Badge / unit-tests (push) Failing after 6s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 6s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Test Llama Stack Build / generate-matrix (push) Successful in 11s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 9s Integration Tests / test-matrix (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Test External Providers / test-external-providers (venv) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Test Llama Stack Build / build (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 48s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Unit Tests / unit-tests (3.13) (push) Failing after 52s Pre-commit / pre-commit (push) Successful in 1m42s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes flaky telemetry tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> See https://github.com/meta-llama/llama-stack/pull/2814 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 20c3197952 | chore: Making name optional in openai_create_vector_store (#2858) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | b57db11bed | feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745) 
		
			Some checks failed
		
		
	 Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Python Package Build Test / build (3.13) (push) Failing after 2s Test Llama Stack Build / generate-matrix (push) Successful in 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test Llama Stack Build / build-single-provider (push) Failing after 7s Integration Tests / discover-tests (push) Successful in 13s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Integration Tests / test-matrix (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Test External Providers / test-external-providers (venv) (push) Failing after 17s Test Llama Stack Build / build (push) Failing after 14s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Python Package Build Test / build (3.12) (push) Failing after 51s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Unit Tests / unit-tests (3.13) (push) Failing after 53s Pre-commit / pre-commit (push) Successful in 1m42s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models | ||
|  | 31b088978a | fix: Fix /vector-stores/createAPI when vector store with duplicatename(#2617)# What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 33f0d83ad3 | chore: Move vector store kvstoreimplementation intoopenai_vector_store_mixin.py(#2748) | ||
|  | f731f369a2 | feat: add infrastructure to allow inference model discovery (#2710) # What does this PR do?
inference providers each have a static list of supported / known models.
some also have access to a dynamic list of currently available models.
this change gives prodivers using the ModelRegistryHelper the ability to
combine their static and dynamic lists.
for instance, OpenAIInferenceAdapter can implement
```
   def query_available_models(self) -> list[str]:
      return [entry.model for entry in self.openai_client.models.list()]
```
to augment its static list w/ a current list from openai.
## Test Plan
scripts/unit-test.sh | ||
|  | d880c2df0e | fix: auth sql store: user is owner policy (#2674) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Installer CI / lint (push) Failing after 4s Installer CI / smoke-test (push) Has been skipped Integration Tests / discover-tests (push) Successful in 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 4s Python Package Build Test / build (3.12) (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Test Llama Stack Build / generate-matrix (push) Successful in 10s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Unit Tests / unit-tests (3.13) (push) Failing after 8s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s Test Llama Stack Build / build-single-provider (push) Failing after 13s Integration Tests / test-matrix (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 13s Test Llama Stack Build / build (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Pre-commit / pre-commit (push) Successful in 1m8s # What does this PR do? The current authorized sql store implementation does not respect user.principal (only checks attributes). This PR addresses that. ## Test Plan Added test cases to integration tests. | ||
|  | bbe0199bb7 | chore: update pre-commit hook versions (#2708) While investigating the `uv.lock` changes made in https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of the pre-commit hook versions were out of date This PR updates them and fixes some new `ruff` errors --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 9b7eecebcf | ci: test safety with starter (#2628) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Test Llama Stack Build / generate-matrix (push) Successful in 14s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 14s Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s Update ReadTheDocs / update-readthedocs (push) Failing after 12s Unit Tests / unit-tests (3.13) (push) Failing after 14s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Test External Providers / test-external-providers (venv) (push) Failing after 17s Test Llama Stack Build / build (push) Failing after 13s Unit Tests / unit-tests (3.12) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s Python Package Build Test / build (3.12) (push) Failing after 31s Python Package Build Test / build (3.13) (push) Failing after 29s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Pre-commit / pre-commit (push) Successful in 1m24s # What does this PR do? We are now testing the safety capability with the starter image. This includes a few changes: * Enable the safety integration test * Relax the shield model requirements from llama-guard to make it work with llama-guard3:8b coming from Ollama * Expose a shield for each inference provider in the starter distro. The shield will only be registered if the provider is enabled. Closes: https://github.com/meta-llama/llama-stack/issues/2528 Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | e9926564bd | fix: authorized sql store with postgres (#2641) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 11s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 14s Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 14s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Test Llama Stack Build / generate-matrix (push) Successful in 5s Python Package Build Test / build (3.12) (push) Failing after 1s Test External Providers / test-external-providers (venv) (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 3s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test Llama Stack Build / build (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 4s Unit Tests / unit-tests (3.13) (push) Failing after 7s Test Llama Stack Build / build-single-provider (push) Failing after 44s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s Pre-commit / pre-commit (push) Successful in 1m34s # What does this PR do? postgres has different json extract syntax from sqlite ## Test Plan added integration test | ||
|  | f77d4d91f5 | fix: handle encoding errors when adding files to vector store (#2574) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Test Llama Stack Build / generate-matrix (push) Successful in 5s Python Package Build Test / build (3.13) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 1s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test External Providers / test-external-providers (venv) (push) Failing after 6s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s Test Llama Stack Build / build-single-provider (push) Failing after 37s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s Pre-commit / pre-commit (push) Successful in 1m35s - Add try-catch block around data.decode() to handle UnicodeDecodeError - Implement UTF-8 fallback when detected encoding fails - Return empty string when both encodings fail - add unit tests Fixes #2572: UnicodeDecodeError when uploading files with problematic encodings Signed-off-by: Derek Higgins <derekh@redhat.com> | ||
|  | ea80ea63ac | chore: Updating chunk id generation to ensure uniqueness (#2618) # What does this PR do? This handles an edge case for `generate_chunk_id` if the concatenation of the `document_id` and `chunk_text` combination are not unique. Adding the window location ensures uniqueness. ## Test Plan Added unit test Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 3c43a2f529 | fix: store configs (#2593) # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo, as the config expected a str but the value was converted to int. This PR: 1. Updates the type of port in sqlstore to be int 2. template generation uses `dict` instead of `StackRunConfig` so as to avoid failing pydantic typechecks. 3. Adds `replace_env_vars` to StackRunConfig instantiation in `configure.py` (not sure why this wasn't needed before). ## Test Plan `llama stack build --template postgres_demo --image-type conda --run` | ||
|  | 4d0d2d685f | fix: Set parameter usedforsecurity=False when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters (#2577) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 5s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 26s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 26s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Python Package Build Test / build (3.12) (push) Failing after 1s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 24s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 31s Unit Tests / unit-tests (3.12) (push) Failing after 5s Test External Providers / test-external-providers (venv) (push) Failing after 5s Unit Tests / unit-tests (3.13) (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 34s Python Package Build Test / build (3.13) (push) Failing after 33s Pre-commit / pre-commit (push) Successful in 1m52s # What does this PR do? Set parameter `usedforsecurity=False` when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters <!-- If resolving an issue, uncomment and update the line below --> Closes #2571 --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> | ||
|  | b333a3c03a | fix(ollama): Download remote image URLs for Ollama (#2551) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s Python Package Build Test / build (3.12) (push) Failing after 43s Test External Providers / test-external-providers (venv) (push) Failing after 40s Python Package Build Test / build (3.13) (push) Failing after 42s Unit Tests / unit-tests (3.13) (push) Failing after 22s Unit Tests / unit-tests (3.12) (push) Failing after 25s Update ReadTheDocs / update-readthedocs (push) Failing after 20s Pre-commit / pre-commit (push) Successful in 2m13s ## What does this PR do? Ollama does not support remote images. Only local file paths OR base64 inputs are supported. This PR ensures that the Stack downloads remote images and passes the base64 down to the inference engine. ## Test Plan Added a test cases for Responses and ran it for both `fireworks` and `ollama` providers. | ||
|  | 8d8e90d78e | fix: add missing argument and methods (#2550) # What does this PR do? Resolves: ``` mypy.....................................................................Failed - hook id: mypy - exit code: 1 llama_stack/providers/utils/responses/responses_store.py:119: error: Missing positional argument "policy" in call to "fetch_one" of "AuthorizedSqlStore" [call-arg] llama_stack/providers/utils/responses/responses_store.py:122: error: "AuthorizedSqlStore" has no attribute "delete" [attr-defined] Found 2 errors in 1 file (checked 403 source files) ``` Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | be9bf68246 | feat: Add webmethod for deleting openai responses (#2160) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s Test External Providers / test-external-providers (venv) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s Unit Tests / unit-tests (3.12) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s Python Package Build Test / build (3.13) (push) Failing after 33s Python Package Build Test / build (3.12) (push) Failing after 36s Pre-commit / pre-commit (push) Failing after 1m19s # What does this PR do? This PR creates a webmethod for deleting open AI responses, adds and implementation for it and makes an integration test for the OpenAI delete response method. [//]: # (If resolving an issue, uncomment and update the line below) # (Closes #2077) ## Test Plan Ran the standard tests and the pre-commit hooks and the unit tests. # (## Documentation) For this pr I made the routes and implementation based on the current get and create methods. The unit tests were not able to handle this test due to the mock interface in use, which did not allow for effective CRUD to be tested. I instead created an integration test to match the existing ones in the test_openai_responses. | ||
|  | cc19b56c87 | chore: OpenAI compatibility for Milvus (#2470) # What does this PR do? Closes https://github.com/meta-llama/llama-stack/issues/2461 ## Test Plan Tested with the `ollama` distriubtion template and updated the vector_io provider to: ```yaml vector_io: - provider_id: milvus provider_type: inline::milvus config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db kvstore: type: sqlite db_name: milvus_registry.db ``` Ran the stack ```bash llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434" ``` Ran the tests: ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` Output passed. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 03e61e3fcc | fix: ValueError in faiss vector database serialization (resolves #2519) (#2526) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 22s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 22s Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 23s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Python Package Build Test / build (3.12) (push) Failing after 15s Python Package Build Test / build (3.13) (push) Failing after 17s Test External Providers / test-external-providers (venv) (push) Failing after 20s Unit Tests / unit-tests (3.12) (push) Failing after 21s Unit Tests / unit-tests (3.13) (push) Failing after 11s Pre-commit / pre-commit (push) Successful in 1m12s The error message was misleading as it appeared to be an Ollama connectivity issue, but actually occurred during faiss vector database initialization. ## 🔍 Root Cause Analysis The issue was in the faiss vector database serialization logic in `llama_stack/providers/inline/vector_io/faiss/faiss.py`: 1. **Saving**: `faiss.serialize_index()` returns binary data (uint8 numpy array) 2. **Bug**: Code incorrectly used `np.savetxt()` which converts binary to text with scientific notation (e.g., `7.300000000000000000e+01`) 3. **Loading**: `np.loadtxt(buffer, dtype=np.uint8)` failed to parse scientific notation back to uint8 4. **Result**: Server crashed during initialization before reaching Ollama connectivity check ## ✅ Solution Replaced text-based serialization with proper binary serialization: ``` **After (fixed):** ```python # Saving - proper binary format np.save(buffer, np_index, allow_pickle=False) # Loading - proper binary format self.index = faiss.deserialize_index(np.load(buffer, allow_pickle=False)) ``` ## 🧪 Testing - ✅ Binary serialization/deserialization works correctly - ✅ Backward compatible with existing functionality - ✅ No security concerns (allow_pickle=False maintained) - ✅ Resolves the specific ValueError mentioned in the issue ## 📊 Impact This fix resolves: - ValueError during server startup with Ollama templates ## 🔗 Related Issues - Closes #2519 - Affects all users of Ollama template and faiss vector_io configurations ## 📝 Files Changed - `llama_stack/providers/inline/vector_io/faiss/faiss.py` - Fixed serialization methods in `initialize()` and `_save_index()` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 7cb5d3c60f | chore: standardize unsupported model error #2517 (#2518) # What does this PR do? - llama_stack/exceptions.py: Add UnsupportedModelError class - remote inference ollama.py and utils/inference/model_registry.py: Changed ValueError in favor of UnsupportedModelError - utils/inference/litellm_openai_mixin.py: remove `register_model` function implementation from `LiteLLMOpenAIMixin` class. Now uses the parent class `ModelRegistryHelper`'s function implementation Closes #2517 ## Test Plan 1. Create a new `test_run_openai.yaml` and paste the following config in it: ```yaml version: '2' image_name: test-image apis: - inference providers: inference: - provider_id: openai provider_type: remote::openai config: max_tokens: 8192 models: - metadata: {} model_id: "non-existent-model" provider_id: openai model_type: llm server: port: 8321 ``` And run the server with: ```bash uv run llama stack run test_run_openai.yaml ``` You should now get a `llama_stack.exceptions.UnsupportedModelError` with the supported list of models in the error message. --- Tested for the following remote inference providers, and they all raise the `UnsupportedModelError`: - Anthropic - Cerebras - Fireworks - Gemini - Groq - Ollama - OpenAI - SambaNova - Together - Watsonx --------- Co-authored-by: Rohan Awhad <rawhad@redhat.com> | ||
|  | 68d8f2186f | fix: fix test of root span to match what is being set (#2494) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 23s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 13s Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 22s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 22s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 14s Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 11s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 20s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s Python Package Build Test / build (3.12) (push) Failing after 7s Test External Providers / test-external-providers (venv) (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 9s Python Package Build Test / build (3.13) (push) Failing after 32s Unit Tests / unit-tests (3.12) (push) Failing after 48s Pre-commit / pre-commit (push) Successful in 1m32s # What does this PR do? I get errors when trying to query spans. It appears to be a result of traces being inserted where there is no root_span_id which causes a pydantic validation error on trying to load the data for a query response (and in any case having no span referenced undermines the purpose of the trace). The root cause as far as I can see is an invalid test in the code that inserts the trace, where it is testing for the string "true" against an object set to the python value True. <!-- If resolving an issue, uncomment and update the line below --> Closes #2493 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> With this change I can query spans. Signed-off-by: Gordon Sim <gsim@redhat.com> | ||
|  | 43c1f39bd6 | refactor(env)!: enhanced environment variable substitution (#2490) # What does this PR do?
This commit significantly improves the environment variable substitution
functionality in Llama Stack configuration files:
* The version field in configuration files has been changed from string
to integer type for better type consistency across build and run
configurations.
* The environment variable substitution system for ${env.FOO:} was fixed
and properly returns an error
* The environment variable substitution system for ${env.FOO+} returns
None instead of an empty strings, it better matches type annotations in
config fields
* The system includes automatic type conversion for boolean, integer,
and float values.
* The error messages have been enhanced to provide clearer guidance when
environment variables are missing, including suggestions for using
default values or conditional syntax.
* Comprehensive documentation has been added to the configuration guide
explaining all supported syntax patterns, best practices, and runtime
override capabilities.
* Multiple provider configurations have been updated to use the new
conditional syntax for optional API keys, making the system more
flexible for different deployment scenarios. The telemetry configuration
has been improved to properly handle optional endpoints with appropriate
validation, ensuring that required endpoints are specified when their
corresponding sinks are enabled.
* There were many instances of ${env.NVIDIA_API_KEY:} that should have
caused the code to fail. However, due to a bug, the distro server was
still being started, and early validation wasn’t triggered. As a result,
failures were likely being handled downstream by the providers. I’ve
maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I
believe this is incorrect for many configurations. I’ll leave it to each
provider to correct it as needed.
* Environment variable substitution now uses the same syntax as Bash
parameter expansion.
Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | ac5fd57387 | chore: remove nested imports (#2515) # What does this PR do? * Given that our API packages use "import *" in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import *` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 1d3f27fe5b | fix: resume responses with tool call output (#2524) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Python Package Build Test / build (3.12) (push) Failing after 5s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 49s Test External Providers / test-external-providers (venv) (push) Failing after 49s Unit Tests / unit-tests (3.13) (push) Failing after 49s Pre-commit / pre-commit (push) Successful in 2m5s # What does this PR do? closes #2522 ## Test Plan added integration test LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v tests/integration/agents/test_openai_responses.py --text-model "accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k 'function_call' | ||
|  | 82f13fe83e | feat: Add ChunkMetadata to Chunk (#2497) # What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.
More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.
```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.
<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 
## Test Plan
Added unit tests
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | cfee63bd0d | feat: Add search_mode support to OpenAI vector store API (#2500) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s Test Llama Stack Build / build-single-provider (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 8s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s Test Llama Stack Build / build (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 44s Test External Providers / test-external-providers (venv) (push) Failing after 47s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s Pre-commit / pre-commit (push) Successful in 2m12s # What does this PR do? Add search_mode parameter (vector/keyword/hybrid) to openai_search_vector_store method. Fixes OpenAPI code generation by using str instead of Literal type. Closes: #2459 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 73c18feac4 | fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503) | ||
|  | 9c8be89fb6 | chore: bump python supported version to 3.12 (#2475) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 5s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Unit Tests / unit-tests (3.13) (push) Failing after 8s Test Llama Stack Build / build (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s Python Package Build Test / build (3.12) (push) Failing after 33s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Test External Providers / test-external-providers (venv) (push) Failing after 31s Pre-commit / pre-commit (push) Successful in 1m54s # What does this PR do? The project now supports Python >= 3.12 Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | d3b60507d7 | feat: support auth attributes in inference/responses stores (#2389) # What does this PR do? Inference/Response stores now store user attributes when inserting, and respects them when fetching. ## Test Plan pytest tests/unit/utils/test_sqlstore.py | ||
|  | f394c7f2d9 | feat: Add missing Vector Store Files API surface (#2468) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s Python Package Build Test / build (3.11) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Unit Tests / unit-tests (3.11) (push) Failing after 7s Update ReadTheDocs / update-readthedocs (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s Test External Providers / test-external-providers (venv) (push) Failing after 43s Unit Tests / unit-tests (3.13) (push) Failing after 52s Pre-commit / pre-commit (push) Successful in 2m4s # What does this PR do? This adds the ability to list, retrieve, update, and delete Vector Store Files. It implements these new APIs for the faiss and sqlite-vec providers, since those are the two that also have the rest of the vector store files implementation. Closes #2445 ## Test Plan ### test_openai_vector_stores Integration Tests There are a number of new integration tests added, which I ran for each provider as outlined below. faiss (from ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` sqlite-vec (from starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` ### file_search verification tests I also ensured the file_search verification tests continue to work, both for faiss and sqlite-vec. faiss (ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` sqlite-vec (starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | a2f054607d | fix: cancel scheduler tasks on shutdown (#2130) # What does this PR do?
Scheduler: cancel tasks on shutdown.
Otherwise the currently running tasks will never exit (before they
actually complete), which means the process can't be properly shut down
(only with SIGKILL).
Ideally, we let tasks know that they are about to shutdown and give them
some time to do so; but in the lack of the mechanism, it's better to
cancel than linger forever.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Start a long running task (e.g. torchtune or external kfp-provider
training).
Ctr-C the process in TTY. Confirm it exits in reasonable time.
```
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
13:32:26.187 - INFO - Shutting down
13:32:26.187 - INFO - Shutting down DatasetsRoutingTable
13:32:26.187 - INFO - Shutting down DatasetIORouter
13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl
    Traceback (most recent call last):
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
        return self._loop.run_until_complete(task)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
    asyncio.exceptions.CancelledError
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module>
        executor_main()
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main
        output_file = executor.execute()
                      ^^^^^^^^^^^^^^^^^^
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute
        result = self.func(**func_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component
        asyncio.run(recipe.setup())
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
        return runner.run(main)
               ^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run
        raise KeyboardInterrupt()
    KeyboardInterrupt
13:32:31.219 - ERROR - Task 'component' finished with status FAILURE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO     2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m
         finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
ERROR    2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed.
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45   │
         │ in do                                                                                                       │
         │                                                                                                             │
         │    42 │   │   │                                                                                             │
         │    43 │   │   │   job.status = JobStatus.running                                                            │
         │    44 │   │   │   try:                                                                                      │
         │ ❱  45 │   │   │   │   artifacts = self._to_artifacts(job.handler().output)                                  │
         │    46 │   │   │   │   for artifact in artifacts:                                                            │
         │    47 │   │   │   │   │   on_artifact_collected_cb(artifact)                                                │
         │    48                                                                                                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │
         │ ent.py:101 in __call__                                                                                      │
         │                                                                                                             │
         │    98 │   │   │   │   f'{self.name}() missing {len(missing_arguments)} required '                           │
         │    99 │   │   │   │   f'{argument_or_arguments}: {arguments}.')                                             │
         │   100 │   │                                                                                                 │
         │ ❱ 101 │   │   return pipeline_task.PipelineTask(                                                            │
         │   102 │   │   │   component_spec=self.component_spec,                                                       │
         │   103 │   │   │   args=task_inputs,                                                                         │
         │   104 │   │   │   execute_locally=pipeline_context.Pipeline.get_default_pipeline() is                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:187 in __init__                                                                                       │
         │                                                                                                             │
         │   184 │   │   ])                                                                                            │
         │   185 │   │                                                                                                 │
         │   186 │   │   if execute_locally:                                                                           │
         │ ❱ 187 │   │   │   self._execute_locally(args=args)                                                          │
         │   188 │                                                                                                     │
         │   189 │   def _execute_locally(self, args: Dict[str, Any]) -> None:                                         │
         │   190 │   │   """Execute the pipeline task locally.                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:197 in _execute_locally                                                                               │
         │                                                                                                             │
         │   194 │   │   from kfp.local import task_dispatcher                                                         │
         │   195 │   │                                                                                                 │
         │   196 │   │   if self.pipeline_spec is not None:                                                            │
         │ ❱ 197 │   │   │   self._outputs = pipeline_orchestrator.run_local_pipeline(                                 │
         │   198 │   │   │   │   pipeline_spec=self.pipeline_spec,                                                     │
         │   199 │   │   │   │   arguments=args,                                                                       │
         │   200 │   │   │   )                                                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:43 in run_local_pipeline                                                                    │
         │                                                                                                             │
         │    40 │                                                                                                     │
         │    41 │   # validate and access all global state in this function, not downstream                           │
         │    42 │   config.LocalExecutionConfig.validate()                                                            │
         │ ❱  43 │   return _run_local_pipeline_implementation(                                                        │
         │    44 │   │   pipeline_spec=pipeline_spec,                                                                  │
         │    45 │   │   arguments=arguments,                                                                          │
         │    46 │   │   raise_on_error=config.LocalExecutionConfig.instance.raise_on_error,                           │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:108 in _run_local_pipeline_implementation                                                   │
         │                                                                                                             │
         │   105 │   │   │   )                                                                                         │
         │   106 │   │   return outputs                                                                                │
         │   107 │   elif dag_status == status.Status.FAILURE:                                                         │
         │ ❱ 108 │   │   log_and_maybe_raise_for_failure(                                                              │
         │   109 │   │   │   pipeline_name=pipeline_name,                                                              │
         │   110 │   │   │   fail_stack=fail_stack,                                                                    │
         │   111 │   │   │   raise_on_error=raise_on_error,                                                            │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:137 in log_and_maybe_raise_for_failure                                                      │
         │                                                                                                             │
         │   134 │   │   logging_utils.format_task_name(task_name) for task_name in fail_stack)                        │
         │   135 │   msg = f'Pipeline {pipeline_name_with_color} finished with status                                  │
         │       {status_with_color}. Inner task failed: {task_chain_with_color}.'                                     │
         │   136 │   if raise_on_error:                                                                                │
         │ ❱ 137 │   │   raise RuntimeError(msg)                                                                       │
         │   138 │   with logging_utils.local_logger_context():                                                        │
         │   139 │   │   logging.error(msg)                                                                            │
         │   140                                                                                                       │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status
         [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down
         DistributionInspectImpl
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [26648]
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> | ||
|  | c20388c424 | ci: add python package build test (#2457) # What does this PR do? We now test a package build on every PRs. Closes: https://github.com/meta-llama/llama-stack/issues/2406 Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | d12f195f56 | feat: drop python 3.10 support (#2469) # What does this PR do? dropped python3.10, updated pyproject and dependencies, and also removed some blocks of code with special handling for enum.StrEnum Closes #2458 Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | db2cd9e8f3 | feat: support filters in file search (#2472) # What does this PR do? Move to use vector_stores.search for file search tool in Responses, which supports filters. closes #2435 ## Test Plan Added e2e test with fitlers. myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search and filters' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct | ||
|  | 15f630e5da | feat: support pagination in inference/responses stores (#2397) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 23s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 5s Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 7s Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 27s Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 19s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 44s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 46s Test External Providers / test-external-providers (venv) (push) Failing after 41s Unit Tests / unit-tests (3.10) (push) Failing after 52s Unit Tests / unit-tests (3.12) (push) Failing after 18s Unit Tests / unit-tests (3.11) (push) Failing after 20s Unit Tests / unit-tests (3.13) (push) Failing after 16s Pre-commit / pre-commit (push) Successful in 2m0s # What does this PR do? ## Test Plan added unit tests | ||
|  | 985d0b156c | feat: Add suffixto openai_completions  (#2449)
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Unit Tests / unit-tests (3.10) (push) Failing after 19s Unit Tests / unit-tests (3.11) (push) Failing after 20s Unit Tests / unit-tests (3.12) (push) Failing after 18s Unit Tests / unit-tests (3.13) (push) Failing after 16s Update ReadTheDocs / update-readthedocs (push) Failing after 8s Pre-commit / pre-commit (push) Successful in 58s For code completion apps need "fill in the middle" capabilities. Added option of `suffix` to `openai_completion` to enable this. Updated ollama provider to showcase the same. ### Test Plan ``` pytest -sv --stack-config="inference=ollama" tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix ``` ### OpenAI Sample script ``` from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/openai/v1") response = client.completions.create( model="qwen2.5-coder:1.5b", prompt="The capital of ", suffix="is Paris.", max_tokens=10, ) print(response.choices[0].text) ``` ### Output ``` France is ____. To answer this question, we ``` | ||
|  | 2e8054bede | feat: Implement hybrid search in SQLite-vec (#2312) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s Test Llama Stack Build / generate-matrix (push) Successful in 37s Test Llama Stack Build / build-single-provider (push) Failing after 37s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s Test External Providers / test-external-providers (venv) (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Unit Tests / unit-tests (3.11) (push) Failing after 6s Unit Tests / unit-tests (3.12) (push) Failing after 6s Unit Tests / unit-tests (3.13) (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 7s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Unit Tests / unit-tests (3.10) (push) Failing after 17s Pre-commit / pre-commit (push) Successful in 2m0s # What does this PR do?
Add support for hybrid search mode in SQLite-vec provider, which
combines
keyword and vector search for better results. The implementation:
- Adds hybrid search mode as a new option alongside vector and keyword
search
- Implements query_hybrid method in SQLiteVecIndex that:
  - First performs keyword search to get candidate matches
  - Then applies vector similarity search on those candidates
- Updates documentation to reflect the new search mode
This change improves search quality by leveraging both semantic
similarity
and keyword matching, while maintaining backward compatibility with
existing
vector and keyword search modes.
## Test Plan
```
pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 10 items                                                                                                                                                                                                
tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED
```
---------
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 941f505eb0 | feat: File search tool for Responses API (#2426) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 554ada57b0 | chore: Add OpenAI compatibility for Ollama embeddings (#2440) # What does this PR do? This PR adds OpenAI compatibility for Ollama embeddings. Closes https://github.com/meta-llama/llama-stack/issues/2428 Summary of changes: - `llama_stack/providers/remote/inference/ollama/ollama.py` - Implements the OpenAI embeddings endpoint for Ollama, replacing the NotImplementedError with a full function that validates the model, prepares parameters, calls the client, encodes embedding data (optionally in base64), and returns a correctly structured response. - Updates import statements to include the new embedding response utilities. - `llama_stack/providers/utils/inference/litellm_openai_mixin.py` - Refactors the embedding data encoding logic to use a new shared utility (`b64_encode_openai_embeddings_response`) instead of inline base64 encoding and packing logic. - Cleans up imports accordingly. - `llama_stack/providers/utils/inference/openai_compat.py` - Adds `b64_encode_openai_embeddings_response` to handle encoding OpenAI embedding outputs (including base64 support) in a reusable way. - Adds `prepare_openai_embeddings_params` utility for standardizing embedding parameter preparation. - Updates imports to include the new embedding data class. - `tests/integration/inference/test_openai_embeddings.py` - Removes `"remote::ollama"` from the list of providers that skip OpenAI embeddings tests, since support is now implemented. ## Note There was one minor issue, which required me to override the `OpenAIEmbeddingsResponse.model` name with `self._get_model(model).identifier` name, which is very unsatisfying. ## Test Plan Unit Tests and integration tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | fef670b024 | feat: update openai tests to work with both clients (#2442) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 18s Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 20s Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 15s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Unit Tests / unit-tests (3.11) (push) Failing after 9s Unit Tests / unit-tests (3.13) (push) Failing after 6s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 1m45s Update ReadTheDocs / update-readthedocs (push) Failing after 1m46s Unit Tests / unit-tests (3.12) (push) Failing after 2m1s Unit Tests / unit-tests (3.10) (push) Failing after 2m3s Pre-commit / pre-commit (push) Successful in 3m11s https://github.com/meta-llama/llama-stack-client-python/pull/238 updated llama-stack-client to also support Open AI endpoints for embeddings, files, vector-stores. This updates the test to test all configs -- openai sdk, llama stack sdk and library-as-client. | ||
|  | 0bc1747ed8 | feat: update search for vector_stores (#2441) Updated the `search` functionality return response to match openai. ## Test Plan ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` | ||
|  | de37a04c3e | fix: set appropriate defaults for params (#2434) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Test External Providers / test-external-providers (venv) (push) Failing after 20s Update ReadTheDocs / update-readthedocs (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 20s Unit Tests / unit-tests (3.11) (push) Failing after 1m39s Unit Tests / unit-tests (3.13) (push) Failing after 1m37s Unit Tests / unit-tests (3.10) (push) Failing after 1m41s Pre-commit / pre-commit (push) Failing after 3h4m8s Setting defaults to be `| None` else they get marked as required params in open-api spec. |