mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-23 08:33:09 +00:00 
			
		
		
		
	
	
		
			27 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | 4161102100 | chore!: add double routes for v1/openai/v1 (#3636) So that users get a warning in 0.3.0 and we remove them in 0.4.0. Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | cc64093ae4 | feat(api): Add Vector Store File batches api stub (#3615) 
		
			Some checks failed
		
		
	 SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 1s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s API Conformance Tests / check-schema-compatibility (push) Successful in 7s Python Package Build Test / build (3.13) (push) Failing after 2s Vector IO Integration Tests / test-matrix (push) Failing after 4s Test External API and Providers / test-external (venv) (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 4s Unit Tests / unit-tests (3.13) (push) Failing after 4s UI Tests / ui-tests (22) (push) Successful in 34s Pre-commit / pre-commit (push) Successful in 1m14s # What does this PR do? Adding api stubs for vector store file batches apis https://github.com/llamastack/llama-stack/issues/3533 API Ref: https://platform.openai.com/docs/api-reference/vector-stores-file-batches ## Test Plan CI | ||
|  | 56b625d18a | feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) (#3605) | ||
|  | 5e7fed8bbb | feat(openai_movement): Change URL structures to kill /openai/v1  (part 1) (#3587) 
		
			Some checks failed
		
		
	 SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 1s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 2s API Conformance Tests / check-schema-compatibility (push) Successful in 6s Vector IO Integration Tests / test-matrix (push) Failing after 4s Pre-commit / pre-commit (push) Successful in 1m19s Test External API and Providers / test-external (venv) (push) Failing after 3s Unit Tests / unit-tests (3.12) (push) Failing after 3s Unit Tests / unit-tests (3.13) (push) Failing after 4s UI Tests / ui-tests (22) (push) Successful in 38s The `/v1/openai/v1` prefix is annoying and now unnecessary given our clearer focus on how to think about the API surface. Let's kill it for the 0.3.0 update. To make client-side changes feasible, we will do this in two parts. This part adds a new route (sans `/openai/v1`) so the existing client continues to work since the server supports both. The next PR will be client-side (Stainless) changes which I will be making shortly. The final PR will remove the `/openai/v1` routes. Note that all these changes will happen rapidly within this release cycle. The entire set _will be backwards incompatible_. | ||
|  | c88c4ff2c6 | feat: introduce API leveling, post_training, eval to v1alpha (#3449) # What does this PR do? Rather than have a single `LLAMA_STACK_VERSION`, we need to have a `_V1`, `_V1ALPHA`, and `_V1BETA` constant. This also necessitated addition of `level` to the `WebMethod` so that routing can be handeled properly. For backwards compat, the `v1` routes are being kept around and marked as `deprecated`. When used, the server will log a deprecation warning. Deprecation log: <img width="1224" height="134" alt="Screenshot 2025-09-25 at 2 43 36 PM" src="https://github.com/user-attachments/assets/0cc7c245-dafc-48f0-be99-269fb9a686f9" /> move: 1. post_training to `v1alpha` as it is under heavy development and not near its final state 2. eval: job scheduling is not implemented. Relies heavily on the datasetio API which is under development missing implementations of specific routes indicating the structure of those routes might change. Additionally eval depends on the `inference` API which is going to be deprecated, eval will likely need a major API surface change to conform to using completions properly implements leveling in #3317 note: integration tests will fail until the SDK is regenerated with v1alpha/inference as opposed to v1/inference ## Test Plan existing tests should pass with newly generated schema. Conformance will also pass as these routes are not the ones we currently test for stability Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | 33cca26154 | chore: Enabling Integration tests for Weaviate (#2882) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | cb7354a9ce | docs: Add detailed docstrings to API models and update OpenAPI spec (#2889) This PR focuses on improving the developer experience by adding comprehensive docstrings to the API data models across the Llama Stack. These docstrings provide detailed explanations for each model and its fields, making the API easier to understand and use. **Key changes:** - **Added Docstrings:** Added reST formatted docstrings to Pydantic models in the `llama_stack/apis/` directory. This includes models for: - Agents (`agents.py`) - Benchmarks (`benchmarks.py`) - Datasets (`datasets.py`) - Inference (`inference.py`) - And many other API modules. - **OpenAPI Spec Update:** Regenerated the OpenAPI specification (`docs/_static/llama-stack-spec.yaml` and `docs/_static/llama-stack-spec.html`) to include the new docstrings. This will be reflected in the API documentation, providing richer information to users. **Impact:** - Developers using the Llama Stack API will have a better understanding of the data structures. - The auto-generated API documentation is now more informative. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | 20c3197952 | chore: Making name optional in openai_create_vector_store (#2858) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 31b088978a | fix: Fix /vector-stores/createAPI when vector store with duplicatename(#2617)# What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | ac5fd57387 | chore: remove nested imports (#2515) # What does this PR do? * Given that our API packages use "import *" in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import *` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 82f13fe83e | feat: Add ChunkMetadata to Chunk (#2497) # What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.
More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.
```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.
<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 
## Test Plan
Added unit tests
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | cfee63bd0d | feat: Add search_mode support to OpenAI vector store API (#2500) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s Test Llama Stack Build / build-single-provider (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 8s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s Test Llama Stack Build / build (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 44s Test External Providers / test-external-providers (venv) (push) Failing after 47s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s Pre-commit / pre-commit (push) Successful in 2m12s # What does this PR do? Add search_mode parameter (vector/keyword/hybrid) to openai_search_vector_store method. Fixes OpenAPI code generation by using str instead of Literal type. Closes: #2459 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 6832e8a658 | feat: remove score_threshold constraint (#2479) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 26s Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 28s Python Package Build Test / build (3.11) (push) Failing after 3s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s Python Package Build Test / build (3.13) (push) Failing after 4s Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 28s Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 25s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 10s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 23s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 30s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s Unit Tests / unit-tests (3.12) (push) Failing after 11s Unit Tests / unit-tests (3.13) (push) Failing after 11s Unit Tests / unit-tests (3.11) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 48s Test External Providers / test-external-providers (venv) (push) Failing after 1m5s Pre-commit / pre-commit (push) Successful in 2m17s # What does this PR do?
See inline comment.
fixes test
_
test_openai_vector_store_search_with_high_score_filter[llama_stack_client-meta-llama/Llama-3.3-70B-Instruct-meta-llama/Llama-4-Scout-17B-16E-Instruct-all-MiniLM-L6-v2-None-None]
_
llama-stack/llama_stack/distribution/library_client.py:98: in
convert_to_pydantic
    return TypeAdapter(annotation).validate_python(value)
.venv/lib/python3.10/site-packages/pydantic/type_adapter.py:421: in
validate_python
    return self.validator.validate_python(
E pydantic_core._pydantic_core.ValidationError: 1 validation error for
nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal
The above exception was the direct cause of the following exception:
llama-stack/tests/integration/vector_io/test_openai_vector_stores.py:376:
in test_openai_vector_store_search_with_high_score_filter
    search_response = compat_client.vector_stores.search(
.venv/lib/python3.10/site-packages/llama_stack_client/resources/vector_stores/vector_stores.py:356:
in search
    return self._post(
.venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1232:
in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream,
stream_cls=stream_cls))
llama-stack/llama_stack/distribution/library_client.py:177: in request
result = loop.run_until_complete(self.async_client.request(*args,
**kwargs))
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/asyncio/base_events.py:649:
in run_until_complete
    return future.result()
llama-stack/llama_stack/distribution/library_client.py:292: in request
    response = await self._call_non_streaming(
llama-stack/llama_stack/distribution/library_client.py:313: in
_call_non_streaming
    body = self._convert_body(path, options.method, body)
llama-stack/llama_stack/distribution/library_client.py:425: in
_convert_body
converted_body[param_name] = convert_to_pydantic(param.annotation,
value)
llama-stack/llama_stack/distribution/library_client.py:112: in
convert_to_pydantic
raise ValueError(f"Failed to convert parameter {value} into
{annotation}: {e}") from e
E ValueError: Failed to convert parameter {'score_threshold':
1.3458905661753127} into
llama_stack.apis.vector_io.vector_io.SearchRankingOptions | None: 1
validation error for nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal
## Test Plan | ||
|  | f394c7f2d9 | feat: Add missing Vector Store Files API surface (#2468) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s Python Package Build Test / build (3.11) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Unit Tests / unit-tests (3.11) (push) Failing after 7s Update ReadTheDocs / update-readthedocs (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s Test External Providers / test-external-providers (venv) (push) Failing after 43s Unit Tests / unit-tests (3.13) (push) Failing after 52s Pre-commit / pre-commit (push) Successful in 2m4s # What does this PR do? This adds the ability to list, retrieve, update, and delete Vector Store Files. It implements these new APIs for the faiss and sqlite-vec providers, since those are the two that also have the rest of the vector store files implementation. Closes #2445 ## Test Plan ### test_openai_vector_stores Integration Tests There are a number of new integration tests added, which I ran for each provider as outlined below. faiss (from ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` sqlite-vec (from starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` ### file_search verification tests I also ensured the file_search verification tests continue to work, both for faiss and sqlite-vec. faiss (ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` sqlite-vec (starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | db2cd9e8f3 | feat: support filters in file search (#2472) # What does this PR do? Move to use vector_stores.search for file search tool in Responses, which supports filters. closes #2435 ## Test Plan Added e2e test with fitlers. myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search and filters' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct | ||
|  | 941f505eb0 | feat: File search tool for Responses API (#2426) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 0bc1747ed8 | feat: update search for vector_stores (#2441) Updated the `search` functionality return response to match openai. ## Test Plan ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` | ||
|  | de37a04c3e | fix: set appropriate defaults for params (#2434) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Test External Providers / test-external-providers (venv) (push) Failing after 20s Update ReadTheDocs / update-readthedocs (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 20s Unit Tests / unit-tests (3.11) (push) Failing after 1m39s Unit Tests / unit-tests (3.13) (push) Failing after 1m37s Unit Tests / unit-tests (3.10) (push) Failing after 1m41s Pre-commit / pre-commit (push) Failing after 3h4m8s Setting defaults to be `| None` else they get marked as required params in open-api spec. | ||
|  | d55100d9b7 | feat: OpenAIVectorIOMixin for vector_stores common logic (#2427) Extracts common OpenAI vector-store code into its own mixin so that all providers can share the same core logic. This also makes it easy for Llama Stack to support both vector-stores and Llama Stack APIs in the interim so that both share the same underlying vector-dbs. Each provider contains storage specific logic to `create / edit / delete / list` vector dbs while the plumbing logic is standardized in the common code. Ensured that this works well with both faiss and sqllite-vec. ### Test Plan ``` llama stack run starter pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` | ||
|  | 5ac43268e8 | feat: Add OpenAI compat /v1/vector_store APIs (#2423) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 11s Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 41s Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 13s Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 46s Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 11s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 16s Test External Providers / test-external-providers (venv) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Update ReadTheDocs / update-readthedocs (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 11s Unit Tests / unit-tests (3.12) (push) Failing after 1m31s Unit Tests / unit-tests (3.11) (push) Failing after 1m33s Unit Tests / unit-tests (3.10) (push) Failing after 1m35s Pre-commit / pre-commit (push) Failing after 3h13m41s Adding OpenAI compat `/v1/vector-store` apis. This PR implements the `faiss` provider with followup PRs coming up for other providers. Added routes to create, update, delete, list vector stores. Also added route to search a vector store Inserting into vector stores is missing and will be a follow up diff. ### Test Plan - Added new integration test for testing the faiss provider ``` pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` | ||
|  | f328436831 | feat: Enable ingestion of precomputed embeddings (#2317) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Integration Tests / test-matrix (http, agents) (push) Failing after 10s Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Integration Tests / test-matrix (http, inference) (push) Failing after 10s Integration Tests / test-matrix (library, agents) (push) Failing after 9s Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Integration Tests / test-matrix (http, providers) (push) Failing after 9s Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, inference) (push) Failing after 9s Test External Providers / test-external-providers (venv) (push) Failing after 6s Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Integration Tests / test-matrix (library, providers) (push) Failing after 8s Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Unit Tests / unit-tests (3.11) (push) Failing after 7s Unit Tests / unit-tests (3.10) (push) Failing after 9s Unit Tests / unit-tests (3.13) (push) Failing after 7s Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Pre-commit / pre-commit (push) Successful in 1m15s | ||
|  | bb5fca9521 | chore: more API validators (#2165) # What does this PR do? We added: * make sure docstrings are present with 'params' and 'returns' * fail if someone sets 'returns: None' * fix the failing APIs Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 9e6561a1ec | chore: enable pyupgrade fixes (#1806) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> | ||
|  | 515c16e352 | chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711) # What does this PR do?
Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}.
This also makes API accept a tool call result without any content (like
RAG tool already may produce).
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> | ||
|  | 314ee09ae3 | chore: move all Llama Stack types from llama-models to llama-stack (#1098) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ``` | ||
|  | a63a43c646 | [memory refactor][6/n] Update naming and routes (#839) Making a few small naming changes as per feedback: - RAGToolRuntime methods are called `insert` and `query` to keep them more general - The tool names are changed to non-namespaced forms `insert_into_memory` and `query_from_memory` - The REST endpoints are more REST-ful | ||
|  | 3ae8585b65 | [memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This is the first part: - delete other kinds of memory banks (keyvalue, keyword, graph) for now; we will introduce a keyvalue store API as part of this design but not use it in the RAG tool yet. - renaming of the APIs |