llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 04:04:14 +00:00

Author	SHA1	Message	Date
Mustafa Elbehery	e7444c1d9b	chore: remove irrelevant comments Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-09-10 15:40:34 +02:00
Mustafa Elbehery	142bd248e7	feat(client): migrate MilvusClient to AsyncMilvusClient The commit makes the follwing changes. - Import statements updated: MilvusClient → AsyncMilvusClient - Removed asyncio.to_thread() wrappers: All Milvus operations now use native async/await - Test compatibility: Mock objects and fixtures updated to work with AsyncMilvusClient Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-09-10 15:40:34 +02:00
Matthew Farrellee	0e27016cf2	chore: update the vertexai inference impl to use openai-python for openai-compat functions (#3377 ) # What does this PR do? update VertexAI inference provider to use openai-python for openai-compat functions ## Test Plan ``` $ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py ... ``` i don't have an account to test this. `get_api_key` may also need to be updated per https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-09-10 15:39:29 +02:00
Akram Ben Aissi	c836fa29e3	fix: pre-commit issues: non executable shebang file and removal of @pytest.mark.asyncio decorator (#3397 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Fix pre-commit issues: non executable shebang file, @pytest.mark.asyncio decorator <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-10 15:27:35 +02:00
Akram Ben Aissi	1671431310	fix: Add missing files_api parameter to MemoryToolRuntimeImpl test (#3394 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The test_query_adds_vector_db_id_to_chunk_metadata test was failing because MemoryToolRuntimeImpl.__init__() now requires a files_api parameter. Fixes failing unit tests for Python 3.12 and 3.13. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-10 06:55:57 -04:00
Cesare Pompeiano	1c23aeb937	feat: Add vector_db_id to chunk metadata (#3304 ) # What does this PR do? When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the `vector_db_id` into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs. Relevant code: ```python for vector_db_id, result in zip(vector_db_ids, results): for chunk, score in zip(result.chunks, result.scores): if not hasattr(chunk, "metadata") or chunk.metadata is None: chunk.metadata = {} chunk.metadata["vector_db_id"] = vector_db_id chunks.append(chunk) scores.append(score) ``` ## Test Plan * Ran Llama Stack in debug mode. * Verified that `vector_db_id` was added to each chunk’s metadata. * Confirmed that the metadata was printed in the console when using the RAG tool. --------- Co-authored-by: are-ces <cpompeia@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-09-10 11:19:21 +02:00
Ashwin Bharambe	a8aa815b6a	feat(tests): migrate to global "setups" system for test configuration (#3390 ) This PR refactors the integration test system to use global "setups" which provides better separation of concerns: suites = what to test, setups = how to configure. NOTE: if you naming suggestions, please provide feedback Changes: - New `tests/integration/setups.py` with global, reusable configurations (ollama, vllm, gpt, claude) - Modified `scripts/integration-tests.sh` options to match with the underlying pytest options - Updated documentation to reflect the new global setup system The main benefit is that setups can be reused across multiple suites (e.g., use "gpt" with any suite) even though sometimes they could specifically tailored for a suite (vision <> ollama-vision). It is now easier to add new configurations without modifying existing suites. Usage examples: - `pytest tests/integration --suite=responses --setup=gpt` - `pytest tests/integration --suite=vision` # auto-selects "ollama-vision" setup - `pytest tests/integration --suite=base --setup=vllm`	2025-09-09 15:50:56 -07:00
Francisco Arceo	ad6ea7fb91	feat: Adding OpenAI Prompts API (#3319 ) # What does this PR do? This PR adds support for OpenAI Prompts API. Note, OpenAI does not explicitly expose the Prompts API but instead makes it available in the Responses API and in the [Prompts Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt). I have added the following APIs: - CREATE - GET - LIST - UPDATE - Set Default Version The Set Default Version API is made available only in the Prompts Dashboard and configures which prompt version is returned in the GET (the latest version is the default). Overall, the expected functionality in Responses will look like this: ```python from openai import OpenAI client = OpenAI() response = client.responses.create( prompt={ "id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e", "version": "2", "variables": { "city": "San Francisco", "age": 30, } } ) ``` ### Resolves https://github.com/llamastack/llama-stack/issues/3276 ## Test Plan Unit tests added. Integration tests can be added after client generation. ## Next Steps 1. Update Responses API to support Prompt API 2. I'll enhance the UI to implement the Prompt Dashboard. 3. Add cache for lower latency --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-09-08 11:05:13 -04:00
Akram Ben Aissi	072dca0609	feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559 ) # What does this PR do? Add Kubernetes authentication provider support - Add KubernetesAuthProvider class for token validation using Kubernetes SelfSubjectReview API - Add KubernetesAuthProviderConfig with configurable API server URL, TLS settings, and claims mapping - Implement authentication via POST requests to /apis/authentication.k8s.io/v1/selfsubjectreviews endpoint - Add support for parsing Kubernetes SelfSubjectReview response format to extract user information - Add KUBERNETES provider type to AuthProviderType enum - Update create_auth_provider factory function to handle 'kubernetes' provider type - Add comprehensive unit tests for KubernetesAuthProvider functionality - Add documentation with configuration examples and usage instructions The provider validates tokens by sending SelfSubjectReview requests to the Kubernetes API server and extracts user information from the userInfo structure in the response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> What This Verifies: Authentication header validation Token validation with Kubernetes SelfSubjectReview and kubernetes server API endpoint Error handling for invalid tokens and HTTP errors Request payload structure and headers ``` python -m pytest tests/unit/server/test_auth.py -k "kubernetes" -v ``` Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>	2025-09-08 11:25:10 +02:00
Matthew Farrellee	6a35bd7bb6	chore: update the anthropic inference impl to use openai-python for openai-compat functions (#3366 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m13s Details # What does this PR do? update the Anthropic inference provider to use openai-python for the openai-compat endpoints ## Test Plan ci Co-authored-by: raghotham <rsm@meta.com>	2025-09-07 14:00:42 -07:00
Matthew Farrellee	78cab5331a	chore(groq test): skip completions tests for groq, api is not supported server-side (#3347 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m16s Details # What does this PR do? skip /v1/completions tests on groq, endpoint is not supported Co-authored-by: raghotham <rsm@meta.com>	2025-09-06 16:21:55 -07:00
Matthew Farrellee	d23607483f	chore: update the groq inference impl to use openai-python for openai-compat functions (#3348 ) # What does this PR do? update Groq inference provider to use OpenAIMixin for openai-compat endpoints changes on api.groq.com - - json_schema is now supported for specific models, see https://console.groq.com/docs/structured-outputs#supported-models - response_format with streaming is now supported for models that support response_format - groq no longer returns a 400 error if tools are provided and tool_choice is not "required" ## Test Plan ``` $ GROQ_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::groq --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model groq/llama-3.3-70b-versatile tests/integration/inference/test_openai_completion.py -k 'not store' ... SKIPPED [3] tests/integration/inference/test_openai_completion.py:44: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support OpenAI completions. SKIPPED [3] tests/integration/inference/test_openai_completion.py:94: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:73: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py💯 Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support chat completion calls with base64 encoded files. ======================= 8 passed, 11 skipped, 8 deselected, 2 warnings in 5.13s ======================== ``` --------- Co-authored-by: raghotham <rsm@meta.com>	2025-09-06 15:36:27 -07:00
Matthew Farrellee	9252d9fc01	chore(groq test): skip with_n tests for groq, it is not supported server-side (#3346 ) # What does this PR do? skip the with_n test for groq, because it isn't supported by the provider's service see https://console.groq.com/docs/openai#currently-unsupported-openai-features Co-authored-by: raghotham <rsm@meta.com>	2025-09-06 12:35:30 -07:00
Matthew Farrellee	4c28544c04	chore(gemini, tests): add skips for n and completions, gemini api does not support them (#3350 ) # What does this PR do? the gemini api endpoints do not support the n param or completions ## Test Plan ci	2025-09-06 12:22:44 -07:00
Francisco Arceo	7cd1c2c238	feat: Updating Rag Tool to use Files API and Vector Stores API (#3344 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (push) Failing after 23s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Pre-commit / pre-commit (push) Successful in 1m32s Details	2025-09-06 07:26:34 -06:00
Ashwin Bharambe	47b640370e	feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m15s Details Our integration tests need to be 'grouped' because each group often needs a specific set of models it works with. We separated vision tests due to this, and we have a separate set of tests which test "Responses" API. This PR makes this system a bit more official so it is very easy to target these groups and apply all testing infrastructure towards all the groups (for example, record-replay) uniformly. There are three suites declared: - base - vision - responses Note that our CI currently runs the "base" and "vision" suites. You can use the `--suite` option when running pytest (or any of the testing scripts or workflows.) For example: ``` OLLAMA_URL=http://localhost:11434 \ pytest -s -v tests/integration/ --stack-config starter --suite vision ```	2025-09-05 13:58:49 -07:00
Matthew Farrellee	0c2757a05b	chore(sambanova test): skip with_n tests for sambanova, it is not implemented server-side (#3342 ) # What does this PR do? skip a test that cannot pass for sambanova see https://docs-legacy.sambanova.ai/sambastudio/latest/open-ai-api.html\#_example_requests_using_openai_client ## Test Plan ci	2025-09-05 12:00:09 -07:00
Matthew Farrellee	df1526991f	feat(batches, completions): add /v1/completions support to /v1/batches (#3309 ) # What does this PR do? add support for /v1/completions to the /v1/batches api ## Test Plan ci	2025-09-05 11:59:57 -07:00
Francisco Arceo	e2fe39aee1	feat!: Migrate Vector DB IDs to Vector Store IDs (breaking change) (#3253 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 35s Details Pre-commit / pre-commit (push) Successful in 1m15s Details # What does this PR do? This change migrates the VectorDB id generation to Vector Stores. This is a breaking change for _some users_ that may have application code using the `vector_db_id` parameter in the request of the VectorDB protocol instead of the `VectorDB.identifier` in the response. By default we will now create a Vector Store every time we register a VectorDB. The caveat with this approach is that this maps the `vector_db_id` → `vector_store.name`. This is a reasonable tradeoff to transition users towards OpenAI Vector Stores. As an added benefit, registering VectorDBs will result in them appearing in the VectorStores admin UI. ### Why? This PR makes the `POST` API call to `/v1/vector-dbs` swap the `vector_db_id` parameter in the request body into the VectorStore's name field and sets the `vector_db_id` to the generated vector store id (e.g., `vs_038247dd-4bbb-4dbb-a6be-d5ecfd46cfdb`). That means that users would have to do something like follows in their application code: ```python res = client.vector_dbs.register( vector_db_id='my-vector-db-id', embedding_model='ollama/all-minilm:l6-v2', embedding_dimension=384, ) vector_db_id = res.identifier ``` And then the rest of their code would behave, including `VectorIO`'s insert protocol using `vector_db_id` in the request. An alternative implementation would be to just delete the `vector_db_id` parameter in `VectorDB` but the end result would still require users having to write `vector_db_id = res.identifier` since `VectorStores.create()` generates the ID for you. So this approach felt the easiest way to migrate users towards VectorStores (subsequent PRs will be added to trigger `files.create()` and `vector_stores.files.create()`). ## Test Plan Unit tests and integration tests have been added. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-09-05 15:40:34 +02:00
Sumanth Kamenani	0b00c68d59	fix: use lambda pattern for bedrock config env vars (#3307 ) # What does this PR do? Improved bedrock provider config to read from environment variables like AWS_ACCESS_KEY_ID. Updated all fields to use default_factory with lambda patterns like the nvidia provider does. Now the environment variables work as documented. Closes #3305 ## Test Plan Ran the new bedrock config tests: ```bash python -m pytest tests/unit/providers/inference/bedrock/test_config.py -v Verified existing provider tests still work: python -m pytest tests/unit/providers/test_configs.py -v	2025-09-05 10:45:11 +02:00
ehhuang	3a7ac4227d	chore: unbreak inference store test (#3340 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 1m21s Details Pre-commit / pre-commit (push) Successful in 2m27s Details # What does this PR do? The inference store writes were moved to asyncio.create_task and not await anymore ## Test Plan ❯ OLLAMA_URL=http://localhost:11434 LLAMA_STACK_CONFIG=server:starter uv run --with pytest-repeat pytest tests/integration/inference --text-model="ollama/llama3.2:3b-instruct-fp16" -vvs -k "test_inference_store_tool_calls and 3b-instruct-fp16-True" --count=10 Uninstalled 2 packages in 102ms Installed 2 packages in 138ms INFO 2025-09-04 14:10:17,775 tests.integration.conftest:66 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS ========================================================================================================== test session starts =========================================================================================================== platform darwin -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0 -- /Users/erichuang/.cache/uv/builds-v0/.tmpSGMlgt/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.3', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'repeat': '0.9.4', 'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-git configfile: pyproject.toml plugins: repeat-0.9.4, anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 970 items / 950 deselected / 20 selected tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-1-10] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Waiting for server at http://localhost:8321... (15.2s elapsed) Waiting for server at http://localhost:8321... (15.7s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 20.583s PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-2-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-3-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-4-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-5-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-6-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-7-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-8-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-9-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-10-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-1-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-2-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-3-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-4-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-5-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-6-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-7-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-8-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-9-10] PASSED tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-10-10] PASSEDTerminating llama stack server process... Terminating process 53307 and its group... Server process and children terminated gracefully	2025-09-04 15:13:31 -07:00
Sumanth Kamenani	55a8c5f439	fix: show descriptive MCP server connection errors instead of generic 500s (#3256 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 1m20s Details Pre-commit / pre-commit (push) Successful in 2m37s Details What does this PR do? Fixes error handling when MCP server connections fail. Instead of returning generic 500 errors, now provides descriptive error messages with proper HTTP status codes. Closes #3107 Test Plan Before fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"detail": "Internal server error: An unexpected error occurred."} (500) After fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"error": {"detail": "Failed to connect to MCP server at http://localhost:9999/sse: Connection refused"}} (502) Tests: - Added unit test for ConnectionError → 502 translation - Manually tested with unreachable MCP servers (connection refused)	2025-09-04 13:25:02 -07:00
Ashwin Bharambe	02f6e0f531	fix(tests): set inference mode to be replay by default (#3326 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m19s Details Pre-commit / pre-commit (push) Successful in 2m30s Details `construct_stack()` relies on the environment variable to know when to setup the patching infrastructure. `c3d3a0b833/llama_stack/core/stack.py (L314)`	2025-09-03 15:57:17 -07:00
Ashwin Bharambe	c3d3a0b833	feat(tests): auto-merge all model list responses and unify recordings (#3320 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details UI Tests / ui-tests (22) (push) Successful in 1m7s Details Pre-commit / pre-commit (push) Successful in 2m34s Details One needed to specify record-replay related environment variables for running integration tests. We could not use defaults because integration tests could be run against Ollama instances which could be running different models. For example, text vs vision tests needed separate instances of Ollama because a single instance typically cannot serve both of these models if you assume the standard CI worker configuration on Github. As a result, `client.list()` as returned by the Ollama client would be different between these runs and we'd end up overwriting responses. This PR "solves" it by adding a small amount of complexity -- we store model list responses specially, keyed by the hashes of the models they return. At replay time, we merge all of them and pretend that we have the union of all models available. ## Test Plan Re-recorded all the tests using `scripts/integration-tests.sh --inference-mode record`, including the vision tests.	2025-09-03 11:33:03 -07:00
Varsha	c59d8c5047	fix: Fix mock vector DB schema in Qdrant tests (#3295 ) # What does this PR do? Fix: https://github.com/llamastack/llama-stack/issues/3293 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` ===================================================== test session starts ===================================================== platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.7-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 3 items tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_adapter_returns_expected_chunks[2-2] PASSED [ 33%] tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_adapter_returns_expected_chunks[100-60] PASSED [ 66%] tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_register_and_unregister_vector_db PASSED [100%] ``` Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-09-03 09:59:16 +02:00
Matthew Farrellee	478b4ff1e6	chore(migrate apis): move VectorDBWithIndex from embeddings to openai_embeddings (#3294 ) # What does this PR do? migrates VectorDBWithIndex to use openai_embeddings part of #2365 ## Test Plan existing unit tests	2025-08-31 14:48:35 -07:00
Matthew Farrellee	3370d8e557	feat(files, s3, expiration): add expires_after support to S3 files provider (#3283 )	2025-08-29 16:17:24 -07:00
IAN MILLER	3130ca0a78	feat: implement keyword, vector and hybrid search inside vector stores for PGVector provider (#3064 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to implement `openai/v1/vector_stores/{vector_store_id}/search` for PGVector provider. It involves implementing vector similarity search, keyword search and hybrid search for `PGVectorIndex`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3006 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run unit tests: ` ./scripts/unit-tests.sh ` Run integration tests for openai vector stores: 1. Export env vars: ``` export ENABLE_PGVECTOR=true export PGVECTOR_HOST=localhost export PGVECTOR_PORT=5432 export PGVECTOR_DB=llamastack export PGVECTOR_USER=llamastack export PGVECTOR_PASSWORD=llamastack ``` 2. Create DB: ``` psql -h localhost -U postgres -c "CREATE ROLE llamastack LOGIN PASSWORD 'llamastack';" psql -h localhost -U postgres -c "CREATE DATABASE llamastack OWNER llamastack;" psql -h localhost -U llamastack -d llamastack -c "CREATE EXTENSION IF NOT EXISTS vector;" ``` 3. Install sentence-transformers: ` uv pip install sentence-transformers ` 4. Run: ``` uv run --group test pytest -s -v --stack-config="inference=inline::sentence-transformers,vector_io=remote::pgvector" --embedding-model sentence-transformers/all-MiniLM-L6-v2 tests/integration/vector_io/test_openai_vector_stores.py ``` Inspect PGVector vector stores (optional): ``` psql llamastack psql (14.18 (Homebrew)) Type "help" for help. llamastack=# \z Access privileges Schema \| Name \| Type \| Access privileges \| Column privileges \| Policies --------+------------------------------------------------------+-------+-------------------+-------------------+---------- public \| llamastack_kvstore \| table \| \| \| public \| metadata_store \| table \| \| \| public \| vector_store_pgvector_main \| table \| \| \| public \| vector_store_vs_1dfbc061_1f4d_4497_9165_ecba2622ba3a \| table \| \| \| public \| vector_store_vs_2085a9fb_1822_4e42_a277_c6a685843fa7 \| table \| \| \| public \| vector_store_vs_2b3dae46_38be_462a_afd6_37ee5fe661b1 \| table \| \| \| public \| vector_store_vs_2f438de6_f606_4561_9d50_ef9160eb9060 \| table \| \| \| public \| vector_store_vs_3eeca564_2580_4c68_bfea_83dc57e31214 \| table \| \| \| public \| vector_store_vs_53942163_05f3_40e0_83c0_0997c64613da \| table \| \| \| public \| vector_store_vs_545bac75_8950_4ff1_b084_e221192d4709 \| table \| \| \| public \| vector_store_vs_688a37d8_35b2_4298_a035_bfedf5b21f86 \| table \| \| \| public \| vector_store_vs_70624d9a_f6ac_4c42_b8ab_0649473c6600 \| table \| \| \| public \| vector_store_vs_73fc1dd2_e942_4972_afb1_1e177b591ac2 \| table \| \| \| public \| vector_store_vs_9d464949_d51f_49db_9f87_e033b8b84ac9 \| table \| \| \| public \| vector_store_vs_a1e4d724_5162_4d6d_a6c0_bdafaf6b76ec \| table \| \| \| public \| vector_store_vs_a328fb1b_1a21_480f_9624_ffaa60fb6672 \| table \| \| \| public \| vector_store_vs_a8981bf0_2e66_4445_a267_a8fff442db53 \| table \| \| \| public \| vector_store_vs_ccd4b6a4_1efd_4984_ad03_e7ff8eadb296 \| table \| \| \| public \| vector_store_vs_cd6420a4_a1fc_4cec_948c_1413a26281c9 \| table \| \| \| public \| vector_store_vs_cd709284_e5cf_4a88_aba5_dc76a35364bd \| table \| \| \| public \| vector_store_vs_d7a4548e_fbc1_44d7_b2ec_b664417f2a46 \| table \| \| \| public \| vector_store_vs_e7f73231_414c_4523_886c_d1174eee836e \| table \| \| \| public \| vector_store_vs_ffd53588_819f_47e8_bb9d_954af6f7833d \| table \| \| \| (23 rows) llamastack=# ``` Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-29 16:30:12 +02:00
Matthew Farrellee	e96e3c4da4	feat(s3 auth): add authorization support for s3 files provider (#3265 ) # What does this PR do? adds support for authorized users to the s3 files provider ## Test Plan existing and new unit tests	2025-08-29 16:14:00 +02:00
Matthew Farrellee	ed418653ec	chore(dev): add inequality support to sqlstore where clause (#3272 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test External API and Providers / test-external (venv) (push) Failing after 1s Details UI Tests / ui-tests (22) (push) Failing after 0s Details Unit Tests / unit-tests (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1s Details # What does this PR do? add the ability to use inequalities in the where clause of the sqlstore. this is infrastructure for files expiration. ## Test Plan unit tests	2025-08-28 14:49:36 -07:00
Omer Tuchfeld	52106d95d3	fix(env): env var replacement preserve types (#3270 ) # What does this PR do? During env var replacement, we're implicitly converting all config types to their apparent types (e.g., "true" to True, "123" to 123). This may be arguably useful for when doing an env var substitution, as those are always strings, but we should definitely avoid touching config values that have explicit types and are uninvolved in env var substitution. ## Test Plan Unit	2025-08-28 17:07:18 +02:00
Derek Higgins	7ca8233889	feat(testing): remove SQLite dependency from inference recorder (#3254 ) Recording files use a predictable naming format, making the SQLite index redundant. The binary SQLite file was causing frequent git conflicts. Simplify by calculating file paths directly from request hashes. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-26 09:17:00 -07:00
Matthew Farrellee	cffc4edf47	feat: Add optional idempotency support to batches API (#3171 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s Details Test Llama Stack Build / build-single-provider (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Failing after 5s Details Test Llama Stack Build / build (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Details Implements optional idempotency for batch creation using `idem_tok` parameter: * Core idempotency: Same token + parameters returns existing batch * Conflict detection: Same token + different parameters raises HTTP 409 ConflictError * Metadata order independence: Different key ordering doesn't affect idempotency API changes: - Add optional `idem_tok` parameter to `create_batch()` method - Enhanced API documentation with idempotency extensions Implementation: - Reference provider supports idempotent batch creation - ConflictError for proper HTTP 409 status code mapping - Comprehensive parameter validation Testing: - Unit tests: focused tests covering core scenarios with parametrized conflict detection - Integration tests: tests validating real OpenAI client behavior This enables client-side retry safety and prevents duplicate batch creation when using the same idempotency token, following REST API closes #3144	2025-08-22 15:50:40 -07:00
Charlie Doern	3b9278f254	feat: implement query_metrics (#3074 ) # What does this PR do? query_metrics currently has no implementation, meaning once a metric is emitted there is no way in llama stack to query it from the store. implement query_metrics for the meta_reference provider which follows a similar style to `query_traces`, using the trace_store to format an SQL query and execute it in this case the parameters for the query are `metric.METRIC_NAME, start_time, and end_time` and any other matchers if they are provided. this required client side changes since the client had no `query_metrics` or any associated resources, so any tests here will fail but I will provide manual execution logs for the new tests I am adding order the metrics by timestamp. Additionally add `unit` to the `MetricDataPoint` class since this adds much more context to the metric being queried. depends on https://github.com/llamastack/llama-stack-client-python/pull/260 ## Test Plan ``` import time import uuid def create_http_client(): from llama_stack_client import LlamaStackClient return LlamaStackClient(base_url="http://localhost:8321") client = create_http_client() response = client.telemetry.query_metrics(metric_name="total_tokens", start_time=0) print(response) ``` ``` ╰─ python3.12 ~/telemetry.py INFO:httpx:HTTP Request: POST http://localhost:8322/v1/telemetry/metrics/total_tokens "HTTP/1.1 200 OK" [TelemetryQueryMetricsResponse(data=None, metric='total_tokens', labels=[], values=[{'timestamp': 1753999514, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999816, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999881, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999956, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000200, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000419, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000714, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000876, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000908, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754001309, 'value': 584.0, 'unit': 'tokens'}, {'timestamp': 1754001311, 'value': 138.0, 'unit': 'tokens'}, {'timestamp': 1754001316, 'value': 349.0, 'unit': 'tokens'}, {'timestamp': 1754001318, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001320, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001341, 'value': 923.0, 'unit': 'tokens'}, {'timestamp': 1754001350, 'value': 354.0, 'unit': 'tokens'}, {'timestamp': 1754001462, 'value': 417.0, 'unit': 'tokens'}, {'timestamp': 1754001464, 'value': 158.0, 'unit': 'tokens'}, {'timestamp': 1754001475, 'value': 697.0, 'unit': 'tokens'}, {'timestamp': 1754001477, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001479, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001489, 'value': 298.0, 'unit': 'tokens'}, {'timestamp': 1754001541, 'value': 615.0, 'unit': 'tokens'}, {'timestamp': 1754001543, 'value': 119.0, 'unit': 'tokens'}, {'timestamp': 1754001548, 'value': 310.0, 'unit': 'tokens'}, {'timestamp': 1754001549, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001551, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001568, 'value': 714.0, 'unit': 'tokens'}, {'timestamp': 1754001800, 'value': 437.0, 'unit': 'tokens'}, {'timestamp': 1754001802, 'value': 200.0, 'unit': 'tokens'}, {'timestamp': 1754001806, 'value': 262.0, 'unit': 'tokens'}, {'timestamp': 1754001808, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001810, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001816, 'value': 82.0, 'unit': 'tokens'}, {'timestamp': 1754001923, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754001929, 'value': 391.0, 'unit': 'tokens'}, {'timestamp': 1754001939, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754001941, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001942, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001952, 'value': 252.0, 'unit': 'tokens'}, {'timestamp': 1754002053, 'value': 251.0, 'unit': 'tokens'}, {'timestamp': 1754002059, 'value': 375.0, 'unit': 'tokens'}, {'timestamp': 1754002062, 'value': 244.0, 'unit': 'tokens'}, {'timestamp': 1754002064, 'value': 111.0, 'unit': 'tokens'}, {'timestamp': 1754002065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002083, 'value': 719.0, 'unit': 'tokens'}, {'timestamp': 1754002302, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754002306, 'value': 218.0, 'unit': 'tokens'}, {'timestamp': 1754002308, 'value': 198.0, 'unit': 'tokens'}, {'timestamp': 1754002309, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754002311, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002324, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754003169, 'value': 499.0, 'unit': 'tokens'}, {'timestamp': 1754003171, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003173, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003185, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003448, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003453, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003589, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003609, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754003614, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 303.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 51.0, 'unit': 'tokens'}, {'timestamp': 1754003713, 'value': 426.0, 'unit': 'tokens'}, {'timestamp': 1754003714, 'value': 70.0, 'unit': 'tokens'}, {'timestamp': 1754003715, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003724, 'value': 225.0, 'unit': 'tokens'}, {'timestamp': 1754004226, 'value': 516.0, 'unit': 'tokens'}, {'timestamp': 1754004228, 'value': 127.0, 'unit': 'tokens'}, {'timestamp': 1754004232, 'value': 281.0, 'unit': 'tokens'}, {'timestamp': 1754004234, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004236, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004244, 'value': 206.0, 'unit': 'tokens'}, {'timestamp': 1754004683, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004690, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 124.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 65.0, 'unit': 'tokens'}, {'timestamp': 1754004694, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004703, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754004743, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004749, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754005566, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754006101, 'value': 159.0, 'unit': 'tokens'}, {'timestamp': 1754006105, 'value': 272.0, 'unit': 'tokens'}, {'timestamp': 1754006109, 'value': 308.0, 'unit': 'tokens'}, {'timestamp': 1754006110, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754006112, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754006130, 'value': 705.0, 'unit': 'tokens'}, {'timestamp': 1754051825, 'value': 454.0, 'unit': 'tokens'}, {'timestamp': 1754051827, 'value': 152.0, 'unit': 'tokens'}, {'timestamp': 1754051834, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754051835, 'value': 55.0, 'unit': 'tokens'}, {'timestamp': 1754051837, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754051845, 'value': 102.0, 'unit': 'tokens'}, {'timestamp': 1754099929, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754510050, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754510052, 'value': 160.0, 'unit': 'tokens'}, {'timestamp': 1754510064, 'value': 725.0, 'unit': 'tokens'}, {'timestamp': 1754510065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510067, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510083, 'value': 535.0, 'unit': 'tokens'}, {'timestamp': 1754596582, 'value': 36.0, 'unit': 'tokens'}])] ``` adding tests for each currently documented metric in llama stack using this new function. attached is also some manual testing integrations tests passing locally with replay mode and the linked client changes: <img width="1907" height="529" alt="Screenshot 2025-08-08 at 2 49 14 PM" src="https://github.com/user-attachments/assets/d482ab06-dcff-4f0c-a1f1-f870670ee9bc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-22 14:19:24 -07:00
grs	da73f1a180	fix: ensure assistant message is followed by tool call message as expected by openai (#3224 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s Details Test Llama Stack Build / generate-matrix (push) Failing after 21s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 23s Details Test Llama Stack Build / build (push) Has been skipped Details Update ReadTheDocs / update-readthedocs (push) Failing after 20s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s Details # What does this PR do? As described in #3134 a langchain example works against openai's responses impl, but not against llama stack's. This turned out to be due to the order of the inputs. The langchain example has the two function call outputs first, followed by each call result in turn. This seems to be valid as it is accepted by openai's impl. However in llama stack, these inputs are converted to chat completion inputs and the resulting order for that api is not accpeted by openai. This PR fixes the issue by ensuring that the converted chat completions inputs are in the expected order. Closes #3134 ## Test Plan Added unit and integration tests. Verified this fixes original issue as reported. --------- Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-08-22 10:42:03 -07:00
Matthew Farrellee	f520e244d9	feat: Add S3 Files Provider (#3202 ) Implements a complete S3-based file storage provider for Llama Stack with: Core Implementation: - S3FilesImpl class with full OpenAI Files API compatibility - Support for file upload, download, listing, deletion operations - Sqlite-based metadata storage for fast queries and API compliance - Configurable S3 endpoints (AWS, MinIO, LocalStack support) Key Features: - Automatic S3 bucket creation and management - Metadata persistence - Proper error handling for S3 connectivity and permissions Dependencies: - Adds boto3 for AWS S3 integration - Adds moto[s3] for testing infrastructure Testing: Unit: `./scripts/unit-tests.sh tests/unit/files tests/unit/providers/files` Integration: Start MinIO: `podman run --rm -it -p 9000:9000 minio/minio server /data` Start stack w/ S3 provider: `S3_ENDPOINT_URL=http://localhost:9000 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin S3_BUCKET_NAME=llama-stack-files uv run llama stack build --image-type venv --providers files=remote::s3 --run` Run integration tests: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files`	2025-08-22 10:38:59 -04:00
Jiayi Ni	deffaa9e4e	fix: fix the error type in embedding test case (#3197 ) # What does this PR do? Currently the embedding integration test cases fail due to a misalignment in the error type. This PR fixes the embedding integration test by fixing the error type. ## Test Plan ``` pytest -s -v tests/integration/inference/test_embedding.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-08-21 16:19:51 -07:00
Mustafa Elbehery	1790fc0f25	feat: Remove initialize() Method from LlamaStackAsLibrary (#2979 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR removes `init()` from `LlamaStackAsLibrary` Currently client.initialize() had to be invoked by user. To improve dev experience and to avoid runtime errors, this PR init LlamaStackAsLibrary implicitly upon using the client. It prevents also multiple init of the same client, while maintaining backward ccompatibility. This PR does the following - Automatic Initialization: Constructor calls initialize_impl() automatically. - Client is fully initialized after __init__ completes. - Prevents consecutive initialization after the client has been successfully initialized. - initialize() method still exists but is now a no-op. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2946 --------- Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 15:59:04 -07:00
Sumanth Kamenani	ac25e35124	feat: Add CORS configuration support for server (#3201 ) Adds flexible CORS (Cross-Origin Resource Sharing) configuration support to the FastAPI server with both local development and explicit configuration modes: - Local development mode: `cors: true` enables localhost-only access with regex pattern `https?://localhost:\d+` - Explicit configuration mode: Specific origins configuration with credential support and validation - Prevents insecure combinations (wildcards with credentials) - FastAPI CORSMiddleware integration via `model_dump()` Addresses the need for configurable CORS policies to support web frontends and cross-origin API access while maintaining security. Closes #2119 ## Test Plan 1. Ran Unit Tests. 2. Manual tests: FastAPI middleware integration with actual HTTP requests - Local development mode localhost access validation - Explicit configuration mode origins validation - Preflight OPTIONS request handling Some screenshots of manual tests. <img width="1920" height="927" alt="image" src="https://github.com/user-attachments/assets/79322338-40c7-45c9-a9ea-e3e8d8e2f849" /> <img width="1911" height="1037" alt="image" src="https://github.com/user-attachments/assets/1683524e-b0c9-48c9-a0a5-782e949cde01" /> cc: @leseb @rhuss @franciscojavierarceo	2025-08-21 14:23:27 -07:00
grs	14082b22af	fix: handle mcp tool calls in previous response correctly (#3155 ) # What does this PR do? Handles MCP tool calls in a previous response Closes #3105 ## Test Plan Made call to create response with tool call, then made second call with the first linked through previous_response_id. Did not get error. Also added unit test. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-08-20 14:12:15 -07:00
Matthew Farrellee	c2c859a6b0	chore(files tests): update files integration tests and fix inline::localfs (#3195 ) - update files=inline::localfs to raise ResourceNotFoundError instead of ValueError - only skip tests when no files provider is available - directly use openai_client and llama_stack_client where appropriate - check for correct behavior of non-existent file - xfail the isolation test, no implementation supports it test plan - ``` $ uv run ./scripts/integration-tests.sh --stack-config server:ci-tests --provider ollama --test-subdirs files ... tests/integration/files/test_files.py::test_openai_client_basic_operations PASSED [ 25%] tests/integration/files/test_files.py::test_files_authentication_isolation XFAIL [ 50%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes PASSED [ 75%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access PASSED [100%] ==================================== 3 passed, 1 xfailed in 1.03s ===================================== ``` previously - ``` $ uv run llama stack build --image-type venv --providers files=inline::localfs --run & ... $ ./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files ... tests/integration/files/test_files.py::test_openai_client_basic_operations[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 12%] tests/integration/files/test_files.py::test_files_authentication_isolation[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 25%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 37%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 50%] tests/integration/files/test_files.py::test_openai_client_basic_operations[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 62%] tests/integration/files/test_files.py::test_files_authentication_isolation[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 75%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 87%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [100%] ========================================================= 2 passed, 6 skipped in 1.31s ========================================================== ```	2025-08-20 14:22:40 -04:00
Mustafa Elbehery	3f8df167f3	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 ) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-08-20 07:15:35 -04:00
Matthew Farrellee	e7a812f5de	chore: Fixup main pre commit (#3204 )	2025-08-19 14:52:38 -04:00
Varsha	8cc4925f7d	chore: Enable keyword search for Milvus inline (#3073 ) # What does this PR do? With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite supports keyword search using BM25. While introducing keyword search we had explicitly disabled it for inline milvus. This PR removes the need for the check, and enables `inline::milvus` for tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Run llama stack with `inline::milvus` enabled: ``` pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v ``` ``` INFO 2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS =========================================================================================== test session starts ============================================================================================ platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 3 items tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED [ 33%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED [ 66%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED [100%] ============================================================================================ 3 passed in 4.75s ============================================================================================= ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-19 13:01:23 -04:00
Ashwin Bharambe	eb07a0f86a	fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test Llama Stack Build / build-single-provider (push) Failing after 23s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 28s Details Test Llama Stack Build / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 37s Details Test External API and Providers / test-external (venv) (push) Failing after 33s Details Unit Tests / unit-tests (3.13) (push) Failing after 33s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 38s Details Python Package Build Test / build (3.12) (push) Failing after 1m0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m4s Details Unit Tests / unit-tests (3.12) (push) Failing after 59s Details Test Llama Stack Build / build (push) Failing after 50s Details Vector IO Integration Tests / test-matrix (push) Failing after 1m48s Details UI Tests / ui-tests (22) (push) Successful in 2m12s Details Pre-commit / pre-commit (push) Successful in 2m41s Details I started this PR trying to unbreak a newly broken test `test_agent_name`. This test was broken all along but did not show up because during testing we were pulling the "non-updated" llama stack client. See this comment: https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205 While fixing this, I encountered a large amount of badness in our CI workflow definitions. - We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR` overrides to `llama stack build` at all in some cases. - Even when we did, we used `uv run` liberally. The first thing `uv run` does is "syncs" the project environment. This means, it is going to undo any mutations we might have done ourselves. But we make many mutations in our CI runners to these environments. The most important of which is why `llama stack build` where we install distro dependencies. As a result, when you tried to run the integration tests, you would see old, strange versions. ## Test Plan Re-record using: ``` sh scripts/integration-tests.sh --stack-config ci-tests \ --provider ollama --test-pattern test_agent_name --inference-mode record ``` Then re-run with `--inference-mode replay`. But: Eventually, this test turned out to be quite flaky for telemetry reasons. I haven't investigated it for now and just disabled it sadly since we have a release to push out.	2025-08-18 17:02:24 -07:00
slekkala1	7519ab4024	feat: Code scanner Provider impl for moderations api (#3100 ) # What does this PR do? Add CodeScanner implementations ## Test Plan `SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` This PR need to land after this https://github.com/meta-llama/llama-stack/pull/3098	2025-08-18 14:15:40 -07:00
Ashwin Bharambe	5e7c2250be	test(recording): add a script to schedule recording workflow (#3170 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Pre-commit / pre-commit (push) Successful in 1m19s Details See comment here: https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097 -- TL;DR it is quite complex to invoke the recording workflow correctly for an end developer writing tests. This script simplifies the work. No more manual GitHub UI navigation! ## Script Functionality - Auto-detects your current branch and associated PR - Finds the right repository context (works from forks!) - Runs the workflow where it can actually commit back - Validates prerequisites and provides helpful error messages ## How to Use First ensure you are on the branch which introduced a new test and want it recorded. Make sure you have pushed this branch remotely, easiest is to create a PR. ``` # Record tests for current branch ./scripts/github/schedule-record-workflow.sh # Record specific test subdirectories ./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference" # Record with vision tests enabled ./scripts/github/schedule-record-workflow.sh --run-vision-tests # Record tests matching a pattern ./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming" ``` ## Test Plan Ran `./scripts/github/schedule-record-workflow.sh -s inference -k tool_choice` which started `4820409329` which successfully committed recorded outputs.	2025-08-15 16:54:34 -07:00
Matthew Farrellee	914c7be288	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 ) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-15 15:34:15 -07:00
Ashwin Bharambe	0e8bb94bf3	feat(ci): make recording workflow simpler, more parameterizable (#3169 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Test External API and Providers / test-external (venv) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (push) Failing after 28s Details Unit Tests / unit-tests (3.12) (push) Failing after 27s Details Unit Tests / unit-tests (3.13) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 2m6s Details # What does this PR do? Recording tests has become a nightmare. This is the first part of making that process simpler by making it _less_ automatic. I tried to be too clever earlier. It simplifies the record-integration-tests workflow to use workflow dispatch inputs instead of PR labels. No more opaque stuff. Just go to the GitHub UI and run the workflow with inputs. I will soon add a helper script for this also. Other things to aid re-running just the small set of things you need to re-record: - Replaces the `test-types` JSON array parameter with a more intuitive `test-subdirs` comma-separated list. The whole JSON array crap was for matrix. - Adds a new `test-pattern` parameter to allow filtering tests using pytest's `-k` option ## Test Plan Note that this PR is in a fork not the source repository. - Replay tests on this PR are green - Manually [ran](`1699856292`) the replay workflow with a test-subdir and test-pattern filter, worked - Manually [ran](`4819508034`) the record workflow with a simple pattern, it has worked and updated _this_ PR. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-15 14:47:20 -07:00
ashwinb	f66ae3b3b1	docs(tests): Add a bunch of documentation for our testing systems (#3139 ) # What does this PR do? Creates a structured testing documentation section with multiple detailed pages: - Testing overview explaining the record-replay architecture - Integration testing guide with practical usage examples - Record-replay system technical documentation - Guide for writing effective tests - Troubleshooting guide for common testing issues Hopefully this makes things a bit easier.	2025-08-15 17:45:30 +00:00

1 2 3 4 5 ...

535 commits