llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 12:07:34 +00:00

Author	SHA1	Message	Date
Nathan Weinberg	2f58d87c22	docs: fix typos in RAG docs (#3530 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m21s Details Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-09-23 14:30:24 -07:00
Matthew Farrellee	ce7a3b4dff	feat: update Cerebras inference provider to support dynamic model listing (#3481 ) # What does this PR do? - update Cerebras to use OpenAIMixin - enable openai completions tests - enable openai chat completions tests - disable with n > 1 tests - add recording for --setup cerebras --subdirs inference --pattern openai ## Test Plan `./scripts/integration-tests.sh --stack-config server:ci-tests --setup cerebras --subdirs inference --pattern openai` ``` tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] instantiating llama_stack_client Port 8321 is already in use, assuming server is already running... llama_stack_client instantiated in 0.053s PASSED [ 2%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=cerebras/llama-3.3-70b-inference:completion:suffix] SKIPPED (Suffix is not supported for the model: cerebras/llama-3.3-70b.) [ 4%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] PASSED [ 6%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-1] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 8%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 10%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED [ 12%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 17%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED [ 19%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED [ 21%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls wit...) [ 23%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 25%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 27%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 29%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 31%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 34%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 36%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 38%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 40%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 42%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-0] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 46%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 53%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED [ 57%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 59%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 61%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 63%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 65%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 68%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 70%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 72%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 74%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 76%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 78%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED [ 80%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED [ 82%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 85%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED [ 87%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED [ 89%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED [ 91%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED [ 93%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 95%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED [ 97%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED [100%] =================================================================================================================== slowest 10 durations ==================================================================================================================== 0.37s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] 0.34s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False] 0.18s call tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True] 0.17s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] 0.15s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True] 0.13s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False] 0.08s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] ================================================================================================================== short test summary info ================================================================================================================== SKIPPED [1] tests/integration/inference/test_openai_completion.py:75: Suffix is not supported for the model: cerebras/llama-3.3-70b. SKIPPED [3] tests/integration/inference/test_openai_completion.py:123: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:103: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py:129: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls with base64 encoded files. SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:90: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:112: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:136: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:154: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:175: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:195: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:206: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:217: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:244: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:278: embedding_model_id empty - skipping test ================================================================================================= 18 passed, 29 skipped, 50 deselected, 4 warnings in 3.02s ================================================================================================= ```	2025-09-23 16:26:00 -04:00
Matthew Farrellee	d07ebce4d9	feat: (re-)enable Databricks inference adapter (#3500 ) # What does this PR do? add/enable the Databricks inference adapter Databricks inference adapter was broken, closes #3486 - remove deprecated completion / chat_completion endpoints - enable dynamic model listing w/o refresh, listing is not async - use SecretStr instead of str for token - backward incompatible change: for consistency with databricks docs, env DATABRICKS_URL -> DATABRICKS_HOST and DATABRICKS_API_TOKEN -> DATABRICKS_TOKEN - databricks urls are custom per user/org, add special recorder handling for databricks urls - add integration test --setup databricks - enable chat completions tests - enable embeddings tests - disable n > 1 tests - disable embeddings base64 tests - disable embeddings dimensions tests note: reasoning models, e.g. gpt oss, fail because databricks has a custom, incompatible response format ## Test Plan ci and ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup databricks --subdirs inference --pattern openai ``` note: databricks needs to be manually added to the ci-tests distro for replay testing	2025-09-23 15:37:23 -04:00
Sébastien Han	d3600b92d1	fix: force milvus-lite installation for inline::milvus (#3488 ) # What does this PR do? pymilvus recently made `milvus-lite` an optional dependency to their package. If someone wants to use the inline provider we must include the extra dependency. For more details see: https://github.com/milvus-io/pymilvus/pull/2976 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-09-19 16:12:08 -04:00
adam-d-young	9378bdca43	docs: Fix incorrect vector_db_id usage in RAG tutorial (#3444 ) Some checks failed UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m58s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details # What does this PR do? This PR fixes a blocking issue in the detailed RAG tutorial where the code fails with a 400 Bad Request error. The root cause is that recent versions of Llama-Stack ignore the client-generated vector_db_id and assign a new server-side ID. The tutorial was not updated to reflect this, causing the rag_tool.insert call to fail. This change updates the code to capture the authoritative ID from the .identifier attribute of the register() method's response. This ensures the tutorial code runs successfully and reflects the current API behavior. ## Test Plan The fix can be verified by running the Python code snippet from the detailed tutorial page. Run the original code (Before this change): Result: The script fails with a 400 Bad Request error on the rag_tool.insert step. Run the updated code (After this change): Result: The script runs successfully to completion. Co-authored-by: Adam Young <adam.young@redhat.com>	2025-09-19 11:41:26 -04:00
Jiayi Ni	e66103c09d	fix: add missing files provider to NVIDIA distribution (#3479 ) # What does this PR do? The rag-runtime tool requires files API as a dependency, but the NVIDIA distribution was missing the files provider configuration. Thus, when running: ``` llama stack build --distro nvidia --image-type venv ``` And then: ``` llama stack run {path_to_distribution_config} --image-type venv ``` It would raise an error: ``` RuntimeError: Failed to resolve 'tool_runtime' provider 'rag-runtime' of type 'inline::rag-runtime': required dependency 'files' is not available. Please add a 'files' provider to your configuration or check if the provider is properly configured. ``` This PR fixes the issue by adding missing files provider to NVIDIA distribution. ## Test Plan N/A	2025-09-18 13:49:46 +02:00
Charlie Doern	6b855af96f	feat: introduce api leveling proposal (#3317 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 37s Details Unit Tests / unit-tests (3.12) (push) Failing after 37s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 2m31s Details # What does this PR do? this document outlines different API stability levels, how to enforce them, and next steps ## Next Steps Following the adoption of this document, all existing APIs should follow the enforcement protocol. relates to #3237 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-16 18:18:36 +02:00
Sébastien Han	65d45c7318	chore: various watsonx fixes (#3428 ) # What does this PR do? use a logger * update the distro to add the Files API otherwise it won't start since it is a dependency of vector * clarify project_id and api_key requirements * disable openai compatible calls since the endpoint returns 404 * disable text_inference structured format tests * fixed openai client initialization ## Test Plan Execute text_inference: ``` WATSONX_API_KEY=... WATSONX_PROJECT_ID=... python -m llama_stack.core.server.server llama_stack/distributions/watsonx/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -vvvv -ra --text-model watsonx/meta-llama/llama-3-3-70b-instruct tests/integration/inference/test_text_inference.py ============================================= test session starts ============================================== platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 20 items tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 5%] tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 10%] tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] XFAIL [ 15%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 20%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 25%] tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:structured_output] SKIPPED structured output) [ 30%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 35%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 40%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 45%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 55%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 60%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] SKIPPEDstructured output) [ 65%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-True] PASSED [ 70%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] XFAIL [ 75%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 80%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 85%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-False] PASSED [ 90%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] XFAIL [ 95%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] XFAIL [100%] =========================================== short test summary info ============================================ SKIPPED [2] tests/integration/inference/test_text_inference.py:49: Model watsonx/meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support json_schema structured output XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] - remote::watsonx doesn't support 'stop' parameter yet XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] - Not tested for non-llama4 models yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] - Not tested for non-llama4 models yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] - Not tested for non-llama4 models yet ============================ 12 passed, 2 skipped, 6 xfailed, 14 warnings in 36.88s ============================ ``` --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-09-16 13:55:10 +02:00
Francisco Arceo	d15368a302	chore: Updating documentation, adding exception handling for Vector Stores in RAG Tool, more tests on migration, and migrate off of inference_api for context_retriever for RAG (#3367 ) # What does this PR do? - Updating documentation on migration from RAG Tool to Vector Stores and Files APIs - Adding exception handling for Vector Stores in RAG Tool - Add more tests on migration from RAG Tool to Vector Stores - Migrate off of inference_api for context_retriever for RAG <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Integration and unit tests added Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-09-11 14:20:11 +02:00
Sébastien Han	f31bcc11bc	feat: add Azure OpenAI inference provider support (#3396 ) # What does this PR do? Llama-stack now supports a new OpenAI compatible endpoint with Azure OpenAI. The starter distro has been updated to add the new remote inference provider. A few tests have been modified and improved. ## Test Plan Deploy a model in the Aure portal then: ``` $ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py ... Results: ``` ============================================= test session starts ============================================== platform darwin -- Python 3.12.8, pytest-8.4.1, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 27 items tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 3%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix] SKIPPED [ 7%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 11%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1] SKIPPED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini] SKIPPED [ 18%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 22%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 25%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 29%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 33%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 37%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini] SKIPPEDed files.) [ 40%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0] SKIPPED [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 59%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 62%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 66%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 70%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 74%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 77%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 81%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 85%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 88%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 92%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False] PASSED [ 96%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False] PASSED [100%] =========================================== short test summary info ============================================ SKIPPED [3] tests/integration/inference/test_openai_completion.py:63: Model azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI completions. SKIPPED [3] tests/integration/inference/test_openai_completion.py:118: Model azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body parameters. SKIPPED [1] tests/integration/inference/test_openai_completion.py:124: Model azure/gpt-5-mini hosted by remote::azure doesn't support chat completion calls with base64 encoded files. ================================== 20 passed, 7 skipped, 2 warnings in 51.77s ================================== ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-09-11 13:48:38 +02:00
ehhuang	c04f1c1e8c	chore: move benchmarking related code (#3406 ) # What does this PR do? - moving things and some formatting changes ## Test Plan	2025-09-10 13:19:44 -07:00
Alexey Rybak	7394828c7a	docs: horizontal nav bar (#3407 ) # What does this PR do? * Adds a horizontal nav bar for easy access to the API reference and the Llama Stack Github repo <img width="2696" height="520" alt="image" src="https://github.com/user-attachments/assets/82daffe1-c206-4e20-b95b-1e090011eecc" /> ## Test Plan * Built the docs and ran the local HTML server to verify changes	2025-09-10 12:43:36 -07:00
ehhuang	e980436a2e	chore: introduce write queue for inference_store (#3383 ) # What does this PR do? Adds a write worker queue for writes to inference store. This avoids overwhelming request processing with slow inference writes. ## Test Plan Benchmark: ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ## RPS from 21 -> 57	2025-09-10 11:57:42 -07:00
Ashwin Bharambe	81ad240faa	fix(k8s): unwedge run.yaml to add files Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m28s Details	2025-09-09 23:02:26 -07:00
Mohammad Daoud Farooqi	9618adba89	docs: add MongoDB to external provider list (#3369 ) Some checks failed Python Package Build Test / build (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 36s Details Test External API and Providers / test-external (venv) (push) Failing after 41s Details UI Tests / ui-tests (22) (push) Successful in 1m3s Details Pre-commit / pre-commit (push) Successful in 2m10s Details The MongoDB integration - Vector search, Full-Text search and Hybrid search have now been added as an external provider offering for Llama Stack: https://github.com/mongodb-partners/mongodb-llama-stack	2025-09-08 14:09:13 +02:00
Akram Ben Aissi	072dca0609	feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559 ) # What does this PR do? Add Kubernetes authentication provider support - Add KubernetesAuthProvider class for token validation using Kubernetes SelfSubjectReview API - Add KubernetesAuthProviderConfig with configurable API server URL, TLS settings, and claims mapping - Implement authentication via POST requests to /apis/authentication.k8s.io/v1/selfsubjectreviews endpoint - Add support for parsing Kubernetes SelfSubjectReview response format to extract user information - Add KUBERNETES provider type to AuthProviderType enum - Update create_auth_provider factory function to handle 'kubernetes' provider type - Add comprehensive unit tests for KubernetesAuthProvider functionality - Add documentation with configuration examples and usage instructions The provider validates tokens by sending SelfSubjectReview requests to the Kubernetes API server and extracts user information from the userInfo structure in the response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> What This Verifies: Authentication header validation Token validation with Kubernetes SelfSubjectReview and kubernetes server API endpoint Error handling for invalid tokens and HTTP errors Request payload structure and headers ``` python -m pytest tests/unit/server/test_auth.py -k "kubernetes" -v ``` Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>	2025-09-08 11:25:10 +02:00
Francisco Arceo	7cd1c2c238	feat: Updating Rag Tool to use Files API and Vector Stores API (#3344 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (push) Failing after 23s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Pre-commit / pre-commit (push) Successful in 1m32s Details	2025-09-06 07:26:34 -06:00
Sumanth Kamenani	0b00c68d59	fix: use lambda pattern for bedrock config env vars (#3307 ) # What does this PR do? Improved bedrock provider config to read from environment variables like AWS_ACCESS_KEY_ID. Updated all fields to use default_factory with lambda patterns like the nvidia provider does. Now the environment variables work as documented. Closes #3305 ## Test Plan Ran the new bedrock config tests: ```bash python -m pytest tests/unit/providers/inference/bedrock/test_config.py -v Verified existing provider tests still work: python -m pytest tests/unit/providers/test_configs.py -v	2025-09-05 10:45:11 +02:00
ehhuang	bcc7f2c7d0	chore: async inference store write (#3318 ) # What does this PR do? ## Test Plan ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 30 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` Before: ============================================================ BENCHMARK RESULTS ============================================================ Total time: 30.00s Concurrent users: 50 Total requests: 1267 Successful requests: 1267 Failed requests: 0 Success rate: 100.0% Requests per second: 42.23 After: ============================================================ BENCHMARK RESULTS ============================================================ Total time: 30.00s Concurrent users: 50 Total requests: 1449 Successful requests: 1449 Failed requests: 0 Success rate: 100.0% Requests per second: 48.30	2025-09-04 11:37:46 -07:00
Ashwin Bharambe	c3d3a0b833	feat(tests): auto-merge all model list responses and unify recordings (#3320 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details UI Tests / ui-tests (22) (push) Successful in 1m7s Details Pre-commit / pre-commit (push) Successful in 2m34s Details One needed to specify record-replay related environment variables for running integration tests. We could not use defaults because integration tests could be run against Ollama instances which could be running different models. For example, text vs vision tests needed separate instances of Ollama because a single instance typically cannot serve both of these models if you assume the standard CI worker configuration on Github. As a result, `client.list()` as returned by the Ollama client would be different between these runs and we'd end up overwriting responses. This PR "solves" it by adding a small amount of complexity -- we store model list responses specially, keyed by the hashes of the models they return. At replay time, we merge all of them and pretend that we have the union of all models available. ## Test Plan Re-recorded all the tests using `scripts/integration-tests.sh --inference-mode record`, including the vision tests.	2025-09-03 11:33:03 -07:00
Jiayi Ni	b12cd528ef	docs: add VLM NIM example (#3277 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 0s Details Test Llama Stack Build / build-single-provider (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 0s Details Test Llama Stack Build / generate-matrix (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s Details Test Llama Stack Build / build (push) Has been skipped Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 1s Details UI Tests / ui-tests (22) (push) Failing after 0s Details Unit Tests / unit-tests (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 0s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1s Details	2025-08-29 16:23:52 -07:00
slekkala1	efdb5558b8	fix: Remove bfcl scoring function as not supported (#3281 ) Some checks failed Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 1s Details Test Llama Stack Build / build-single-provider (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 0s Details Test Llama Stack Build / generate-matrix (push) Failing after 2s Details Test Llama Stack Build / build (push) Has been skipped Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 1s Details UI Tests / ui-tests (22) (push) Failing after 0s Details Unit Tests / unit-tests (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1s Details # What does this PR do? BFCL scoring function is not supported, removing it. Also minor fixes as the llama stack run is broken for open-benchmark for test plan verification 1. Correct the model paths for supported models 2. Fix another issue as there is no `provider_id` for DatasetInput but logger assumes it exists. ``` File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 332, in construct_stack await register_resources(run_config, impls) File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 108, in register_resources logger.debug(f"registering {rsrc.capitalize()} {obj} for provider {obj.provider_id}") ^^^^^^^^^^^^^^^ File "/Users/swapna942/llama-stack/.venv/lib/python3.13/site-packages/pydantic/main.py", line 991, in __getattr__ raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') AttributeError: 'DatasetInput' object has no attribute 'provider_id' ``` ## Test Plan ```llama stack build --distro open-benchmark --image-type venv``` and run the server succeeds Issue Link: https://github.com/llamastack/llama-stack/issues/3282	2025-08-29 11:03:52 -07:00
IAN MILLER	3130ca0a78	feat: implement keyword, vector and hybrid search inside vector stores for PGVector provider (#3064 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to implement `openai/v1/vector_stores/{vector_store_id}/search` for PGVector provider. It involves implementing vector similarity search, keyword search and hybrid search for `PGVectorIndex`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3006 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run unit tests: ` ./scripts/unit-tests.sh ` Run integration tests for openai vector stores: 1. Export env vars: ``` export ENABLE_PGVECTOR=true export PGVECTOR_HOST=localhost export PGVECTOR_PORT=5432 export PGVECTOR_DB=llamastack export PGVECTOR_USER=llamastack export PGVECTOR_PASSWORD=llamastack ``` 2. Create DB: ``` psql -h localhost -U postgres -c "CREATE ROLE llamastack LOGIN PASSWORD 'llamastack';" psql -h localhost -U postgres -c "CREATE DATABASE llamastack OWNER llamastack;" psql -h localhost -U llamastack -d llamastack -c "CREATE EXTENSION IF NOT EXISTS vector;" ``` 3. Install sentence-transformers: ` uv pip install sentence-transformers ` 4. Run: ``` uv run --group test pytest -s -v --stack-config="inference=inline::sentence-transformers,vector_io=remote::pgvector" --embedding-model sentence-transformers/all-MiniLM-L6-v2 tests/integration/vector_io/test_openai_vector_stores.py ``` Inspect PGVector vector stores (optional): ``` psql llamastack psql (14.18 (Homebrew)) Type "help" for help. llamastack=# \z Access privileges Schema \| Name \| Type \| Access privileges \| Column privileges \| Policies --------+------------------------------------------------------+-------+-------------------+-------------------+---------- public \| llamastack_kvstore \| table \| \| \| public \| metadata_store \| table \| \| \| public \| vector_store_pgvector_main \| table \| \| \| public \| vector_store_vs_1dfbc061_1f4d_4497_9165_ecba2622ba3a \| table \| \| \| public \| vector_store_vs_2085a9fb_1822_4e42_a277_c6a685843fa7 \| table \| \| \| public \| vector_store_vs_2b3dae46_38be_462a_afd6_37ee5fe661b1 \| table \| \| \| public \| vector_store_vs_2f438de6_f606_4561_9d50_ef9160eb9060 \| table \| \| \| public \| vector_store_vs_3eeca564_2580_4c68_bfea_83dc57e31214 \| table \| \| \| public \| vector_store_vs_53942163_05f3_40e0_83c0_0997c64613da \| table \| \| \| public \| vector_store_vs_545bac75_8950_4ff1_b084_e221192d4709 \| table \| \| \| public \| vector_store_vs_688a37d8_35b2_4298_a035_bfedf5b21f86 \| table \| \| \| public \| vector_store_vs_70624d9a_f6ac_4c42_b8ab_0649473c6600 \| table \| \| \| public \| vector_store_vs_73fc1dd2_e942_4972_afb1_1e177b591ac2 \| table \| \| \| public \| vector_store_vs_9d464949_d51f_49db_9f87_e033b8b84ac9 \| table \| \| \| public \| vector_store_vs_a1e4d724_5162_4d6d_a6c0_bdafaf6b76ec \| table \| \| \| public \| vector_store_vs_a328fb1b_1a21_480f_9624_ffaa60fb6672 \| table \| \| \| public \| vector_store_vs_a8981bf0_2e66_4445_a267_a8fff442db53 \| table \| \| \| public \| vector_store_vs_ccd4b6a4_1efd_4984_ad03_e7ff8eadb296 \| table \| \| \| public \| vector_store_vs_cd6420a4_a1fc_4cec_948c_1413a26281c9 \| table \| \| \| public \| vector_store_vs_cd709284_e5cf_4a88_aba5_dc76a35364bd \| table \| \| \| public \| vector_store_vs_d7a4548e_fbc1_44d7_b2ec_b664417f2a46 \| table \| \| \| public \| vector_store_vs_e7f73231_414c_4523_886c_d1174eee836e \| table \| \| \| public \| vector_store_vs_ffd53588_819f_47e8_bb9d_954af6f7833d \| table \| \| \| (23 rows) llamastack=# ``` Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-29 16:30:12 +02:00
Kelly Brown	1a9fa3c0b8	docs: Contributor guidelines for creating Internal or External providers (#3111 ) Description: Adding information and guidelines on when contributors should create an in-tree vs out-of-tree provider. Im still learning a bit about this subject so Im very open to feedback on this PR Will also add this section to the API Providers section of the docs	2025-08-28 12:26:47 +02:00
raghotham	d73955a41e	chore: remove absolute paths (#3263 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 2s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Failing after 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build (push) Has been skipped Details Unit Tests / unit-tests (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 2s Details UI Tests / ui-tests (22) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s Details # What does this PR do? Finding these issues while moving to github pages. ## Test Plan uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all	2025-08-27 12:04:25 -07:00
Charlie Doern	cec00c5476	docs: fix post_training docs (#3262 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Failing after 1s Details Test Llama Stack Build / build (push) Has been skipped Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Pre-commit / pre-commit (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details UI Tests / ui-tests (22) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details # What does this PR do? the post training docs are missing references to the more indepth `huggingface.md` and `torchtune.md` which explain how to actually use the providers. These files show up in search though. Add references to these files into the `inline_..md` files currently pointed to by `index.md` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-26 18:21:15 -07:00
Ashwin Bharambe	9fa69b0337	feat(distro): no huggingface provider for starter (#3258 ) The `trl` dependency brings in `accelerate` which brings in nvidia dependencies for torch. We cannot have that in the starter distro. As such, no CPU-only post-training for the huggingface provider.	2025-08-26 14:06:36 -07:00
Matthew Farrellee	cffc4edf47	feat: Add optional idempotency support to batches API (#3171 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s Details Test Llama Stack Build / build-single-provider (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Failing after 5s Details Test Llama Stack Build / build (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Details Implements optional idempotency for batch creation using `idem_tok` parameter: * Core idempotency: Same token + parameters returns existing batch * Conflict detection: Same token + different parameters raises HTTP 409 ConflictError * Metadata order independence: Different key ordering doesn't affect idempotency API changes: - Add optional `idem_tok` parameter to `create_batch()` method - Enhanced API documentation with idempotency extensions Implementation: - Reference provider supports idempotent batch creation - ConflictError for proper HTTP 409 status code mapping - Comprehensive parameter validation Testing: - Unit tests: focused tests covering core scenarios with parametrized conflict detection - Integration tests: tests validating real OpenAI client behavior This enables client-side retry safety and prevents duplicate batch creation when using the same idempotency token, following REST API closes #3144	2025-08-22 15:50:40 -07:00
Ashwin Bharambe	7519b73fcc	feat(distro): fork off a starter-gpu distribution (#3240 ) The starter distribution added post-training which added torch dependencies which pulls in all the nvidia CUDA libraries. This made our starter container very big. We have worked hard to keep the starter container small so it serves its purpose as a starter. This PR tries to get it back to its size by forking off duplicate "-gpu" providers for post-training. These forked providers are then used for a new `starter-gpu` distribution which can pull in all dependencies.	2025-08-22 15:47:15 -07:00
Matthew Farrellee	f520e244d9	feat: Add S3 Files Provider (#3202 ) Implements a complete S3-based file storage provider for Llama Stack with: Core Implementation: - S3FilesImpl class with full OpenAI Files API compatibility - Support for file upload, download, listing, deletion operations - Sqlite-based metadata storage for fast queries and API compliance - Configurable S3 endpoints (AWS, MinIO, LocalStack support) Key Features: - Automatic S3 bucket creation and management - Metadata persistence - Proper error handling for S3 connectivity and permissions Dependencies: - Adds boto3 for AWS S3 integration - Adds moto[s3] for testing infrastructure Testing: Unit: `./scripts/unit-tests.sh tests/unit/files tests/unit/providers/files` Integration: Start MinIO: `podman run --rm -it -p 9000:9000 minio/minio server /data` Start stack w/ S3 provider: `S3_ENDPOINT_URL=http://localhost:9000 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin S3_BUCKET_NAME=llama-stack-files uv run llama stack build --image-type venv --providers files=remote::s3 --run` Run integration tests: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files`	2025-08-22 10:38:59 -04:00
Mustafa Elbehery	1790fc0f25	feat: Remove initialize() Method from LlamaStackAsLibrary (#2979 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR removes `init()` from `LlamaStackAsLibrary` Currently client.initialize() had to be invoked by user. To improve dev experience and to avoid runtime errors, this PR init LlamaStackAsLibrary implicitly upon using the client. It prevents also multiple init of the same client, while maintaining backward ccompatibility. This PR does the following - Automatic Initialization: Constructor calls initialize_impl() automatically. - Client is fully initialized after __init__ completes. - Prevents consecutive initialization after the client has been successfully initialized. - initialize() method still exists but is now a no-op. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2946 --------- Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 15:59:04 -07:00
Sumanth Kamenani	ac25e35124	feat: Add CORS configuration support for server (#3201 ) Adds flexible CORS (Cross-Origin Resource Sharing) configuration support to the FastAPI server with both local development and explicit configuration modes: - Local development mode: `cors: true` enables localhost-only access with regex pattern `https?://localhost:\d+` - Explicit configuration mode: Specific origins configuration with credential support and validation - Prevents insecure combinations (wildcards with credentials) - FastAPI CORSMiddleware integration via `model_dump()` Addresses the need for configurable CORS policies to support web frontends and cross-origin API access while maintaining security. Closes #2119 ## Test Plan 1. Ran Unit Tests. 2. Manual tests: FastAPI middleware integration with actual HTTP requests - Local development mode localhost access validation - Explicit configuration mode origins validation - Preflight OPTIONS request handling Some screenshots of manual tests. <img width="1920" height="927" alt="image" src="https://github.com/user-attachments/assets/79322338-40c7-45c9-a9ea-e3e8d8e2f849" /> <img width="1911" height="1037" alt="image" src="https://github.com/user-attachments/assets/1683524e-b0c9-48c9-a0a5-782e949cde01" /> cc: @leseb @rhuss @franciscojavierarceo	2025-08-21 14:23:27 -07:00
Matthew Farrellee	e7a812f5de	chore: Fixup main pre commit (#3204 )	2025-08-19 14:52:38 -04:00
Francisco Arceo	a8091d0c6a	chore: Update benchmarking location in contributing docs (#3180 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (push) Failing after 19s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 24s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details Unit Tests / unit-tests (3.13) (push) Failing after 57s Details Pre-commit / pre-commit (push) Successful in 2m11s Details # What does this PR do? Small docs change as requested in https://github.com/llamastack/llama-stack/pull/3160#pullrequestreview-3125038932 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-18 08:04:21 -04:00
Matthew Farrellee	914c7be288	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 ) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-15 15:34:15 -07:00
ehhuang	2c06b24c77	test: benchmark scripts (#3160 ) # What does this PR do? 1. Add our own benchmark script instead of locust (doesn't support measuring streaming latency well) 2. Simplify k8s deployment 3. Add a simple profile script for locally running server ## Test Plan ❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10 ============================================================ BENCHMARK RESULTS ============================================================ Total time: 180.00s Concurrent users: 10 Total requests: 1636 Successful requests: 1636 Failed requests: 0 Success rate: 100.0% Requests per second: 9.09 Response Time Statistics: Mean: 1.095s Median: 1.721s Min: 0.136s Max: 3.218s Std Dev: 0.762s Percentiles: P50: 1.721s P90: 1.751s P95: 1.756s P99: 1.796s Time to First Token (TTFT) Statistics: Mean: 0.037s Median: 0.037s Min: 0.023s Max: 0.211s Std Dev: 0.011s TTFT Percentiles: P50: 0.037s P90: 0.040s P95: 0.044s P99: 0.055s Streaming Statistics: Mean chunks per response: 64.0 Total chunks received: 104775	2025-08-15 11:24:29 -07:00
ashwinb	f66ae3b3b1	docs(tests): Add a bunch of documentation for our testing systems (#3139 ) # What does this PR do? Creates a structured testing documentation section with multiple detailed pages: - Testing overview explaining the record-replay architecture - Integration testing guide with practical usage examples - Record-replay system technical documentation - Guide for writing effective tests - Troubleshooting guide for common testing issues Hopefully this makes things a bit easier.	2025-08-15 17:45:30 +00:00
ashwinb	47d5af703c	chore(responses): Refactor Responses Impl to be civilized (#3138 ) # What does this PR do? Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by: 1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic 2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py` ## Test Plan Existing tests	2025-08-15 00:05:35 +00:00
Ashwin Bharambe	ee7631b6cf	Revert "feat: add batches API with OpenAI compatibility" (#3149 ) Reverts llamastack/llama-stack#3088 The PR broke integration tests.	2025-08-14 10:08:54 -07:00
Matthew Farrellee	de692162af	feat: add batches API with OpenAI compatibility (#3088 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s Details Unit Tests / unit-tests (3.12) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 22s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Update ReadTheDocs / update-readthedocs (push) Failing after 38s Details Pre-commit / pre-commit (push) Successful in 1m53s Details Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066	2025-08-14 09:42:02 -04:00
ehhuang	d6ae54723d	chore: setup for performance benchmarking (#3096 ) # What does this PR do? 1. Added a simple mock openai-compat server that serves chat/completion 2. Add a benchmark server in EKS that includes mock inference server 3. Add locust (https://locust.io/) file for load testing ## Test Plan bash apply.sh kubectl port-forward service/locust-web-ui 8089:8089 Go to localhost:8089 to start a load test <img width="1392" height="334" alt="image" src="https://github.com/user-attachments/assets/d6aa3deb-583a-42ed-889b-751262b8e91c" /> <img width="1362" height="881" alt="image" src="https://github.com/user-attachments/assets/6a28b9b4-05e6-44e2-b504-07e60c12d35e" />	2025-08-13 10:58:22 -07:00
Kelly Brown	0cbd93c5cc	docs: Update blocks formatting in docs/source files (#3120 ) Description: The standard markdown [!NOTE] format is not supported on Sphinx generated documentation, replacing those instances. Also updating other Notes, Tips and Warning blocks throughout the source docs WIP: Working to update the provider code gen	2025-08-13 08:06:31 -07:00
Kelly Brown	6358d0a478	docs: reorganize contributor guide (#3110 ) Some checks failed Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 11s Details Pre-commit / pre-commit (push) Successful in 1m48s Details Description: Restructures contribution guide and move some sections into categories <img width="1399" height="527" alt="Screenshot 2025-08-12 at 9 28 44 AM" src="https://github.com/user-attachments/assets/404e23b4-0001-4174-b662-593e0173ef7d" />	2025-08-12 16:17:03 -07:00
Nathan Weinberg	6812aa1e1e	chore: bump min python version in docs and tests (#3103 ) # What does this PR do? the minimum python version for the project was bumped to 3.12 a couple months ago, but there remains some artifacts in the repo suggesting we support >=3.10 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-12 08:52:57 -07:00
Francisco Arceo	f7adf58b1b	docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093 ) # What does this PR do? - Adds documentation on how to contribute a Vector DB provider. - Updates the testing section to be a little friendlier to navigate. - Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K` trigger search <img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10 12 AM" src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29" /> <img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10 25 AM" src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-11 11:11:09 -07:00
Eran Cohen	a4bad6c0b4	feat: Add Google Vertex AI inference provider support (#2841 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 39s Details Pre-commit / pre-commit (push) Successful in 1m37s Details # What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-11 08:22:04 -04:00
Varsha	69dc789e15	docs: Add unsupported search mode info about FAISS (#3089 )	2025-08-10 17:34:34 -06:00
Varsha	ce72a28525	docs: Update doc on search modes for Milvus (#3078 ) # What does this PR do? Update Milvus doc on using search modes. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-10 18:48:36 -04:00
Jiayi Ni	9e78f2da96	docs: fix the docs for NVIDIA Inference Provider (#3055 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 21s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s Details Pre-commit / pre-commit (push) Successful in 1m40s Details Test Llama Stack Build / build (push) Failing after 14s Details # What does this PR do? Fix the NVIDIA inference docs by updating API methods, model IDs, and embedding example. ## Test Plan N/A	2025-08-08 11:27:55 +02:00
Dean Wampler	342550c1e2	docs: Added comment about a known limitation of AgentEventLogger (#2930 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 7s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s Details Pre-commit / pre-commit (push) Successful in 1m11s Details # What does this PR do? `AgentEventLogger` only supports streaming responses, so I suggest adding a comment near the bottom of `demo_script.py` letting the user know this, e.g., if they change the `stream` value to `False` in the call to `create_turn`, they need to comment out the logging lines. See https://github.com/llamastack/llama-stack-client-python/issues/15 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Dean Wampler <dean.wampler@ibm.com>	2025-08-07 10:09:57 -07:00

1 2 3 4 5 ...

465 commits