llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-12 21:58:38 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	27d6becfd0	fix(misc): pin openai dependency to < 1.100.0 (#3192 ) This OpenAI client release `0843a11164` ends up breaking litellm `169a17400f/litellm/types/llms/openai.py (L40)` Update the dependency pin. Also make the imports a bit more defensive anyhow if something else during `llama stack build` ends up moving openai to a previous version. ## Test Plan Run pre-release script integration tests.	2025-08-18 12:20:50 -07:00
Maor Friedman	739b18edf8	feat: add support for postgres ssl mode and root cert (#3182 ) this PR adds support for configuring `sslmode` and `sslrootcert` when initiating the psycopg2 connection. closes #3181	2025-08-18 10:24:24 -07:00
Matthew Farrellee	914c7be288	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 ) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-15 15:34:15 -07:00
Ashwin Bharambe	a6e2c18909	Revert "refactor(agents): migrate to OpenAI chat completions API" (#3167 ) Reverts llamastack/llama-stack#3097 It has broken agents tests.	2025-08-15 12:01:07 -07:00
Aakanksha Duggal	e743d3fdf6	refactor(agents): migrate to OpenAI chat completions API (#3097 ) Replace chat_completion calls with openai_chat_completion to eliminate dependency on legacy inference APIs. # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> Closes #3067 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-15 10:51:41 -07:00
ashwinb	ba664474de	feat(responses): add mcp list tool streaming event (#3159 ) # What does this PR do? Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more. ## Test Plan Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events	2025-08-15 00:05:36 +00:00
ashwinb	9324e902f1	refactor(responses): move stuff into some utils and add unit tests (#3158 ) # What does this PR do? Refactors the OpenAI response conversion utilities by moving helper functions from `openai_responses.py` to `utils.py`. Adds unit tests.	2025-08-15 00:05:36 +00:00
ashwinb	47d5af703c	chore(responses): Refactor Responses Impl to be civilized (#3138 ) # What does this PR do? Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by: 1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic 2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py` ## Test Plan Existing tests	2025-08-15 00:05:35 +00:00
Derek Higgins	c15cc7ed77	fix: use ChatCompletionMessageFunctionToolCall (#3142 ) The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-14 10:27:00 -07:00
Ashwin Bharambe	ee7631b6cf	Revert "feat: add batches API with OpenAI compatibility" (#3149 ) Reverts llamastack/llama-stack#3088 The PR broke integration tests.	2025-08-14 10:08:54 -07:00
Matthew Farrellee	de692162af	feat: add batches API with OpenAI compatibility (#3088 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s Details Unit Tests / unit-tests (3.12) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 22s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Update ReadTheDocs / update-readthedocs (push) Failing after 38s Details Pre-commit / pre-commit (push) Successful in 1m53s Details Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066	2025-08-14 09:42:02 -04:00
Ashwin Bharambe	e1e161553c	feat(responses): add MCP argument streaming and content part events (#3136 ) # What does this PR do? Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces: 1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals 2. New streaming event types: - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete 3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones. ## Test Plan Updated existing streaming tests to verify content part events are properly emitted	2025-08-13 16:34:26 -07:00
Ashwin Bharambe	8638537d14	feat(responses): stream progress of tool calls (#3135 ) # What does this PR do? Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle. ## Test Plan Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including: - Checking for MCP in-progress and completed events - Verifying that progress events contain required fields (item_id, output_index, sequence_number) - Ensuring completed events have the necessary sequence_number field	2025-08-13 16:31:25 -07:00
Ashwin Bharambe	5b312a80b9	feat(responses): improve streaming for function calls (#3124 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 1m10s Details Test Llama Stack Build / build (push) Failing after 12s Details Emit streaming events for function calls ## Test Plan Improved the test case	2025-08-13 11:23:27 -07:00
slekkala1	25e0553eed	chore: Change moderations api response to Provider returned categories (#3098 ) # What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`	2025-08-13 09:47:35 -07:00
Francisco Arceo	92aca434a7	fix: Fix list_sessions() (#3114 ) # What does this PR do? 1. Updates `AgentPersistence.list_sessions()` to properly filter out `Turn` keys from `Session` keys. 2. Adds a suite of unit tests to confirm the `list_sessions()` behavior and tests the failed sample in https://github.com/meta-llama/llama-stack/issues/3048 ## Fixes https://github.com/meta-llama/llama-stack/issues/3048 ## Test Plan Unit tests added. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-13 07:46:26 -07:00
Chacksu	fffdab4f5c	fix: Dell distribution missing kvstore (#3113 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Integration Tests (Replay) / discover-tests (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Test Llama Stack Build / build (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 1m44s Details # What does this PR do? - Added kvstore config to ChromaDB provider config for Dell distribution similar to [starter config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112) - Fixed [error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425) getting endpoint information by adding `hf-inference` as the provider to the `AsyncInferenceClient` (TGI client). ## Test Plan ``` export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=8000 export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 export HF_TOKEN=[redacted] # TGI Server docker run --rm -it \ --pull always \ --network host \ -v $HOME/.cache/huggingface:/data \ -e HF_TOKEN=$HF_TOKEN \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -p $INFERENCE_PORT:$INFERENCE_PORT \ --gpus all \ ghcr.io/huggingface/text-generation-inference:latest \ --dtype float16 \ --usage-stats off \ --sharded false \ --cuda-memory-fraction 0.8 \ --model-id meta-llama/Llama-3.2-3B-Instruct \ --port $INFERENCE_PORT \ --hostname 0.0.0.0 # Chrome DB docker run --rm -it \ --name chromadb \ --net=host -p 8000:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # Llama Stack llama stack run dell \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL ``` --------- Co-authored-by: Connor Hack <connorhack@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-13 06:18:25 -07:00
Ashwin Bharambe	3d90117891	chore(tests): fix responses and vector_io tests (#3119 ) Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI.	2025-08-12 16:15:53 -07:00
Ashwin Bharambe	1721aafc1f	feat(responses): type file results properly (#3117 ) Some checks failed Python Package Build Test / build (3.13) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 1m16s Details Another thing our tests implicitly depended on.	2025-08-12 10:39:09 -07:00
Ashwin Bharambe	4fec49dfdb	feat(responses): add include parameter (#3115 ) Well our Responses tests use it so we better include it in the API, no? I discovered it because I want to make sure `llama-stack-client` can be used always instead of `openai-python` as the client (we do want to be _truly_ compatible.)	2025-08-12 10:24:01 -07:00
Matthew Farrellee	b70e2f1f09	fix(dep): update to openai >= 1.99.6 and use new Function location (#3087 ) # What does this PR do? closes #3072 ## Test Plan ci	2025-08-12 08:40:32 -07:00
Mustafa Elbehery	4a13ef45e9	fix: Implement missing `run_moderation` method in `PromptGuardSafetyImpl` (#3101 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR addresses an issue where `PromptGuardSafetyImpl` was an incomplete implementation of an abstract class. The class was missing the required run_moderation method from its parent interface. Currently, running `pre-commit` locally fails with the error below. ``` llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation" [abstract] Found 1 error in 1 file (checked 410 source files) ``` This PR fixes the issue as follows - Added the missing run_moderation method to PromptGuardSafetyImpl - Method raises NotImplementedError with appropriate message indicating this functionality is not implemented for PromptGuard - This allows the class to be properly instantiated while clearly indicating the limitation <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-12 08:32:52 -07:00
Nathan Weinberg	19123ca957	refactor: standardize InferenceRouter model handling (#2965 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Test External API and Providers / test-external (venv) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s Details Unit Tests / unit-tests (3.13) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m19s Details	2025-08-12 04:20:39 -06:00
Mustafa Elbehery	b5b5f5b9ae	chore: add `mypy` prompt guard (#2678 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-11 08:40:40 -07:00
Eran Cohen	a4bad6c0b4	feat: Add Google Vertex AI inference provider support (#2841 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 39s Details Pre-commit / pre-commit (push) Successful in 1m37s Details # What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-11 08:22:04 -04:00
Varsha	69dc789e15	docs: Add unsupported search mode info about FAISS (#3089 )	2025-08-10 17:34:34 -06:00
Varsha	ce72a28525	docs: Update doc on search modes for Milvus (#3078 ) # What does this PR do? Update Milvus doc on using search modes. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-10 18:48:36 -04:00
Vlastimil Eliáš	1677d6bffd	feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 59s Details Pre-commit / pre-commit (push) Successful in 1m41s Details PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider Closes #3046 ## Test Plan I was not able to locate any existing test for this provider, so I performed manual testing. But the change is really trivial and straightforward.	2025-08-08 13:48:15 -07:00
ehhuang	0b5a794c27	fix: telemetry logger spams when queue is full (#3070 ) # What does this PR do? ## Test Plan Ran a stress test on chat completion endpoint locally: For 10 concurrent users over 3 minutes: Before: <img width="1440" height="201" alt="image" src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6" /> After: <img width="1434" height="204" alt="image" src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866" /> (Will send scripts in a future PR.)	2025-08-08 13:47:36 -07:00
Jiayi Ni	9e78f2da96	docs: fix the docs for NVIDIA Inference Provider (#3055 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 21s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s Details Pre-commit / pre-commit (push) Successful in 1m40s Details Test Llama Stack Build / build (push) Failing after 14s Details # What does this PR do? Fix the NVIDIA inference docs by updating API methods, model IDs, and embedding example. ## Test Plan N/A	2025-08-08 11:27:55 +02:00
Varsha	e3928e6a29	feat: Implement hybrid search in Milvus (#2644 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Details Test External API and Providers / test-external (venv) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? This PR implements hybrid search for Milvus DB based on the inbuilt milvus support. To test: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=long --disable-warnings --asyncio-mode=auto ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-07 09:42:03 +02:00
slekkala1	26d3d25c87	feat: Add moderations create api (#3020 ) # What does this PR do? This PR adds Open AI Compatible moderations api. Currently only implementing for llama guard safety provider Image support, expand to other safety providers and Deprecation of run_shield will be next steps. ## Test Plan Added 2 new tests for safe/ unsafe text prompt examples for the new open ai compatible moderations api usage `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` (Had some issue with previous PR https://github.com/meta-llama/llama-stack/pull/2994 while updating and accidentally close it , reopened new one )	2025-08-06 13:51:23 -07:00
Charlie Doern	0caef40e0d	fix: telemetry fixes (inference and core telemetry) (#2733 ) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-06 13:37:40 -07:00
Ashwin Bharambe	7f834339ba	chore(misc): make tests and starter faster (#3042 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Details Pre-commit / pre-commit (push) Successful in 1m53s Details A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client`	2025-08-05 14:55:05 -07:00
IAN MILLER	e12524af85	feat: create unregister shield API endpoint in Llama Stack (#2853 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Integration Tests (Replay) / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m21s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models \| jq '.data[] \| select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ```	2025-08-05 07:33:46 -07:00
Nathan Weinberg	68b0071861	chore: standardize session not found error (#3031 ) # What does this PR do? 1. Creates a new `SessionNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-04 13:12:02 -07:00
Ashwin Bharambe	cc87995e2b	chore: rename templates to distributions (#3035 ) As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change.	2025-08-04 11:34:17 -07:00
Eran Cohen	e5b542dd8e	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Test External API and Providers / test-external (venv) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m10s Details	2025-08-03 12:08:56 -07:00
Varsha	3c2aee610d	refactor: Remove double filtering based on score threshold (#3019 ) # What does this PR do? Remove score_threshold based check from `OpenAIVectorStoreMixin` Closes: https://github.com/meta-llama/llama-stack/issues/3018 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:57:03 -07:00
IAN MILLER	a749d5f4a4	refactor: remove Conda support from Llama Stack (#2969 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for removal of Conda support in Llama Stack <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2539 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:52:59 -07:00
Matthew Farrellee	140ee7d337	fix: sambanova inference provider (#2996 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ```	2025-08-01 09:09:14 -07:00
Varsha	1f0766308d	feat: Add openAI compatible APIs to Qdrant (#2465 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Integration Tests (Replay) / discover-tests (push) Successful in 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s Details Test Llama Stack Build / build (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 2m39s Details # What does this PR do? Adds support to Vector store Open AI APIs in Qdrant. <!-- If resolving an issue, uncomment and update the line below --> Closes #2463 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-01 00:41:34 -04:00
Francisco Arceo	33cca26154	chore: Enabling Integration tests for Weaviate (#2882 ) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-31 20:29:50 -04:00
Matthew Farrellee	218c89fff1	feat: Add clear error message when API key is missing (#2992 ) # What does this PR do? Improve user experience by providing specific guidance when no API key is available, showing both provider data header and config options with the correct field name for each provider. Also adds comprehensive test coverage for API key resolution scenarios. addresses #2990 for providers using litellm openai mixin ## Test Plan `./scripts/unit-tests.sh tests/unit/providers/inference/test_litellm_openai_mixin.py`	2025-07-31 16:33:16 -04:00
Nehanth Narendrula	cf73146132	feat: Enable DPO training with HuggingFace inline provider (#2825 ) Some checks failed Integration Tests / discover-tests (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Integration Tests / record-tests (push) Has been skipped Details Integration Tests / run-tests (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s Details Test External API and Providers / test-external (venv) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s Details Update ReadTheDocs / update-readthedocs (push) Failing after 31s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s Details Unit Tests / unit-tests (3.12) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s Details Test Llama Stack Build / build (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m0s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s Details Pre-commit / pre-commit (push) Successful in 1m50s Details What does this PR do? This PR adds support for Direct Preference Optimization (DPO) training via the existing HuggingFace inline provider. It introduces a new DPO training recipe, config schema updates, dataset integration, and end-to-end testing to support preference-based fine-tuning with TRL. Test Plan Added integration test: tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize Ran tests on both CPU and CUDA environments --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-30 23:33:36 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Nathan Weinberg	cd5c6a2fcd	chore: standardize vector store not found error (#2968 ) # What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 15:19:16 -07:00
Nehanth Narendrula	58ffd82853	fix: Update SFTConfig parameter to fix CI and Post Training Workflow (#2948 ) # What does this PR do? - Change max_seq_length to max_length in SFTConfig constructor - TRL deprecated max_seq_length in Feb 2024 and removed it in v0.20.0 - Reference: https://github.com/huggingface/trl/pull/2895 This resolves the SFT training failure in CI tests	2025-07-29 11:14:04 -07:00
Matthew Farrellee	60bb5e307e	feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919 ) # What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests	2025-07-28 10:16:02 -07:00
Matthew Farrellee	47c078fcef	feat: implement dynamic model detection support for inference providers using litellm (#2886 ) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI.	2025-07-28 10:13:54 -07:00

... 2 3 4 5 6 ...

1034 commits