mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-23 00:27:26 +00:00 
			
		
		
		
	
	
		
			270 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | cffc4edf47 | feat: Add optional idempotency support to batches API (#3171) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s Test Llama Stack Build / build-single-provider (push) Failing after 2s Pre-commit / pre-commit (push) Failing after 4s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Test Llama Stack Build / generate-matrix (push) Failing after 5s Test Llama Stack Build / build (push) Has been skipped Vector IO Integration Tests / test-matrix (push) Failing after 6s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Python Package Build Test / build (3.13) (push) Failing after 4s Test External API and Providers / test-external (venv) (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 4s Update ReadTheDocs / update-readthedocs (push) Failing after 4s Python Package Build Test / build (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 5s UI Tests / ui-tests (22) (push) Failing after 6s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Implements optional idempotency for batch creation using `idem_tok` parameter: * **Core idempotency**: Same token + parameters returns existing batch * **Conflict detection**: Same token + different parameters raises HTTP 409 ConflictError * **Metadata order independence**: Different key ordering doesn't affect idempotency **API changes:** - Add optional `idem_tok` parameter to `create_batch()` method - Enhanced API documentation with idempotency extensions **Implementation:** - Reference provider supports idempotent batch creation - ConflictError for proper HTTP 409 status code mapping - Comprehensive parameter validation **Testing:** - Unit tests: focused tests covering core scenarios with parametrized conflict detection - Integration tests: tests validating real OpenAI client behavior This enables client-side retry safety and prevents duplicate batch creation when using the same idempotency token, following REST API closes #3144 | ||
|  | 3b9278f254 | feat: implement query_metrics (#3074) # What does this PR do? query_metrics currently has no implementation, meaning once a metric is emitted there is no way in llama stack to query it from the store. implement query_metrics for the meta_reference provider which follows a similar style to `query_traces`, using the trace_store to format an SQL query and execute it in this case the parameters for the query are `metric.METRIC_NAME, start_time, and end_time` and any other matchers if they are provided. this required client side changes since the client had no `query_metrics` or any associated resources, so any tests here will fail but I will provide manual execution logs for the new tests I am adding order the metrics by timestamp. Additionally add `unit` to the `MetricDataPoint` class since this adds much more context to the metric being queried. depends on https://github.com/llamastack/llama-stack-client-python/pull/260 ## Test Plan ``` import time import uuid def create_http_client(): from llama_stack_client import LlamaStackClient return LlamaStackClient(base_url="http://localhost:8321") client = create_http_client() response = client.telemetry.query_metrics(metric_name="total_tokens", start_time=0) print(response) ``` ``` ╰─ python3.12 ~/telemetry.py INFO:httpx:HTTP Request: POST http://localhost:8322/v1/telemetry/metrics/total_tokens "HTTP/1.1 200 OK" [TelemetryQueryMetricsResponse(data=None, metric='total_tokens', labels=[], values=[{'timestamp': 1753999514, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999816, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999881, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999956, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000200, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000419, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000714, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000876, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000908, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754001309, 'value': 584.0, 'unit': 'tokens'}, {'timestamp': 1754001311, 'value': 138.0, 'unit': 'tokens'}, {'timestamp': 1754001316, 'value': 349.0, 'unit': 'tokens'}, {'timestamp': 1754001318, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001320, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001341, 'value': 923.0, 'unit': 'tokens'}, {'timestamp': 1754001350, 'value': 354.0, 'unit': 'tokens'}, {'timestamp': 1754001462, 'value': 417.0, 'unit': 'tokens'}, {'timestamp': 1754001464, 'value': 158.0, 'unit': 'tokens'}, {'timestamp': 1754001475, 'value': 697.0, 'unit': 'tokens'}, {'timestamp': 1754001477, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001479, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001489, 'value': 298.0, 'unit': 'tokens'}, {'timestamp': 1754001541, 'value': 615.0, 'unit': 'tokens'}, {'timestamp': 1754001543, 'value': 119.0, 'unit': 'tokens'}, {'timestamp': 1754001548, 'value': 310.0, 'unit': 'tokens'}, {'timestamp': 1754001549, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001551, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001568, 'value': 714.0, 'unit': 'tokens'}, {'timestamp': 1754001800, 'value': 437.0, 'unit': 'tokens'}, {'timestamp': 1754001802, 'value': 200.0, 'unit': 'tokens'}, {'timestamp': 1754001806, 'value': 262.0, 'unit': 'tokens'}, {'timestamp': 1754001808, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001810, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001816, 'value': 82.0, 'unit': 'tokens'}, {'timestamp': 1754001923, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754001929, 'value': 391.0, 'unit': 'tokens'}, {'timestamp': 1754001939, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754001941, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001942, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001952, 'value': 252.0, 'unit': 'tokens'}, {'timestamp': 1754002053, 'value': 251.0, 'unit': 'tokens'}, {'timestamp': 1754002059, 'value': 375.0, 'unit': 'tokens'}, {'timestamp': 1754002062, 'value': 244.0, 'unit': 'tokens'}, {'timestamp': 1754002064, 'value': 111.0, 'unit': 'tokens'}, {'timestamp': 1754002065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002083, 'value': 719.0, 'unit': 'tokens'}, {'timestamp': 1754002302, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754002306, 'value': 218.0, 'unit': 'tokens'}, {'timestamp': 1754002308, 'value': 198.0, 'unit': 'tokens'}, {'timestamp': 1754002309, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754002311, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002324, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754003169, 'value': 499.0, 'unit': 'tokens'}, {'timestamp': 1754003171, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003173, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003185, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003448, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003453, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003589, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003609, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754003614, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 303.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 51.0, 'unit': 'tokens'}, {'timestamp': 1754003713, 'value': 426.0, 'unit': 'tokens'}, {'timestamp': 1754003714, 'value': 70.0, 'unit': 'tokens'}, {'timestamp': 1754003715, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003724, 'value': 225.0, 'unit': 'tokens'}, {'timestamp': 1754004226, 'value': 516.0, 'unit': 'tokens'}, {'timestamp': 1754004228, 'value': 127.0, 'unit': 'tokens'}, {'timestamp': 1754004232, 'value': 281.0, 'unit': 'tokens'}, {'timestamp': 1754004234, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004236, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004244, 'value': 206.0, 'unit': 'tokens'}, {'timestamp': 1754004683, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004690, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 124.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 65.0, 'unit': 'tokens'}, {'timestamp': 1754004694, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004703, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754004743, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004749, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754005566, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754006101, 'value': 159.0, 'unit': 'tokens'}, {'timestamp': 1754006105, 'value': 272.0, 'unit': 'tokens'}, {'timestamp': 1754006109, 'value': 308.0, 'unit': 'tokens'}, {'timestamp': 1754006110, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754006112, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754006130, 'value': 705.0, 'unit': 'tokens'}, {'timestamp': 1754051825, 'value': 454.0, 'unit': 'tokens'}, {'timestamp': 1754051827, 'value': 152.0, 'unit': 'tokens'}, {'timestamp': 1754051834, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754051835, 'value': 55.0, 'unit': 'tokens'}, {'timestamp': 1754051837, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754051845, 'value': 102.0, 'unit': 'tokens'}, {'timestamp': 1754099929, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754510050, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754510052, 'value': 160.0, 'unit': 'tokens'}, {'timestamp': 1754510064, 'value': 725.0, 'unit': 'tokens'}, {'timestamp': 1754510065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510067, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510083, 'value': 535.0, 'unit': 'tokens'}, {'timestamp': 1754596582, 'value': 36.0, 'unit': 'tokens'}])] ``` adding tests for each currently documented metric in llama stack using this new function. attached is also some manual testing integrations tests passing locally with replay mode and the linked client changes: <img width="1907" height="529" alt="Screenshot 2025-08-08 at 2 49 14 PM" src="https://github.com/user-attachments/assets/d482ab06-dcff-4f0c-a1f1-f870670ee9bc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | 3d119a86d4 | chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239) # What does this PR do? closes https://github.com/llamastack/llama-stack/issues/3236 mypy considered our default implementations (raise NotImplementedError) to be trivial. the result was we implemented the same stubs in providers. this change puts enough into the default impls so mypy considers them non-trivial. this allows us to remove the duplicate implementations. | ||
|  | 2ee898cc4c | chore: indicate to mypy that InferenceProvider.rerank is concrete (#3238) | ||
|  | c5e2e269e2 | feat(api): introduce /rerank (#2940) 
		
			Some checks failed
		
		
	 Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Vector IO Integration Tests / test-matrix (push) Failing after 6s Pre-commit / pre-commit (push) Failing after 7s Test Llama Stack Build / build-single-provider (push) Failing after 6s Python Package Build Test / build (3.13) (push) Failing after 8s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Test External API and Providers / test-external (venv) (push) Failing after 10s Update ReadTheDocs / update-readthedocs (push) Failing after 11s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Unit Tests / unit-tests (3.13) (push) Failing after 12s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Test Llama Stack Build / generate-matrix (push) Failing after 21s Test Llama Stack Build / build (push) Has been skipped UI Tests / ui-tests (22) (push) Failing after 21s # What does this PR do? Context: https://github.com/meta-llama/llama-stack/issues/2937 The API design is inspired by existing offerings, but not exactly the same: * `top_n` as the parameter to control number of results, instead of `top_k`, since `n` is conventional to control number * `truncation` bool instead of `max_token_per_doc`, since we should just handle the truncation automatically depending on model capability, instead of user setting the context length manually. * `data` field in the response, to be consistent with other OpenAI APIs (though they don't have a rerank API). Also, it is one less name to learn in the API. ## Test Plan Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | 914c7be288 | feat: add batches API with OpenAI compatibility (with inference replay) (#3162) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | ee7631b6cf | Revert "feat: add batches API with OpenAI compatibility" (#3149) Reverts llamastack/llama-stack#3088 The PR broke integration tests. | ||
|  | de692162af | feat: add batches API with OpenAI compatibility (#3088) 
		
			Some checks failed
		
		
	 Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Tests (Replay) / discover-tests (push) Successful in 12s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Python Package Build Test / build (3.12) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Python Package Build Test / build (3.13) (push) Failing after 17s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s Unit Tests / unit-tests (3.12) (push) Failing after 20s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Test External API and Providers / test-external (venv) (push) Failing after 22s Unit Tests / unit-tests (3.13) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Update ReadTheDocs / update-readthedocs (push) Failing after 38s Pre-commit / pre-commit (push) Successful in 1m53s Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 | ||
|  | e1e161553c | feat(responses): add MCP argument streaming and content part events (#3136) # What does this PR do? Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces: 1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals 2. New streaming event types: - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete 3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones. ## Test Plan Updated existing streaming tests to verify content part events are properly emitted | ||
|  | 25e0553eed | chore: Change moderations api response to Provider returned categories (#3098) # What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` | ||
|  | 1721aafc1f | feat(responses): type file results properly (#3117) 
		
			Some checks failed
		
		
	 Python Package Build Test / build (3.13) (push) Failing after 3s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Test Llama Stack Build / generate-matrix (push) Successful in 8s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Python Package Build Test / build (3.12) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 10s Unit Tests / unit-tests (3.12) (push) Failing after 12s Test External API and Providers / test-external (venv) (push) Failing after 15s Unit Tests / unit-tests (3.13) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s Test Llama Stack Build / build (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Pre-commit / pre-commit (push) Successful in 1m16s Another thing our tests implicitly depended on. | ||
|  | 4fec49dfdb | feat(responses): add include parameter (#3115) Well our Responses tests use it so we better include it in the API, no? I discovered it because I want to make sure `llama-stack-client` can be used always instead of `openai-python` as the client (we do want to be _truly_ compatible.) | ||
|  | 19123ca957 | refactor: standardize InferenceRouter model handling (#2965) 
		
			Some checks failed
		
		
	 Integration Tests (Replay) / discover-tests (push) Successful in 3s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Python Package Build Test / build (3.12) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Python Package Build Test / build (3.13) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Test External API and Providers / test-external (venv) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Unit Tests / unit-tests (3.12) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s Unit Tests / unit-tests (3.13) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Pre-commit / pre-commit (push) Successful in 1m19s | ||
|  | 26d3d25c87 | feat: Add moderations create api (#3020) # What does this PR do? This PR adds Open AI Compatible moderations api. Currently only implementing for llama guard safety provider Image support, expand to other safety providers and Deprecation of run_shield will be next steps. ## Test Plan Added 2 new tests for safe/ unsafe text prompt examples for the new open ai compatible moderations api usage `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` (Had some issue with previous PR https://github.com/meta-llama/llama-stack/pull/2994 while updating and accidentally close it , reopened new one ) | ||
|  | e9fced773a | refactor: introduce common 'ResourceNotFoundError' exception (#3032) # What does this PR do? 1. Introduce new base custom exception class `ResourceNotFoundError` 2. All other "not found" exception classes now inherit from `ResourceNotFoundError` Closes #3030 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | e12524af85 | feat: create unregister shield API endpoint in Llama Stack (#2853) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Integration Tests (Replay) / discover-tests (push) Successful in 13s Python Package Build Test / build (3.12) (push) Failing after 1s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Test External API and Providers / test-external (venv) (push) Failing after 12s Unit Tests / unit-tests (3.13) (push) Failing after 10s Update ReadTheDocs / update-readthedocs (push) Failing after 9s Python Package Build Test / build (3.13) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s Unit Tests / unit-tests (3.12) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Pre-commit / pre-commit (push) Successful in 2m21s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models | jq '.data[] | select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ``` | ||
|  | 68b0071861 | chore: standardize session not found error (#3031) # What does this PR do? 1. Creates a new `SessionNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 05cfa213b6 | chore: standardize tool group not found error (#2986) # What does this PR do? 1. Creates a new `ToolGroupNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 33cca26154 | chore: Enabling Integration tests for Weaviate (#2882) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 3a574ef23c | fix: remove unused DPO parameters from schema and tests (#2988) # What does this PR do?
I removed these DPO parameters from the schema in [this
PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not
have done it correctly, since they were reintroduced in [this
commit]( | ||
|  | cb7354a9ce | docs: Add detailed docstrings to API models and update OpenAPI spec (#2889) This PR focuses on improving the developer experience by adding comprehensive docstrings to the API data models across the Llama Stack. These docstrings provide detailed explanations for each model and its fields, making the API easier to understand and use. **Key changes:** - **Added Docstrings:** Added reST formatted docstrings to Pydantic models in the `llama_stack/apis/` directory. This includes models for: - Agents (`agents.py`) - Benchmarks (`benchmarks.py`) - Datasets (`datasets.py`) - Inference (`inference.py`) - And many other API modules. - **OpenAPI Spec Update:** Regenerated the OpenAPI specification (`docs/_static/llama-stack-spec.yaml` and `docs/_static/llama-stack-spec.html`) to include the new docstrings. This will be reflected in the API documentation, providing richer information to users. **Impact:** - Developers using the Llama Stack API will have a better understanding of the data structures. - The auto-generated API documentation is now more informative. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | cd5c6a2fcd | chore: standardize vector store not found error (#2968) # What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 272a3e9937 | chore: standardize dataset not found error (#2962) # What does this PR do? 1. Adds a broad schema for custom exception classes in the Llama Stack project 2. Creates a new `DatasetNotFoundError` class 3. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | c5622c79de | chore: standardize model not found error (#2964) # What does this PR do? 1. Creates a new `ModelNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 870a37ff4b | feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 1s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Integration Tests / discover-tests (push) Successful in 3s Test Llama Stack Build / generate-matrix (push) Successful in 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Unit Tests / unit-tests (3.13) (push) Failing after 10s Test Llama Stack Build / build-single-provider (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Test External API and Providers / test-external (venv) (push) Failing after 16s Test Llama Stack Build / build (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 23s Update ReadTheDocs / update-readthedocs (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 29s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 31s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s Python Package Build Test / build (3.13) (push) Failing after 54s Integration Tests / test-matrix (push) Failing after 56s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m4s Pre-commit / pre-commit (push) Successful in 2m15s # What does this PR do? OpenAI Chat Completions supports passing a base64 encoded PDF file to a model, but Llama Stack currently does not allow for this behavior. This PR extends our implementation of the OpenAI API spec to change that. Closes #2129 ## Test Plan A new functional test has been added to test the validity of such a request Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 968fc132d3 | fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932) **What:** - Added OpenAIChatCompletionTextOnlyMessageContent type for text-only content validation - Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam, OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only content type instead of mixed content - OpenAIUserMessageParam unchanged - still accepts both text and images - Updated OpenAPI spec files to reflect text-only content restrictions in schemas closes #2894 **Why:** - Enforces OpenAI API compatibility by restricting image content to user messages only - Prevents API misuse where images might be sent in message types that don't support them - Aligns with OpenAI's actual API behavior where only user messages can contain multimodal content - Improves type safety and validation at the API boundary **Test plan:** - Added comprehensive parametrized tests covering all 5 OpenAI message types - Tests verify text string acceptance for all message types - Tests verify text list acceptance for all message types - Tests verify image rejection for system/assistant/developer/tool messages (ValidationError expected) - Tests verify user messages still accept images (backward compatibility maintained) | ||
|  | 21bae296f2 | feat(auth): API access control (#2822) # What does this PR do? - Added ability to specify `required_scope` when declaring an API. This is part of the `@webmethod` decorator. - If auth is enabled, a user can access an API only if `user.attributes['scope']` includes the `required_scope` - We add `required_scope='telemetry.read'` to the telemetry read APIs. ## Test Plan CI with added tests 1. Enable server.auth with github token 2. Observe `client.telemetry.query_traces()` returns 403 | ||
|  | 632cf9eb72 | feat: Bring Your Own API (BYOA) (#2228) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 1s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Installer CI / lint (push) Failing after 3s Integration Tests / discover-tests (push) Successful in 3s Installer CI / smoke-test-on-dev (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 2s Test Llama Stack Build / generate-matrix (push) Successful in 3s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Test Llama Stack Build / build-single-provider (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Unit Tests / unit-tests (3.13) (push) Failing after 6s Test External API and Providers / test-external (venv) (push) Failing after 5s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Update ReadTheDocs / update-readthedocs (push) Failing after 8s Integration Tests / test-matrix (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Test Llama Stack Build / build (push) Failing after 6s Pre-commit / pre-commit (push) Successful in 57s # What does this PR do? Prototype on a new feature to allow new APIs to be plugged in Llama Stack. Opened for early feedback on the approach and test appetite on the functionality. @ashwinb @raghotham open for early feedback, thanks! --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | 1463b79218 | feat(registry): make the Stack query providers for model listing (#2862) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider. | ||
|  | 8353ad4981 | fix: search mode validation for rag query (#2857) # What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
I noticed a few issues with my implementation of the search mode
validation for RagQuery.
This PR replaces the check for search mode in RagQuery with a Literal. 
There were issues before with
```
TypeError: Object of type RAGSearchMode is not JSON serializable
```
When using 
```
query_config = RAGQueryConfig(max_chunks=6, mode="vector").model_dump()
```
It also fixes the fact that despite user input "vector" was always the
used search mode.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Verify that a chosen search mode works when using Rag Query or use below
agent config:
```
agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant",
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
                "query_config": {
                    "mode": "keyword",
                    "max_chunks": 6
                }
            },
        }
    ],
)
```
Running Unit Tests:
```
uv sync --extra dev
uv run pytest tests/unit/rag/test_rag_query.py -v
``` | ||
|  | 20c3197952 | chore: Making name optional in openai_create_vector_store (#2858) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | b2c7543af7 | fix(vectordb): VectorDBInput has no provider_id (#2830) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 13s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Python Package Build Test / build (3.13) (push) Failing after 11s Python Package Build Test / build (3.12) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Unit Tests / unit-tests (3.12) (push) Failing after 13s Integration Tests / discover-tests (push) Successful in 21s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Unit Tests / unit-tests (3.13) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s Integration Tests / test-matrix (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 53s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 51s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 59s Pre-commit / pre-commit (push) Successful in 1m35s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR add `provider_id` field to `VectorDBInput` class. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2819 Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 199f859eec | feat(vllm): periodically refresh models (#2823) Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM. | ||
|  | 68a2dfbad7 | feat(ollama): periodically refresh models (#2805) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list` | ||
|  | 874b1cb00f | fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 1s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Integration Tests / discover-tests (push) Successful in 4s Test Llama Stack Build / generate-matrix (push) Successful in 9s Test Llama Stack Build / build-single-provider (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Unit Tests / unit-tests (3.12) (push) Failing after 9s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s Update ReadTheDocs / update-readthedocs (push) Failing after 13s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Python Package Build Test / build (3.12) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Test External Providers / test-external-providers (venv) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Unit Tests / unit-tests (3.13) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Integration Tests / test-matrix (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Test Llama Stack Build / build (push) Failing after 15s Python Package Build Test / build (3.13) (push) Failing after 1m50s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Pre-commit / pre-commit (push) Successful in 3m20s # What does this PR do? This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct Preference Optimization (DPO) parameters. The current schema incorrectly uses PPO-inspired parameters (`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of the DPO algorithm. This PR updates it to use the standard DPO parameters: - `beta`: The KL divergence coefficient that controls deviation from the reference model - `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo, kto_pair) These parameters align with standard DPO implementations like HuggingFace's TRL library. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal> | ||
|  | 57745101be | chore: internal change, make Model.provider_model_id non-optional (#2690) 
		
			Some checks failed
		
		
	 Integration Tests / discover-tests (push) Successful in 13s Test Llama Stack Build / generate-matrix (push) Successful in 14s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Python Package Build Test / build (3.12) (push) Failing after 25s Test Llama Stack Build / build-single-provider (push) Failing after 30s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Unit Tests / unit-tests (3.12) (push) Failing after 32s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s Unit Tests / unit-tests (3.13) (push) Failing after 36s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s Test External Providers / test-external-providers (venv) (push) Failing after 36s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s Python Package Build Test / build (3.13) (push) Failing after 1m51s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Integration Tests / test-matrix (push) Failing after 36s Test Llama Stack Build / build (push) Failing after 37s Pre-commit / pre-commit (push) Successful in 3m40s - POST /v1/models accepts optional provider_model_id - ModelsRoutingTable.register_model handler ensures it is non-None, providing a default usage of Model.provider_model_id will no longer need to detect None | ||
|  | 31b088978a | fix: Fix /vector-stores/createAPI when vector store with duplicatename(#2617)# What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 618ccea090 | feat: add input validation for search mode of rag query config (#2275) # What does this PR do? Adds input validation for mode in RagQueryConfig This will prevent users from inputting search modes other than `vector` and `keyword` for the time being with `hybrid` to follow when that functionality is implemented. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] ``` # Check out this PR and enter the LS directory uv sync --extra dev ``` Run the quickstart [example](https://llama-stack.readthedocs.io/en/latest/getting_started/#step-3-run-the-demo) Alter the Agent to include a query_config ``` agent = Agent( client, model=model_id, instructions="You are a helpful assistant", tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], "query_config": { "mode": "i-am-not-vector", # Test for non valid search mode "max_chunks": 6 } }, } ], ) ``` Ensure you get the following error: ``` 400: {'errors': [{'loc': ['mode'], 'msg': "Value error, mode must be either 'vector' or 'keyword' if supported by the vector_io provider", 'type': 'value_error'}]} ``` ## Running unit tests ``` uv sync --extra dev uv run pytest tests/unit/rag/test_rag_query.py -v ``` [//]: # (## Documentation) | ||
|  | cd0ad21111 | chore(api): add mypycoverage toapis(#2648)# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack/apis` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 5b07755556 | docs: Minor spelling fix (#2592) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 23s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 22s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 21s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 19s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 34s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 33s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 33s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 33s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 31s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 30s Python Package Build Test / build (3.12) (push) Failing after 47s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 54s Python Package Build Test / build (3.13) (push) Failing after 42s Test External Providers / test-external-providers (venv) (push) Failing after 27s Unit Tests / unit-tests (3.13) (push) Failing after 36s Unit Tests / unit-tests (3.12) (push) Failing after 38s Pre-commit / pre-commit (push) Successful in 2m3s # What does this PR do? Minor spelling fix in the comments ## Test Plan No code changes | ||
|  | be9bf68246 | feat: Add webmethod for deleting openai responses (#2160) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s Test External Providers / test-external-providers (venv) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s Unit Tests / unit-tests (3.12) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s Python Package Build Test / build (3.13) (push) Failing after 33s Python Package Build Test / build (3.12) (push) Failing after 36s Pre-commit / pre-commit (push) Failing after 1m19s # What does this PR do? This PR creates a webmethod for deleting open AI responses, adds and implementation for it and makes an integration test for the OpenAI delete response method. [//]: # (If resolving an issue, uncomment and update the line below) # (Closes #2077) ## Test Plan Ran the standard tests and the pre-commit hooks and the unit tests. # (## Documentation) For this pr I made the routes and implementation based on the current get and create methods. The unit tests were not able to handle this test due to the mock interface in use, which did not allow for effective CRUD to be tested. I instead created an integration test to match the existing ones in the test_openai_responses. | ||
|  | 7cb5d3c60f | chore: standardize unsupported model error #2517 (#2518) # What does this PR do? - llama_stack/exceptions.py: Add UnsupportedModelError class - remote inference ollama.py and utils/inference/model_registry.py: Changed ValueError in favor of UnsupportedModelError - utils/inference/litellm_openai_mixin.py: remove `register_model` function implementation from `LiteLLMOpenAIMixin` class. Now uses the parent class `ModelRegistryHelper`'s function implementation Closes #2517 ## Test Plan 1. Create a new `test_run_openai.yaml` and paste the following config in it: ```yaml version: '2' image_name: test-image apis: - inference providers: inference: - provider_id: openai provider_type: remote::openai config: max_tokens: 8192 models: - metadata: {} model_id: "non-existent-model" provider_id: openai model_type: llm server: port: 8321 ``` And run the server with: ```bash uv run llama stack run test_run_openai.yaml ``` You should now get a `llama_stack.exceptions.UnsupportedModelError` with the supported list of models in the error message. --- Tested for the following remote inference providers, and they all raise the `UnsupportedModelError`: - Anthropic - Cerebras - Fireworks - Gemini - Groq - Ollama - OpenAI - SambaNova - Together - Watsonx --------- Co-authored-by: Rohan Awhad <rawhad@redhat.com> | ||
|  | 36d70637b9 | fix: finish conversion to StrEnum (#2514) # What does this PR do? We still had a few enum declared to behave like string as well as enum. Let's use StrEnum for those. Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | ac5fd57387 | chore: remove nested imports (#2515) # What does this PR do? * Given that our API packages use "import *" in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import *` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 2d9fd041eb | fix: annotations list and web_search_preview in Responses (#2520) # What does this PR do? These are a couple of fixes to get an example LangChain app working with our OpenAI Responses API implementation. The Responses API spec requires an annotations array in `output[*].content[*].annotations` and we were not providing one. So, this adds that as an empty list, even though we don't do anything to populate it yet. This prevents an error from client libraries like Langchain that expect this field to always exist, even if an empty list. The other fix is `web_search_preview` is a valid name for the web search tool in the Responses API, but we only responded to `web_search` or `web_search_preview_2025_03_11`. ## Test Plan The existing Responses unit tests were expanded to test these cases, via: ``` pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` The existing test_openai_responses.py integration tests still pass with this change, tested as below with Fireworks: ``` uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv tests/integration/agents/test_openai_responses.py \ --text-model accounts/fireworks/models/llama4-scout-instruct-basic ``` Lastly, this example LangChain app now works with Llama stack (tested with Ollama in the starter template in this case). This LangChain code is using the example snippets for using Responses API at https://python.langchain.com/docs/integrations/chat/openai/#responses-api ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="fake", model="ollama/meta-llama/Llama-3.2-3B-Instruct", ) tool = {"type": "web_search_preview"} llm_with_tools = llm.bind_tools([tool]) response = llm_with_tools.invoke("What was a positive news story from today?") print(response.content) ``` Signed-off-by: Ben Browning <bbrownin@redhat.com> | ||
|  | 82f13fe83e | feat: Add ChunkMetadata to Chunk (#2497) # What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.
More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.
```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.
<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 
## Test Plan
Added unit tests
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | cfee63bd0d | feat: Add search_mode support to OpenAI vector store API (#2500) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s Test Llama Stack Build / build-single-provider (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Unit Tests / unit-tests (3.13) (push) Failing after 8s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s Test Llama Stack Build / build (push) Failing after 5s Update ReadTheDocs / update-readthedocs (push) Failing after 44s Test External Providers / test-external-providers (venv) (push) Failing after 47s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s Pre-commit / pre-commit (push) Successful in 2m12s # What does this PR do? Add search_mode parameter (vector/keyword/hybrid) to openai_search_vector_store method. Fixes OpenAPI code generation by using str instead of Literal type. Closes: #2459 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 9c8be89fb6 | chore: bump python supported version to 3.12 (#2475) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 5s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s Update ReadTheDocs / update-readthedocs (push) Failing after 5s Unit Tests / unit-tests (3.13) (push) Failing after 8s Test Llama Stack Build / build (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s Python Package Build Test / build (3.12) (push) Failing after 33s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Test External Providers / test-external-providers (venv) (push) Failing after 31s Pre-commit / pre-commit (push) Successful in 1m54s # What does this PR do? The project now supports Python >= 3.12 Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 6832e8a658 | feat: remove score_threshold constraint (#2479) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 26s Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 28s Python Package Build Test / build (3.11) (push) Failing after 3s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s Python Package Build Test / build (3.13) (push) Failing after 4s Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 28s Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 25s Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 10s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 23s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 30s Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s Unit Tests / unit-tests (3.12) (push) Failing after 11s Unit Tests / unit-tests (3.13) (push) Failing after 11s Unit Tests / unit-tests (3.11) (push) Failing after 14s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 48s Test External Providers / test-external-providers (venv) (push) Failing after 1m5s Pre-commit / pre-commit (push) Successful in 2m17s # What does this PR do?
See inline comment.
fixes test
_
test_openai_vector_store_search_with_high_score_filter[llama_stack_client-meta-llama/Llama-3.3-70B-Instruct-meta-llama/Llama-4-Scout-17B-16E-Instruct-all-MiniLM-L6-v2-None-None]
_
llama-stack/llama_stack/distribution/library_client.py:98: in
convert_to_pydantic
    return TypeAdapter(annotation).validate_python(value)
.venv/lib/python3.10/site-packages/pydantic/type_adapter.py:421: in
validate_python
    return self.validator.validate_python(
E pydantic_core._pydantic_core.ValidationError: 1 validation error for
nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal
The above exception was the direct cause of the following exception:
llama-stack/tests/integration/vector_io/test_openai_vector_stores.py:376:
in test_openai_vector_store_search_with_high_score_filter
    search_response = compat_client.vector_stores.search(
.venv/lib/python3.10/site-packages/llama_stack_client/resources/vector_stores/vector_stores.py:356:
in search
    return self._post(
.venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1232:
in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream,
stream_cls=stream_cls))
llama-stack/llama_stack/distribution/library_client.py:177: in request
result = loop.run_until_complete(self.async_client.request(*args,
**kwargs))
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/asyncio/base_events.py:649:
in run_until_complete
    return future.result()
llama-stack/llama_stack/distribution/library_client.py:292: in request
    response = await self._call_non_streaming(
llama-stack/llama_stack/distribution/library_client.py:313: in
_call_non_streaming
    body = self._convert_body(path, options.method, body)
llama-stack/llama_stack/distribution/library_client.py:425: in
_convert_body
converted_body[param_name] = convert_to_pydantic(param.annotation,
value)
llama-stack/llama_stack/distribution/library_client.py:112: in
convert_to_pydantic
raise ValueError(f"Failed to convert parameter {value} into
{annotation}: {e}") from e
E ValueError: Failed to convert parameter {'score_threshold':
1.3458905661753127} into
llama_stack.apis.vector_io.vector_io.SearchRankingOptions | None: 1
validation error for nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal
## Test Plan | ||
|  | f394c7f2d9 | feat: Add missing Vector Store Files API surface (#2468) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s Python Package Build Test / build (3.11) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Python Package Build Test / build (3.13) (push) Failing after 5s Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Unit Tests / unit-tests (3.11) (push) Failing after 7s Update ReadTheDocs / update-readthedocs (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s Test External Providers / test-external-providers (venv) (push) Failing after 43s Unit Tests / unit-tests (3.13) (push) Failing after 52s Pre-commit / pre-commit (push) Successful in 2m4s # What does this PR do? This adds the ability to list, retrieve, update, and delete Vector Store Files. It implements these new APIs for the faiss and sqlite-vec providers, since those are the two that also have the rest of the vector store files implementation. Closes #2445 ## Test Plan ### test_openai_vector_stores Integration Tests There are a number of new integration tests added, which I ran for each provider as outlined below. faiss (from ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` sqlite-vec (from starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` ### file_search verification tests I also ensured the file_search verification tests continue to work, both for faiss and sqlite-vec. faiss (ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` sqlite-vec (starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> |