mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 17:11:12 +00:00 
			
		
		
		
	
	
		
			9 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | 42414a1a1b | fix(logging): disable console telemetry sink by default (#3623) 
		
			Some checks failed
		
		
	 SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Vector IO Integration Tests / test-matrix (push) Failing after 3s Test Llama Stack Build / generate-matrix (push) Successful in 3s Python Package Build Test / build (3.12) (push) Failing after 1s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Test External API and Providers / test-external (venv) (push) Failing after 4s Unit Tests / unit-tests (3.13) (push) Failing after 3s Test Llama Stack Build / build (push) Failing after 4s Python Package Build Test / build (3.13) (push) Failing after 21s Test Llama Stack Build / build-single-provider (push) Failing after 25s Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Unit Tests / unit-tests (3.12) (push) Failing after 22s API Conformance Tests / check-schema-compatibility (push) Successful in 33s UI Tests / ui-tests (22) (push) Successful in 39s Pre-commit / pre-commit (push) Successful in 1m12s The current span processing dumps so much junk on the console that it makes actual understanding of what is going on in the server impossible. I am killing the console sink as a default. If you want, you are always free to change your run.yaml to add it. Before: <img width="1877" height="1107" alt="image" src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37" /> After: <img width="1919" height="470" alt="image" src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332" /> | ||
|  | f31bcc11bc | feat: add Azure OpenAI inference provider support (#3396) # What does this PR do? Llama-stack now supports a new OpenAI compatible endpoint with Azure OpenAI. The starter distro has been updated to add the new remote inference provider. A few tests have been modified and improved. ## Test Plan Deploy a model in the Aure portal then: ``` $ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py ... Results: ``` ============================================= test session starts ============================================== platform darwin -- Python 3.12.8, pytest-8.4.1, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 27 items tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 3%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix] SKIPPED [ 7%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 11%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1] SKIPPED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini] SKIPPED [ 18%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 22%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 25%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 29%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 33%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 37%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini] SKIPPEDed files.) [ 40%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0] SKIPPED [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 59%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 62%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 66%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 70%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 74%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 77%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 81%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 85%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 88%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 92%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False] PASSED [ 96%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False] PASSED [100%] =========================================== short test summary info ============================================ SKIPPED [3] tests/integration/inference/test_openai_completion.py:63: Model azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI completions. SKIPPED [3] tests/integration/inference/test_openai_completion.py:118: Model azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body parameters. SKIPPED [1] tests/integration/inference/test_openai_completion.py:124: Model azure/gpt-5-mini hosted by remote::azure doesn't support chat completion calls with base64 encoded files. ================================== 20 passed, 7 skipped, 2 warnings in 51.77s ================================== ``` Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | 9fa69b0337 | feat(distro): no huggingface provider for starter (#3258) The `trl` dependency brings in `accelerate` which brings in nvidia dependencies for torch. We cannot have that in the starter distro. As such, no CPU-only post-training for the huggingface provider. | ||
|  | 7519b73fcc | feat(distro): fork off a starter-gpu distribution (#3240) The starter distribution added post-training which added torch dependencies which pulls in all the nvidia CUDA libraries. This made our starter container very big. We have worked hard to keep the starter container small so it serves its purpose as a starter. This PR tries to get it back to its size by forking off duplicate "-gpu" providers for post-training. These forked providers are then used for a new `starter-gpu` distribution which can pull in all dependencies. | ||
|  | 7519ab4024 | feat: Code scanner Provider impl for moderations api (#3100) # What does this PR do? Add CodeScanner implementations ## Test Plan `SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` This PR need to land after this https://github.com/meta-llama/llama-stack/pull/3098 | ||
|  | 914c7be288 | feat: add batches API with OpenAI compatibility (with inference replay) (#3162) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> | ||
|  | a4bad6c0b4 | feat: Add Google Vertex AI inference provider support (#2841) 
		
			Some checks failed
		
		
	 Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s Python Package Build Test / build (3.13) (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Test Llama Stack Build / generate-matrix (push) Successful in 8s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Test External API and Providers / test-external (venv) (push) Failing after 11s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Test Llama Stack Build / build-single-provider (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Update ReadTheDocs / update-readthedocs (push) Failing after 9s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Test Llama Stack Build / build (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Unit Tests / unit-tests (3.13) (push) Failing after 39s Pre-commit / pre-commit (push) Successful in 1m37s # What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> | ||
|  | 7f834339ba | chore(misc): make tests and starter faster (#3042) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Test Llama Stack Build / generate-matrix (push) Successful in 11s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Test External API and Providers / test-external (venv) (push) Failing after 14s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Unit Tests / unit-tests (3.13) (push) Failing after 14s Test Llama Stack Build / build-single-provider (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Unit Tests / unit-tests (3.12) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Test Llama Stack Build / build (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Python Package Build Test / build (3.13) (push) Failing after 53s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Pre-commit / pre-commit (push) Successful in 1m53s A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client` | ||
|  | cc87995e2b | chore: rename templates to distributions (#3035) As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change. | 
				Renamed from llama_stack/templates/starter/run.yaml (Browse further)