Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								a385e0d95e 
								
							 
						 
						
							
							
								
								more better  
							
							
							
						 
						
							2025-10-05 21:55:44 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								ea233c2134 
								
							 
						 
						
							
							
								
								feat!: providers use unified 'persistence' field  
							
							... 
							
							
							
							BREAKING CHANGE: Provider config field names changed for semantic clarity
- Rename kvstore → persistence for all providers
- Simple providers: flat persistence with backend reference
- Complex providers (agents): nested persistence.agent_state + persistence.responses
- Files provider: metadata_store → persistence
- Provider configs now clearly express 'how do I persist?' not 'what type of store?'
Example:
  # Before
  config:
    kvstore:
      backend: kvstore
      namespace: faiss
  # After
  config:
    persistence:
      backend: kvstore
      namespace: faiss
  # Agents (nested)
  config:
    persistence:
      agent_state:
        backend: kvstore
        namespace: agents
      responses:
        backend: sqlstore
        namespace: responses 
							
						 
						
							2025-10-05 20:33:03 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								490110eba2 
								
							 
						 
						
							
							
								
								Simplify to single 'default' backend name  
							
							... 
							
							
							
							- Replace kvstore/sqlstore backend names with unified 'default'
- Context-aware parsing handles KVStore vs SqlStore based on usage
- Provider configs use 'backend: default' everywhere
- Much cleaner: kvstore { backend: default } instead of { backend: kvstore } 
							
						 
						
							2025-10-05 19:06:47 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								5672e70832 
								
							 
						 
						
							
							
								
								Fix discriminator ambiguity with context-aware backend parsing  
							
							... 
							
							
							
							- Both SqliteKVStoreConfig and SqliteSqlStoreConfig use type='sqlite'
- Pydantic cannot distinguish them in a union
- Solution: Custom validator parses backends based on which stores reference them
- Metadata store requires KVStore, inference/conversations require SqlStore
- Separate kvstore/sqlstore backends in configs for clarity 
							
						 
						
							2025-10-05 14:16:54 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								b1659369e8 
								
							 
						 
						
							
							
								
								Refactor persistence config to use stores key with unified backends  
							
							... 
							
							
							
							- Add StoresConfig to group all store references under persistence.stores
- Use single 'default' backend instead of separate metadata_backend/inference_backend
- Update resolver to access persistence.stores.{metadata,inference,conversations}
- All SQLite distributions now use single store.db file with shared backend 
							
						 
						
							2025-10-05 13:20:44 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								ac9b4b4350 
								
							 
						 
						
							
							
								
								Migrate starter to use unified persistence config  
							
							
							
						 
						
							2025-10-05 13:14:44 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42414a1a1b 
								
							 
						 
						
							
							
								
								fix(logging): disable console telemetry sink by default ( #3623 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 3s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 1s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 4s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 4s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 21s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 25s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 22s 
				
			 
		
			
				
	API Conformance Tests / check-schema-compatibility (push) Successful in 33s 
				
			 
		
			
				
	UI Tests / ui-tests (22) (push) Successful in 39s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m12s 
				
			 
		
		
	 
 
	 
							
							The current span processing dumps so much junk on the console that it
makes actual understanding of what is going on in the server impossible.
I am killing the console sink as a default. If you want, you are always
free to change your run.yaml to add it.
Before: 
<img width="1877" height="1107" alt="image"
src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37 "
/>
After:
<img width="1919" height="470" alt="image"
src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332 "
/> 
							
						 
						
							2025-09-30 14:58:05 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Charlie Doern 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8422bd102a 
								
							 
						 
						
							
							
								
								feat: combine ProviderSpec datatypes ( #3378 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 3s 
				
			 
		
			
				
	UI Tests / ui-tests (22) (push) Successful in 36s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 4s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m12s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 1s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 3s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 2s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (push) Failing after 5s 
				
			 
		
			
				
	API Conformance Tests / check-schema-compatibility (push) Successful in 7s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 5s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 4s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s 
				
			 
		
		
	 
 
	 
							
							# What does this PR do?
currently `RemoteProviderSpec` has an `AdapterSpec` embedded in it.
Remove `AdapterSpec`, and put its leftover fields into
`RemoteProviderSpec`.
Additionally, many of the fields were duplicated between
`InlineProviderSpec` and `RemoteProviderSpec`. Move these to
`ProviderSpec` so they are shared.
Fixup the distro codegen to use `RemoteProviderSpec` directly rather
than `remote_provider_spec` which took an AdapterSpec and returned a
full provider spec
## Test Plan
existing distro tests should pass.
Signed-off-by: Charlie Doern <cdoern@redhat.com> 
							
						 
						
							2025-09-18 16:10:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Sébastien Han 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f31bcc11bc 
								
							 
						 
						
							
							
								
								feat: add Azure OpenAI inference provider support ( #3396 )  
							
							... 
							
							
							
							# What does this PR do?
Llama-stack now supports a new OpenAI compatible endpoint with Azure
OpenAI. The starter distro has been updated to add the new remote
inference provider.
A few tests have been modified and improved.
## Test Plan
Deploy a model in the Aure portal then:
```
$ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py
...
Results:
```
============================================= test session starts
============================================== platform darwin -- Python
3.12.8, pytest-8.4.1, pluggy-1.6.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir:
.pytest_cache
metadata: {'Python': '3.12.8', 'Platform':
'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1',
'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1',
'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0',
'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval':
'0.11.0', 'hydra-core': '1.3.2'}} rootdir:
/Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0,
json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1,
nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO,
asyncio_default_fixture_loop_scope=None,
asyncio_default_test_loop_scope=function collected 27 items
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix]
SKIPPED [ 7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1]
SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini]
SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini]
SKIPPEDed files.) [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0]
SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [100%]
=========================================== short test summary info
============================================ SKIPPED [3]
tests/integration/inference/test_openai_completion.py:63: Model
azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI
completions. SKIPPED [3]
tests/integration/inference/test_openai_completion.py:118: Model
azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body
parameters. SKIPPED [1]
tests/integration/inference/test_openai_completion.py:124: Model
azure/gpt-5-mini hosted by remote::azure doesn't support chat completion
calls with base64 encoded files. ================================== 20
passed, 7 skipped, 2 warnings in 51.77s
==================================
```
Signed-off-by: Sébastien Han <seb@redhat.com> 
							
						 
						
							2025-09-11 13:48:38 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Derek Higgins 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64b2977162 
								
							 
						 
						
							
							
								
								fix: Fix locations of distrubution runtime directories ( #3336 )  
							
							... 
							
							
							
							The defaults were mixed up
Signed-off-by: Derek Higgins <derekh@redhat.com> 
							
						 
						
							2025-09-05 14:09:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fa69b0337 
								
							 
						 
						
							
							
								
								feat(distro): no huggingface provider for starter ( #3258 )  
							
							... 
							
							
							
							The `trl` dependency brings in `accelerate` which brings in nvidia
dependencies for torch. We cannot have that in the starter distro. As
such, no CPU-only post-training for the huggingface provider. 
							
						 
						
							2025-08-26 14:06:36 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7519b73fcc 
								
							 
						 
						
							
							
								
								feat(distro): fork off a starter-gpu distribution ( #3240 )  
							
							... 
							
							
							
							The starter distribution added post-training which added torch
dependencies which pulls in all the nvidia CUDA libraries. This made our
starter container very big. We have worked hard to keep the starter
container small so it serves its purpose as a starter. This PR tries to
get it back to its size by forking off duplicate "-gpu" providers for
post-training. These forked providers are then used for a new
`starter-gpu` distribution which can pull in all dependencies. 
							
						 
						
							2025-08-22 15:47:15 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slekkala1 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7519ab4024 
								
							 
						 
						
							
							
								
								feat: Code scanner Provider impl for moderations api ( #3100 )  
							
							... 
							
							
							
							# What does this PR do?
Add CodeScanner implementations
## Test Plan
`SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v
tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
This PR need to land after this
https://github.com/meta-llama/llama-stack/pull/3098  
							
						 
						
							2025-08-18 14:15:40 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Matthew Farrellee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								914c7be288 
								
							 
						 
						
							
							
								
								feat: add batches API with OpenAI compatibility (with inference replay) ( #3162 )  
							
							... 
							
							
							
							Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066 
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-08-15 15:34:15 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eran Cohen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4bad6c0b4 
								
							 
						 
						
							
							
								
								feat: Add Google Vertex AI inference provider support ( #2841 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 4s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 8s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 11s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 10s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 9s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 8s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 39s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m37s 
				
			 
		
		
	 
 
	 
							
							# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration
<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> 
							
						 
						
							2025-08-11 08:22:04 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f834339ba 
								
							 
						 
						
							
							
								
								chore(misc): make tests and starter faster ( #3042 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 4s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 11s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 14s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 14s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 13s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 53s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m53s 
				
			 
		
		
	 
 
	 
							
							A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.
- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client` 
							
						 
						
							2025-08-05 14:55:05 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cc87995e2b 
								
							 
						 
						
							
							
								
								chore: rename templates to distributions ( #3035 )  
							
							... 
							
							
							
							As the title says. Distributions is in, Templates is out.
`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.
Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change. 
							
						 
						
							2025-08-04 11:34:17 -07:00