Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								94faec7bc5 
								
							 
						 
						
							
							
								
								chore(yaml)!: move registered resources to a sub-key ( #3861 )  
							
							... 
							
							
							
							**NOTE: this is a backwards incompatible change to the run-configs.**
A small QOL update, but this will prove useful when I do a rename for
"vector_dbs" to "vector_stores" next.
Moves all the `models, shields, ...` keys in run-config under a
`registered_resources` sub-key. 
							
						 
						
							2025-10-20 14:52:48 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Francisco Arceo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								48581bf651 
								
							 
						 
						
							
							
								
								chore: Updating how default embedding model is set in stack ( #3818 )  
							
							... 
							
							
							
							# What does this PR do?
Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).
New config is simply (default for Starter distro):
```yaml
vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
```
## Test Plan
CI and Unit tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-10-20 14:22:45 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2c43285e22 
								
							 
						 
						
							
							
								
								feat(stores)!: use backend storage references instead of configs ( #3697 )  
							
							... 
							
							
							
							**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
  type: sqlite
  db_path: ~/.llama/distributions/foo/registry.db
inference_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
conversations_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/distributions/foo/kvstore.db
    sql_default:
      type: sql_postgres
      host: ${env.POSTGRES_HOST}
      port: ${env.POSTGRES_PORT}
      db: ${env.POSTGRES_DB}
      user: ${env.POSTGRES_USER}
      password: ${env.POSTGRES_PASSWORD}
  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      backend: sql_default
      table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      kvstore:
        type: sqlite
        db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      persistence:
        backend: kv_default
        namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool. 
							
						 
						
							2025-10-20 13:20:09 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ehhuang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07ff15d917 
								
							 
						 
						
							
							
								
								chore: distrogen enables telemetry by default ( #3828 )  
							
							... 
							
							
							
							# What does this PR do?
leftover from #3815 
## Test Plan
CI
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com ). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3828 ).
* #3830 
* __->__ #3828  
							
						 
						
							2025-10-16 11:29:51 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Charlie Doern 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f22aaef42f 
								
							 
						 
						
							
							
								
								chore!: remove telemetry API usage ( #3815 )  
							
							... 
							
							
							
							# What does this PR do?
remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc
since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.
This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.
so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.
relates to #3806 
Signed-off-by: Charlie Doern <cdoern@redhat.com> 
							
						 
						
							2025-10-16 10:39:32 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ehhuang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6ba9db3929 
								
							 
						 
						
							
							
								
								chore!: BREAKING CHANGE: remove sqlite from telemetry config ( #3808 )  
							
							... 
							
							
							
							# What does this PR do?
- Removed sqlite sink from telemetry config.
- Removed related code
- Updated doc related to telemetry
## Test Plan
CI 
							
						 
						
							2025-10-15 14:24:45 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Francisco Arceo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e7d21e1ee3 
								
							 
						 
						
							
							
								
								feat: Add support for Conversations in Responses API ( #3743 )  
							
							... 
							
							
							
							# What does this PR do?
This PR adds support for Conversations in Responses.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Unit tests
Integration tests
<Details>
<Summary>Manual testing with this script: (click to expand)</Summary>
```python
from openai import OpenAI
client = OpenAI()
client = OpenAI(base_url="http://localhost:8321/v1/ ", api_key="none")
def test_conversation_create():
    print("Testing conversation create...")
    conversation = client.conversations.create(
        metadata={"topic": "demo"},
        items=[
            {"type": "message", "role": "user", "content": "Hello!"}
        ]
    )
    print(f"Created: {conversation}")
    return conversation
def test_conversation_retrieve(conv_id):
    print(f"Testing conversation retrieve for {conv_id}...")
    retrieved = client.conversations.retrieve(conv_id)
    print(f"Retrieved: {retrieved}")
    return retrieved
def test_conversation_update(conv_id):
    print(f"Testing conversation update for {conv_id}...")
    updated = client.conversations.update(
        conv_id,
        metadata={"topic": "project-x"}
    )
    print(f"Updated: {updated}")
    return updated
def test_conversation_delete(conv_id):
    print(f"Testing conversation delete for {conv_id}...")
    deleted = client.conversations.delete(conv_id)
    print(f"Deleted: {deleted}")
    return deleted
def test_conversation_items_create(conv_id):
    print(f"Testing conversation items create for {conv_id}...")
    items = client.conversations.items.create(
        conv_id,
        items=[
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "Hello!"}]
            },
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "How are you?"}]
            }
        ]
    )
    print(f"Items created: {items}")
    return items
def test_conversation_items_list(conv_id):
    print(f"Testing conversation items list for {conv_id}...")
    items = client.conversations.items.list(conv_id, limit=10)
    print(f"Items list: {items}")
    return items
def test_conversation_item_retrieve(conv_id, item_id):
    print(f"Testing conversation item retrieve for {conv_id}/{item_id}...")
    item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id)
    print(f"Item retrieved: {item}")
    return item
def test_conversation_item_delete(conv_id, item_id):
    print(f"Testing conversation item delete for {conv_id}/{item_id}...")
    deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id)
    print(f"Item deleted: {deleted}")
    return deleted
def test_conversation_responses_create():
    print("\nTesting conversation create for a responses example...")
    conversation = client.conversations.create()
    print(f"Created: {conversation}")
    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")
    return response, conversation
def test_conversations_responses_create_followup(
        conversation,
        content="Repeat what you just said but add 'this is my second time saying this'",
    ):
    print(f"Using: {conversation.id}")
    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": content}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")
    conv_items = client.conversations.items.list(conversation.id)
    print(f"\nRetrieving list of items for conversation {conversation.id}:")
    print(conv_items.model_dump_json(indent=2))
def test_response_with_fake_conv_id():
    fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44"
    print(f"Using {fake_conv_id}")
    try:
        response = client.responses.create(
          model="gpt-4.1",
          input=[{"role": "user", "content": "say hello"}],
          conversation=fake_conv_id,
        )
        print(f"Created response: {response} for conversation {fake_conv_id}")
    except Exception as e:
        print(f"failed to create response for conversation {fake_conv_id} with error {e}")
def main():
    print("Testing OpenAI Conversations API...")
    # Create conversation
    conversation = test_conversation_create()
    conv_id = conversation.id
    # Retrieve conversation
    test_conversation_retrieve(conv_id)
    # Update conversation
    test_conversation_update(conv_id)
    # Create items
    items = test_conversation_items_create(conv_id)
    # List items
    items_list = test_conversation_items_list(conv_id)
    # Retrieve specific item
    if items_list.data:
        item_id = items_list.data[0].id
        test_conversation_item_retrieve(conv_id, item_id)
        # Delete item
        test_conversation_item_delete(conv_id, item_id)
    # Delete conversation
    test_conversation_delete(conv_id)
    response, conversation2 = test_conversation_responses_create()
    print('\ntesting reseponse retrieval')
    test_conversation_retrieve(conversation2.id)
    print('\ntesting responses follow up')
    test_conversations_responses_create_followup(conversation2)
    print('\ntesting responses follow up x2!')
    test_conversations_responses_create_followup(
        conversation2,
        content="Repeat what you just said but add 'this is my third time saying this'",
    )
    test_response_with_fake_conv_id()
    print("All tests completed!")
if __name__ == "__main__":
    main()
```
</Details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-10-10 11:57:40 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42414a1a1b 
								
							 
						 
						
							
							
								
								fix(logging): disable console telemetry sink by default ( #3623 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 3s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 1s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 4s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 4s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 21s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 25s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 22s 
				
			 
		
			
				
	API Conformance Tests / check-schema-compatibility (push) Successful in 33s 
				
			 
		
			
				
	UI Tests / ui-tests (22) (push) Successful in 39s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m12s 
				
			 
		
		
	 
 
	 
							
							The current span processing dumps so much junk on the console that it
makes actual understanding of what is going on in the server impossible.
I am killing the console sink as a default. If you want, you are always
free to change your run.yaml to add it.
Before: 
<img width="1877" height="1107" alt="image"
src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37 "
/>
After:
<img width="1919" height="470" alt="image"
src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332 "
/> 
							
						 
						
							2025-09-30 14:58:05 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Sébastien Han 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f31bcc11bc 
								
							 
						 
						
							
							
								
								feat: add Azure OpenAI inference provider support ( #3396 )  
							
							... 
							
							
							
							# What does this PR do?
Llama-stack now supports a new OpenAI compatible endpoint with Azure
OpenAI. The starter distro has been updated to add the new remote
inference provider.
A few tests have been modified and improved.
## Test Plan
Deploy a model in the Aure portal then:
```
$ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py
...
Results:
```
============================================= test session starts
============================================== platform darwin -- Python
3.12.8, pytest-8.4.1, pluggy-1.6.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir:
.pytest_cache
metadata: {'Python': '3.12.8', 'Platform':
'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1',
'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1',
'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0',
'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval':
'0.11.0', 'hydra-core': '1.3.2'}} rootdir:
/Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0,
json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1,
nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO,
asyncio_default_fixture_loop_scope=None,
asyncio_default_test_loop_scope=function collected 27 items
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix]
SKIPPED [ 7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1]
SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini]
SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini]
SKIPPEDed files.) [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0]
SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [100%]
=========================================== short test summary info
============================================ SKIPPED [3]
tests/integration/inference/test_openai_completion.py:63: Model
azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI
completions. SKIPPED [3]
tests/integration/inference/test_openai_completion.py:118: Model
azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body
parameters. SKIPPED [1]
tests/integration/inference/test_openai_completion.py:124: Model
azure/gpt-5-mini hosted by remote::azure doesn't support chat completion
calls with base64 encoded files. ================================== 20
passed, 7 skipped, 2 warnings in 51.77s
==================================
```
Signed-off-by: Sébastien Han <seb@redhat.com> 
							
						 
						
							2025-09-11 13:48:38 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Derek Higgins 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64b2977162 
								
							 
						 
						
							
							
								
								fix: Fix locations of distrubution runtime directories ( #3336 )  
							
							... 
							
							
							
							The defaults were mixed up
Signed-off-by: Derek Higgins <derekh@redhat.com> 
							
						 
						
							2025-09-05 14:09:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fa69b0337 
								
							 
						 
						
							
							
								
								feat(distro): no huggingface provider for starter ( #3258 )  
							
							... 
							
							
							
							The `trl` dependency brings in `accelerate` which brings in nvidia
dependencies for torch. We cannot have that in the starter distro. As
such, no CPU-only post-training for the huggingface provider. 
							
						 
						
							2025-08-26 14:06:36 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7519b73fcc 
								
							 
						 
						
							
							
								
								feat(distro): fork off a starter-gpu distribution ( #3240 )  
							
							... 
							
							
							
							The starter distribution added post-training which added torch
dependencies which pulls in all the nvidia CUDA libraries. This made our
starter container very big. We have worked hard to keep the starter
container small so it serves its purpose as a starter. This PR tries to
get it back to its size by forking off duplicate "-gpu" providers for
post-training. These forked providers are then used for a new
`starter-gpu` distribution which can pull in all dependencies. 
							
						 
						
							2025-08-22 15:47:15 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slekkala1 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7519ab4024 
								
							 
						 
						
							
							
								
								feat: Code scanner Provider impl for moderations api ( #3100 )  
							
							... 
							
							
							
							# What does this PR do?
Add CodeScanner implementations
## Test Plan
`SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v
tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
This PR need to land after this
https://github.com/meta-llama/llama-stack/pull/3098  
							
						 
						
							2025-08-18 14:15:40 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Matthew Farrellee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								914c7be288 
								
							 
						 
						
							
							
								
								feat: add batches API with OpenAI compatibility (with inference replay) ( #3162 )  
							
							... 
							
							
							
							Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066 
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-08-15 15:34:15 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eran Cohen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4bad6c0b4 
								
							 
						 
						
							
							
								
								feat: Add Google Vertex AI inference provider support ( #2841 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 4s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 8s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 11s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 10s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 9s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 8s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 39s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m37s 
				
			 
		
		
	 
 
	 
							
							# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration
<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> 
							
						 
						
							2025-08-11 08:22:04 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f834339ba 
								
							 
						 
						
							
							
								
								chore(misc): make tests and starter faster ( #3042 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 4s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Successful in 11s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 14s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 14s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 13s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Failing after 12s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 53s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m53s 
				
			 
		
		
	 
 
	 
							
							A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.
- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client` 
							
						 
						
							2025-08-05 14:55:05 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cc87995e2b 
								
							 
						 
						
							
							
								
								chore: rename templates to distributions ( #3035 )  
							
							... 
							
							
							
							As the title says. Distributions is in, Templates is out.
`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.
Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change. 
							
						 
						
							2025-08-04 11:34:17 -07:00