Sébastien Han 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4161102100 
								
							 
						 
						
							
							
								
								chore!: add double routes for v1/openai/v1 ( #3636 )  
							
							... 
							
							
							
							So that users get a warning in 0.3.0 and we remove them in 0.4.0.
Signed-off-by: Sébastien Han <seb@redhat.com> 
							
						 
						
							2025-10-02 16:11:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								56b625d18a 
								
							 
						 
						
							
							
								
								feat(openai_movement)!: Change URL structures to kill /openai/v1  (part 2) ( #3605 )  
							
							
							
						 
						
							2025-09-29 22:57:37 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e7fed8bbb 
								
							 
						 
						
							
							
								
								feat(openai_movement): Change URL structures to kill /openai/v1  (part 1) ( #3587 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 1s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 2s 
				
			 
		
			
				
	API Conformance Tests / check-schema-compatibility (push) Successful in 6s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (push) Failing after 4s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m19s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 3s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 3s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 4s 
				
			 
		
			
				
	UI Tests / ui-tests (22) (push) Successful in 38s 
				
			 
		
		
	 
 
	 
							
							The `/v1/openai/v1` prefix is annoying and now unnecessary given our
clearer focus on how to think about the API surface.
Let's kill it for the 0.3.0 update.
To make client-side changes feasible, we will do this in two parts. This
part adds a new route (sans `/openai/v1`) so the existing client
continues to work since the server supports both.
The next PR will be client-side (Stainless) changes which I will be
making shortly.
The final PR will remove the `/openai/v1` routes. 
Note that all these changes will happen rapidly within this release
cycle. The entire set _will be backwards incompatible_. 
							
						 
						
							2025-09-29 16:14:35 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Charlie Doern 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c88c4ff2c6 
								
							 
						 
						
							
							
								
								feat: introduce API leveling, post_training, eval to v1alpha ( #3449 )  
							
							... 
							
							
							
							# What does this PR do?
Rather than have a single `LLAMA_STACK_VERSION`, we need to have a
`_V1`, `_V1ALPHA`, and `_V1BETA` constant.
This also necessitated addition of `level` to the `WebMethod` so that
routing can be handeled properly.
For backwards compat, the `v1` routes are being kept around and marked
as `deprecated`. When used, the server will log a deprecation warning.
Deprecation log:
<img width="1224" height="134" alt="Screenshot 2025-09-25 at 2 43 36 PM"
src="https://github.com/user-attachments/assets/0cc7c245-dafc-48f0-be99-269fb9a686f9 "
/>
move:
1. post_training to `v1alpha` as it is under heavy development and not
near its final state
2. eval: job scheduling is not implemented. Relies heavily on the
datasetio API which is under development missing implementations of
specific routes indicating the structure of those routes might change.
Additionally eval depends on the `inference` API which is going to be
deprecated, eval will likely need a major API surface change to conform
to using completions properly
implements leveling in #3317  
note: integration tests will fail until the SDK is regenerated with
v1alpha/inference as opposed to v1/inference
## Test Plan
existing tests should pass with newly generated schema. Conformance will
also pass as these routes are not the ones we currently test for
stability
Signed-off-by: Charlie Doern <cdoern@redhat.com> 
							
						 
						
							2025-09-26 16:18:07 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Matthew Farrellee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cffc4edf47 
								
							 
						 
						
							
							
								
								feat: Add optional idempotency support to batches API ( #3171 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s 
				
			 
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s 
				
			 
		
			
				
	Test Llama Stack Build / build-single-provider (push) Failing after 2s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Failing after 4s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s 
				
			 
		
			
				
	Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s 
				
			 
		
			
				
	Test Llama Stack Build / generate-matrix (push) Failing after 5s 
				
			 
		
			
				
	Test Llama Stack Build / build (push) Has been skipped 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (push) Failing after 6s 
				
			 
		
			
				
	Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 4s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 4s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 4s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 4s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 7s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 5s 
				
			 
		
			
				
	UI Tests / ui-tests (22) (push) Failing after 6s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s 
				
			 
		
		
	 
 
	 
							
							Implements optional idempotency for batch creation using `idem_tok`
parameter:
* **Core idempotency**: Same token + parameters returns existing batch
* **Conflict detection**: Same token + different parameters raises HTTP
409 ConflictError
* **Metadata order independence**: Different key ordering doesn't affect
idempotency
**API changes:**
- Add optional `idem_tok` parameter to `create_batch()` method
- Enhanced API documentation with idempotency extensions
**Implementation:**
- Reference provider supports idempotent batch creation
- ConflictError for proper HTTP 409 status code mapping
- Comprehensive parameter validation
**Testing:**
- Unit tests: focused tests covering core scenarios with parametrized
conflict detection
- Integration tests: tests validating real OpenAI client behavior
This enables client-side retry safety and prevents duplicate batch
creation when using the same idempotency token, following REST API
closes  #3144  
							
						 
						
							2025-08-22 15:50:40 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Matthew Farrellee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								914c7be288 
								
							 
						 
						
							
							
								
								feat: add batches API with OpenAI compatibility (with inference replay) ( #3162 )  
							
							... 
							
							
							
							Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066 
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-08-15 15:34:15 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee7631b6cf 
								
							 
						 
						
							
							
								
								Revert "feat: add batches API with OpenAI compatibility" ( #3149 )  
							
							... 
							
							
							
							Reverts llamastack/llama-stack#3088 
The PR broke integration tests. 
							
						 
						
							2025-08-14 10:08:54 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Matthew Farrellee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								de692162af 
								
							 
						 
						
							
							
								
								feat: add batches API with OpenAI compatibility ( #3088 )  
							
							... 
							
							
	
		
			
	 
	
	
		
	
	
		
			
				
	Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped 
				
			 
		
			
				
	Integration Tests (Replay) / discover-tests (push) Successful in 12s 
				
			 
		
			
				
	Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s 
				
			 
		
			
				
	Python Package Build Test / build (3.12) (push) Failing after 16s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s 
				
			 
		
			
				
	Python Package Build Test / build (3.13) (push) Failing after 17s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s 
				
			 
		
			
				
	SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.12) (push) Failing after 20s 
				
			 
		
			
				
	Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s 
				
			 
		
			
				
	Test External API and Providers / test-external (venv) (push) Failing after 22s 
				
			 
		
			
				
	Unit Tests / unit-tests (3.13) (push) Failing after 18s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s 
				
			 
		
			
				
	Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s 
				
			 
		
			
				
	Update ReadTheDocs / update-readthedocs (push) Failing after 38s 
				
			 
		
			
				
	Pre-commit / pre-commit (push) Successful in 1m53s 
				
			 
		
		
	 
 
	 
							
							Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321  uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066  
							
						 
						
							2025-08-14 09:42:02 -04:00