mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-26 09:15:40 +00:00 
			
		
		
		
	
	
		
			2 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | e1ed152779 | chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Integration Tests / discover-tests (push) Successful in 7s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Python Package Build Test / build (3.13) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Unit Tests / unit-tests (3.13) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Integration Tests / test-matrix (push) Failing after 18s Pre-commit / pre-commit (push) Successful in 1m14s # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests | ||
|  | 51d9fd4808 | fix: Don't cache clients for passthrough auth providers (#2728) 
		
			Some checks failed
		
		
	 Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s Unit Tests / unit-tests (3.12) (push) Failing after 45s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s Integration Tests / discover-tests (push) Successful in 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Pre-commit / pre-commit (push) Successful in 2m8s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Test Llama Stack Build / generate-matrix (push) Successful in 5s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Test Llama Stack Build / build-single-provider (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 5s Python Package Build Test / build (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Test External Providers / test-external-providers (venv) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 6s Integration Tests / test-matrix (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s # What does this PR do? Some of our inference providers support passthrough authentication via `x-llamastack-provider-data` header values. This fixes the providers that support passthrough auth to not cache their clients to the backend providers (mostly OpenAI client instances) so that the client connecting to Llama Stack has to provide those auth values on each and every request. ## Test Plan I added some unit tests to ensure we're not caching clients across requests for all the fixed providers in this PR. ``` uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py ``` I also ran some of our OpenAI compatible API integration tests for each of the changed providers, just to ensure they still work. Note that these providers don't actually pass all these tests (for unrelated reasons due to quirks of the Groq and Together SaaS services), but enough of the tests passed to confirm the clients are still working as intended. ### Together ``` ENABLE_TOGETHER="together" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "together/meta-llama/Llama-3.1-8B-Instruct" ``` ### OpenAI ``` ENABLE_OPENAI="openai" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "openai/gpt-4o-mini" ``` ### Groq ``` ENABLE_GROQ="groq" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "groq/meta-llama/Llama-3.1-8B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> |