llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-07 04:45:44 +00:00

History

Ben Browning 51d9fd4808 Some checks failed Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s Details Unit Tests / unit-tests (3.12) (push) Failing after 45s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s Details fix: Don't cache clients for passthrough auth providers (#2728 ) # What does this PR do? Some of our inference providers support passthrough authentication via `x-llamastack-provider-data` header values. This fixes the providers that support passthrough auth to not cache their clients to the backend providers (mostly OpenAI client instances) so that the client connecting to Llama Stack has to provide those auth values on each and every request. ## Test Plan I added some unit tests to ensure we're not caching clients across requests for all the fixed providers in this PR. ``` uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py ``` I also ran some of our OpenAI compatible API integration tests for each of the changed providers, just to ensure they still work. Note that these providers don't actually pass all these tests (for unrelated reasons due to quirks of the Groq and Together SaaS services), but enough of the tests passed to confirm the clients are still working as intended. ### Together ``` ENABLE_TOGETHER="together" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "together/meta-llama/Llama-3.1-8B-Instruct" ``` ### OpenAI ``` ENABLE_OPENAI="openai" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "openai/gpt-4o-mini" ``` ### Groq ``` ENABLE_GROQ="groq" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "groq/meta-llama/Llama-3.1-8B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-07-11 13:38:27 -07:00
..
anthropic	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
bedrock	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
cerebras	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
cerebras_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
databricks	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
fireworks	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
fireworks_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
gemini	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
groq	fix: Don't cache clients for passthrough auth providers (#2728 )	2025-07-11 13:38:27 -07:00
groq_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
llama_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
nvidia	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
ollama	refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627 )	2025-07-06 09:07:37 +05:30
openai	fix: Don't cache clients for passthrough auth providers (#2728 )	2025-07-11 13:38:27 -07:00
passthrough	feat: consolidate most distros into "starter" (#2516 )	2025-07-04 15:58:03 +02:00
runpod	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
sambanova	fix: sambanova shields and model validation (#2693 )	2025-07-11 16:29:15 -04:00
sambanova_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
tgi	feat: consolidate most distros into "starter" (#2516 )	2025-07-04 15:58:03 +02:00
together	fix: Don't cache clients for passthrough auth providers (#2728 )	2025-07-11 13:38:27 -07:00
together_openai_compat	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
vllm	refactor(env)!: enhanced environment variable substitution (#2490 )	2025-06-26 08:20:08 +05:30
watsonx	fix: allow default empty vars for conditionals (#2570 )	2025-07-01 14:42:05 +02:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00