llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-08 11:07:22 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	1536ae0333	Merge remote-tracking branch 'origin/main' into resp_branching	2025-10-01 21:13:12 -07:00
Ashwin Bharambe	6749c853c0	more substantial cleanup of Tool vs. ToolDef crap	2025-10-01 15:54:14 -07:00
Matthew Farrellee	4dbe0593f9	chore: add provider-data-api-key support to openaimixin (#3639 ) # What does this PR do? the LiteLLMOpenAIMixin provides support for reading key from provider data (headers users send). this adds the same functionality to the OpenAIMixin. this is infrastructure for migrating providers. ## Test Plan ci w/ new tests	2025-10-01 13:44:59 -07:00
Matthew Farrellee	f7c5ef4ec0	chore: remove /v1/inference/completion and implementations (#3622 ) # What does this PR do? the /inference/completion route is gone. this removes the implementations. ## Test Plan ci	2025-10-01 11:36:53 -04:00
Ashwin Bharambe	139320e19f	feat(tools): use { input_schema, output_schema } for ToolDefinition	2025-09-30 20:44:14 -07:00
Michael Dawson	ddf3f1735a	fix: ensure usage is requested if telemetry is enabled (#3571 ) # What does this PR do? Refs: https://github.com/llamastack/llama-stack/issues/3420 When telemetry is enabled the router uncondionally expects the usage attribute to be availble and fails if it is not present. Usage is not currently being requested by litellm_openai_mixin.py for streaming requests when using the responses API which means that providers like vertexai fail if telemetry is enabled and streaming is used. This is part of the required fix. Other part is in liteLLM, will plan to submit PR for that soon. ## Test Plan I applied this change along with the change for litellm in a llama stack deployment and validated that I could make streaming requests through the responses API to a gemini model and they would succeed instead of failing due to the missing usage attribute when telemetry is enabled. Signed-off-by: Michael Dawson <midawson@redhat.com>	2025-09-29 14:09:08 -07:00
Matthew Farrellee	975ead1d6a	chore(api): remove deprecated embeddings impls (#3301 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m25s Details # What does this PR do? remove deprecated embeddings implementations	2025-09-29 14:45:09 -04:00
Tami Takamiya	65f7b81e98	feat: Add items and title to ToolParameter/ToolParamDefinition (#3003 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (push) Failing after 20s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details API Conformance Tests / check-schema-compatibility (push) Successful in 25s Details UI Tests / ui-tests (22) (push) Successful in 50s Details Pre-commit / pre-commit (push) Successful in 1m16s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Add items and title to ToolParameter/ToolParamDefinition. Adding items will resolve the issue that occurs with Gemini LLM when an MCP tool has array-type properties. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Unite test cases will be added. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-27 11:35:29 -07:00
Matthew Farrellee	53b15725b6	chore(apis): unpublish deprecated /v1/inference apis (#3297 ) # What does this PR do? unpublish (make unavailable to users) the following apis - - `/v1/inference/completion`, replaced by `/v1/openai/v1/completions` - `/v1/inference/chat-completion`, replaced by `/v1/openai/v1/chat/completions` - `/v1/inference/embeddings`, replaced by `/v1/openai/v1/embeddings` - `/v1/inference/batch-completion`, replaced by `/v1/openai/v1/batches` - `/v1/inference/batch-chat-completion`, replaced by `/v1/openai/v1/batches` note: the implementations are still available for internal use, e.g. agents uses chat-completion.	2025-09-27 11:20:06 -07:00
Matthew Farrellee	b48d5cfed7	feat(internal): add image_url download feature to OpenAIMixin (#3516 ) # What does this PR do? simplify Ollama inference adapter by - - moving image_url download code to OpenAIMixin - being a ModelRegistryHelper instead of having one (mypy blocks check_model_availability method assignment) ## Test Plan - add unit tests for new download feature - add integration tests for openai_chat_completion w/ image_url (close test gap)	2025-09-26 17:32:16 -04:00
Matthew Farrellee	926c3ada41	chore: prune mypy exclude list (#3561 ) # What does this PR do? prune the mypy exclude list, build a stronger foundation for quality code ## Test Plan ci	2025-09-26 11:44:43 -04:00
Matthew Farrellee	b67aef2fc4	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547 ) # What does this PR do? - remove auto-download of ollama embedding models - add embedding model metadata to dynamic listing w/ unit test - add support and tests for allowed_models - removed inference provider models.py files where dynamic listing is enabled - store embedding metadata in embedding_model_metadata field on inference providers - make model_entries optional on ModelRegistryHelper and LiteLLMOpenAIMixin - make OpenAIMixin a ModelRegistryHelper - skip base64 embedding test for remote::ollama, always returns floats - only use OpenAI client for ollama model listing - remove unused build_model_entry function - remove unused get_huggingface_repo function ## Test Plan ci w/ new tests	2025-09-25 17:17:00 -04:00
Matthew Farrellee	62e0aef7bc	fix: return llama stack model id from embeddings (#3525 ) # What does this PR do? the openai_embeddings method on OpenAIMixin was returning the provider's model id instead of the llama stack name ## Test Plan before - ``` $ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string ... FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small' FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small' ========================================== 2 failed, 95 deselected, 4 warnings in 3.87s =========================================== ``` after - ``` $ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string ... ========================================== 2 passed, 95 deselected, 4 warnings in 2.12s =========================================== ```	2025-09-23 12:30:00 -04:00
Kai Wu	e3fd70c321	fix: change ModelRegistryHelper to use ProviderModelEntry instead of hardcoded ModelType.llm (#3451 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> change ModelRegistryHelper to use ProviderModelEntry instead of hardcoded ModelType.llm which fixed issue #3330. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[3330] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. open llama-stack server ``` uv sync --python 3.12 source .venv/bin/activate uv run llama stack build --distro starter --image-type venv --run ``` 2.Used following script to test ``` from llama_stack_client import LlamaStackClient import os def test_openai_embedding_type(): client = LlamaStackClient( base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:8321"), provider_data={ "openai_api_key": os.environ.get("OPENAI_API_KEY", ""), }, ) model = client.models.retrieve("openai/text-embedding-3-small") print(model) assert model.identifier == "openai/text-embedding-3-small" assert model.model_type == "embedding" test_openai_embedding_type() ``` logs: ``` python test_openai.py INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models/openai/text-embedding-3-small "HTTP/1.1 200 OK" Model(identifier='openai/text-embedding-3-small', metadata={'embedding_dimension': 1536.0, 'context_length': 8192.0}, api_model_type='embedding', provider_id='openai', type='model', provider_resource_id='text-embedding-3-small', owner=None, source='listed_from_provider', model_type='embedding') ```	2025-09-22 12:55:32 -04:00
ehhuang	f44eb935c4	chore: simplify authorized sqlstore (#3496 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 35s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Pre-commit / pre-commit (push) Successful in 1m19s Details # What does this PR do? This PR is generated with AI and reviewed by me. Refactors the AuthorizedSqlStore class to store the access policy as an instance variable rather than passing it as a parameter to each method call. This simplifies the API. # Test Plan existing tests	2025-09-19 16:13:56 -07:00
Matthew Farrellee	521865c388	feat: include all models from provider's /v1/models (#3471 ) # What does this PR do? this replaces the static model listing for any provider using OpenAIMixin currently - - anthropic - azure openai - gemini - groq - llama-api - nvidia - openai - sambanova - tgi - vertexai - vllm - not changed: together has its own impl ## Test Plan - new unit tests - manual for llama-api, openai, groq, gemini ``` for provider in llama-openai-compat openai groq gemini; do uv run llama stack build --image-type venv --providers inference=remote::provider --run & uv run --with llama-stack-client llama-stack-client models list \| grep Total ``` results (17 sep 2025): - llama-api: 4 - openai: 86 - groq: 21 - gemini: 66 closes #3467	2025-09-18 05:17:11 -04:00
Matthew Farrellee	f4ab154ade	feat: add dynamic model registration support to TGI inference (#3417 ) Some checks failed Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 43s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 1m21s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details # What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`	2025-09-15 15:52:40 -04:00
Matthew Farrellee	8ef1189be7	chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions (#3404 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 31s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? update vLLM inference provider to use OpenAIMixin for openai-compat functions inference recordings from Qwen3-0.6B and vLLM 0.8.3 - ``` docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \ vllm/vllm-openai:latest \ --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes ``` ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference ```	2025-09-11 09:04:38 -04:00
Ashwin Bharambe	0c7f49490c	fix(inference_store): on duplicate chat completion IDs, replace (#3408 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 23s Details Test External API and Providers / test-external (venv) (push) Failing after 30s Details UI Tests / ui-tests (22) (push) Successful in 35s Details Pre-commit / pre-commit (push) Successful in 1m45s Details # What does this PR do? Duplicate chat completion IDs can be generated during tests especially if they are replaying recorded responses across different tests. No need to warn or error under those circumstances. In the wild, this is not likely to happen at all (no evidence) so we aren't really hiding any problem.	2025-09-10 14:34:18 -07:00
ehhuang	e980436a2e	chore: introduce write queue for inference_store (#3383 ) # What does this PR do? Adds a write worker queue for writes to inference store. This avoids overwhelming request processing with slow inference writes. ## Test Plan Benchmark: ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ## RPS from 21 -> 57	2025-09-10 11:57:42 -07:00
Derek Higgins	5bbca56cfc	fix: Make SentenceTransformer embedding operations non-blocking (#3335 ) - Wrap model loading with asyncio.to_thread() to prevent blocking during model download/initialization - Wrap encoding operations with asyncio.to_thread() to run in background thread - Convert _load_sentence_transformer_model() to async method This ensures the async event loop remains responsive during embedding operations. Closes: #3332 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-09-04 13:58:41 -04:00
Matthew Farrellee	3d119a86d4	chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239 ) # What does this PR do? closes https://github.com/llamastack/llama-stack/issues/3236 mypy considered our default implementations (raise NotImplementedError) to be trivial. the result was we implemented the same stubs in providers. this change puts enough into the default impls so mypy considers them non-trivial. this allows us to remove the duplicate implementations.	2025-08-22 14:17:30 -07:00
Mustafa Elbehery	c3b2b06974	refactor(logging): rename llama_stack logger categories (#3065 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR renames categories of llama_stack loggers. This PR aligns logging categories as per the package name, as well as reviews from initial https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up to #3061. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Replaces https://github.com/meta-llama/llama-stack/pull/2868 Part of https://github.com/meta-llama/llama-stack/issues/2865 cc @leseb @rhuss Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 17:31:04 -07:00
Mustafa Elbehery	3f8df167f3	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 ) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-08-20 07:15:35 -04:00
Ashwin Bharambe	27d6becfd0	fix(misc): pin openai dependency to < 1.100.0 (#3192 ) This OpenAI client release `0843a11164` ends up breaking litellm `169a17400f/litellm/types/llms/openai.py (L40)` Update the dependency pin. Also make the imports a bit more defensive anyhow if something else during `llama stack build` ends up moving openai to a previous version. ## Test Plan Run pre-release script integration tests.	2025-08-18 12:20:50 -07:00
Derek Higgins	c15cc7ed77	fix: use ChatCompletionMessageFunctionToolCall (#3142 ) The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-14 10:27:00 -07:00
Matthew Farrellee	b70e2f1f09	fix(dep): update to openai >= 1.99.6 and use new Function location (#3087 ) # What does this PR do? closes #3072 ## Test Plan ci	2025-08-12 08:40:32 -07:00
Charlie Doern	0caef40e0d	fix: telemetry fixes (inference and core telemetry) (#2733 ) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-06 13:37:40 -07:00
Eran Cohen	e5b542dd8e	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Test External API and Providers / test-external (venv) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m10s Details	2025-08-03 12:08:56 -07:00
Matthew Farrellee	140ee7d337	fix: sambanova inference provider (#2996 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ```	2025-08-01 09:09:14 -07:00
Matthew Farrellee	218c89fff1	feat: Add clear error message when API key is missing (#2992 ) # What does this PR do? Improve user experience by providing specific guidance when no API key is available, showing both provider data header and config options with the correct field name for each provider. Also adds comprehensive test coverage for API key resolution scenarios. addresses #2990 for providers using litellm openai mixin ## Test Plan `./scripts/unit-tests.sh tests/unit/providers/inference/test_litellm_openai_mixin.py`	2025-07-31 16:33:16 -04:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Matthew Farrellee	47c078fcef	feat: implement dynamic model detection support for inference providers using litellm (#2886 ) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI.	2025-07-28 10:13:54 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Matthew Farrellee	e1ed152779	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests	2025-07-23 06:49:40 -04:00
Ashwin Bharambe	3b83032555	feat(registry): more flexible model lookup (#2859 ) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests	2025-07-22 15:22:48 -07:00
IAN MILLER	b57db11bed	feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Details Unit Tests / unit-tests (3.13) (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models	2025-07-16 12:49:38 -04:00
Matthew Farrellee	f731f369a2	feat: add infrastructure to allow inference model discovery (#2710 ) # What does this PR do? inference providers each have a static list of supported / known models. some also have access to a dynamic list of currently available models. this change gives prodivers using the ModelRegistryHelper the ability to combine their static and dynamic lists. for instance, OpenAIInferenceAdapter can implement ``` def query_available_models(self) -> list[str]: return [entry.model for entry in self.openai_client.models.list()] ``` to augment its static list w/ a current list from openai. ## Test Plan scripts/unit-test.sh	2025-07-14 11:38:53 -07:00
Sébastien Han	9b7eecebcf	ci: test safety with starter (#2628 ) Some checks failed Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 31s Details Python Package Build Test / build (3.13) (push) Failing after 29s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? We are now testing the safety capability with the starter image. This includes a few changes: * Enable the safety integration test * Relax the shield model requirements from llama-guard to make it work with llama-guard3:8b coming from Ollama * Expose a shield for each inference provider in the starter distro. The shield will only be registered if the provider is enabled. Closes: https://github.com/meta-llama/llama-stack/issues/2528 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-09 16:53:50 +02:00
Ashwin Bharambe	b333a3c03a	fix(ollama): Download remote image URLs for Ollama (#2551 ) Some checks failed Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s Details Python Package Build Test / build (3.12) (push) Failing after 43s Details Test External Providers / test-external-providers (venv) (push) Failing after 40s Details Python Package Build Test / build (3.13) (push) Failing after 42s Details Unit Tests / unit-tests (3.13) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 25s Details Update ReadTheDocs / update-readthedocs (push) Failing after 20s Details Pre-commit / pre-commit (push) Successful in 2m13s Details ## What does this PR do? Ollama does not support remote images. Only local file paths OR base64 inputs are supported. This PR ensures that the Stack downloads remote images and passes the base64 down to the inference engine. ## Test Plan Added a test cases for Responses and ran it for both `fireworks` and `ollama` providers.	2025-06-30 20:36:11 +05:30
Rohan Awhad	7cb5d3c60f	chore: standardize unsupported model error #2517 (#2518 ) # What does this PR do? - llama_stack/exceptions.py: Add UnsupportedModelError class - remote inference ollama.py and utils/inference/model_registry.py: Changed ValueError in favor of UnsupportedModelError - utils/inference/litellm_openai_mixin.py: remove `register_model` function implementation from `LiteLLMOpenAIMixin` class. Now uses the parent class `ModelRegistryHelper`'s function implementation Closes #2517 ## Test Plan 1. Create a new `test_run_openai.yaml` and paste the following config in it: ```yaml version: '2' image_name: test-image apis: - inference providers: inference: - provider_id: openai provider_type: remote::openai config: max_tokens: 8192 models: - metadata: {} model_id: "non-existent-model" provider_id: openai model_type: llm server: port: 8321 ``` And run the server with: ```bash uv run llama stack run test_run_openai.yaml ``` You should now get a `llama_stack.exceptions.UnsupportedModelError` with the supported list of models in the error message. --- Tested for the following remote inference providers, and they all raise the `UnsupportedModelError`: - Anthropic - Cerebras - Fireworks - Gemini - Groq - Ollama - OpenAI - SambaNova - Together - Watsonx --------- Co-authored-by: Rohan Awhad <rawhad@redhat.com>	2025-06-27 14:26:58 -04:00
Sébastien Han	ac5fd57387	chore: remove nested imports (#2515 ) # What does this PR do? * Given that our API packages use "import " in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import ` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:05 +05:30
ehhuang	1d3f27fe5b	fix: resume responses with tool call output (#2524 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 2m5s Details # What does this PR do? closes #2522 ## Test Plan added integration test LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v tests/integration/agents/test_openai_responses.py --text-model "accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k 'function_call'	2025-06-25 14:43:37 -07:00
ehhuang	d3b60507d7	feat: support auth attributes in inference/responses stores (#2389 ) # What does this PR do? Inference/Response stores now store user attributes when inserting, and respects them when fetching. ## Test Plan pytest tests/unit/utils/test_sqlstore.py	2025-06-20 10:24:45 -07:00
Charlie Doern	d12f195f56	feat: drop python 3.10 support (#2469 ) # What does this PR do? dropped python3.10, updated pyproject and dependencies, and also removed some blocks of code with special handling for enum.StrEnum Closes #2458 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-06-19 12:07:14 +05:30
ehhuang	15f630e5da	feat: support pagination in inference/responses stores (#2397 ) Some checks failed Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 27s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 44s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 41s Details Unit Tests / unit-tests (3.10) (push) Failing after 52s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? ## Test Plan added unit tests	2025-06-16 22:43:35 -07:00
Hardik Shah	985d0b156c	feat: Add `suffix` to openai_completions (#2449 ) Some checks failed Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Details Unit Tests / unit-tests (3.10) (push) Failing after 19s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 58s Details For code completion apps need "fill in the middle" capabilities. Added option of `suffix` to `openai_completion` to enable this. Updated ollama provider to showcase the same. ### Test Plan ``` pytest -sv --stack-config="inference=ollama" tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix ``` ### OpenAI Sample script ``` from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/openai/v1") response = client.completions.create( model="qwen2.5-coder:1.5b", prompt="The capital of ", suffix="is Paris.", max_tokens=10, ) print(response.choices[0].text) ``` ### Output ``` France is ____. To answer this question, we ```	2025-06-13 16:06:06 -07:00
Francisco Arceo	554ada57b0	chore: Add OpenAI compatibility for Ollama embeddings (#2440 ) # What does this PR do? This PR adds OpenAI compatibility for Ollama embeddings. Closes https://github.com/meta-llama/llama-stack/issues/2428 Summary of changes: - `llama_stack/providers/remote/inference/ollama/ollama.py` - Implements the OpenAI embeddings endpoint for Ollama, replacing the NotImplementedError with a full function that validates the model, prepares parameters, calls the client, encodes embedding data (optionally in base64), and returns a correctly structured response. - Updates import statements to include the new embedding response utilities. - `llama_stack/providers/utils/inference/litellm_openai_mixin.py` - Refactors the embedding data encoding logic to use a new shared utility (`b64_encode_openai_embeddings_response`) instead of inline base64 encoding and packing logic. - Cleans up imports accordingly. - `llama_stack/providers/utils/inference/openai_compat.py` - Adds `b64_encode_openai_embeddings_response` to handle encoding OpenAI embedding outputs (including base64 support) in a reusable way. - Adds `prepare_openai_embeddings_params` utility for standardizing embedding parameter preparation. - Updates imports to include the new embedding data class. - `tests/integration/inference/test_openai_embeddings.py` - Removes `"remote::ollama"` from the list of providers that skip OpenAI embeddings tests, since support is now implemented. ## Note There was one minor issue, which required me to override the `OpenAIEmbeddingsResponse.model` name with `self._get_model(model).identifier` name, which is very unsatisfying. ## Test Plan Unit Tests and integration tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-13 14:28:51 -04:00
Hardik Shah	b21050935e	feat: New OpenAI compat embeddings API (#2314 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, providers) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 44s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 47s Details Integration Tests / test-matrix (http, providers) (push) Failing after 45s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? Adds a new endpoint that is compatible with OpenAI for embeddings api. `/openai/v1/embeddings` Added providers for OpenAI, LiteLLM and SentenceTransformer. ## Test Plan ``` LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004 ```	2025-05-31 22:11:47 -07:00

1 2 3 4

163 commits