llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 10:10:36 +00:00

Author	SHA1	Message	Date
Sébastien Han	dbdc811d16	chore: isolate bare minimum project dependencies (#2282 ) Some checks failed Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 19s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 48s Details # What does this PR do? The goal is to promote the minimal set of dependencies the project needs to run, this includes: * dependencies needed to work with the CLI * dependencies needed for the server to run with no providers This also: * Relocate redundant dependencies out of the core project and into the individual providers that actually require them. * Include all necessary server dependencies so the project can run standalone, even without any providers. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Build and run distro a server. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 10:14:27 +02:00
Sébastien Han	43c1f39bd6	refactor(env)!: enhanced environment variable substitution (#2490 ) # What does this PR do? This commit significantly improves the environment variable substitution functionality in Llama Stack configuration files: * The version field in configuration files has been changed from string to integer type for better type consistency across build and run configurations. * The environment variable substitution system for ${env.FOO:} was fixed and properly returns an error * The environment variable substitution system for ${env.FOO+} returns None instead of an empty strings, it better matches type annotations in config fields * The system includes automatic type conversion for boolean, integer, and float values. * The error messages have been enhanced to provide clearer guidance when environment variables are missing, including suggestions for using default values or conditional syntax. * Comprehensive documentation has been added to the configuration guide explaining all supported syntax patterns, best practices, and runtime override capabilities. * Multiple provider configurations have been updated to use the new conditional syntax for optional API keys, making the system more flexible for different deployment scenarios. The telemetry configuration has been improved to properly handle optional endpoints with appropriate validation, ensuring that required endpoints are specified when their corresponding sinks are enabled. * There were many instances of ${env.NVIDIA_API_KEY:} that should have caused the code to fail. However, due to a bug, the distro server was still being started, and early validation wasn’t triggered. As a result, failures were likely being handled downstream by the providers. I’ve maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I believe this is incorrect for many configurations. I’ll leave it to each provider to correct it as needed. * Environment variable substitution now uses the same syntax as Bash parameter expansion. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:20:08 +05:30
Sébastien Han	36d70637b9	fix: finish conversion to StrEnum (#2514 ) # What does this PR do? We still had a few enum declared to behave like string as well as enum. Let's use StrEnum for those. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:26 +05:30
Sébastien Han	ac5fd57387	chore: remove nested imports (#2515 ) # What does this PR do? * Given that our API packages use "import " in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import ` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:05 +05:30
Ben Browning	2d9fd041eb	fix: annotations list and web_search_preview in Responses (#2520 ) # What does this PR do? These are a couple of fixes to get an example LangChain app working with our OpenAI Responses API implementation. The Responses API spec requires an annotations array in `output[].content[].annotations` and we were not providing one. So, this adds that as an empty list, even though we don't do anything to populate it yet. This prevents an error from client libraries like Langchain that expect this field to always exist, even if an empty list. The other fix is `web_search_preview` is a valid name for the web search tool in the Responses API, but we only responded to `web_search` or `web_search_preview_2025_03_11`. ## Test Plan The existing Responses unit tests were expanded to test these cases, via: ``` pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` The existing test_openai_responses.py integration tests still pass with this change, tested as below with Fireworks: ``` uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv tests/integration/agents/test_openai_responses.py \ --text-model accounts/fireworks/models/llama4-scout-instruct-basic ``` Lastly, this example LangChain app now works with Llama stack (tested with Ollama in the starter template in this case). This LangChain code is using the example snippets for using Responses API at https://python.langchain.com/docs/integrations/chat/openai/#responses-api ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="fake", model="ollama/meta-llama/Llama-3.2-3B-Instruct", ) tool = {"type": "web_search_preview"} llm_with_tools = llm.bind_tools([tool]) response = llm_with_tools.invoke("What was a positive news story from today?") print(response.content) ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-26 07:59:33 +05:30
ehhuang	1d3f27fe5b	fix: resume responses with tool call output (#2524 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 2m5s Details # What does this PR do? closes #2522 ## Test Plan added integration test LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v tests/integration/agents/test_openai_responses.py --text-model "accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k 'function_call'	2025-06-25 14:43:37 -07:00
Francisco Arceo	82f13fe83e	feat: Add ChunkMetadata to Chunk (#2497 ) # What does this PR do? Adding `ChunkMetadata` so we can properly delete embeddings later. More specifically, this PR refactors and extends the chunk metadata handling in the vector database and introduces a distinction between metadata used for model context and backend-only metadata required for chunk management, storage, and retrieval. It also improves chunk ID generation and propagation throughout the stack, enhances test coverage, and adds new utility modules. ```python class ChunkMetadata(BaseModel): """ `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that will NOT be inserted into the context during inference, but is required for backend functionality. Use `metadata` in `Chunk` for metadata that will be used during inference. """ document_id: str \| None = None chunk_id: str \| None = None source: str \| None = None created_timestamp: int \| None = None updated_timestamp: int \| None = None chunk_window: str \| None = None chunk_tokenizer: str \| None = None chunk_embedding_model: str \| None = None chunk_embedding_dimension: int \| None = None content_token_count: int \| None = None metadata_token_count: int \| None = None ``` Eventually we can migrate the document_id out of the `metadata` field. I've introduced the changes so that `ChunkMetadata` is backwards compatible with `metadata`. <!-- If resolving an issue, uncomment and update the line below --> Closes https://github.com/meta-llama/llama-stack/issues/2501 ## Test Plan Added unit tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-25 15:55:23 -04:00
Ben Browning	fa0b0c13d4	fix: Ollama should be optional in starter distro (#2482 ) Some checks failed Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 1m10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m8s Details Python Package Build Test / build (3.13) (push) Failing after 1m6s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m4s Details Pre-commit / pre-commit (push) Successful in 2m33s Details # What does this PR do? Our starter distro required Ollama to be running (and a large list of models available in that Ollama) to successfully start. This adjusts things so that Ollama does not have to be running to use the starter template / distro. To accomplish this, a few changes were needed: * The Ollama provider is now configurable whether it raises an Exception or just logs a warning when it cannot reach the Ollama server on startup. The default is to raise an exception (same as previous behavior), but in the starter template we adjust this to just log a warning so that we can bring the stack up without needing a running Ollama server. * The starter template no longer specifies a default list of models for Ollama, as any models specified there need to actually be pulled and available in Ollama. Instead, it adds a new `OLLAMA_INFERENCE_MODEL` environment variable where users can provide an optional model to register with the Ollama provider on startup. Additional models can also be registered via the typical `models.register(...)` at runtime. * The vLLM template was adjusted to also allow an optional `VLLM_INFERENCE_MODEL` specified on startup, so that the behavior between vLLM and Ollama was consistent here to make it easy to get up and running quickly. * The default vector store was changed from sqlite-vec to faiss. sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment variable, like we do for chromadb and pgvector. This is due to sqlite-vec not shipping proper arm64 binaries, like we previously fixed in #1530 for the ollama distribution. ## Test Plan With this change, the following scenarios now work with the starter template that did not before: * no Ollama running * Ollama running but not all of the Llama models pulled locally * Ollama running with a custom model registered on startup * vLLM running with a custom model registered on startup * running the starter template on linux/arm64, like when running containers on Mac without rosetta emulation --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-25 15:54:00 +02:00
Varsha	cfee63bd0d	feat: Add search_mode support to OpenAI vector store API (#2500 ) Some checks failed Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s Details Test Llama Stack Build / build (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 44s Details Test External Providers / test-external-providers (venv) (push) Failing after 47s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m12s Details # What does this PR do? Add search_mode parameter (vector/keyword/hybrid) to openai_search_vector_store method. Fixes OpenAPI code generation by using str instead of Literal type. Closes: #2459 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-24 20:38:47 -04:00
Ashwin Bharambe	73c18feac4	fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503 )	2025-06-24 18:55:56 +05:30
Sébastien Han	9c8be89fb6	chore: bump python supported version to 3.12 (#2475 ) Some checks failed Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 31s Details Pre-commit / pre-commit (push) Successful in 1m54s Details # What does this PR do? The project now supports Python >= 3.12 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-24 09:22:04 +05:30
ehhuang	d3b60507d7	feat: support auth attributes in inference/responses stores (#2389 ) # What does this PR do? Inference/Response stores now store user attributes when inserting, and respects them when fetching. ## Test Plan pytest tests/unit/utils/test_sqlstore.py	2025-06-20 10:24:45 -07:00
Eran Cohen	747e594680	feat: expand set of known gemini models (#2471 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 39s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Python Package Build Test / build (3.12) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 45s Details Pre-commit / pre-commit (push) Successful in 1m57s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Python Package Build Test / build (3.11) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 3s Details feat: Add Gemini 2.0 and 2.5 models This commit expands the set of known Gemini models by introducing: - `gemini/gemini-2.0-flash` - `gemini/gemini-2.5-flash` - `gemini/gemini-2.5-pro` These new models are added to `LLM_MODEL_IDS` for broader compatibility and updated in `run.yaml` to allow for their immediate use in starter configurations. Signed-off-by: Eran Cohen <eranco@redhat.com>	2025-06-19 12:19:37 -04:00
Ben Browning	f394c7f2d9	feat: Add missing Vector Store Files API surface (#2468 ) Some checks failed Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s Details Python Package Build Test / build (3.11) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s Details Test External Providers / test-external-providers (venv) (push) Failing after 43s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 2m4s Details # What does this PR do? This adds the ability to list, retrieve, update, and delete Vector Store Files. It implements these new APIs for the faiss and sqlite-vec providers, since those are the two that also have the rest of the vector store files implementation. Closes #2445 ## Test Plan ### test_openai_vector_stores Integration Tests There are a number of new integration tests added, which I ran for each provider as outlined below. faiss (from ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` sqlite-vec (from starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` ### file_search verification tests I also ensured the file_search verification tests continue to work, both for faiss and sqlite-vec. faiss (ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` sqlite-vec (starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-19 11:08:24 -04:00
Ihar Hrachyshka	a2f054607d	fix: cancel scheduler tasks on shutdown (#2130 ) # What does this PR do? Scheduler: cancel tasks on shutdown. Otherwise the currently running tasks will never exit (before they actually complete), which means the process can't be properly shut down (only with SIGKILL). Ideally, we let tasks know that they are about to shutdown and give them some time to do so; but in the lack of the mechanism, it's better to cancel than linger forever. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Start a long running task (e.g. torchtune or external kfp-provider training). Ctr-C the process in TTY. Confirm it exits in reasonable time. ``` ^CINFO: Shutting down INFO: Waiting for application shutdown. 13:32:26.187 - INFO - Shutting down 13:32:26.187 - INFO - Shutting down DatasetsRoutingTable 13:32:26.187 - INFO - Shutting down DatasetIORouter 13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module> executor_main() File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main output_file = executor.execute() ^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute result = self.func(**func_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component asyncio.run(recipe.setup()) File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run raise KeyboardInterrupt() KeyboardInterrupt 13:32:31.219 - ERROR - Task 'component' finished with status FAILURE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ INFO 2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m. ERROR 2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed. ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45 │ │ in do │ │ │ │ 42 │ │ │ │ │ 43 │ │ │ job.status = JobStatus.running │ │ 44 │ │ │ try: │ │ ❱ 45 │ │ │ │ artifacts = self._to_artifacts(job.handler().output) │ │ 46 │ │ │ │ for artifact in artifacts: │ │ 47 │ │ │ │ │ on_artifact_collected_cb(artifact) │ │ 48 │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │ │ ent.py:101 in __call__ │ │ │ │ 98 │ │ │ │ f'{self.name}() missing {len(missing_arguments)} required ' │ │ 99 │ │ │ │ f'{argument_or_arguments}: {arguments}.') │ │ 100 │ │ │ │ ❱ 101 │ │ return pipeline_task.PipelineTask( │ │ 102 │ │ │ component_spec=self.component_spec, │ │ 103 │ │ │ args=task_inputs, │ │ 104 │ │ │ execute_locally=pipeline_context.Pipeline.get_default_pipeline() is │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │ │ sk.py:187 in __init__ │ │ │ │ 184 │ │ ]) │ │ 185 │ │ │ │ 186 │ │ if execute_locally: │ │ ❱ 187 │ │ │ self._execute_locally(args=args) │ │ 188 │ │ │ 189 │ def _execute_locally(self, args: Dict[str, Any]) -> None: │ │ 190 │ │ """Execute the pipeline task locally. │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │ │ sk.py:197 in _execute_locally │ │ │ │ 194 │ │ from kfp.local import task_dispatcher │ │ 195 │ │ │ │ 196 │ │ if self.pipeline_spec is not None: │ │ ❱ 197 │ │ │ self._outputs = pipeline_orchestrator.run_local_pipeline( │ │ 198 │ │ │ │ pipeline_spec=self.pipeline_spec, │ │ 199 │ │ │ │ arguments=args, │ │ 200 │ │ │ ) │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:43 in run_local_pipeline │ │ │ │ 40 │ │ │ 41 │ # validate and access all global state in this function, not downstream │ │ 42 │ config.LocalExecutionConfig.validate() │ │ ❱ 43 │ return _run_local_pipeline_implementation( │ │ 44 │ │ pipeline_spec=pipeline_spec, │ │ 45 │ │ arguments=arguments, │ │ 46 │ │ raise_on_error=config.LocalExecutionConfig.instance.raise_on_error, │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:108 in _run_local_pipeline_implementation │ │ │ │ 105 │ │ │ ) │ │ 106 │ │ return outputs │ │ 107 │ elif dag_status == status.Status.FAILURE: │ │ ❱ 108 │ │ log_and_maybe_raise_for_failure( │ │ 109 │ │ │ pipeline_name=pipeline_name, │ │ 110 │ │ │ fail_stack=fail_stack, │ │ 111 │ │ │ raise_on_error=raise_on_error, │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:137 in log_and_maybe_raise_for_failure │ │ │ │ 134 │ │ logging_utils.format_task_name(task_name) for task_name in fail_stack) │ │ 135 │ msg = f'Pipeline {pipeline_name_with_color} finished with status │ │ {status_with_color}. Inner task failed: {task_chain_with_color}.' │ │ 136 │ if raise_on_error: │ │ ❱ 137 │ │ raise RuntimeError(msg) │ │ 138 │ with logging_utils.local_logger_context(): │ │ 139 │ │ logging.error(msg) │ │ 140 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m. INFO 2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down DistributionInspectImpl INFO 2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl INFO: Application shutdown complete. INFO: Finished server process [26648] ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-06-19 17:01:33 +02:00
Sébastien Han	c20388c424	ci: add python package build test (#2457 ) # What does this PR do? We now test a package build on every PRs. Closes: https://github.com/meta-llama/llama-stack/issues/2406 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-19 18:57:32 +05:30
Charlie Doern	d12f195f56	feat: drop python 3.10 support (#2469 ) # What does this PR do? dropped python3.10, updated pyproject and dependencies, and also removed some blocks of code with special handling for enum.StrEnum Closes #2458 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-06-19 12:07:14 +05:30
ehhuang	db2cd9e8f3	feat: support filters in file search (#2472 ) # What does this PR do? Move to use vector_stores.search for file search tool in Responses, which supports filters. closes #2435 ## Test Plan Added e2e test with fitlers. myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search and filters' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct	2025-06-18 21:50:55 -07:00
Sumit Jaiswal	90d03552d4	feat: To add health check for faiss inline vector_io provider (#2319 ) Some checks failed Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m11s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m13s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m9s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Pre-commit / pre-commit (push) Successful in 1m52s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health check for faiss inline vector_io provider. I tried adding `async def health(self) -> HealthResponse:` like in inference provider, but it didn't worked for `inline->vector_io->faiss` provider. And via debug logs, I understood the critical issue, that the health responses are being stored with the API name as the key, not as a nested dictionary with provider IDs. This means that all providers of the same API type (e.g., "vector_io") will share the same health response, and only the last one processed will be visible in the API response. I've created a patch file that fixes this issue by: - Storing the original get_providers_health method - Creating a patched version that correctly maps health responses to providers - Applying the patch to the `ProviderImpl` class Not an expert, so please let me know, if there can be any other workaround using which I can get the health status updated directly from `faiss.py`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Added unit tests to test the provider patch implementation in the PR. Adding a screenshot with the FAISS inline vector_io health status as "OK" ![faiss_health_check](https://github.com/user-attachments/assets/d769e762-890c-41ea-a596-5e90951f79a4)	2025-06-18 17:56:25 +02:00
ehhuang	15f630e5da	feat: support pagination in inference/responses stores (#2397 ) Some checks failed Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 27s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 44s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 41s Details Unit Tests / unit-tests (3.10) (push) Failing after 52s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? ## Test Plan added unit tests	2025-06-16 22:43:35 -07:00
Varsha	6f1a935365	chore: Add OpenAI compatiblity for vLLM embeddings (#2448 ) Some checks failed Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.11) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 47s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 49s Details Test Llama Stack Build / build-single-provider (push) Failing after 38s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s Details Pre-commit / pre-commit (push) Successful in 1m38s Details # What does this PR do? - Implement OpenAI-compatible embeddings endpoint in vLLM provider - Support both float and base64 encoding formats - Add proper error handling and response formatting <!-- If resolving an issue, uncomment and update the line below --> Closes #2447 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-16 19:06:05 -04:00
Jash Gulabrai	40e2c97915	feat: Add Nvidia e2e beginner notebook and tool calling notebook (#1964 ) # What does this PR do? This PR contains two sets of notebooks that serve as reference material for developers getting started with Llama Stack using the NVIDIA Provider. Developers should be able to execute these notebooks end-to-end, pointing to their NeMo Microservices deployment. 1. `beginner_e2e/`: Notebook that walks through a beginner end-to-end workflow that covers creating datasets, running inference, customizing and evaluating models, and running safety checks. 2. `tool_calling/`: Notebook that is ported over from the [Data Flywheel & Tool Calling notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/data-flywheel) that is referenced in the NeMo Microservices docs. I updated the notebook to use the Llama Stack client wherever possible, and added relevant instructions. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - Both notebook folders contain READMEs with pre-requisites. To manually test these notebooks, you'll need to have a deployment of the NeMo Microservices Platform and update the `config.py` file with your deployment's information. - I've run through these notebooks manually end-to-end to verify each step works. [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-06-16 11:29:01 -04:00
Hardik Shah	985d0b156c	feat: Add `suffix` to openai_completions (#2449 ) Some checks failed Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Details Unit Tests / unit-tests (3.10) (push) Failing after 19s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 58s Details For code completion apps need "fill in the middle" capabilities. Added option of `suffix` to `openai_completion` to enable this. Updated ollama provider to showcase the same. ### Test Plan ``` pytest -sv --stack-config="inference=ollama" tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix ``` ### OpenAI Sample script ``` from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/openai/v1") response = client.completions.create( model="qwen2.5-coder:1.5b", prompt="The capital of ", suffix="is Paris.", max_tokens=10, ) print(response.choices[0].text) ``` ### Output ``` France is ____. To answer this question, we ```	2025-06-13 16:06:06 -07:00
Varsha	2e8054bede	feat: Implement hybrid search in SQLite-vec (#2312 ) Some checks failed Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s Details Test Llama Stack Build / generate-matrix (push) Successful in 37s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.11) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.10) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? Add support for hybrid search mode in SQLite-vec provider, which combines keyword and vector search for better results. The implementation: - Adds hybrid search mode as a new option alongside vector and keyword search - Implements query_hybrid method in SQLiteVecIndex that: - First performs keyword search to get candidate matches - Then applies vector similarity search on those candidates - Updates documentation to reflect the new search mode This change improves search quality by leveraging both semantic similarity and keyword matching, while maintaining backward compatibility with existing vector and keyword search modes. ## Test Plan ``` pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) =============================================================================================== test session starts =============================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 10 items tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-13 15:54:06 -04:00
Ben Browning	941f505eb0	feat: File search tool for Responses API (#2426 ) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-13 14:32:48 -04:00
Francisco Arceo	554ada57b0	chore: Add OpenAI compatibility for Ollama embeddings (#2440 ) # What does this PR do? This PR adds OpenAI compatibility for Ollama embeddings. Closes https://github.com/meta-llama/llama-stack/issues/2428 Summary of changes: - `llama_stack/providers/remote/inference/ollama/ollama.py` - Implements the OpenAI embeddings endpoint for Ollama, replacing the NotImplementedError with a full function that validates the model, prepares parameters, calls the client, encodes embedding data (optionally in base64), and returns a correctly structured response. - Updates import statements to include the new embedding response utilities. - `llama_stack/providers/utils/inference/litellm_openai_mixin.py` - Refactors the embedding data encoding logic to use a new shared utility (`b64_encode_openai_embeddings_response`) instead of inline base64 encoding and packing logic. - Cleans up imports accordingly. - `llama_stack/providers/utils/inference/openai_compat.py` - Adds `b64_encode_openai_embeddings_response` to handle encoding OpenAI embedding outputs (including base64 support) in a reusable way. - Adds `prepare_openai_embeddings_params` utility for standardizing embedding parameter preparation. - Updates imports to include the new embedding data class. - `tests/integration/inference/test_openai_embeddings.py` - Removes `"remote::ollama"` from the list of providers that skip OpenAI embeddings tests, since support is now implemented. ## Note There was one minor issue, which required me to override the `OpenAIEmbeddingsResponse.model` name with `self._get_model(model).identifier` name, which is very unsatisfying. ## Test Plan Unit Tests and integration tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-13 14:28:51 -04:00
Hardik Shah	fef670b024	feat: update openai tests to work with both clients (#2442 ) Some checks failed Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 18s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 1m45s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m46s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m1s Details Unit Tests / unit-tests (3.10) (push) Failing after 2m3s Details Pre-commit / pre-commit (push) Successful in 3m11s Details https://github.com/meta-llama/llama-stack-client-python/pull/238 updated llama-stack-client to also support Open AI endpoints for embeddings, files, vector-stores. This updates the test to test all configs -- openai sdk, llama stack sdk and library-as-client.	2025-06-12 16:30:23 -07:00
Hardik Shah	0bc1747ed8	feat: update search for vector_stores (#2441 ) Updated the `search` functionality return response to match openai. ## Test Plan ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-12 15:34:22 -07:00
Ibrahim Haroon	35c2817d0a	fix(weaviate): handle case where distance is 0 by setting score to infinity (#2415 ) Some checks failed Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 39s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 42s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 38s Details Integration Tests / test-matrix (http, 3.10, providers) (push) Failing after 46s Details Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 44s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 42s Details Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 43s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 40s Details Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 39s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m3s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m12s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m10s Details Pre-commit / pre-commit (push) Successful in 2m23s Details # What does this PR do? Fixes provider weaviate `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then sets `score` to infinity, which represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="weaviate", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ```	2025-06-12 11:23:59 -04:00
Hardik Shah	de37a04c3e	fix: set appropriate defaults for params (#2434 ) Some checks failed Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 20s Details Update ReadTheDocs / update-readthedocs (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m39s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m37s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m41s Details Pre-commit / pre-commit (push) Failing after 3h4m8s Details Setting defaults to be `\| None` else they get marked as required params in open-api spec.	2025-06-11 17:30:34 -07:00
Hardik Shah	d55100d9b7	feat: OpenAIVectorIOMixin for vector_stores common logic (#2427 ) Extracts common OpenAI vector-store code into its own mixin so that all providers can share the same core logic. This also makes it easy for Llama Stack to support both vector-stores and Llama Stack APIs in the interim so that both share the same underlying vector-dbs. Each provider contains storage specific logic to `create / edit / delete / list` vector dbs while the plumbing logic is standardized in the common code. Ensured that this works well with both faiss and sqllite-vec. ### Test Plan ``` llama stack run starter pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-11 15:40:57 -07:00
Hardik Shah	5ac43268e8	feat: Add OpenAI compat /v1/vector_store APIs (#2423 ) Some checks failed Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 46s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 16s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m31s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m33s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m35s Details Pre-commit / pre-commit (push) Failing after 3h13m41s Details Adding OpenAI compat `/v1/vector-store` apis. This PR implements the `faiss` provider with followup PRs coming up for other providers. Added routes to create, update, delete, list vector stores. Also added route to search a vector store Inserting into vector stores is missing and will be a follow up diff. ### Test Plan - Added new integration test for testing the faiss provider ``` pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-10 13:07:39 -07:00
Ibrahim Haroon	28ca00d0d9	fix(pgvector): handle case where distance is 0 by setting score to infinity (#2416 ) Some checks failed Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? Fixes provider pgvector `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then sets `score` to infinity, which represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="pgvector", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ```	2025-06-07 16:31:30 -04:00
Ibrahim Haroon	a34cef925b	fix(faiss): handle case where distance is 0 by setting d to minimum positive… (#2387 ) # What does this PR do? Adds try-catch to faiss `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then appends `(1.0 / sys.float_info.min)` to `scores` to represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="faiss", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ``` ### Running unit tests `uv run pytest tests/unit/rag/test_rag_query.py -v` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com>	2025-06-07 16:09:46 -04:00
Sumit Jaiswal	33ecefd284	feat: To add health status check for remote VLLM (#2303 ) Some checks failed Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 56s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health status check for remote VLLM <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> PR includes the unit test to test the added health check implementation feature.	2025-06-06 15:33:12 -04:00
Hardik Shah	1f48577a02	fix: ChromaDB provider (#2413 ) fixes the remote::chromaDB provider for vector_io by updating the method definition appropriately. Fixed impl to use score_threshold properly. ### Test Plan ``` # Start Chroma Docker docker run --rm \ --name chromadb \ -p 8800:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # run pytest CHROMADB_URL="http://localhost:8800" pytest -sv tests/integration/vector_io/test_vector_io.py --stack-config vector_io=remote::chromadb,inference=fireworks --embedding-model nomic-ai/nomic-embed-text-v1.5 ```	2025-06-06 11:25:58 -07:00
ehhuang	446893f791	feat: add deps dynamically based on metastore config (#2405 ) # What does this PR do? ## Test Plan changed metastore in one of the templates, rerun distro gen, observe change in build.yaml	2025-06-05 14:07:25 -07:00
Ashwin Bharambe	3251b44d8a	refactor: unify stream and non-stream impls for responses (#2388 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m18s Details The non-streaming version is just a small layer on top of the streaming version - just pluck off the final `response.completed` event and return that as the response! This PR also includes a couple other changes which I ended up making while working on it on a flight: - changes to `ollama` so it does not pull embedding models unconditionally - a small fix to library client to make the stream and non-stream cases a bit more symmetric	2025-06-05 17:48:09 +02:00
Ashwin Bharambe	ed69c1b3cc	feat(responses): add more streaming response types (#2375 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 1m21s Details	2025-06-03 15:48:41 -07:00
grs	7c1998db25	feat: fine grained access control policy (#2264 ) This allows a set of rules to be defined for determining access to resources. The rules are (loosely) based on the cedar policy format. A rule defines a list of action either to permit or to forbid. It may specify a principal or a resource that must match for the rule to take effect. It may also specify a condition, either a 'when' or an 'unless', with additional constraints as to where the rule applies. A list of rules is held for each type to be protected and tried in order to find a match. If a match is found, the request is permitted or forbidden depening on the type of rule. If no match is found, the request is denied. If no rules are specified for a given type, a rule that allows any action as long as the resource attributes match the user attributes is added (i.e. the previous behaviour is the default. Some examples in yaml: ``` model: - permit: principal: user-1 actions: [create, read, delete] comment: user-1 has full access to all models - permit: principal: user-2 actions: [read] resource: model-1 comment: user-2 has read access to model-1 only - permit: actions: [read] when: user_in: resource.namespaces comment: any user has read access to models with matching attributes vector_db: - forbid: actions: [create, read, delete] unless: user_in: role::admin comment: only user with admin role can use vector_db resources ``` --------- Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-06-03 14:51:12 -07:00
Ben Browning	8bee2954be	feat: Structured output for Responses API (#2324 ) # What does this PR do? This adds the missing `text` parameter to the Responses API that is how users control structured outputs. All we do with that parameter is map it to the corresponding chat completion response_format. ## Test Plan The new unit tests exercise the various permutations allowed for this property, while a couple of new verification tests actually use it for real to verify the model outputs are following the format as expected. Unit tests: `python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -vv 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Note that the verification tests can only be run with a real Llama Stack server (as opposed to using the library client via `--provider=stack:together`) because the Llama Stack python client is not yet updated to accept this text field. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-03 14:43:00 -07:00
Ashwin Bharambe	cba55808ab	feat(distro): add more providers to starter distro, prefix conflicting models (#2362 ) The name changes to the verifications file are unfortunate, but maybe we don't need that @ehhuang ? Edit: deleted the verifications template now	2025-06-03 12:10:46 -07:00
Ashwin Bharambe	b380cb463f	feat: add postgres deps to starter distro (#2360 ) Once we have this, we can use the starter distro for the Kubernetes cluster demos.	2025-06-03 11:04:23 -07:00
ehhuang	3c9a10d2fe	feat: reference implementation for files API (#2330 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 53s Details # What does this PR do? TSIA Added Files provider to the fireworks template. Might want to add to all templates as a follow-up. ## Test Plan llama-stack pytest tests/unit/files/test_files.py llama-stack llama stack build --template fireworks --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/files/	2025-06-02 21:54:24 -07:00
Ben Browning	e92f571f47	fix: ollama chat completion needs unique ids (#2344 ) # What does this PR do? The chat completion ids generated by Ollama are not unique enough to use with stored chat completions as they rely on only 3 numbers of randomness to give unique values - ie `chatcmpl-373`. This causes frequent collisions in id values of chat completions in Ollama, which creates issues in our SQL storage of chat completions by id where it expects ids to actually be unique. So, this adjusts Ollama responses to use uuids as unique ids. This does mean we're replacing the ids generated natively by Ollama. If we don't wish to do this, we'll either need to relax the unique constraint on our chat completions id field in the inference storage or convince Ollama upstream to use something closer to uuid values here. Closes #2315 ## Test Plan I tested by running the openai completion / chat completion integration tests in a loop. Without this change, I regularly get unique id collisions. With this change, I do not. We sometimes see flakes from these unique id collisions in our CI tests, and this will resolve those. ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml while true; do; \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ pytest -s -v \ tests/integration/inference/test_openai_completion.py \ --stack-config=http://localhost:8321 \ --text-model="meta-llama/Llama-3.2-3B-Instruct"; \ done ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-02 17:43:20 -07:00
Ashwin Bharambe	dbe4e84aca	feat(responses): implement full multi-turn support (#2295 ) I think the implementation needs more simplification. Spent way too much time trying to get the tests pass with models not co-operating :( Finally had to switch claude-sonnet to get things to pass reliably. ### Test Plan ``` export TAVILY_SEARCH_API_KEY=... export OPENAI_API_KEY=... uv run pytest -p no:warnings \ -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:starter \ --model openai/gpt-4o ```	2025-06-02 15:35:49 -07:00
Sébastien Han	6bb174bb05	revert: "chore: Remove zero-width space characters from OTEL service" (#2331 ) # What does this PR do? Revert #2060 and fix PLE2515. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-02 14:21:35 -07:00
Hardik Shah	3511af7c33	fix: fireworks provider for openai compat inference endpoint (#2335 ) fixes provider to use stream var correctly Before ``` curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"detail":"Internal server error: An unexpected error occurred."} ``` After ``` llama-stack % curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "accounts/fireworks/models/llama4-scout-instruct-basic", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"id":"chatcmpl-97978538-271d-4c73-8d4d-c509bfb6c87e","choices":[{"message":{"role":"assistant","content":"I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1748896403,"model":"accounts/fireworks/models/llama4-scout-instruct-basic"}% ```	2025-06-02 14:11:15 -07:00
Sébastien Han	1c0c6e1e17	chore: remove usage of load_tiktoken_bpe (#2276 )	2025-06-02 07:33:37 -07:00
Hardik Shah	b21050935e	feat: New OpenAI compat embeddings API (#2314 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, providers) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 44s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 47s Details Integration Tests / test-matrix (http, providers) (push) Failing after 45s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? Adds a new endpoint that is compatible with OpenAI for embeddings api. `/openai/v1/embeddings` Added providers for OpenAI, LiteLLM and SentenceTransformer. ## Test Plan ``` LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004 ```	2025-05-31 22:11:47 -07:00

1 2 3 4 5 ...

757 commits