llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-28 00:01:59 +00:00

Author	SHA1	Message	Date
Ben Browning	9239b338d5	Add OLLAMA_EMBEDDING_MODEL to starter distro This allows a user to specify the Ollama Embedding Model to use, if any. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-25 09:16:01 -04:00
Ben Browning	404708e99d	fix: Ollama should be optional in starter distro Our starter distro required Ollama to be running (and a large list of models available in that Ollama) to successfully start. This adjusts things so that Ollama does not have to be running to use the starter template / distro. To accomplish this, a few changes were needed: * The Ollama provider is now configurable whether it raises an Exception or just logs a warning when it cannot reach the Ollama server on startup. The default is to raise an exception (same as previous behavior), but in the starter template we adjust this to just log a warning so that we can bring the stack up without needing a running Ollama server. * The starter template no longer specifies a default list of models for Ollama, as any models specified there need to actually be pulled and available in Ollama. Instead, it adds a new `OLLAMA_INFERENCE_MODEL` environment variable where users can provide an optional model to register with the Ollama provider on startup. Additional models can also be registered via the typical `models.register(...)` at runtime. * The vLLM template was adjusted to also allow an optional `VLLM_INFERENCE_MODEL` specified on startup, so that the behavior between vLLM and Ollama was consistent here to make it easy to get up and running quickly. * The default vector store was changed from sqlite-vec to faiss. sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment variable, like we do for chromadb and pgvector. This is due to sqlite-vec not shipping proper arm64 binaries, like we previously fixed in #1530 for the ollama distribution. With this change, the following scenarios now work with the starter template that did not before: * no Ollama running * Ollama running but not all of the Llama models pulled locally * Ollama running with a custom model registered on startup * vLLM running with a custom model registered on startup * running the starter template on linux/arm64, like when running containers on Mac without rosetta emulation Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-25 09:04:45 -04:00
Eran Cohen	747e594680	feat: expand set of known gemini models (#2471 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 39s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Python Package Build Test / build (3.12) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 45s Details Pre-commit / pre-commit (push) Successful in 1m57s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Python Package Build Test / build (3.11) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 3s Details feat: Add Gemini 2.0 and 2.5 models This commit expands the set of known Gemini models by introducing: - `gemini/gemini-2.0-flash` - `gemini/gemini-2.5-flash` - `gemini/gemini-2.5-pro` These new models are added to `LLM_MODEL_IDS` for broader compatibility and updated in `run.yaml` to allow for their immediate use in starter configurations. Signed-off-by: Eran Cohen <eranco@redhat.com>	2025-06-19 12:19:37 -04:00
Ben Browning	941f505eb0	feat: File search tool for Responses API (#2426 ) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-13 14:32:48 -04:00
Sébastien Han	c8c742ba45	fix: vllm starter name (#2392 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 2m3s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-04 16:21:36 +02:00
Ashwin Bharambe	cba55808ab	feat(distro): add more providers to starter distro, prefix conflicting models (#2362 ) The name changes to the verifications file are unfortunate, but maybe we don't need that @ehhuang ? Edit: deleted the verifications template now	2025-06-03 12:10:46 -07:00
Ashwin Bharambe	b380cb463f	feat: add postgres deps to starter distro (#2360 ) Once we have this, we can use the starter distro for the Kubernetes cluster demos.	2025-06-03 11:04:23 -07:00
Sébastien Han	6bb174bb05	revert: "chore: Remove zero-width space characters from OTEL service" (#2331 ) # What does this PR do? Revert #2060 and fix PLE2515. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-02 14:21:35 -07:00
ehhuang	2603f10f95	feat: support postgresql inference store (#2310 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 13s Details Integration Tests / test-matrix (http, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (http, agents) (push) Failing after 19s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 16s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 18s Details Integration Tests / test-matrix (http, inference) (push) Failing after 20s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? * Added support postgresql inference store * Added 'oracle' template that demos how to config postgresql stores (except for telemetry, which is not supported currently) ## Test Plan llama stack build --template oracle --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/ --text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k 'inference_store'	2025-05-29 14:33:09 -07:00
ehhuang	5844c2da68	feat: add list responses API (#2233 ) # What does this PR do? This is not part of the official OpenAI API, but we'll use this for the logs UI. In order to support more filtering options, I'm adopting the newly introduced sql store in in place of the kv store. ## Test Plan Added integration/unit tests.	2025-05-23 13:16:48 -07:00
ehhuang	549812f51e	feat: implement get chat completions APIs (#2200 ) # What does this PR do? * Provide sqlite implementation of the APIs introduced in https://github.com/meta-llama/llama-stack/pull/2145. * Introduced a SqlStore API: llama_stack/providers/utils/sqlstore/api.py and the first Sqlite implementation * Pagination support will be added in a future PR. ## Test Plan Unit test on sql store: <img width="1005" alt="image" src="https://github.com/user-attachments/assets/9b8b7ec8-632b-4667-8127-5583426b2e29" /> Integration test: ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type conda --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:5001 INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-fp16" -k 'inference_store and openai' ```	2025-05-21 22:21:52 -07:00
grs	b8f7e1504d	feat: allow the interface on which the server will listen to be configured (#2015 ) # What does this PR do? It may not always be desirable to listen on all interfaces, which is the default. As an example, by listening instead only on a loopback interface, the server cannot be reached except from within the host it is run on. This PR makes this configurable, through a CLI option, an env var or an entry on the config file. ## Test Plan I ran a server with and without the added CLI argument to verify that the argument is used if provided, but the default is as it was before if not. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-16 12:59:31 -07:00
Ashwin Bharambe	1a6d4af5e9	refactor: rename dev distro as starter (#2181 ) We want this to be a "flagship" distribution we can advertize to a segment of users to get started quickly. This distro should package a bunch of remote providers and some cheap inline providers so they get a solid "AI Platform in a box" setup instantly.	2025-05-15 12:52:34 -07:00

13 commits