llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-09 23:25:58 +00:00

Author	SHA1	Message	Date
Eran Cohen	747e594680	feat: expand set of known gemini models (#2471 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 39s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Python Package Build Test / build (3.12) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 45s Details Pre-commit / pre-commit (push) Successful in 1m57s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Python Package Build Test / build (3.11) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 3s Details feat: Add Gemini 2.0 and 2.5 models This commit expands the set of known Gemini models by introducing: - `gemini/gemini-2.0-flash` - `gemini/gemini-2.5-flash` - `gemini/gemini-2.5-pro` These new models are added to `LLM_MODEL_IDS` for broader compatibility and updated in `run.yaml` to allow for their immediate use in starter configurations. Signed-off-by: Eran Cohen <eranco@redhat.com>	2025-06-19 12:19:37 -04:00
Ben Browning	941f505eb0	feat: File search tool for Responses API (#2426 ) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-13 14:32:48 -04:00
Sébastien Han	c8c742ba45	fix: vllm starter name (#2392 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 2m3s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-04 16:21:36 +02:00
Ashwin Bharambe	cba55808ab	feat(distro): add more providers to starter distro, prefix conflicting models (#2362 ) The name changes to the verifications file are unfortunate, but maybe we don't need that @ehhuang ? Edit: deleted the verifications template now	2025-06-03 12:10:46 -07:00
Ashwin Bharambe	b380cb463f	feat: add postgres deps to starter distro (#2360 ) Once we have this, we can use the starter distro for the Kubernetes cluster demos.	2025-06-03 11:04:23 -07:00
Sébastien Han	6bb174bb05	revert: "chore: Remove zero-width space characters from OTEL service" (#2331 ) # What does this PR do? Revert #2060 and fix PLE2515. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-02 14:21:35 -07:00
ehhuang	2603f10f95	feat: support postgresql inference store (#2310 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 13s Details Integration Tests / test-matrix (http, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (http, agents) (push) Failing after 19s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 16s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 18s Details Integration Tests / test-matrix (http, inference) (push) Failing after 20s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? * Added support postgresql inference store * Added 'oracle' template that demos how to config postgresql stores (except for telemetry, which is not supported currently) ## Test Plan llama stack build --template oracle --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/ --text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k 'inference_store'	2025-05-29 14:33:09 -07:00
ehhuang	5844c2da68	feat: add list responses API (#2233 ) # What does this PR do? This is not part of the official OpenAI API, but we'll use this for the logs UI. In order to support more filtering options, I'm adopting the newly introduced sql store in in place of the kv store. ## Test Plan Added integration/unit tests.	2025-05-23 13:16:48 -07:00
ehhuang	549812f51e	feat: implement get chat completions APIs (#2200 ) # What does this PR do? * Provide sqlite implementation of the APIs introduced in https://github.com/meta-llama/llama-stack/pull/2145. * Introduced a SqlStore API: llama_stack/providers/utils/sqlstore/api.py and the first Sqlite implementation * Pagination support will be added in a future PR. ## Test Plan Unit test on sql store: <img width="1005" alt="image" src="https://github.com/user-attachments/assets/9b8b7ec8-632b-4667-8127-5583426b2e29" /> Integration test: ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type conda --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:5001 INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-fp16" -k 'inference_store and openai' ```	2025-05-21 22:21:52 -07:00
grs	b8f7e1504d	feat: allow the interface on which the server will listen to be configured (#2015 ) # What does this PR do? It may not always be desirable to listen on all interfaces, which is the default. As an example, by listening instead only on a loopback interface, the server cannot be reached except from within the host it is run on. This PR makes this configurable, through a CLI option, an env var or an entry on the config file. ## Test Plan I ran a server with and without the added CLI argument to verify that the argument is used if provided, but the default is as it was before if not. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-16 12:59:31 -07:00
Ashwin Bharambe	1a6d4af5e9	refactor: rename dev distro as starter (#2181 ) We want this to be a "flagship" distribution we can advertize to a segment of users to get started quickly. This distro should package a bunch of remote providers and some cheap inline providers so they get a solid "AI Platform in a box" setup instantly.	2025-05-15 12:52:34 -07:00

11 commits