llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Angel Nunez Mencias	b174effe05	fix security update All checks were successful Build and Push playground container / build-playground (push) Successful in 1m57s Details Build and Push container / build (push) Successful in 4m33s Details	2025-06-03 20:07:06 +02:00
Angel Nunez Mencias	8943b283e9	fix install All checks were successful Build and Push playground container / build-playground (push) Successful in 2m2s Details Build and Push container / build (push) Successful in 4m7s Details	2025-06-02 03:02:25 +02:00
Angel Nunez Mencias	08905fc937	add requirements Some checks failed Build and Push playground container / build-playground (push) Failing after 35s Details Build and Push container / build (push) Has been cancelled Details	2025-06-02 03:01:15 +02:00
Angel Nunez Mencias	8b5b1c937b	update ui command All checks were successful Build and Push playground container / build-playground (push) Successful in 1m47s Details Build and Push container / build (push) Successful in 4m9s Details	2025-06-02 02:54:55 +02:00
Angel Nunez Mencias	205fc2cbd1	include all All checks were successful Build and Push playground container / build-playground (push) Successful in 1m42s Details Build and Push container / build (push) Successful in 4m6s Details	2025-06-02 02:49:54 +02:00
Angel Nunez Mencias	4a122bbaca	use own code	2025-06-02 02:49:45 +02:00
Angel Nunez Mencias	a77b554bcf	update requiements All checks were successful Build and Push playground container / build-playground (push) Successful in 2m7s Details Build and Push container / build (push) Successful in 4m32s Details	2025-06-02 02:34:19 +02:00
Angel Nunez Mencias	51816af52e	use env file All checks were successful Build and Push playground container / build-playground (push) Successful in 1m6s Details Build and Push container / build (push) Successful in 4m22s Details	2025-06-02 01:39:17 +02:00
Angel Nunez Mencias	96003b55de	use auth for kvant All checks were successful Build and Push playground container / build-playground (push) Successful in 1m6s Details Build and Push container / build (push) Successful in 4m27s Details	2025-06-02 01:23:33 +02:00
Angel Nunez Mencias	3bde47e562	add keycloak auth to playground ui All checks were successful Build and Push playground container / build-playground (push) Successful in 2m0s Details Build and Push container / build (push) Successful in 4m11s Details	2025-06-01 22:23:49 +02:00
Angel Nunez Mencias	ed31462499	ci tag All checks were successful Build and Push playground container / build-playground (push) Successful in 1m5s Details Build and Push container / build (push) Successful in 4m13s Details	2025-06-01 13:38:22 +02:00
Angel Nunez Mencias	43a7713140	use raw tag Some checks failed Build and Push playground container / build-playground (push) Failing after 27s Details Build and Push container / build (push) Failing after 3m21s Details	2025-06-01 13:23:17 +02:00
Angel Nunez Mencias	ad9860c312	fix ci Some checks failed Build and Push playground container / build-playground (push) Failing after 24s Details Build and Push container / build (push) Failing after 3m32s Details	2025-06-01 13:05:50 +02:00
Angel Nunez Mencias	9b70e01c99	add local registry Some checks failed Build and Push playground container / build-playground (push) Successful in 1m9s Details Build and Push container / build (push) Failing after 1m9s Details	2025-06-01 13:01:23 +02:00
Angel Nunez Mencias	7bba685dee	add scripts Some checks failed Build and Push container / build (push) Failing after 1m4s Details Build and Push playground container / build-playground (push) Successful in 1m4s Details	2025-06-01 12:43:43 +02:00
Angel Nunez Mencias	4603206065	ci Some checks failed Build and Push playground container / build-playground (push) Successful in 1m6s Details Build and Push container / build (push) Failing after 1m41s Details	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	16abfaeb69	build playground	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	b2ac7f69cc	add responses_store	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	00fc43ae96	do not push twice	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	65936f7933	wip	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	226e443e03	wip	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	5b057d60ee	wip	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	95a56b62a0	wip	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	c642ea2dd5	wip	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	7e1725f72b	install uvx	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	b414fe5566	add kvant	2025-06-01 12:13:57 +02:00
Angel Nunez Mencias	cfa38bd13b	add kvant	2025-06-01 12:13:57 +02:00
Hardik Shah	b21050935e	feat: New OpenAI compat embeddings API (#2314 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 7s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Integration Tests / test-matrix (library, providers) (push) Failing after 7s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 5s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 26s Details Pre-commit / pre-commit (push) Successful in 1m11s Details # What does this PR do? Adds a new endpoint that is compatible with OpenAI for embeddings api. `/openai/v1/embeddings` Added providers for OpenAI, LiteLLM and SentenceTransformer. ## Test Plan ``` LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004 ```	2025-05-31 22:11:47 -07:00
Ben Browning	277f8690ef	fix: Responses streaming tools don't concatenate None and str (#2326 ) # What does this PR do? This adds a check to ensure we don't attempt to concatenate `None + str` or `str + None` when building up our arguments for streaming tool calls in the Responses API. ## Test Plan All existing tests pass with this change. Unit tests: ``` python -m pytest -s -v \ tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` Integration tests: ``` llama stack run llama_stack/templates/together/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ python -m pytest -s -v \ tests/integration/agents/test_openai_responses.py \ --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Additionally, the manual example using Codex CLI from #2325 now succeeds instead of throwing a 500 error. Closes #2325 Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-31 18:24:04 -07:00
Francisco Arceo	f328436831	feat: Enable ingestion of precomputed embeddings (#2317 )	2025-05-31 04:03:37 -06:00
Francisco Arceo	31ce208bda	fix: Fix requirements from broken github-actions[bot] (#2323 )	2025-05-30 19:05:47 -07:00
github-actions[bot]	ad15276da1	build: Bump version to 0.2.9	2025-05-30 19:43:09 +00:00
ehhuang	2603f10f95	feat: support postgresql inference store (#2310 ) # What does this PR do? * Added support postgresql inference store * Added 'oracle' template that demos how to config postgresql stores (except for telemetry, which is not supported currently) ## Test Plan llama stack build --template oracle --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/ --text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k 'inference_store'	2025-05-29 14:33:09 -07:00
Jorge Piedrahita Ortiz	168c7113df	fix(providers): update sambanova json schema mode (#2306 ) # What does this PR do? Updates sambanova inference to use strict as false in json_schema structured output ## Test Plan pytest -s -v tests/integration/inference/test_text_inference.py --stack-config=sambanova --text-model=sambanova/Meta-Llama-3.3-70B-Instruct	2025-05-29 09:54:23 -07:00
Mark Campbell	f0d8ceb242	chore: fix flaky distro_codegen script (#2305 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds an import for all of the template modules before the executor to prevent deadlock <!-- If resolving an issue, uncomment and update the line below --> Closes #2278 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` # Run the pre-commit multiple times and verify the deadlock doesn't occur for i in {1..10}; do pre-commit run --all-files; done ```	2025-05-29 09:53:45 -07:00
Ashwin Bharambe	bfdd15d1fa	fix(responses): use input, not original_input when storing the Response (#2300 ) We must store the full (re-hydrated) input not just the original input in the Response object. Of course, this is not very space efficient and we should likely find a better storage scheme so that we can only store unique entries in the database and then re-hydrate them efficiently later. But that can be done safely later. Closes https://github.com/meta-llama/llama-stack/issues/2299 ## Test Plan Unit test	2025-05-28 13:17:48 -07:00
Michael Dawson	a654467552	feat: add cpu/cuda config for prompt guard (#2194 ) # What does this PR do? Previously prompt guard was hard coded to require cuda which prevented it from being used on an instance without a cuda support. This PR allows prompt guard to be configured to use either cpu or cuda. [//]: # (If resolving an issue, uncomment and update the line below) Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133) ## Test Plan (Edited after incorporating suggestion) 1) started stack configured with prompt guard as follows on a system without a GPU and validated prompt guard could be used through the APIs 2) validated on a system with a gpu (but without llama stack) that the python selecting between cpu and cuda support returned the right value when a cuda device was available. 3) ran the unit tests as per - https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md [//]: # (## Documentation) --------- Signed-off-by: Michael Dawson <mdawson@devrus.com>	2025-05-28 12:23:15 -07:00
Sébastien Han	63a9f08c9e	chore: use starlette built-in Route class (#2267 ) # What does this PR do? Use a more common pattern and known terminology from the ecosystem, where Route is more approved than Endpoint. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:53:33 -07:00
ehhuang	56e5ddb39f	feat(ui): add views for Responses (#2293 ) # What does this PR do? * Add responses list and detail views * Refactored components to be shared as much as possible between chat completions and responses ## Test Plan <img width="2014" alt="image" src="https://github.com/user-attachments/assets/6dee12ea-8876-4351-a6eb-2338058466ef" /> <img width="2021" alt="image" src="https://github.com/user-attachments/assets/6c7c71b8-25b7-4199-9c57-6960be5580c8" /> added tests	2025-05-28 09:51:22 -07:00
Sébastien Han	6352078e4b	chore: use groups when running commands (#2298 ) # What does this PR do? Followup of https://github.com/meta-llama/llama-stack/pull/2287. We must use `--group` when running commands with uv. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:13:16 -07:00
Charlie Doern	a7ecc92be1	docs: add post training to providers list (#2280 ) # What does this PR do? the providers list is missing post_training. Add that column and `HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers. also point to these providers in docs/source/providers/index.md, and describe basic functionality There are other missing provider types here as well, but starting with this Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-28 09:32:00 -04:00
raghotham	9b7f9db05c	fix: build docs without requirements.txt (#2294 ) Following the instructions here https://docs.readthedocs.com/platform/stable/build-customization.html#install-dependencies-with-uv as per https://github.com/meta-llama/llama-stack/pull/2223#issuecomment-2914315408	2025-05-27 16:27:57 -07:00
ehhuang	0b695538af	fix: chat completion with more than one choice (#2288 ) # What does this PR do? Fix a bug in openai_compat where choices are not indexed correctly. ## Test Plan Added a new test. Rerun the failed inference_store tests: llama stack run fireworks --image-type conda pytest -s -v tests/integration/ --stack-config http://localhost:8321 -k 'test_inference_store' --text-model meta-llama/Llama-3.3-70B-Instruct --count 10	2025-05-27 15:39:15 -07:00
ehhuang	1d46f3102e	fix: enable test_responses_store (#2290 ) # What does this PR do? Changed the test to not require tool_call in output, but still keeping the tools params there as a smoke test. ## Test Plan Used llama3.3 from fireworks (same as CI) <img width="1433" alt="image" src="https://github.com/user-attachments/assets/1e5fca98-9b4f-402e-a0bc-d9f910f2c207" /> Run with ollama distro and 3b model.	2025-05-27 15:37:28 -07:00
Sébastien Han	4f3f28f718	chore: use dependency-groups for dev (#2287 ) # What does this PR do? The previous `[project.optional-dependencies]` was misrepresenting what the packages were. They were NOT optional dependencies to the project but development dependencies. Unlike optional dependencies, development dependencies are local-only and will not be included in the project requirements when published to PyPI or other indexes. As such, development dependencies are not included in the [project] table. Additionally, the dev group is synced by default. Source: https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 23:00:17 +02:00
Sébastien Han	484abe3116	chore: bump uv version (#2289 ) # What does this PR do? To match the one used by the release bot. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:44:27 -07:00
github-actions[bot]	7105a25b0f	build: Bump version to 0.2.8	2025-05-27 20:28:29 +00:00
Ashwin Bharambe	5cdb29758a	feat(responses): add output_text delta events to responses (#2265 ) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-27 13:07:14 -07:00
Sébastien Han	6ee319ae08	fix: convert boolean string to boolean (#2284 ) # What does this PR do? Handles the case where the vllm config `tls_verify` is set to `false` or `true`. Closes: https://github.com/meta-llama/llama-stack/issues/2283 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:05:38 -07:00
Sébastien Han	a8f75d3897	chore: remove dependencies.json (#2281 ) # What does this PR do? It's not used anywhere in the build process. Ancient artifact from an old attempt of using sub packages to build distros. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> N/A Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 10:26:57 -07:00

1 2 3 4 5 ...

2039 commits