llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
ehhuang	a58c0639d5	chore: update postgres_demo distro config (#2396 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? ## Test Plan	2025-06-04 17:41:27 -07:00
Sébastien Han	c8c742ba45	fix: vllm starter name (#2392 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 2m3s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-04 16:21:36 +02:00
grs	0de9536717	fix: remove debug print accidentally merged (#2393 ) I accidentally left a debug print in a PR that was merged. This removes that.	2025-06-04 15:14:14 +02:00
Ben Browning	e9d9f01b8b	docs: Add OpenAI API compatibility page (#2316 ) # What does this PR do? This adds some initial content documenting our OpenAI compatible APIs - Responses, Chat Completions, Completions, and Models - along with instructions on how to use them via OpenAI or Llama Stack clients and some simple examples for each. It's not a lot of content, but it's a start so that users have some idea how to get going as we continue to work on these APIs. ## Test Plan I generated the docs site locally and verified things render properly. I also ran each code example to ensure it works as expected. And, I asked my AI code assistant to do a quick spell-check and review of the docs and it didn't flag any obvious errors. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-06-04 06:51:52 -04:00
Ashwin Bharambe	ed69c1b3cc	feat(responses): add more streaming response types (#2375 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 1m21s Details	2025-06-03 15:48:41 -07:00
ehhuang	d96f6ec763	chore(ui): use proxy server for backend API calls; simplified k8s deployment (#2350 ) # What does this PR do? - no more CORS middleware needed ## Test Plan ### Local test llama stack run starter --image-type conda npm run dev verify UI works in browser ### Deploy to k8s temporarily change ui-k8s.yaml.template to load from PR commit <img width="604" alt="image" src="https://github.com/user-attachments/assets/87fa2e52-1e93-4e32-9e0f-5b283b7a37b3" /> sh ./apply.sh $ kubectl get services go to external_ip:8322 and play around with UI <img width="1690" alt="image" src="https://github.com/user-attachments/assets/5b7ec827-4302-4435-a9eb-df423676d873" />	2025-06-03 14:57:10 -07:00
grs	7c1998db25	feat: fine grained access control policy (#2264 ) This allows a set of rules to be defined for determining access to resources. The rules are (loosely) based on the cedar policy format. A rule defines a list of action either to permit or to forbid. It may specify a principal or a resource that must match for the rule to take effect. It may also specify a condition, either a 'when' or an 'unless', with additional constraints as to where the rule applies. A list of rules is held for each type to be protected and tried in order to find a match. If a match is found, the request is permitted or forbidden depening on the type of rule. If no match is found, the request is denied. If no rules are specified for a given type, a rule that allows any action as long as the resource attributes match the user attributes is added (i.e. the previous behaviour is the default. Some examples in yaml: ``` model: - permit: principal: user-1 actions: [create, read, delete] comment: user-1 has full access to all models - permit: principal: user-2 actions: [read] resource: model-1 comment: user-2 has read access to model-1 only - permit: actions: [read] when: user_in: resource.namespaces comment: any user has read access to models with matching attributes vector_db: - forbid: actions: [create, read, delete] unless: user_in: role::admin comment: only user with admin role can use vector_db resources ``` --------- Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-06-03 14:51:12 -07:00
Ben Browning	8bee2954be	feat: Structured output for Responses API (#2324 ) # What does this PR do? This adds the missing `text` parameter to the Responses API that is how users control structured outputs. All we do with that parameter is map it to the corresponding chat completion response_format. ## Test Plan The new unit tests exercise the various permutations allowed for this property, while a couple of new verification tests actually use it for real to verify the model outputs are following the format as expected. Unit tests: `python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -vv 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Note that the verification tests can only be run with a real Llama Stack server (as opposed to using the library client via `--provider=stack:together`) because the Llama Stack python client is not yet updated to accept this text field. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-03 14:43:00 -07:00
Ignas Baranauskas	c70ca8344f	fix: resolve template name to config path in `llama stack run` (#2361 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes a bug where running a known template by name using: `llama stack run ollama` would fail with the following error: `ValueError: Config file ollama does not exist` <!-- If resolving an issue, uncomment and update the line below --> Closes #2291 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> `llama stack run ollama` should work	2025-06-03 14:39:12 -07:00
Ashwin Bharambe	cba55808ab	feat(distro): add more providers to starter distro, prefix conflicting models (#2362 ) The name changes to the verifications file are unfortunate, but maybe we don't need that @ehhuang ? Edit: deleted the verifications template now	2025-06-03 12:10:46 -07:00
Ashwin Bharambe	b380cb463f	feat: add postgres deps to starter distro (#2360 ) Once we have this, we can use the starter distro for the Kubernetes cluster demos.	2025-06-03 11:04:23 -07:00
Jorge	e743257d1d	docs: Add missing dependencies in quickstart demo command (#2347 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 31s Details Pre-commit / pre-commit (push) Successful in 1m17s Details Adds missing required dependencies to run the demo command in the Quickstart doc Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>	2025-06-03 18:01:36 +02:00
ehhuang	3c9a10d2fe	feat: reference implementation for files API (#2330 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 53s Details # What does this PR do? TSIA Added Files provider to the fireworks template. Might want to add to all templates as a follow-up. ## Test Plan llama-stack pytest tests/unit/files/test_files.py llama-stack llama stack build --template fireworks --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/files/	2025-06-02 21:54:24 -07:00
Ashwin Bharambe	ba25c5e7e1	docs(k8s): add UI template (#2343 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 55s Details WIP: add a UI template	2025-06-02 17:55:18 -07:00
Ben Browning	e92f571f47	fix: ollama chat completion needs unique ids (#2344 ) # What does this PR do? The chat completion ids generated by Ollama are not unique enough to use with stored chat completions as they rely on only 3 numbers of randomness to give unique values - ie `chatcmpl-373`. This causes frequent collisions in id values of chat completions in Ollama, which creates issues in our SQL storage of chat completions by id where it expects ids to actually be unique. So, this adjusts Ollama responses to use uuids as unique ids. This does mean we're replacing the ids generated natively by Ollama. If we don't wish to do this, we'll either need to relax the unique constraint on our chat completions id field in the inference storage or convince Ollama upstream to use something closer to uuid values here. Closes #2315 ## Test Plan I tested by running the openai completion / chat completion integration tests in a loop. Without this change, I regularly get unique id collisions. With this change, I do not. We sometimes see flakes from these unique id collisions in our CI tests, and this will resolve those. ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml while true; do; \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ pytest -s -v \ tests/integration/inference/test_openai_completion.py \ --stack-config=http://localhost:8321 \ --text-model="meta-llama/Llama-3.2-3B-Instruct"; \ done ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-02 17:43:20 -07:00
ehhuang	4540c9b3e5	chore: revert llama-stack-client dep (#2342 ) # What does this PR do? ## Test Plan	2025-06-02 16:05:21 -07:00
Ashwin Bharambe	dbe4e84aca	feat(responses): implement full multi-turn support (#2295 ) I think the implementation needs more simplification. Spent way too much time trying to get the tests pass with models not co-operating :( Finally had to switch claude-sonnet to get things to pass reliably. ### Test Plan ``` export TAVILY_SEARCH_API_KEY=... export OPENAI_API_KEY=... uv run pytest -p no:warnings \ -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:starter \ --model openai/gpt-4o ```	2025-06-02 15:35:49 -07:00
ehhuang	cac7d404a2	fix: remove openai dep (#2337 ) # What does this PR do? 1. remove openai dep 2. temporarily update llama-stack-client to stainless sync'd branch as the responses/inputitems API wasn't included in the last push. This will automatically be updated to the next version in the release. ## Test Plan npm run dev go to any responses details page	2025-06-02 15:15:12 -07:00
Ashwin Bharambe	76dcf47320	docs(mcp): add a few lines for how to specify Auth headers in MCP tools (#2336 )	2025-06-02 14:28:38 -07:00
Sébastien Han	6bb174bb05	revert: "chore: Remove zero-width space characters from OTEL service" (#2331 ) # What does this PR do? Revert #2060 and fix PLE2515. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-02 14:21:35 -07:00
Hardik Shah	3511af7c33	fix: fireworks provider for openai compat inference endpoint (#2335 ) fixes provider to use stream var correctly Before ``` curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"detail":"Internal server error: An unexpected error occurred."} ``` After ``` llama-stack % curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "accounts/fireworks/models/llama4-scout-instruct-basic", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"id":"chatcmpl-97978538-271d-4c73-8d4d-c509bfb6c87e","choices":[{"message":{"role":"assistant","content":"I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1748896403,"model":"accounts/fireworks/models/llama4-scout-instruct-basic"}% ```	2025-06-02 14:11:15 -07:00
Ashwin Bharambe	7fb4bdabea	docs(kubernetes): add more fleshed-out example of a Demo Kubernetes cluster (#2329 ) This Kubernetes cluster has: - vLLM for serving an inference model - vLLM for serving a safety model - Postgres DB (for metadata and other state for the Llama Stack distro) - Chroma DB for Vector IO (memory) Perhaps most importantly, this was me trying to learn Kubernetes for the first time. ## Test Plan Run `sh apply.sh` against an EKS cluster, then after `kubectl port-forward service/llama-stack-service 8321:8321` and after many attempts, we have finally: <img width="1589" alt="image" src="https://github.com/user-attachments/assets/c69f242d-6aaa-4def-9f7c-172113b8bfc1" /> <img width="1978" alt="image" src="https://github.com/user-attachments/assets/cf678404-f551-4fa5-9077-bebe3e8e8ae8" />	2025-06-02 13:07:08 -07:00
ehhuang	31a3ae60f4	feat: openai files api (#2321 ) # What does this PR do? * Adds the OpenAI compatible Files API * Modified doc gen script to support multipart parameter ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/2321). * #2330 * __->__ #2321	2025-06-02 11:45:53 -07:00
Ben Browning	17f4414be9	fix: remote-vllm event loop blocking unit test on Mac (#2332 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, agents) (push) Failing after 14s Details Integration Tests / test-matrix (http, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 1m11s Details # What does this PR do? The remote-vllm `test_chat_completion_doesnt_block_event_loop` unit test was often failing for me on a Mac with a `httpx.ReadError`. I traced this back to the swap to the `AsyncOpenAI` client in the remote-vllm provider as where this started, and it looks like the async client needs a bit more accurate HTTP request handling from our mock server. So, this fixes that unit test to send proper Content-Type and Content-Length headers which makes the `AsyncOpenAI` client happier on Macs. ## Test Plan All the test_remote_vllm.py unit tests consistently pass for me on a Mac now, without any flaking in the event loop one. `pytest -s -v tests/unit/providers/inference/test_remote_vllm.py` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-02 11:24:12 -04:00
Sébastien Han	1c0c6e1e17	chore: remove usage of load_tiktoken_bpe (#2276 )	2025-06-02 07:33:37 -07:00
Sébastien Han	af65207ebd	chore: help setuptools finding the project path (#2333 )	2025-06-02 07:20:46 -07:00
Mark Campbell	c7be73fb16	refactor: remove container from list of run image types (#2178 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (http, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 2m1s Details # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Removes the ability to run llama stack container images through the llama stack CLI Closes #2110 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Run: ``` llama stack run /path/to/run.yaml --image-type container ``` Expected outcome: ``` llama stack run: error: argument --image-type: invalid choice: 'container' (choose from 'conda', 'venv') ``` [//]: # (## Documentation)	2025-06-02 09:57:55 +02:00
Hardik Shah	b21050935e	feat: New OpenAI compat embeddings API (#2314 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, providers) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 44s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 47s Details Integration Tests / test-matrix (http, providers) (push) Failing after 45s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? Adds a new endpoint that is compatible with OpenAI for embeddings api. `/openai/v1/embeddings` Added providers for OpenAI, LiteLLM and SentenceTransformer. ## Test Plan ``` LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004 ```	2025-05-31 22:11:47 -07:00
Ben Browning	277f8690ef	fix: Responses streaming tools don't concatenate None and str (#2326 ) # What does this PR do? This adds a check to ensure we don't attempt to concatenate `None + str` or `str + None` when building up our arguments for streaming tool calls in the Responses API. ## Test Plan All existing tests pass with this change. Unit tests: ``` python -m pytest -s -v \ tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` Integration tests: ``` llama stack run llama_stack/templates/together/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ python -m pytest -s -v \ tests/integration/agents/test_openai_responses.py \ --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Additionally, the manual example using Codex CLI from #2325 now succeeds instead of throwing a 500 error. Closes #2325 Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-31 18:24:04 -07:00
Francisco Arceo	f328436831	feat: Enable ingestion of precomputed embeddings (#2317 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m15s Details	2025-05-31 04:03:37 -06:00
Francisco Arceo	31ce208bda	fix: Fix requirements from broken github-actions[bot] (#2323 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 11s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 40s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 47s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 45s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 47s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 46s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 49s Details Integration Tests / test-matrix (library, agents) (push) Failing after 48s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 1m33s Details	2025-05-30 19:05:47 -07:00
github-actions[bot]	ad15276da1	build: Bump version to 0.2.9 Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Pre-commit / pre-commit (push) Failing after 1m34s Details	2025-05-30 19:43:09 +00:00
ehhuang	2603f10f95	feat: support postgresql inference store (#2310 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 13s Details Integration Tests / test-matrix (http, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (http, agents) (push) Failing after 19s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 16s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 18s Details Integration Tests / test-matrix (http, inference) (push) Failing after 20s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? * Added support postgresql inference store * Added 'oracle' template that demos how to config postgresql stores (except for telemetry, which is not supported currently) ## Test Plan llama stack build --template oracle --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/ --text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k 'inference_store'	2025-05-29 14:33:09 -07:00
Jorge Piedrahita Ortiz	168c7113df	fix(providers): update sambanova json schema mode (#2306 ) # What does this PR do? Updates sambanova inference to use strict as false in json_schema structured output ## Test Plan pytest -s -v tests/integration/inference/test_text_inference.py --stack-config=sambanova --text-model=sambanova/Meta-Llama-3.3-70B-Instruct	2025-05-29 09:54:23 -07:00
Mark Campbell	f0d8ceb242	chore: fix flaky distro_codegen script (#2305 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds an import for all of the template modules before the executor to prevent deadlock <!-- If resolving an issue, uncomment and update the line below --> Closes #2278 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` # Run the pre-commit multiple times and verify the deadlock doesn't occur for i in {1..10}; do pre-commit run --all-files; done ```	2025-05-29 09:53:45 -07:00
Ashwin Bharambe	bfdd15d1fa	fix(responses): use input, not original_input when storing the Response (#2300 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (http, providers) (push) Failing after 7s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 11s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Pre-commit / pre-commit (push) Failing after 53s Details We must store the full (re-hydrated) input not just the original input in the Response object. Of course, this is not very space efficient and we should likely find a better storage scheme so that we can only store unique entries in the database and then re-hydrate them efficiently later. But that can be done safely later. Closes https://github.com/meta-llama/llama-stack/issues/2299 ## Test Plan Unit test	2025-05-28 13:17:48 -07:00
Michael Dawson	a654467552	feat: add cpu/cuda config for prompt guard (#2194 ) # What does this PR do? Previously prompt guard was hard coded to require cuda which prevented it from being used on an instance without a cuda support. This PR allows prompt guard to be configured to use either cpu or cuda. [//]: # (If resolving an issue, uncomment and update the line below) Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133) ## Test Plan (Edited after incorporating suggestion) 1) started stack configured with prompt guard as follows on a system without a GPU and validated prompt guard could be used through the APIs 2) validated on a system with a gpu (but without llama stack) that the python selecting between cpu and cuda support returned the right value when a cuda device was available. 3) ran the unit tests as per - https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md [//]: # (## Documentation) --------- Signed-off-by: Michael Dawson <mdawson@devrus.com>	2025-05-28 12:23:15 -07:00
Sébastien Han	63a9f08c9e	chore: use starlette built-in Route class (#2267 ) # What does this PR do? Use a more common pattern and known terminology from the ecosystem, where Route is more approved than Endpoint. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:53:33 -07:00
ehhuang	56e5ddb39f	feat(ui): add views for Responses (#2293 ) # What does this PR do? * Add responses list and detail views * Refactored components to be shared as much as possible between chat completions and responses ## Test Plan <img width="2014" alt="image" src="https://github.com/user-attachments/assets/6dee12ea-8876-4351-a6eb-2338058466ef" /> <img width="2021" alt="image" src="https://github.com/user-attachments/assets/6c7c71b8-25b7-4199-9c57-6960be5580c8" /> added tests	2025-05-28 09:51:22 -07:00
Sébastien Han	6352078e4b	chore: use groups when running commands (#2298 ) # What does this PR do? Followup of https://github.com/meta-llama/llama-stack/pull/2287. We must use `--group` when running commands with uv. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:13:16 -07:00
Charlie Doern	a7ecc92be1	docs: add post training to providers list (#2280 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (http, agents) (push) Failing after 13s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m0s Details # What does this PR do? the providers list is missing post_training. Add that column and `HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers. also point to these providers in docs/source/providers/index.md, and describe basic functionality There are other missing provider types here as well, but starting with this Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-28 09:32:00 -04:00
raghotham	9b7f9db05c	fix: build docs without requirements.txt (#2294 ) Some checks failed Integration Tests / test-matrix (http, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, agents) (push) Failing after 44s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 44s Details Integration Tests / test-matrix (library, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 46s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 44s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 48s Details Integration Tests / test-matrix (library, agents) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (http, inference) (push) Failing after 52s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 50s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 1m34s Details Pre-commit / pre-commit (push) Successful in 3m21s Details Following the instructions here https://docs.readthedocs.com/platform/stable/build-customization.html#install-dependencies-with-uv as per https://github.com/meta-llama/llama-stack/pull/2223#issuecomment-2914315408	2025-05-27 16:27:57 -07:00
ehhuang	0b695538af	fix: chat completion with more than one choice (#2288 ) Some checks failed Integration Tests / test-matrix (http, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m33s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (library, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 3m18s Details # What does this PR do? Fix a bug in openai_compat where choices are not indexed correctly. ## Test Plan Added a new test. Rerun the failed inference_store tests: llama stack run fireworks --image-type conda pytest -s -v tests/integration/ --stack-config http://localhost:8321 -k 'test_inference_store' --text-model meta-llama/Llama-3.3-70B-Instruct --count 10	2025-05-27 15:39:15 -07:00
ehhuang	1d46f3102e	fix: enable test_responses_store (#2290 ) # What does this PR do? Changed the test to not require tool_call in output, but still keeping the tools params there as a smoke test. ## Test Plan Used llama3.3 from fireworks (same as CI) <img width="1433" alt="image" src="https://github.com/user-attachments/assets/1e5fca98-9b4f-402e-a0bc-d9f910f2c207" /> Run with ollama distro and 3b model.	2025-05-27 15:37:28 -07:00
Sébastien Han	4f3f28f718	chore: use dependency-groups for dev (#2287 ) # What does this PR do? The previous `[project.optional-dependencies]` was misrepresenting what the packages were. They were NOT optional dependencies to the project but development dependencies. Unlike optional dependencies, development dependencies are local-only and will not be included in the project requirements when published to PyPI or other indexes. As such, development dependencies are not included in the [project] table. Additionally, the dev group is synced by default. Source: https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 23:00:17 +02:00
Sébastien Han	484abe3116	chore: bump uv version (#2289 ) # What does this PR do? To match the one used by the release bot. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:44:27 -07:00
github-actions[bot]	7105a25b0f	build: Bump version to 0.2.8	2025-05-27 20:28:29 +00:00
Ashwin Bharambe	5cdb29758a	feat(responses): add output_text delta events to responses (#2265 ) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-27 13:07:14 -07:00
Sébastien Han	6ee319ae08	fix: convert boolean string to boolean (#2284 ) # What does this PR do? Handles the case where the vllm config `tls_verify` is set to `false` or `true`. Closes: https://github.com/meta-llama/llama-stack/issues/2283 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:05:38 -07:00
Sébastien Han	a8f75d3897	chore: remove dependencies.json (#2281 ) # What does this PR do? It's not used anywhere in the build process. Ancient artifact from an old attempt of using sub packages to build distros. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> N/A Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 10:26:57 -07:00

1 2 3 4 5 ...

2039 commits