Commit graph

2117 commits

Author SHA1 Message Date
ehhuang
31a3ae60f4
feat: openai files api (#2321)
# What does this PR do?
* Adds the OpenAI compatible Files API
* Modified doc gen script to support multipart parameter

## Test Plan

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/2321).
* #2330
* __->__ #2321
2025-06-02 11:45:53 -07:00
Ben Browning
17f4414be9
fix: remote-vllm event loop blocking unit test on Mac (#2332)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, agents) (push) Failing after 14s
Integration Tests / test-matrix (http, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 29s
Pre-commit / pre-commit (push) Successful in 1m11s
# What does this PR do?

The remote-vllm `test_chat_completion_doesnt_block_event_loop` unit test
was often failing for me on a Mac with a `httpx.ReadError`. I traced
this back to the swap to the `AsyncOpenAI` client in the remote-vllm
provider as where this started, and it looks like the async client needs
a bit more accurate HTTP request handling from our mock server.

So, this fixes that unit test to send proper Content-Type and
Content-Length headers which makes the `AsyncOpenAI` client happier on
Macs.

## Test Plan

All the test_remote_vllm.py unit tests consistently pass for me on a Mac
now, without any flaking in the event loop one.

`pytest -s -v tests/unit/providers/inference/test_remote_vllm.py`

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-02 11:24:12 -04:00
Sébastien Han
1c0c6e1e17
chore: remove usage of load_tiktoken_bpe (#2276) 2025-06-02 07:33:37 -07:00
Sébastien Han
af65207ebd
chore: help setuptools finding the project path (#2333) 2025-06-02 07:20:46 -07:00
Mark Campbell
c7be73fb16
refactor: remove container from list of run image types (#2178)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, datasets) (push) Failing after 12s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, post_training) (push) Failing after 12s
Integration Tests / test-matrix (http, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 2m1s
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Removes the ability to run llama stack container images through the
llama stack CLI
Closes #2110
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Run:
```
llama stack run /path/to/run.yaml --image-type container
```
Expected outcome:
```
llama stack run: error: argument --image-type: invalid choice: 'container' (choose from 'conda', 'venv')
```

[//]: # (## Documentation)
2025-06-02 09:57:55 +02:00
Hardik Shah
b21050935e
feat: New OpenAI compat embeddings API (#2314)
Some checks failed
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, providers) (push) Failing after 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, inference) (push) Failing after 46s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 44s
Integration Tests / test-matrix (http, inspect) (push) Failing after 47s
Integration Tests / test-matrix (http, providers) (push) Failing after 45s
Integration Tests / test-matrix (library, datasets) (push) Failing after 45s
Integration Tests / test-matrix (http, post_training) (push) Failing after 46s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s
Integration Tests / test-matrix (http, datasets) (push) Failing after 49s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
Adds a new endpoint that is compatible with OpenAI for embeddings api. 
`/openai/v1/embeddings`
Added providers for OpenAI, LiteLLM and SentenceTransformer. 


## Test Plan
```
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004
```
2025-05-31 22:11:47 -07:00
Ben Browning
277f8690ef
fix: Responses streaming tools don't concatenate None and str (#2326)
# What does this PR do?

This adds a check to ensure we don't attempt to concatenate `None + str`
or `str + None` when building up our arguments for streaming tool calls
in the Responses API.

## Test Plan

All existing tests pass with this change.

Unit tests:

```
python -m pytest -s -v \
  tests/unit/providers/agents/meta_reference/test_openai_responses.py
```

Integration tests:

```
llama stack run llama_stack/templates/together/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
python -m pytest -s -v \
  tests/integration/agents/test_openai_responses.py \
  --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Verification tests:

```
llama stack run llama_stack/templates/together/run.yaml

pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Additionally, the manual example using Codex CLI from #2325 now succeeds
instead of throwing a 500 error.

Closes #2325

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-31 18:24:04 -07:00
Francisco Arceo
f328436831
feat: Enable ingestion of precomputed embeddings (#2317)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m15s
2025-05-31 04:03:37 -06:00
Francisco Arceo
31ce208bda
fix: Fix requirements from broken github-actions[bot] (#2323)
Some checks failed
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 40s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 46s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (http, datasets) (push) Failing after 47s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 45s
Integration Tests / test-matrix (http, scoring) (push) Failing after 45s
Integration Tests / test-matrix (http, post_training) (push) Failing after 47s
Integration Tests / test-matrix (library, datasets) (push) Failing after 46s
Integration Tests / test-matrix (http, inspect) (push) Failing after 49s
Integration Tests / test-matrix (library, agents) (push) Failing after 48s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 1m33s
2025-05-30 19:05:47 -07:00
github-actions[bot]
ad15276da1 build: Bump version to 0.2.9
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Pre-commit / pre-commit (push) Failing after 1m34s
2025-05-30 19:43:09 +00:00
ehhuang
2603f10f95
feat: support postgresql inference store (#2310)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 13s
Integration Tests / test-matrix (http, providers) (push) Failing after 15s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s
Integration Tests / test-matrix (http, datasets) (push) Failing after 18s
Integration Tests / test-matrix (http, scoring) (push) Failing after 16s
Integration Tests / test-matrix (http, agents) (push) Failing after 19s
Integration Tests / test-matrix (library, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, agents) (push) Failing after 18s
Integration Tests / test-matrix (http, inference) (push) Failing after 20s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, providers) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
* Added support postgresql inference store
* Added 'oracle' template that demos how to config postgresql stores
(except for telemetry, which is not supported currently)


## Test Plan

llama stack build --template oracle --image-type conda --run
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/
--text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k
'inference_store'
2025-05-29 14:33:09 -07:00
Jorge Piedrahita Ortiz
168c7113df
fix(providers): update sambanova json schema mode (#2306)
# What does this PR do?
Updates sambanova inference to use strict as false in json_schema
structured output

## Test Plan
pytest -s -v tests/integration/inference/test_text_inference.py
--stack-config=sambanova
--text-model=sambanova/Meta-Llama-3.3-70B-Instruct
2025-05-29 09:54:23 -07:00
Mark Campbell
f0d8ceb242
chore: fix flaky distro_codegen script (#2305)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds an import for all of the template modules before the executor to
prevent deadlock
<!-- If resolving an issue, uncomment and update the line below -->
Closes #2278

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
# Run the pre-commit multiple times and verify the deadlock doesn't occur
for i in {1..10}; do pre-commit run --all-files; done
```
2025-05-29 09:53:45 -07:00
Ashwin Bharambe
bfdd15d1fa
fix(responses): use input, not original_input when storing the Response (#2300)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, providers) (push) Failing after 7s
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 11s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 53s
We must store the full (re-hydrated) input not just the original input
in the Response object. Of course, this is not very space efficient and
we should likely find a better storage scheme so that we can only store
unique entries in the database and then re-hydrate them efficiently
later. But that can be done safely later.

Closes https://github.com/meta-llama/llama-stack/issues/2299

## Test Plan

Unit test
2025-05-28 13:17:48 -07:00
Michael Dawson
a654467552
feat: add cpu/cuda config for prompt guard (#2194)
# What does this PR do?
Previously prompt guard was hard coded to require cuda which prevented
it from being used on an instance without a cuda support.

This PR allows prompt guard to be configured to use either cpu or cuda.

[//]: # (If resolving an issue, uncomment and update the line below)
Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133)

## Test Plan (Edited after incorporating suggestion)
1) started stack configured with prompt guard as follows on a system
without a GPU
and validated prompt guard could be used through the APIs

2) validated on a system with a gpu (but without llama stack) that the
python selecting between cpu and cuda support returned the right value
when a cuda device was available.

3) ran the unit tests as per -
https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md

[//]: # (## Documentation)

---------

Signed-off-by: Michael Dawson <mdawson@devrus.com>
2025-05-28 12:23:15 -07:00
Sébastien Han
63a9f08c9e
chore: use starlette built-in Route class (#2267)
# What does this PR do?

Use a more common pattern and known terminology from the ecosystem,
where Route is more approved than Endpoint.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-28 09:53:33 -07:00
ehhuang
56e5ddb39f
feat(ui): add views for Responses (#2293)
# What does this PR do?
* Add responses list and detail views
* Refactored components to be shared as much as possible between chat
completions and responses

## Test Plan
<img width="2014" alt="image"
src="https://github.com/user-attachments/assets/6dee12ea-8876-4351-a6eb-2338058466ef"
/>
<img width="2021" alt="image"
src="https://github.com/user-attachments/assets/6c7c71b8-25b7-4199-9c57-6960be5580c8"
/>

added tests
2025-05-28 09:51:22 -07:00
Sébastien Han
6352078e4b
chore: use groups when running commands (#2298)
# What does this PR do?

Followup of https://github.com/meta-llama/llama-stack/pull/2287. We must
use `--group` when running commands with uv.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-28 09:13:16 -07:00
Charlie Doern
a7ecc92be1
docs: add post training to providers list (#2280)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (http, inspect) (push) Failing after 12s
Integration Tests / test-matrix (http, agents) (push) Failing after 13s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 1m18s
Pre-commit / pre-commit (push) Successful in 3m0s
# What does this PR do?

the providers list is missing post_training. Add that column and
`HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers.

also point to these providers in docs/source/providers/index.md, and
describe basic functionality

There are other missing provider types here as well, but starting with
this

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-05-28 09:32:00 -04:00
raghotham
9b7f9db05c
fix: build docs without requirements.txt (#2294)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 44s
Integration Tests / test-matrix (http, datasets) (push) Failing after 46s
Integration Tests / test-matrix (http, scoring) (push) Failing after 44s
Integration Tests / test-matrix (library, datasets) (push) Failing after 44s
Integration Tests / test-matrix (http, post_training) (push) Failing after 46s
Integration Tests / test-matrix (http, inspect) (push) Failing after 48s
Integration Tests / test-matrix (library, agents) (push) Failing after 46s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (http, inference) (push) Failing after 52s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 50s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 1m34s
Pre-commit / pre-commit (push) Successful in 3m21s
Following the instructions here
https://docs.readthedocs.com/platform/stable/build-customization.html#install-dependencies-with-uv
as per
https://github.com/meta-llama/llama-stack/pull/2223#issuecomment-2914315408
2025-05-27 16:27:57 -07:00
ehhuang
0b695538af
fix: chat completion with more than one choice (#2288)
Some checks failed
Integration Tests / test-matrix (http, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 1m33s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, scoring) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (http, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 3m18s
# What does this PR do?
Fix a bug in openai_compat where choices are not indexed correctly.

## Test Plan
Added a new test.

Rerun the failed inference_store tests:
llama stack run fireworks --image-type conda
pytest -s -v tests/integration/ --stack-config http://localhost:8321 -k
'test_inference_store' --text-model meta-llama/Llama-3.3-70B-Instruct
--count 10
2025-05-27 15:39:15 -07:00
ehhuang
1d46f3102e
fix: enable test_responses_store (#2290)
# What does this PR do?
Changed the test to not require tool_call in output, but still keeping
the tools params there as a smoke test.

## Test Plan
Used llama3.3 from fireworks (same as CI)
<img width="1433" alt="image"
src="https://github.com/user-attachments/assets/1e5fca98-9b4f-402e-a0bc-d9f910f2c207"
/>

Run with ollama distro and 3b model.
2025-05-27 15:37:28 -07:00
Sébastien Han
4f3f28f718
chore: use dependency-groups for dev (#2287)
# What does this PR do?

The previous `[project.optional-dependencies]` was misrepresenting what
the packages were. They were NOT optional dependencies to the project
but development dependencies. Unlike optional dependencies, development
dependencies are local-only and will not be included in the project
requirements when published to PyPI or other indexes. As such,
development dependencies are not included in the [project] table.
Additionally, the dev group is synced by default.

Source:

https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 23:00:17 +02:00
Sébastien Han
484abe3116
chore: bump uv version (#2289)
# What does this PR do?

To match the one used by the release bot.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 13:44:27 -07:00
github-actions[bot]
7105a25b0f build: Bump version to 0.2.8 2025-05-27 20:28:29 +00:00
Ashwin Bharambe
5cdb29758a
feat(responses): add output_text delta events to responses (#2265)
This adds initial streaming support to the Responses API. 

This PR makes sure that the _first_ inference call made to chat
completions streams out.

There's more to be done:
 - tool call output tokens need to stream out when possible
- we need to loop through multiple rounds of inference and they all need
to stream out.

## Test Plan

Added a test. Executed as:

```
FIREWORKS_API_KEY=... \
  pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Then, started a llama stack fireworks distro and tested against it like
this:

```
OPENAI_API_KEY=blah \
   pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
   --base-url http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct 
```
2025-05-27 13:07:14 -07:00
Sébastien Han
6ee319ae08
fix: convert boolean string to boolean (#2284)
# What does this PR do?

Handles the case where the vllm config `tls_verify` is set to `false` or
`true`.

Closes: https://github.com/meta-llama/llama-stack/issues/2283

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 13:05:38 -07:00
Sébastien Han
a8f75d3897
chore: remove dependencies.json (#2281)
# What does this PR do?
It's not used anywhere in the build process. Ancient artifact from an
old attempt of using sub packages to build distros.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

N/A

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 10:26:57 -07:00
Mark Campbell
e7e9ec0379
chore: fix visible comments in pr template (#2279)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (http, agents) (push) Failing after 14s
Integration Tests / test-matrix (http, providers) (push) Failing after 11s
Integration Tests / test-matrix (http, post_training) (push) Failing after 12s
Integration Tests / test-matrix (http, inference) (push) Failing after 13s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 14s
Integration Tests / test-matrix (library, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, scoring) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Test Llama Stack Build / build (push) Failing after 12s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 1m18s
Pre-commit / pre-commit (push) Successful in 3m15s
# What does this PR do?
This PR adds updated comments for the PR template as comments were
showing up in PRs when they were not meant to
2025-05-27 15:42:33 +02:00
Mark Campbell
b2adaa3f60
docs: fix evals notebook preview (#2277)
# What does this PR do?
Fixes the preview of the Evals Benchmark Notebook

## Explanation 
I took the original notebook, opened it in Google Colab and downloaded
it again from Colab.
I then replaced the original with the new fixed version 
cc: @leseb 

Closes #2142 

## Test Plan
You can view the nb preview from my fork
https://github.com/Bobbins228/llama-stack/blob/fix-evals-nb/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
2025-05-27 15:18:20 +02:00
Sébastien Han
448f00903d
chore: mark blobpath as optional (#2271)
# What does this PR do?

This is not a core dependency of the distro server. It's only necessary
when using `inline::rag-runtime` or `inline::meta-reference` providers.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 10:55:24 +02:00
Ignas Baranauskas
28930cdab6
fix: handle None external_providers_dir in build with run arg (#2269)
# What does this PR do?
Fixes an issue where running `llama stack build --template ollama
--image-type venv --run` fails with a TypeError when validating external
providers directory paths.

The error occurs because `os.path.exists()` is called with `Path(None)`
instead of converting it to a string first. This change ensures
consistent handling of `None` values for `external_providers_dir` across
both build and
[run](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/cli/stack/run.py#L134)
commands by using `str()` conversion before path validation.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```bash
INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run
```
Command completes successfully without TypeError

[//]: # (## Documentation)
2025-05-27 09:41:12 +02:00
Ashwin Bharambe
7504c2f430
test: disable test_inference_store test urrrggg (#2273)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 45s
Integration Tests / test-matrix (http, agents) (push) Failing after 51s
Integration Tests / test-matrix (http, post_training) (push) Failing after 49s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 50s
Integration Tests / test-matrix (http, inspect) (push) Failing after 52s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 54s
Integration Tests / test-matrix (http, datasets) (push) Failing after 57s
Integration Tests / test-matrix (library, datasets) (push) Failing after 52s
Integration Tests / test-matrix (http, inference) (push) Failing after 58s
Integration Tests / test-matrix (http, scoring) (push) Failing after 55s
Integration Tests / test-matrix (http, providers) (push) Failing after 56s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 1m41s
Pre-commit / pre-commit (push) Successful in 3m32s
2025-05-26 22:48:41 -07:00
Ashwin Bharambe
51e6f529f3
fix: index non-MCP toolgroups at registration time (#2272)
Two somewhat annoying fixes: 

- we are going to index tools for non-MCP toolgroups always (like we
used to do). because there are just random assumptions in our tests,
etc. and I don't want to fix them right now
- we need to handle the funny case of toolgroups like
`builtin::rag/knowledge_search` where we added the tool name to use in
the toolgroup itself.
2025-05-26 20:33:36 -07:00
Sébastien Han
39b33a3b01
chore: allow to pass CA cert to remote vllm (#2266)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Integration Tests / test-matrix (http, inference) (push) Failing after 22s
Integration Tests / test-matrix (http, datasets) (push) Failing after 28s
Integration Tests / test-matrix (http, inspect) (push) Failing after 29s
Integration Tests / test-matrix (http, scoring) (push) Failing after 30s
Integration Tests / test-matrix (library, datasets) (push) Failing after 18s
Integration Tests / test-matrix (library, agents) (push) Failing after 28s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 35s
Integration Tests / test-matrix (http, agents) (push) Failing after 37s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (http, providers) (push) Failing after 35s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 1m18s
Pre-commit / pre-commit (push) Successful in 3m12s
# What does this PR do?

The `tls_verify` can now receive a path to a certificate file if the
endpoint requires it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-26 20:59:03 +02:00
Sébastien Han
7710b2f43b
chore: removed unused class (#2268)
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-26 08:41:37 -07:00
Ashwin Bharambe
9623d5d230
fix: match mcp headers in provider data to Responses API shape (#2263) 2025-05-25 14:33:10 -07:00
Ashwin Bharambe
ce33d02443
fix(tools): do not index tools, only index toolgroups (#2261)
When registering a MCP endpoint, we cannot list tools (like we used to)
since the MCP endpoint may be behind an auth wall. Registration can
happen much sooner (via run.yaml).

Instead, we do listing only when the _user_ actually calls listing.
Furthermore, we cache the list in-memory in the server. Currently, the
cache is not invalidated -- we may want to periodically re-list for MCP
servers. Note that they must call `list_tools` before calling
`invoke_tool` -- we use this critically.

This will enable us to list MCP servers in run.yaml

## Test Plan

Existing tests, updated tests accordingly.
2025-05-25 13:27:52 -07:00
raghotham
5a422e236c
chore: make cprint write to stderr (#2250)
Also do sys.exit(1) in case of errors
2025-05-24 23:39:57 -07:00
raghotham
c25bd0ad58
fix: use pypi browser agent (#2260)
Getting this error from pypi of late

```
'python-requests/2.32.3 User-Agents are currently blocked from accessing JSON release resources. A cluster is apparently crawling all project/release resources resulting in excess cache misses. Please contact admin@pypi.org if you have information regarding what this software may be.'
```
2025-05-24 23:26:30 -07:00
Ashwin Bharambe
298721c238
chore: split routing_tables into individual files (#2259) 2025-05-24 23:15:05 -07:00
Ashwin Bharambe
eedf21f19c
chore: split routers into individual files (inference, tool, vector_io, eval_scoring) (#2258) 2025-05-24 22:59:07 -07:00
Ashwin Bharambe
ae7272d8ff
chore: split routers into individual files (datasets) (#2249) 2025-05-24 22:11:43 -07:00
Ashwin Bharambe
a2160dc0af
chore: split routers into individual files (safety)
Reviewers:
bbrowning, leseb, ehhuang, terrytangyuan, raghotham, yanxi0830, hardikjshah

Reviewed By: raghotham

Pull Request: https://github.com/meta-llama/llama-stack/pull/2248
2025-05-24 22:00:32 -07:00
Ashwin Bharambe
c290999c63
fix(telemetry): get rid of annoying sqlite span export error (#2245) 2025-05-24 20:24:34 -07:00
Ashwin Bharambe
3faf1e4a79
feat: enable MCP execution in Responses impl (#2240)
## Test Plan

```
pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:together --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-05-24 14:20:42 -07:00
Ashwin Bharambe
66f09f24ed
fix: disable test_responses_store (#2244)
The test depends on llama's tool calling ability. In the CI, we run with
a small ollama model.

The fix might be to check for either message or function_call because
the model is flaky and we aren't really testing that behavior?
2025-05-24 08:18:06 -07:00
raghotham
84751f3e55
fix: skip failing tests (#2243)
as title. trying release 0.2.8
2025-05-24 07:31:08 -07:00
Yuan Tang
a411029d7e
docs: Update CHANGELOG.md (#2241)
# What does this PR do?

This PR adds release notes for recent releases.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-05-24 07:06:36 -07:00
ehhuang
15b0a67555
feat: add responses input items api (#2239)
# What does this PR do?
TSIA

## Test Plan
added integration and unit tests
2025-05-24 07:05:53 -07:00