Commit graph

85 commits

Author SHA1 Message Date
IAN MILLER
b57db11bed
feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s
Unit Tests / unit-tests (3.13) (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to create a solution that can automatically
detect when new models are added, deprecated, or removed by OpenAI and
Llama API providers, and automatically update the list of supported
models in LLamaStack.

This feature is vitally important in order to avoid missing new models
and editing the entries manually hence I created automation allowing
users to dynamically register:
- any models from OpenAI provider available at 
[https://api.openai.com/v1/models](https://api.openai.com/v1/models)
that are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py)

- any models from Llama API provider available at
[https://api.llama.com/v1/models](https://api.llama.com/v1/models) that
are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2504

this PR is dependant on #2710

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. Create venv at root llamastack directory:
`uv venv .venv --python 3.12 --seed`    
2. Activate venv:
`source .venv/bin/activate`   
3. `uv pip install -e .`
4. Create OpenAI distro modifying run.yaml
5. Build distro:
`llama stack build --template starter --image-type venv`
6. Then run LlamaStack, but before navigate to templates/starter folder:
`llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY>
ENABLE_OPENAI=openai`
7. Then try to register dummy llm that doesn't exist in OpenAI provider:
` llama-stack-client models register ianm/ianllm
--provider-model-id=ianllm --provider-id=openai `
 
You should receive this output - combined list of static config +
fetched available models from OpenAI:
 
<img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50"
src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883"
/>

8. Then register real llm from OpenAI:
llama-stack-client models register openai/gpt-4-turbo-preview
--provider-model-id=gpt-4-turbo-preview --provider-id=openai

<img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02"
src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0"
/>
<img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11"
src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee"
/>

We correctly fetched all available models from OpenAI

As for Llama API, as a non-US person I don't have access to Llama API
Key but I joined wait list. The implementation for Llama is the same as
for OpenAI since Llama is openai compatible. So, the response from GET
endpoint has the same structure as OpenAI
https://llama.developer.meta.com/docs/api/models
2025-07-16 12:49:38 -04:00
Ashwin Bharambe
95fdc8ea94 build: Bump version to 0.2.15 2025-07-15 20:29:08 -07:00
Kelly Brown
b096794959
docs: Reorganize documentation on the webpage (#2651)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
Reorganizes the Llama stack webpage into more concise index pages,
introduce more of a workflow, and reduce repetition of content.

New nav structure so far based on #2637 

Further discussions in
https://github.com/meta-llama/llama-stack/discussions/2585

**Preview:**
![Screenshot 2025-07-09 at 2 31
53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af)

You can also build a full local preview locally 

 **Feedback**
Looking for feedback on page titles and general feedback on the new
structure

**Follow up documentation**
I plan on reducing some sections and standardizing some terminology in a
follow up PR.
More discussions on that in
https://github.com/meta-llama/llama-stack/discussions/2585
2025-07-15 14:19:35 -07:00
Matthew Farrellee
68e7978c88
chore: block network access from unit tests (#2732)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Integration Tests / test-matrix (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Pre-commit / pre-commit (push) Successful in 1m0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
# What does this PR do?
this blocks network access for all `tests/unit/` tests.
`tests/integration/` are untouched.

it also introduces an `allow_network` marker to explicitly allow network
access.

## Test Plan
`./scripts/unit-tests.sh`
2025-07-12 16:53:54 -07:00
Ben Browning
51d9fd4808
fix: Don't cache clients for passthrough auth providers (#2728)
Some checks failed
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 45s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
# What does this PR do?

Some of our inference providers support passthrough authentication via
`x-llamastack-provider-data` header values. This fixes the providers
that support passthrough auth to not cache their clients to the backend
providers (mostly OpenAI client instances) so that the client connecting
to Llama Stack has to provide those auth values on each and every
request.

## Test Plan

I added some unit tests to ensure we're not caching clients across
requests for all the fixed providers in this PR.

```
uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py
```


I also ran some of our OpenAI compatible API integration tests for each
of the changed providers, just to ensure they still work. Note that
these providers don't actually pass all these tests (for unrelated
reasons due to quirks of the Groq and Together SaaS services), but
enough of the tests passed to confirm the clients are still working as
intended.

### Together

```
ENABLE_TOGETHER="together" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "together/meta-llama/Llama-3.1-8B-Instruct"
```

### OpenAI

```
ENABLE_OPENAI="openai" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "openai/gpt-4o-mini"
```

### Groq

```
ENABLE_GROQ="groq" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "groq/meta-llama/Llama-3.1-8B-Instruct"
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-11 13:38:27 -07:00
Matthew Farrellee
30b2e6a495
chore: default to pytest asyncio-mode=auto (#2730)
# What does this PR do?

previously, developers who ran `./scripts/unit-tests.sh` would get
`asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and
`@pytest_asyncio.fixture` were redundent. developers who ran `pytest`
directly would get pytest's default (strict mode), would run into errors
leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture`
to their code.

with this change -
- `asyncio_mode=auto` is included in `pyproject.toml` making behavior
consistent for all invocations of pytest
- removes all redundant `@pytest_asyncio.fixture` and
`@pytest.mark.asyncio`
 - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0`

## Test Plan

- `./scripts/unit-tests.sh`
- `uv run pytest tests/unit`
2025-07-11 13:00:24 -07:00
Sébastien Han
2ebc172f33
fix: pin opentelemtry version (#2722)
Some checks failed
Integration Tests / test-matrix (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 54s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 2m9s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s
Integration Tests / discover-tests (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test Llama Stack Build / build (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
# What does this PR do?

Otherwise we can get old versions like 1.11 and experience this error:

```
ModuleNotFoundError: No module named 'opentelemetry.exporter.otlp.proto.http.metric_exporter'
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-11 16:25:51 +02:00
Nathan Weinberg
7915551eee
build: replace "python-jose" with "python-jose[cryptography]" (#2695)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 42s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s
Test Llama Stack Build / build-single-provider (push) Failing after 43s
Python Package Build Test / build (3.12) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 54s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m43s
# What does this PR do?
`python-jose` recommends using the `cryptography` backend in their
installation docs:
https://github.com/mpdavis/python-jose?tab=readme-ov-file#cryptographic-backends

This PR modifies the LLS dependencies to use this instead of the current
`native-python`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-09 13:21:57 -07:00
Francisco Arceo
83c89265e0
chore: Adding unit tests for Milvus and OpenAI compatibility (#2640)
Some checks failed
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
- Enabling Unit tests for Milvus to start to test OpenAI compatibility
and fixing a few bugs.
- Also fixed an inconsistency in the Milvus config between remote and
inline.
- Added pymilvus to extras for testing in CI

I'm going to refactor this later to include the other inline providers
so that we can catch issues sooner.

I have another PR where I've been testing to find other bugs in the
implementation (and required changes drafted here:
https://github.com/meta-llama/llama-stack/pull/2617).

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-08 00:50:16 -07:00
Ashwin Bharambe
f1c62e0af0 build: Bump version to 0.2.14 2025-07-04 12:12:12 +05:30
Ashwin Bharambe
21669b14e7
fix(docs): add setuptools explicitly (#2547)
Some checks failed
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 31s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 35s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Test Llama Stack Build / build (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m23s
Given the shift to python3.12, we need to explicitly depend on
`setuptools` for the pkg_resources import

## Test Plan

Run 
```
cd local/llama-stack
UV_PROJECT_ENVIRONMENT=/tmp/docs uv sync --frozen --group docs

cd /tmp/docs
uv run python -m sphinx -T -b html -d _build/doctrees -D language=en \
   ~/local/llama-stack/docs/source/ \
  /tmp/docs/html
```
2025-06-28 08:14:25 +05:30
github-actions[bot]
709eb7da33 build: Bump version to 0.2.13 2025-06-27 23:56:14 +00:00
Yuan Tang
9baa16e498
fix(security): Upgrade protobuf and aiohttp. Fixes CVE-2025-4565 (#2541)
# What does this PR do?

Fixes CVE-2025-4565 and the following warning:

```
warning: `aiohttp==3.11.13` is yanked (reason: "Regression: https://github.com/aio-libs/aiohttp/issues/10617")
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-27 06:58:38 -07:00
Yuan Tang
40fdce79b3
fix(security): Upgrade urllib3 to v2.5.0. Fixes CVE-2025-50181 and CVE-2025-50182 (#2534)
Some checks failed
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Test External Providers / test-external-providers (venv) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m21s
This fixes CVE-2025-50181 and CVE-2025-50182.

Changes via:
```
uv sync --upgrade-package urllib3
uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-27 10:46:47 +02:00
Sébastien Han
dbdc811d16
chore: isolate bare minimum project dependencies (#2282)
Some checks failed
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 48s
# What does this PR do?

The goal is to promote the minimal set of dependencies the project needs
to run, this includes:

* dependencies needed to work with the CLI
* dependencies needed for the server to run with no providers

This also:
* Relocate redundant dependencies out of the core project and into the
  individual providers that actually require them.
* Include all necessary server dependencies so the project can run
  standalone, even without any providers.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

Build and run distro a server.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 10:14:27 +02:00
Sébastien Han
9c8be89fb6
chore: bump python supported version to 3.12 (#2475)
Some checks failed
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 33s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m54s
# What does this PR do?

The project now supports Python >= 3.12

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-24 09:22:04 +05:30
ehhuang
6fde601765
chore: upgrade hf hub dependency (#2487)
Some checks failed
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Python Package Build Test / build (3.11) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 33s
Test Llama Stack Build / build-single-provider (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m12s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
# What does this PR do?
CI tests have been failing with
.venv/lib/python3.12/site-packages/peft/auto.py:21: in <module>
    from transformers import (
.venv/lib/python3.12/site-packages/transformers/__init__.py:27: in
<module>
    from . import dependency_versions_check

.venv/lib/python3.12/site-packages/transformers/dependency_versions_check.py:57:
in <module>
    require_version_core(deps[pkg])
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:117:
in require_version_core
    return require_version(requirement, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:111:
in require_version
    _compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:44: in
_compare_versions
    raise ImportError(
E ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
E Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
------------------------------ Captured log setup
------------------------------
INFO llama_stack.providers.remote.inference.ollama.ollama:ollama.py:106
checking connectivity to Ollama at `http://0.0.0.0:11434`.../
=========================== short test summary info
============================
ERROR
tests/integration/providers/test_providers.py::TestProviders::test_providers
- ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
=================== 1 skipped, 4 warnings, 1 error in 9.52s
====================

## Test Plan
CI
2025-06-20 15:50:54 -07:00
github-actions[bot]
d70573bd47 build: Bump version to 0.2.12 2025-06-20 21:06:17 +00:00
Charlie Doern
d12f195f56
feat: drop python 3.10 support (#2469)
# What does this PR do?

dropped python3.10, updated pyproject and dependencies, and also removed
some blocks of code with special handling for enum.StrEnum

Closes #2458

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-19 12:07:14 +05:30
github-actions[bot]
7d812e3bf0 build: Bump version to 0.2.11
Some checks failed
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 55s
2025-06-17 19:08:17 +00:00
Yuan Tang
f6718b2408
fix(security): Upgrade requests to 2.32.4. Fixes CVE-2024-47081 (#2425)
# What does this PR do?

This address https://github.com/advisories/GHSA-9hjg-9r4m-mvj7.

Diff was generated via:
```
uv sync --upgrade-package requests
uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-10 08:33:28 +05:30
Ibrahim Haroon
a34cef925b
fix(faiss): handle case where distance is 0 by setting d to minimum positive… (#2387)
# What does this PR do?
Adds try-catch to faiss `query_vector` function for when the distance
between the query embedding and an embedding within the vector db is 0
(identical vectors). Catches `ZeroDivisionError` and then appends `(1.0
/ sys.float_info.min)` to `scores` to represent maximum similarity.

<!-- If resolving an issue, uncomment and update the line below -->
Closes [#2381]

## Test Plan
Checkout this PR

Execute this code and there will no longer be a `ZeroDivisionError`
exception
```
from llama_stack_client import LlamaStackClient

base_url = "http://localhost:8321"
client = LlamaStackClient(base_url=base_url)

models = client.models.list()
embedding_model = (
    em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = 384

_ = client.vector_dbs.register(
    vector_db_id="foo_db",
    embedding_model=embedding_model,
    embedding_dimension=embedding_dimension,
    provider_id="faiss",
)

chunk = {
    "content": "foo",
    "mime_type": "text/plain",
    "metadata": {
        "document_id": "foo-id"
    }
}

client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk])
client.vector_io.query(vector_db_id="foo_db", query="foo")
```

### Running unit tests
`uv run pytest tests/unit/rag/test_rag_query.py -v`

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
2025-06-07 16:09:46 -04:00
github-actions[bot]
692709cd45 build: Bump version to 0.2.10
Some checks failed
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 27s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Failing after 1m16s
2025-06-05 22:56:39 +00:00
Hardik Shah
102516f33c
fix: Pin fastapi to avoid picking up spurious versions in test pypi (#2409)
as titled
2025-06-05 15:33:30 -07:00
Hardik Shah
04592b9590
fix: update pyproject to include recursive LS deps (#2404)
trying to run `llama` cli after installing wheel fails with this error 
```
Traceback (most recent call last):
  File "/tmp/tmp.wdZath9U6j/.venv/bin/llama", line 4, in <module>
    from llama_stack.cli.llama import main
  File "/tmp/tmp.wdZath9U6j/.venv/lib/python3.10/site-packages/llama_stack/__init__.py", line 7, in <module>
    from llama_stack.distribution.library_client import (  # noqa: F401
ModuleNotFoundError: No module named 'llama_stack.distribution.library_client'
``` 

This PR fixes it by ensurring that all sub-directories of `llama_stack`
are also included.

Also, fixes the missing `fastapi` dependency issue.
2025-06-05 11:46:48 -07:00
ehhuang
3c9a10d2fe
feat: reference implementation for files API (#2330)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 53s
# What does this PR do?
TSIA
Added Files provider to the fireworks template. Might want to add to all
templates as a follow-up.

## Test Plan
llama-stack pytest tests/unit/files/test_files.py

llama-stack llama stack build --template fireworks --image-type conda
--run
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v
tests/integration/files/
2025-06-02 21:54:24 -07:00
github-actions[bot]
ad15276da1 build: Bump version to 0.2.9
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Pre-commit / pre-commit (push) Failing after 1m34s
2025-05-30 19:43:09 +00:00
Sébastien Han
63a9f08c9e
chore: use starlette built-in Route class (#2267)
# What does this PR do?

Use a more common pattern and known terminology from the ecosystem,
where Route is more approved than Endpoint.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-28 09:53:33 -07:00
Sébastien Han
4f3f28f718
chore: use dependency-groups for dev (#2287)
# What does this PR do?

The previous `[project.optional-dependencies]` was misrepresenting what
the packages were. They were NOT optional dependencies to the project
but development dependencies. Unlike optional dependencies, development
dependencies are local-only and will not be included in the project
requirements when published to PyPI or other indexes. As such,
development dependencies are not included in the [project] table.
Additionally, the dev group is synced by default.

Source:

https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 23:00:17 +02:00
github-actions[bot]
7105a25b0f build: Bump version to 0.2.8 2025-05-27 20:28:29 +00:00
Sébastien Han
448f00903d
chore: mark blobpath as optional (#2271)
# What does this PR do?

This is not a core dependency of the distro server. It's only necessary
when using `inline::rag-runtime` or `inline::meta-reference` providers.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 10:55:24 +02:00
Ashwin Bharambe
3faf1e4a79
feat: enable MCP execution in Responses impl (#2240)
## Test Plan

```
pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:together --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-05-24 14:20:42 -07:00
Yuan Tang
055f48b6a2
fix(security): Upgrade setuptools to v80.8.0. Fixes CVE-2025-47273 (#2242)
# What does this PR do?

This fixes a high vulnerable CVE in `setuptools`:
https://github.com/advisories/GHSA-5rjg-fvgr-3xxf

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-05-24 06:57:24 -07:00
ehhuang
b054023800
chore: add sqlalchemy to test dependencies (#2236)
# What does this PR do?


## Test Plan
2025-05-23 10:33:38 -07:00
ehhuang
549812f51e
feat: implement get chat completions APIs (#2200)
# What does this PR do?
* Provide sqlite implementation of the APIs introduced in
https://github.com/meta-llama/llama-stack/pull/2145.
* Introduced a SqlStore API: llama_stack/providers/utils/sqlstore/api.py
and the first Sqlite implementation
* Pagination support will be added in a future PR.

## Test Plan
Unit test on sql store:
<img width="1005" alt="image"
src="https://github.com/user-attachments/assets/9b8b7ec8-632b-4667-8127-5583426b2e29"
/>


Integration test:
```
INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type conda --run
```
```
LLAMA_STACK_CONFIG=http://localhost:5001 INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-fp16" -k 'inference_store and openai'
```
2025-05-21 22:21:52 -07:00
Sébastien Han
85b5f3172b
docs: misc cleanup (#2223)
# What does this PR do?

* remove requirements.txt to use pyproject.toml as the source of truth
* update relevant docs

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-21 17:35:27 +02:00
Sébastien Han
c25acedbcd
chore: remove k8s auth in favor of k8s jwks endpoint (#2216)
# What does this PR do?

Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our
recent oauth2 recent implementation.
The CI test has been kept intact for validation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-21 16:23:54 +02:00
Ashwin Bharambe
c7015d3d60
feat: introduce OAuth2TokenAuthProvider and notion of "principal" (#2185)
This PR adds a notion of `principal` (aka some kind of persistent
identity) to the authentication infrastructure of the Stack. Until now
we only used access attributes ("claims" in the more standard OAuth /
OIDC setup) but we need the notion of a User fundamentally as well.
(Thanks @rhuss for bringing this up.)

This value is not yet _used_ anywhere downstream but will be used to
segregate access to resources.

In addition, the PR introduces a built-in JWT token validator so the
Stack does not need to contact an authentication provider to validating
the authorization and merely check the signed token for the represented
claims. Public keys are refreshed via the configured JWKS server. This
Auth Provider should overwhelmingly be considered the default given the
seamless integration it offers with OAuth setups.
2025-05-18 17:54:19 -07:00
Charlie Doern
f02f7b28c1
feat: add huggingface post_training impl (#2132)
# What does this PR do?


adds an inline HF SFTTrainer provider. Alongside touchtune -- this is a
super popular option for running training jobs. The config allows a user
to specify some key fields such as a model, chat_template, device, etc

the provider comes with one recipe `finetune_single_device` which works
both with and without LoRA.

any model that is a valid HF identifier can be given and the model will
be pulled.

this has been tested so far with CPU and MPS device types, but should be
compatible with CUDA out of the box

The provider processes the given dataset into the proper format,
establishes the various steps per epoch, steps per save, steps per eval,
sets a sane SFTConfig, and runs n_epochs of training

if checkpoint_dir is none, no model is saved. If there is a checkpoint
dir, a model is saved every `save_steps` and at the end of training.


## Test Plan

re-enabled post_training integration test suite with a singular test
that loads the simpleqa dataset:
https://huggingface.co/datasets/llamastack/simpleqa and a tiny granite
model: https://huggingface.co/ibm-granite/granite-3.3-2b-instruct. The
test now uses the llama stack client and the proper post_training API

runs one step with a batch_size of 1. This test runs on CPU on the
Ubuntu runner so it needs to be a small batch and a single step.

[//]: # (## Documentation)

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 14:41:28 -07:00
github-actions[bot]
65cf076f13 build: Bump version to 0.2.7 2025-05-16 20:32:06 +00:00
github-actions[bot]
23d9f3b1fb build: Bump version to 0.2.6 2025-05-12 18:02:05 +00:00
Christian Zaccaria
18d2312690
fix: test_datasets HF scenario in CI (#2090)
# What does this PR do?
**Fixes** #1959 

HuggingFace provides several loading paths that the datasets library can
use. My theory on why the test would previously fail intermittently is
because when calling `load_dataset(...)`, it may be trying several
options such as local cache, Hugging Face Hub, or a dataset script, or
other. There's one of these options that seem to work inconsistently in
the CI.

The HuggingFace datasets library relies on the `transformers` package to
load certain datasets such as `llamastack/simpleqa`, and by adding the
package, we can see the dataset is loaded consistently via the Hugging
Face Hub.

Please see PR in my fork demonstrating over 7 consecutive passes:
https://github.com/ChristianZaccaria/llama-stack/pull/1 

**Some References:**
- https://github.com/huggingface/transformers/issues/8690
- https://huggingface.co/docs/datasets/en/loading 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-05-06 14:09:15 +02:00
github-actions[bot]
6b4c218788 build: Bump version to 0.2.5 2025-05-03 21:31:01 +00:00
Ashwin Bharambe
799286fe52 fix: Bump version to 0.2.4 2025-04-29 10:34:17 -07:00
Sébastien Han
79851d93aa
feat: Add Kubernetes authentication (#1778)
# What does this PR do?

This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:

- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage

The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints

## Test Plan

Setup a Kube cluster using Kind or Minikube.

Run a server with:

```
server:
  port: 8321
  auth:
    provider_type: kubernetes
    config:
      api_server_url: http://url
      ca_cert_path: path/to/cert (optional)
```

Run:

```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```

Or replace "my-user" with your service account.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-28 22:24:58 +02:00
Yuan Tang
28687b0e85
fix: Bump h11 to 0.16.0 to fix cve-2025-43859 (#2041)
This resolves a new critical severity on h11. See
https://access.redhat.com/security/cve/cve-2025-43859. We should
consider releasing a new patch with this fix.

This was updated via:

```
uv add "h11>=0.16.0"
uv export --frozen --no-hashes --no-emit-project --output-file=requirements.txt
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-27 11:45:35 -07:00
Ben Browning
dc46725f56
fix: properly handle streaming client disconnects (#2000)
# What does this PR do?

Previously, when a streaming client would disconnect before we were
finished streaming the entire response, an error like the below would
get raised from the `sse_generator` function in
`llama_stack/distribution/server/server.py`:

```
AttributeError: 'coroutine' object has no attribute 'aclose'. Did you mean: 'close'?
```

This was because we were calling `aclose` on a coroutine instead of the
awaited value from that coroutine. This change fixes that, so that we
save off the awaited value and then can call `aclose` on it if we
encounter an `asyncio.CancelledError`, like we see when a client
disconnects before we're finished streaming.

The other changes in here are to add a simple set of tests for the happy
path of our SSE streaming and this client disconnect path.

That unfortunately requires adding one more dependency into our unit
test section of pyproject.toml since `server.py` requires loading some
of the telemetry code for me to test this functionality.

## Test Plan

I wrote the tests in `tests/unit/server/test_sse.py` first, verified the
client disconnected test failed before my change, and that it passed
afterwards.

```
python -m pytest -s -v tests/unit/server/test_sse.py
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-23 15:44:28 +02:00
ehhuang
8bd6665775
chore(verification): update README and reorganize generate_report.py (#1978)
# What does this PR do?


## Test Plan
uv run --with-editable ".[dev]" python
tests/verifications/generate_report.py --run-tests
2025-04-17 10:41:22 -07:00
Ashwin Bharambe
ff14773fa7 fix: update llama stack client dependency 2025-04-12 18:14:33 -07:00
Ben Browning
2b2db5fbda
feat: OpenAI-Compatible models, completions, chat/completions (#1894)
# What does this PR do?

This stubs in some OpenAI server-side compatibility with three new
endpoints:

/v1/openai/v1/models
/v1/openai/v1/completions
/v1/openai/v1/chat/completions

This gives common inference apps using OpenAI clients the ability to
talk to Llama Stack using an endpoint like
http://localhost:8321/v1/openai/v1 .

The two "v1" instances in there isn't awesome, but the thinking is that
Llama Stack's API is v1 and then our OpenAI compatibility layer is
compatible with OpenAI V1. And, some OpenAI clients implicitly assume
the URL ends with "v1", so this gives maximum compatibility.

The openai models endpoint is implemented in the routing layer, and just
returns all the models Llama Stack knows about.

The following providers should be working with the new OpenAI
completions and chat/completions API:
* remote::anthropic (untested)
* remote::cerebras-openai-compat (untested)
* remote::fireworks (tested)
* remote::fireworks-openai-compat (untested)
* remote::gemini (untested)
* remote::groq-openai-compat (untested)
* remote::nvidia (tested)
* remote::ollama (tested)
* remote::openai (untested)
* remote::passthrough (untested)
* remote::sambanova-openai-compat (untested)
* remote::together (tested)
* remote::together-openai-compat (untested)
* remote::vllm (tested)

The goal to support this for every inference provider - proxying
directly to the provider's OpenAI endpoint for OpenAI-compatible
providers. For providers that don't have an OpenAI-compatible API, we'll
add a mixin to translate incoming OpenAI requests to Llama Stack
inference requests and translate the Llama Stack inference responses to
OpenAI responses.

This is related to #1817 but is a bit larger in scope than just chat
completions, as I have real use-cases that need the older completions
API as well.

## Test Plan

### vLLM

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct"
```

### ollama
```
INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0"
```



## Documentation

Run a Llama Stack distribution that uses one of the providers mentioned
in the list above. Then, use your favorite OpenAI client to send
completion or chat completion requests with the base_url set to
http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the
host and port of your Llama Stack server, if different.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-11 13:14:17 -07:00