Commit graph

2108 commits

Author SHA1 Message Date
ehhuang
1d3f27fe5b
fix: resume responses with tool call output (#2524)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 49s
Test External Providers / test-external-providers (venv) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 2m5s
# What does this PR do?
closes #2522 

## Test Plan
added integration test
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v
tests/integration/agents/test_openai_responses.py --text-model
"accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k
'function_call'
2025-06-25 14:43:37 -07:00
Francisco Arceo
82f13fe83e
feat: Add ChunkMetadata to Chunk (#2497)
# What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.

More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.

```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 

## Test Plan
Added unit tests

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-25 15:55:23 -04:00
Ben Browning
fa0b0c13d4
fix: Ollama should be optional in starter distro (#2482)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 1m10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m8s
Python Package Build Test / build (3.13) (push) Failing after 1m6s
Test External Providers / test-external-providers (venv) (push) Failing after 1m4s
Pre-commit / pre-commit (push) Successful in 2m33s
# What does this PR do?

Our starter distro required Ollama to be running (and a large list of
models available in that Ollama) to successfully start. This adjusts
things so that Ollama does not have to be running to use the starter
template / distro.

To accomplish this, a few changes were needed:

* The Ollama provider is now configurable whether it raises an Exception
or just logs a warning when it cannot reach the Ollama server on
startup. The default is to raise an exception (same as previous
behavior), but in the starter template we adjust this to just log a
warning so that we can bring the stack up without needing a running
Ollama server.

* The starter template no longer specifies a default list of models for
Ollama, as any models specified there need to actually be pulled and
available in Ollama. Instead, it adds a new
`OLLAMA_INFERENCE_MODEL` environment variable where users can provide an
optional model to register with the Ollama provider on startup.
Additional models can also be registered via the typical
`models.register(...)` at runtime.

* The vLLM template was adjusted to also allow an optional
`VLLM_INFERENCE_MODEL` specified on startup, so that the behavior
between vLLM and Ollama was consistent here to make it easy to get up
and running quickly.

* The default vector store was changed from sqlite-vec to faiss.
sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment
variable, like we do for chromadb and pgvector. This is due to
sqlite-vec not shipping proper arm64 binaries, like we previously fixed
in #1530 for the ollama distribution.

## Test Plan

With this change, the following scenarios now work with the starter
template that did not before:

* no Ollama running
* Ollama running but not all of the Llama models pulled locally
* Ollama running with a custom model registered on startup
* vLLM running with a custom model registered on startup
* running the starter template on linux/arm64, like when running
containers on Mac without rosetta emulation

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-25 15:54:00 +02:00
Varsha
cfee63bd0d
feat: Add search_mode support to OpenAI vector store API (#2500)
Some checks failed
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s
Test Llama Stack Build / build (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 44s
Test External Providers / test-external-providers (venv) (push) Failing after 47s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m12s
# What does this PR do?
Add search_mode parameter (vector/keyword/hybrid) to
openai_search_vector_store method. Fixes OpenAPI
code generation by using str instead of Literal type.

Closes: #2459 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-24 20:38:47 -04:00
ehhuang
114946ae88
chore: fix build script bug (#2507)
# What does this PR do?
Fixes
```
Installing pip dependencies
error: Failed to parse: `scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http`
  Caused by: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `p`
scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
             ^
ERROR    2025-06-24 11:33:33,362 llama_stack.distribution.build:145 uncategorized: Failed to build target myenv with return code 2
Error building stack: Failed to build image myenv
```
## Test Plan
2025-06-24 12:05:22 -07:00
Sébastien Han
450ed920d6
chore: do not build on auth ci test (#2505)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 23s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 2m4s
# What does this PR do?

Since we are using a very minimal run.yaml, there is not need to build.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-24 21:08:33 +05:30
Ashwin Bharambe
73c18feac4
fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503) 2025-06-24 18:55:56 +05:30
ehhuang
7fa8f23555
fix(ui): ensure initial data fetch only happens once (#2486)
# What does this PR do?
Bug:
1. go to responses chat logs in UI
2. go to chat completions logs page
3. observe that same data appears in the table twice

This is because `fetchData` is called multiple times when multiple
renders occur.

## Test Plan
manual testing of above bug repro steps
2025-06-24 12:22:55 +02:00
Sébastien Han
9c8be89fb6
chore: bump python supported version to 3.12 (#2475)
Some checks failed
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 33s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m54s
# What does this PR do?

The project now supports Python >= 3.12

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-24 09:22:04 +05:30
Rohan Awhad
d797f9aec1
fix: #2495 FileNotFound Err in container image (#2498)
# What does this PR do?

Closes #2495 

Changes:
- Delay the `COPY run.yaml` into docker image step until after external
provider handling
- Split the check for `external_providers_dir` into “non-empty” and
“directory exists"


## Test Plan

0. Create and Activate venv

1. Create a `simple_build.yaml`
    ```yaml
    version: '2'
    distribution_spec:
      providers:
        inference:
          - remote::openai
    image_type: container
    image_name: openai-stack
    ```

2. Run llama stack build:
```bash
llama stack build --config simple_build.yaml
```

3. Run the docker container:
```bash
docker run \
  -p 8321:8321 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  openai_stack:0.2.12
```

This should show server is running.
```
INFO     2025-06-23 19:07:57,832 llama_stack.distribution.distribution:151 core: Loading external providers from /.llama/providers.d
INFO     2025-06-23 19:07:59,324 __main__:572 server: Listening on ['::', '0.0.0.0']:8321
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2025-06-23 19:07:59,336 __main__:156 server: Starting up
INFO:     Application startup complete.                                                                             
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

Notice the first line:
```
Loading external providers from /.llama/providers.d
```
This is expected behaviour.

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-24 09:08:08 +05:30
dependabot[bot]
929ac618ce
chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 6.3.0 (#2488)
Some checks failed
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 21s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 22s
Python Package Build Test / build (3.12) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 20s
Python Package Build Test / build (3.11) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 29s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 28s
Unit Tests / unit-tests (3.11) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Pre-commit / pre-commit (push) Successful in 48s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 21s
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.0.1 to 6.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.3.0 🌈 Use latest version from manifest-file</h2>
<h2>Changes</h2>
<p>If a manifest-file is supplied the default value of the version input
(latest) will get the latest version available in the manifest. That
might not be the actual latest version available in the official uv
repo.</p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Use latest version from manifest-file <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li>
</ul>
<h2>v6.2.0 🌈  New input manifest-file</h2>
<h2>Changes</h2>
<p>This release adds a new input <code>manifest-file</code>.</p>
<p>The <code>manifest-file</code> input allows you to specify a JSON
manifest that lists available uv versions,
architectures, and their download URLs. By default, this action uses the
manifest file contained
in this repository, which is automatically updated with each release of
uv.</p>
<p>The manifest file contains an array of objects, each describing a
version,
architecture, platform, and the corresponding download URL.</p>
<p>You can supply a custom manifest file URL to define additional
versions,
architectures, or different download URLs.
This is useful if you maintain your own uv builds or want to override
the default sources.</p>
<p>For example:</p>
<pre lang="json"><code>[
  {
    &quot;version&quot;: &quot;0.7.12-alpha.1&quot;,
&quot;artifactName&quot;:
&quot;uv-x86_64-unknown-linux-gnu.tar.gz&quot;,
    &quot;arch&quot;: &quot;x86_64&quot;,
    &quot;platform&quot;: &quot;unknown-linux-gnu&quot;,
&quot;downloadUrl&quot;:
&quot;https://release.pyx.dev/0.7.12-alpha.1/uv-x86_64-unknown-linux-gnu.tar.gz&quot;
  },
  ...
]
</code></pre>
<pre lang="yaml"><code>- name: Use a custom manifest file
  uses: astral-sh/setup-uv@v6
  with:
manifest-file: &quot;https://example.com/my-custom-manifest.json&quot;
</code></pre>
<blockquote>
<p>[!WARNING]</p>
</blockquote>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="445689ea25"><code>445689e</code></a>
Use latest version from manifest-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li>
<li><a
href="a02a550bdd"><code>a02a550</code></a>
Look for version-manifest.json relative to action path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/456">#456</a>)</li>
<li><a
href="60cc2b4585"><code>60cc2b4</code></a>
Add input manifest-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/454">#454</a>)</li>
<li><a
href="7bbb36f434"><code>7bbb36f</code></a>
chore: update known versions for 0.7.13 and 0.7.12 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/444">#444</a>)</li>
<li><a
href="60ecb381b4"><code>60ecb38</code></a>
Set expected cache dir drive to C: on windows (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)</li>
<li><a
href="252c995424"><code>252c995</code></a>
chore: update known versions for 0.7.11 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/442">#442</a>)</li>
<li><a
href="477a814f2d"><code>477a814</code></a>
chore: update known versions for 0.7.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/440">#440</a>)</li>
<li><a
href="9b19f8f4b1"><code>9b19f8f</code></a>
Add warning about shadowed uv binaries to
<code>activate-environment</code> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/439">#439</a>)</li>
<li><a
href="d44461ea9f"><code>d44461e</code></a>
chore: update known versions for 0.7.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/437">#437</a>)</li>
<li><a
href="c19c1b1ffd"><code>c19c1b1</code></a>
Check that all jobs are in all-tests-passed.needs (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/432">#432</a>)</li>
<li>Additional commits viewable in <a
href="6b9c6063ab...445689ea25">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=6.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-23 11:21:06 +02:00
ehhuang
6fde601765
chore: upgrade hf hub dependency (#2487)
Some checks failed
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Python Package Build Test / build (3.11) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 33s
Test Llama Stack Build / build-single-provider (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m12s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
# What does this PR do?
CI tests have been failing with
.venv/lib/python3.12/site-packages/peft/auto.py:21: in <module>
    from transformers import (
.venv/lib/python3.12/site-packages/transformers/__init__.py:27: in
<module>
    from . import dependency_versions_check

.venv/lib/python3.12/site-packages/transformers/dependency_versions_check.py:57:
in <module>
    require_version_core(deps[pkg])
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:117:
in require_version_core
    return require_version(requirement, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:111:
in require_version
    _compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:44: in
_compare_versions
    raise ImportError(
E ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
E Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
------------------------------ Captured log setup
------------------------------
INFO llama_stack.providers.remote.inference.ollama.ollama:ollama.py:106
checking connectivity to Ollama at `http://0.0.0.0:11434`.../
=========================== short test summary info
============================
ERROR
tests/integration/providers/test_providers.py::TestProviders::test_providers
- ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
=================== 1 skipped, 4 warnings, 1 error in 9.52s
====================

## Test Plan
CI
2025-06-20 15:50:54 -07:00
ehhuang
23b7dc7b37
fix: stack build (#2485)
# What does this PR do?

probably related to 3.11 upgrade

^^^^
File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.11/site-packages/termcolor/termcolor.py",
line 147, in colored
    text = fmt_str % (COLORS[color], text)
                      ~~~~~~^^^^^^^
KeyError: 'light_blue'

## Test Plan
2025-06-20 15:15:43 -07:00
github-actions[bot]
d70573bd47 build: Bump version to 0.2.12 2025-06-20 21:06:17 +00:00
ehhuang
d3b60507d7
feat: support auth attributes in inference/responses stores (#2389)
# What does this PR do?
Inference/Response stores now store user attributes when inserting, and
respects them when fetching.

## Test Plan
pytest tests/unit/utils/test_sqlstore.py
2025-06-20 10:24:45 -07:00
Costa Shulyupin
7930c524f9
docs: Fix spacing (#2481)
Some checks failed
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 13s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.11, vector_io) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Pre-commit / pre-commit (push) Successful in 1m14s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 13s
![image](https://github.com/user-attachments/assets/4b8e0e9c-1622-41dd-a0f4-178b6b452029)


Replace misaligned tab with spaces

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
2025-06-20 13:21:58 +02:00
ehhuang
6832e8a658
feat: remove score_threshold constraint (#2479)
Some checks failed
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 30s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 48s
Test External Providers / test-external-providers (venv) (push) Failing after 1m5s
Pre-commit / pre-commit (push) Successful in 2m17s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 28s
Python Package Build Test / build (3.11) (push) Failing after 3s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 28s
# What does this PR do?
See inline comment.


fixes test

_
test_openai_vector_store_search_with_high_score_filter[llama_stack_client-meta-llama/Llama-3.3-70B-Instruct-meta-llama/Llama-4-Scout-17B-16E-Instruct-all-MiniLM-L6-v2-None-None]
_
llama-stack/llama_stack/distribution/library_client.py:98: in
convert_to_pydantic
    return TypeAdapter(annotation).validate_python(value)
.venv/lib/python3.10/site-packages/pydantic/type_adapter.py:421: in
validate_python
    return self.validator.validate_python(
E pydantic_core._pydantic_core.ValidationError: 1 validation error for
nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal

The above exception was the direct cause of the following exception:

llama-stack/tests/integration/vector_io/test_openai_vector_stores.py:376:
in test_openai_vector_store_search_with_high_score_filter
    search_response = compat_client.vector_stores.search(

.venv/lib/python3.10/site-packages/llama_stack_client/resources/vector_stores/vector_stores.py:356:
in search
    return self._post(

.venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1232:
in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream,
stream_cls=stream_cls))
llama-stack/llama_stack/distribution/library_client.py:177: in request
result = loop.run_until_complete(self.async_client.request(*args,
**kwargs))

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/asyncio/base_events.py:649:
in run_until_complete
    return future.result()
llama-stack/llama_stack/distribution/library_client.py:292: in request
    response = await self._call_non_streaming(
llama-stack/llama_stack/distribution/library_client.py:313: in
_call_non_streaming
    body = self._convert_body(path, options.method, body)
llama-stack/llama_stack/distribution/library_client.py:425: in
_convert_body
converted_body[param_name] = convert_to_pydantic(param.annotation,
value)
llama-stack/llama_stack/distribution/library_client.py:112: in
convert_to_pydantic
raise ValueError(f"Failed to convert parameter {value} into
{annotation}: {e}") from e
E ValueError: Failed to convert parameter {'score_threshold':
1.3458905661753127} into
llama_stack.apis.vector_io.vector_io.SearchRankingOptions | None: 1
validation error for nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal

## Test Plan
2025-06-20 09:17:42 +05:30
Eran Cohen
747e594680
feat: expand set of known gemini models (#2471)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 39s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Python Package Build Test / build (3.12) (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 45s
Pre-commit / pre-commit (push) Successful in 1m57s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Python Package Build Test / build (3.11) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 3s
feat: Add Gemini 2.0 and 2.5 models

This commit expands the set of known Gemini models by introducing:
- `gemini/gemini-2.0-flash`
- `gemini/gemini-2.5-flash`
- `gemini/gemini-2.5-pro`

These new models are added to `LLM_MODEL_IDS` for broader compatibility
and updated in `run.yaml` to allow for their immediate use in starter
configurations.

Signed-off-by: Eran Cohen <eranco@redhat.com>
2025-06-19 12:19:37 -04:00
Ben Browning
f394c7f2d9
feat: Add missing Vector Store Files API surface (#2468)
Some checks failed
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s
Python Package Build Test / build (3.11) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s
Test External Providers / test-external-providers (venv) (push) Failing after 43s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m4s
# What does this PR do?

This adds the ability to list, retrieve, update, and delete Vector Store
Files. It implements these new APIs for the faiss and sqlite-vec
providers, since those are the two that also have the rest of the vector
store files implementation.

Closes #2445 

## Test Plan

### test_openai_vector_stores Integration Tests

There are a number of new integration tests added, which I ran for each
provider as outlined below.

faiss (from ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

sqlite-vec (from starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

### file_search verification tests

I also ensured the file_search verification tests continue to work, both
for faiss and sqlite-vec.

faiss (ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.2-3B-Instruct
```


sqlite-vec (starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-19 11:08:24 -04:00
Ihar Hrachyshka
a2f054607d
fix: cancel scheduler tasks on shutdown (#2130)
# What does this PR do?

Scheduler: cancel tasks on shutdown.

Otherwise the currently running tasks will never exit (before they
actually complete), which means the process can't be properly shut down
(only with SIGKILL).

Ideally, we let tasks know that they are about to shutdown and give them
some time to do so; but in the lack of the mechanism, it's better to
cancel than linger forever.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Start a long running task (e.g. torchtune or external kfp-provider
training).
Ctr-C the process in TTY. Confirm it exits in reasonable time.

```
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
13:32:26.187 - INFO - Shutting down
13:32:26.187 - INFO - Shutting down DatasetsRoutingTable
13:32:26.187 - INFO - Shutting down DatasetIORouter
13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl
    Traceback (most recent call last):
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
        return self._loop.run_until_complete(task)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
    asyncio.exceptions.CancelledError

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module>
        executor_main()
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main
        output_file = executor.execute()
                      ^^^^^^^^^^^^^^^^^^
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute
        result = self.func(**func_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component
        asyncio.run(recipe.setup())
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
        return runner.run(main)
               ^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run
        raise KeyboardInterrupt()
    KeyboardInterrupt


13:32:31.219 - ERROR - Task 'component' finished with status FAILURE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO     2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m
         finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
ERROR    2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed.
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45   │
         │ in do                                                                                                       │
         │                                                                                                             │
         │    42 │   │   │                                                                                             │
         │    43 │   │   │   job.status = JobStatus.running                                                            │
         │    44 │   │   │   try:                                                                                      │
         │ ❱  45 │   │   │   │   artifacts = self._to_artifacts(job.handler().output)                                  │
         │    46 │   │   │   │   for artifact in artifacts:                                                            │
         │    47 │   │   │   │   │   on_artifact_collected_cb(artifact)                                                │
         │    48                                                                                                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │
         │ ent.py:101 in __call__                                                                                      │
         │                                                                                                             │
         │    98 │   │   │   │   f'{self.name}() missing {len(missing_arguments)} required '                           │
         │    99 │   │   │   │   f'{argument_or_arguments}: {arguments}.')                                             │
         │   100 │   │                                                                                                 │
         │ ❱ 101 │   │   return pipeline_task.PipelineTask(                                                            │
         │   102 │   │   │   component_spec=self.component_spec,                                                       │
         │   103 │   │   │   args=task_inputs,                                                                         │
         │   104 │   │   │   execute_locally=pipeline_context.Pipeline.get_default_pipeline() is                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:187 in __init__                                                                                       │
         │                                                                                                             │
         │   184 │   │   ])                                                                                            │
         │   185 │   │                                                                                                 │
         │   186 │   │   if execute_locally:                                                                           │
         │ ❱ 187 │   │   │   self._execute_locally(args=args)                                                          │
         │   188 │                                                                                                     │
         │   189 │   def _execute_locally(self, args: Dict[str, Any]) -> None:                                         │
         │   190 │   │   """Execute the pipeline task locally.                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:197 in _execute_locally                                                                               │
         │                                                                                                             │
         │   194 │   │   from kfp.local import task_dispatcher                                                         │
         │   195 │   │                                                                                                 │
         │   196 │   │   if self.pipeline_spec is not None:                                                            │
         │ ❱ 197 │   │   │   self._outputs = pipeline_orchestrator.run_local_pipeline(                                 │
         │   198 │   │   │   │   pipeline_spec=self.pipeline_spec,                                                     │
         │   199 │   │   │   │   arguments=args,                                                                       │
         │   200 │   │   │   )                                                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:43 in run_local_pipeline                                                                    │
         │                                                                                                             │
         │    40 │                                                                                                     │
         │    41 │   # validate and access all global state in this function, not downstream                           │
         │    42 │   config.LocalExecutionConfig.validate()                                                            │
         │ ❱  43 │   return _run_local_pipeline_implementation(                                                        │
         │    44 │   │   pipeline_spec=pipeline_spec,                                                                  │
         │    45 │   │   arguments=arguments,                                                                          │
         │    46 │   │   raise_on_error=config.LocalExecutionConfig.instance.raise_on_error,                           │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:108 in _run_local_pipeline_implementation                                                   │
         │                                                                                                             │
         │   105 │   │   │   )                                                                                         │
         │   106 │   │   return outputs                                                                                │
         │   107 │   elif dag_status == status.Status.FAILURE:                                                         │
         │ ❱ 108 │   │   log_and_maybe_raise_for_failure(                                                              │
         │   109 │   │   │   pipeline_name=pipeline_name,                                                              │
         │   110 │   │   │   fail_stack=fail_stack,                                                                    │
         │   111 │   │   │   raise_on_error=raise_on_error,                                                            │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:137 in log_and_maybe_raise_for_failure                                                      │
         │                                                                                                             │
         │   134 │   │   logging_utils.format_task_name(task_name) for task_name in fail_stack)                        │
         │   135 │   msg = f'Pipeline {pipeline_name_with_color} finished with status                                  │
         │       {status_with_color}. Inner task failed: {task_chain_with_color}.'                                     │
         │   136 │   if raise_on_error:                                                                                │
         │ ❱ 137 │   │   raise RuntimeError(msg)                                                                       │
         │   138 │   with logging_utils.local_logger_context():                                                        │
         │   139 │   │   logging.error(msg)                                                                            │
         │   140                                                                                                       │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status
         [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down
         DistributionInspectImpl
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [26648]
```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-06-19 17:01:33 +02:00
Sébastien Han
c20388c424
ci: add python package build test (#2457)
# What does this PR do?

We now test a package build on every PRs.

Closes: https://github.com/meta-llama/llama-stack/issues/2406

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 18:57:32 +05:30
Sébastien Han
fa1d986f72
fix: remove asyncio.TimeoutError since Python update (#2476)
# What does this PR do?

Since we now support Pythong starting from 3.11, this is not needed
anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 18:52:41 +05:30
Sébastien Han
6039d922c0
fix: allow running vector tests with embedding dimension (#2467)
Some checks failed
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 24s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 30s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 23s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s
Test Llama Stack Build / build (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 1m7s
Test Llama Stack Build / build-single-provider (push) Failing after 1m15s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m17s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
Pre-commit / pre-commit (push) Failing after 2m14s
# What does this PR do?

Do not force 384 for the embedding dimension, use the one provided by
the test run.

## Test Plan

```
 pytest -s -vvv tests/integration/vector_io/test_vector_io.py --stack-config=http://localhost:8321 \
    -k "not(builtin_tool or safety_with_image or code_interpreter or test_rag)" \
    --text-model="meta-llama/Llama-3.2-3B-Instruct" \
    --embedding-model=granite-embedding-125m --embedding-dimension=768
Uninstalled 1 package in 16ms
Installed 1 package in 11ms
INFO     2025-06-18 10:52:03,314 tests.integration.conftest:59 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================= test session starts =================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.5-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'cov': '6.0.0', 'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: cov-6.0.0, html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 8 items

tests/integration/vector_io/test_vector_io.py::test_vector_db_retrieve[emb=granite-embedding-125m:dim=768] PASSED
tests/integration/vector_io/test_vector_io.py::test_vector_db_register[emb=granite-embedding-125m:dim=768] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case0] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case1] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case2] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case3] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case4] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks_with_precomputed_embeddings[emb=granite-embedding-125m:dim=768] PASSED

================================================== 8 passed in 5.50s ==================================================
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 13:29:04 +05:30
Charlie Doern
d12f195f56
feat: drop python 3.10 support (#2469)
# What does this PR do?

dropped python3.10, updated pyproject and dependencies, and also removed
some blocks of code with special handling for enum.StrEnum

Closes #2458

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-19 12:07:14 +05:30
ehhuang
db2cd9e8f3
feat: support filters in file search (#2472)
# What does this PR do?
Move to use vector_stores.search for file search tool in Responses,
which supports filters.

closes #2435 

## Test Plan
Added e2e test with fitlers.
myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search and filters' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.3-70B-Instruct
2025-06-18 21:50:55 -07:00
Ihar Hrachyshka
fd37a50e6a
chore: Remove @booxter from triagers (#2473)
Sadly, I won't have capacity to continue working for the project.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-06-18 19:30:09 -07:00
ehhuang
e6bfc717cb
feat(ui): add infinite scroll pagination to chat completions/responses logs table (#2466)
Some checks failed
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 20s
Unit Tests / unit-tests (3.11) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.10) (push) Failing after 48s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m23s
## Summary:

This commit adds infinite scroll pagination to the chat completions and
responses tables.


## Test Plan:
  1. Run unit tests: npm run test
  2. Manual testing: Navigate to chat
  completions/responses pages
  3. Verify infinite scroll triggers when approaching
  bottom
  4. Added playwright tests: npm run test:e2e
2025-06-18 15:28:39 -07:00
Sumit Jaiswal
90d03552d4
feat: To add health check for faiss inline vector_io provider (#2319)
Some checks failed
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 1m1s
Unit Tests / unit-tests (3.11) (push) Failing after 1m11s
Unit Tests / unit-tests (3.10) (push) Failing after 1m13s
Unit Tests / unit-tests (3.12) (push) Failing after 1m9s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
To add health check for faiss inline vector_io provider.

I tried adding `async def health(self) -> HealthResponse:` like in
inference provider, but it didn't worked for `inline->vector_io->faiss`
provider. And via debug logs, I understood the critical issue, that the
health responses are being stored with the API name as the key, not as a
nested dictionary with provider IDs. This means that all providers of
the same API type (e.g., "vector_io") will share the same health
response, and only the last one processed will be visible in the API
response.
I've created a patch file that fixes this issue by:
- Storing the original get_providers_health method
- Creating a patched version that correctly maps health responses to
providers
- Applying the patch to the `ProviderImpl` class

Not an expert, so please let me know, if there can be any other
workaround using which I can get the health status updated directly from
`faiss.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Added unit tests to test the provider patch implementation in the PR.
Adding a screenshot with the FAISS inline vector_io health status as
"OK"


![faiss_health_check](https://github.com/user-attachments/assets/d769e762-890c-41ea-a596-5e90951f79a4)
2025-06-18 17:56:25 +02:00
github-actions[bot]
7d812e3bf0 build: Bump version to 0.2.11
Some checks failed
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 55s
2025-06-17 19:08:17 +00:00
Hardik Shah
822307e6d5
fix: Do not throw when listing vector stores (#2460)
When trying to `list` vector_stores , if we cannot retrieve one, log an
error and return all the ones that are valid.

### Test Plan 
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py  --embedding-model all-MiniLM-L6-v2
```
Also tested for `--stack-config fireworks`
2025-06-17 11:19:43 -07:00
Dalton Flanagan
53ac8532e4
fix: clarify bash requirement in install flow (#2450)
Some checks failed
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.10) (push) Failing after 4s
Unit Tests / unit-tests (3.11) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 50s
# What does this PR do?
Our "run this line to get started" pipes into `sh`, but the default
shell on Ubuntu (a common setup) is `dash`, which doesn't support
`pipefail`:

```
dalton@ollama-test:~$ ls -l /usr/bin/sh
lrwxrwxrwx 1 root root 4 Mar 31  2024 /usr/bin/sh -> dash
```

```
$ curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh
sh: 8: set: Illegal option -o pipefail
``` 

Let's be explicit with `bash`? It covers Linux, WSL, macOS and I doubt
anyone's trying to run Llama Stack on embedded systems :)


## Test Plan

```
dalton@ollama-test:~/llama-stack$ cat install.sh | sh
This script must be run with bash
dalton@ollama-test:~/llama-stack$ cat install.sh | bash
 Docker or Podman is required. Install Docker: https://docs.docker.com/get-docker/ or Podman: https://podman.io/getting-started/installation
```
2025-06-17 13:03:28 +05:30
Ben Browning
94fcfb5674
fix: broken links on nvidia distro docs when rendered (#2446)
# What does this PR do?

The Nvidia distribution docs had some broken links when viewing the
rendered docs site, where the deep links they were attempting into our
code on GitHub weren't actually getting users to the intended
destination.

This updates those links to use the `{repopath}` helper we use elsewhere
to generate valid deep links into the Llama Stack repository.

## Test Plan

I generated the site locally after this change and ensured the links now
resolve to their intended destination.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-17 13:02:13 +05:30
ehhuang
15f630e5da
feat: support pagination in inference/responses stores (#2397)
Some checks failed
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 27s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 44s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 46s
Test External Providers / test-external-providers (venv) (push) Failing after 41s
Unit Tests / unit-tests (3.10) (push) Failing after 52s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.11) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do?


## Test Plan
added unit tests
2025-06-16 22:43:35 -07:00
Varsha
6f1a935365
chore: Add OpenAI compatiblity for vLLM embeddings (#2448)
Some checks failed
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 13s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.11) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 47s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 49s
Test Llama Stack Build / build-single-provider (push) Failing after 38s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m38s
# What does this PR do?
- Implement OpenAI-compatible embeddings endpoint in vLLM provider
- Support both float and base64 encoding formats
- Add proper error handling and response formatting

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2447 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-16 19:06:05 -04:00
Jash Gulabrai
40e2c97915
feat: Add Nvidia e2e beginner notebook and tool calling notebook (#1964)
# What does this PR do?
This PR contains two sets of notebooks that serve as reference material
for developers getting started with Llama Stack using the NVIDIA
Provider. Developers should be able to execute these notebooks
end-to-end, pointing to their NeMo Microservices deployment.
1. `beginner_e2e/`: Notebook that walks through a beginner end-to-end
workflow that covers creating datasets, running inference, customizing
and evaluating models, and running safety checks.
2. `tool_calling/`: Notebook that is ported over from the [Data Flywheel
& Tool Calling
notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/data-flywheel)
that is referenced in the NeMo Microservices docs. I updated the
notebook to use the Llama Stack client wherever possible, and added
relevant instructions.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- Both notebook folders contain READMEs with pre-requisites. To manually
test these notebooks, you'll need to have a deployment of the NeMo
Microservices Platform and update the `config.py` file with your
deployment's information.
- I've run through these notebooks manually end-to-end to verify each
step works.

[//]: # (## Documentation)

---------

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-06-16 11:29:01 -04:00
Rohan Awhad
436c7aa751
feat: Add url field to PaginatedResponse and populate it using route … (#2419)
Some checks failed
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Unit Tests / unit-tests (3.11) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Unit Tests / unit-tests (3.12) (push) Failing after 58s
Unit Tests / unit-tests (3.10) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 2m10s
…path


# What does this PR do?

Closes #1847 

Changes:
- llama_stack/apis/common/responses.py: adds optional `url` field to
PaginatedResponse
- llama_stack/distribution/server/server.py: automatically populate the
URL field with route path



## Test Plan

- Built and ran llama stack server using the following cmds:  
  ```bash  
  export INFERENCE_MODEL=llama3.1:8b  
  llama stack build --run --template ollama --image-type container
  llama stack run llama_stack/templates/ollama/run.yaml
  ```
- Ran `curl` to test if we are seeing the `url` param in response:
  ```bash
  curl -X 'GET' \
    'http://localhost:8321/v1/agents' \
    -H 'accept: application/json'
  ```
- Expected and Received Output:
`{"data":[],"has_more":false,"url":"/v1/agents"}`

---------

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-16 11:19:48 +02:00
Hardik Shah
985d0b156c
feat: Add suffix to openai_completions (#2449)
Some checks failed
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s
Unit Tests / unit-tests (3.10) (push) Failing after 19s
Unit Tests / unit-tests (3.11) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 58s
For code completion apps need "fill in the middle" capabilities. 
Added option of `suffix` to `openai_completion` to enable this. 
Updated ollama provider to showcase the same. 

### Test Plan 
```
pytest -sv --stack-config="inference=ollama"  tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix
```

### OpenAI Sample script
```
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1/openai/v1")

response = client.completions.create(
    model="qwen2.5-coder:1.5b",
    prompt="The capital of ",
    suffix="is Paris.",
    max_tokens=10,
)

print(response.choices[0].text)
``` 
### Output
```
France is ____.

To answer this question, we 
```
2025-06-13 16:06:06 -07:00
Varsha
2e8054bede
feat: Implement hybrid search in SQLite-vec (#2312)
Some checks failed
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s
Test Llama Stack Build / generate-matrix (push) Successful in 37s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.11) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.10) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do?
Add support for hybrid search mode in SQLite-vec provider, which
combines
keyword and vector search for better results. The implementation:

- Adds hybrid search mode as a new option alongside vector and keyword
search
- Implements query_hybrid method in SQLiteVecIndex that:
  - First performs keyword search to get candidate matches
  - Then applies vector similarity search on those candidates
- Updates documentation to reflect the new search mode

This change improves search quality by leveraging both semantic
similarity
and keyword matching, while maintaining backward compatibility with
existing
vector and keyword search modes.

## Test Plan
```
pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 10 items                                                                                                                                                                                                

tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED
```

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-13 15:54:06 -04:00
Ben Browning
941f505eb0
feat: File search tool for Responses API (#2426)
# What does this PR do?

This is an initial working prototype of wiring up the `file_search`
builtin tool for the Responses API to our existing rag knowledge search
tool.

This is me seeing what I could pull together on top of the bits we
already have merged. This may not be the ideal way to implement this,
and things like how I shuffle the vector store ids from the original
response API tool request to the actual tool execution feel a bit hacky
(grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see
what I mean).

## Test Plan

I stubbed in some new tests to exercise this using text and pdf
documents.

Note that this is currently under tests/verification only because it
sometimes flakes with tool calling of the small Llama-3.2-3B model we
run in CI (and that I use as an example below). We'd want to make the
test a bit more robust in some way if we moved this over to
tests/integration and ran it in CI.

### OpenAI SaaS (to verify test correctness)

```
pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=https://api.openai.com/v1 \
  --model=gpt-4o
```

### Fireworks with faiss vector store

```
llama stack run llama_stack/templates/fireworks/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.3-70B-Instruct
```

### Ollama with faiss vector store

This sometimes flakes on Ollama because the quantized small model
doesn't always choose to call the tool to answer the user's question.
But, it often works.

```
ollama run llama3.2:3b

INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run ./llama_stack/templates/ollama/run.yaml \
  --image-type venv \
  --env OLLAMA_URL="http://0.0.0.0:11434"

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.2-3B-Instruct
```

### OpenAI provider with sqlite-vec vector store

```
llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv

 pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=openai/gpt-4o-mini
```

### Ensure existing vector store integration tests still pass

```
ollama run llama3.2:3b

INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run ./llama_stack/templates/ollama/run.yaml \
  --image-type venv \
  --env OLLAMA_URL="http://0.0.0.0:11434"

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io \
  --text-model "meta-llama/Llama-3.2-3B-Instruct" \
  --embedding-model=all-MiniLM-L6-v2
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-13 14:32:48 -04:00
Francisco Arceo
554ada57b0
chore: Add OpenAI compatibility for Ollama embeddings (#2440)
# What does this PR do?
This PR adds OpenAI compatibility for Ollama embeddings. Closes
https://github.com/meta-llama/llama-stack/issues/2428

Summary of changes:
- `llama_stack/providers/remote/inference/ollama/ollama.py`
- Implements the OpenAI embeddings endpoint for Ollama, replacing the
NotImplementedError with a full function that validates the model,
prepares parameters, calls the client, encodes embedding data
(optionally in base64), and returns a correctly structured response.
- Updates import statements to include the new embedding response
utilities.

- `llama_stack/providers/utils/inference/litellm_openai_mixin.py`
- Refactors the embedding data encoding logic to use a new shared
utility (`b64_encode_openai_embeddings_response`) instead of inline
base64 encoding and packing logic.
   - Cleans up imports accordingly.

- `llama_stack/providers/utils/inference/openai_compat.py`
- Adds `b64_encode_openai_embeddings_response` to handle encoding OpenAI
embedding outputs (including base64 support) in a reusable way.
- Adds `prepare_openai_embeddings_params` utility for standardizing
embedding parameter preparation.
   - Updates imports to include the new embedding data class.

- `tests/integration/inference/test_openai_embeddings.py`
- Removes `"remote::ollama"` from the list of providers that skip OpenAI
embeddings tests, since support is now implemented.

## Note

There was one minor issue, which required me to override the
`OpenAIEmbeddingsResponse.model` name with
`self._get_model(model).identifier` name, which is very unsatisfying.

## Test Plan
Unit Tests and integration tests

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-13 14:28:51 -04:00
grs
e2e15ebb6c
feat(auth): allow token to be provided for use against jwks endpoint (#2394)
Some checks failed
Update ReadTheDocs / update-readthedocs (push) Failing after 1m11s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 1m17s
Unit Tests / unit-tests (3.10) (push) Failing after 1m19s
Pre-commit / pre-commit (push) Successful in 2m26s
Though the jwks endpoint does not usually require authentication, it
does in a kubernetes cluster. While the cluster can be configured to
allow anonymous access to that endpoint, this avoids the need to do so.
2025-06-13 10:13:41 +02:00
Hardik Shah
ddaee42650
test: Update integration-tests.yml (#2443)
Added `vector_io` to the CI integration tests.
2025-06-13 10:04:08 +02:00
Hardik Shah
fef670b024
feat: update openai tests to work with both clients (#2442)
Some checks failed
Pre-commit / pre-commit (push) Successful in 3m11s
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 18s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 1m45s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m46s
Unit Tests / unit-tests (3.12) (push) Failing after 2m1s
Unit Tests / unit-tests (3.10) (push) Failing after 2m3s
https://github.com/meta-llama/llama-stack-client-python/pull/238 updated
llama-stack-client to also support Open AI endpoints for embeddings,
files, vector-stores. This updates the test to test all configs --
openai sdk, llama stack sdk and library-as-client.
2025-06-12 16:30:23 -07:00
Hardik Shah
0bc1747ed8
feat: update search for vector_stores (#2441)
Updated the `search` functionality return response to match openai. 

## Test Plan
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-12 15:34:22 -07:00
Ibrahim Haroon
35c2817d0a
fix(weaviate): handle case where distance is 0 by setting score to infinity (#2415)
Some checks failed
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 41s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 39s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 41s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 42s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 38s
Integration Tests / test-matrix (http, 3.10, providers) (push) Failing after 46s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 44s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 42s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 43s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 40s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 39s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 1m3s
Unit Tests / unit-tests (3.11) (push) Failing after 1m12s
Unit Tests / unit-tests (3.13) (push) Failing after 1m10s
Pre-commit / pre-commit (push) Successful in 2m23s
# What does this PR do?
Fixes provider weaviate `query_vector` function for when the distance
between the query embedding and an embedding within the vector db is 0
(identical vectors). Catches `ZeroDivisionError` and then sets `score`
to infinity, which represent maximum similarity.

<!-- If resolving an issue, uncomment and update the line below -->
Closes [#2381]

## Test Plan
Checkout this PR

Execute this code and there will no longer be a `ZeroDivisionError`
exception
```
from llama_stack_client import LlamaStackClient

base_url = "http://localhost:8321"
client = LlamaStackClient(base_url=base_url)

models = client.models.list()
embedding_model = (
    em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = 384

_ = client.vector_dbs.register(
    vector_db_id="foo_db",
    embedding_model=embedding_model,
    embedding_dimension=embedding_dimension,
    provider_id="weaviate",
)

chunk = {
    "content": "foo",
    "mime_type": "text/plain",
    "metadata": {
        "document_id": "foo-id"
    }
}

client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk])
client.vector_io.query(vector_db_id="foo_db", query="foo")
```
2025-06-12 11:23:59 -04:00
Sébastien Han
eb04731750
ci: fix external provider test (#2438)
# What does this PR do?

The test wasn't using the correct virtual environment. Also augment the
console width for logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-12 16:14:32 +02:00
Hardik Shah
de37a04c3e
fix: set appropriate defaults for params (#2434)
Some checks failed
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Update ReadTheDocs / update-readthedocs (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.11) (push) Failing after 1m39s
Unit Tests / unit-tests (3.13) (push) Failing after 1m37s
Unit Tests / unit-tests (3.10) (push) Failing after 1m41s
Pre-commit / pre-commit (push) Failing after 3h4m8s
Setting defaults to be `| None` else they get marked as required params
in open-api spec.
2025-06-11 17:30:34 -07:00
Hardik Shah
d55100d9b7
feat: OpenAIVectorIOMixin for vector_stores common logic (#2427)
Extracts common OpenAI vector-store code into its own mixin so that all
providers can share the same core logic.
This also makes it easy for Llama Stack to support both vector-stores
and Llama Stack APIs in the interim so that both share the same
underlying vector-dbs.

Each provider contains storage specific logic to `create / edit / delete
/ list` vector dbs while the plumbing logic is standardized in the
common code.

Ensured that this works well with both faiss and sqllite-vec. 

### Test Plan 
```
llama stack run starter
pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-11 15:40:57 -07:00
Rohan Awhad
4e37b49cdc
fix: #1867 InferenceRouter has no attribute formatter (#2422)
Some checks failed
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 49s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 53s
Integration Tests / test-matrix (http, 3.10, datasets) (push) Failing after 57s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.10, scoring) (push) Failing after 55s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 50s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 51s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 2m9s
Unit Tests / unit-tests (3.11) (push) Failing after 2m7s
Pre-commit / pre-commit (push) Failing after 3h13m50s
# What does this PR do?

Closes #1867 

[Steps to reproduce the
bug](https://github.com/meta-llama/llama-stack/issues/1867#issuecomment-2956819381)

The change was designed to minimize code changes. Open to option of
skipping `metrics` field entirely when `telemetry` is disabled.


## Test Plan
1. Build llama-stack remote-vllm container
    ```bash
    llama stack build --template remote-vllm --image-type container
    ```
2. Create a small run.yaml
    ```yaml
    version: '2'
    image_name: remote-vllm
    apis:
    - inference
    providers:
      inference:
      - provider_id: vllm-inference
        provider_type: remote::vllm
        config:
          url: ${env.VLLM_URL:http://localhost:8000/v1}
          max_tokens: ${env.VLLM_MAX_TOKENS:4096}
          api_token: ${env.VLLM_API_TOKEN:fake}
          tls_verify: ${env.VLLM_TLS_VERIFY:true}
    metadata_store:
      type: sqlite
db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db
    inference_store:
      type: sqlite
db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/inference_store.db
    models:
    - metadata: {}
      model_id: ${env.INFERENCE_MODEL}
      provider_id: vllm-inference
      model_type: llm
    shields: []
    vector_dbs: []
    datasets: []
    scoring_fns: []
    benchmarks: []
    server:
      port: 8321
    ```
3. Run the llama-stack server
    ```bash
    export VLLM_URL="http://localhost:8000/v1"
    export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
    llama stack run run.yaml
    ```
4. Then perform a curl
    ```bash
    curl -X 'POST' \
      'http://localhost:8321/v1/inference/completion' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "model_id": "meta-llama/Llama-3.2-3B-Instruct",
      "content": "string",
      "sampling_params": {
        "strategy": {
          "type": "greedy"
        },
        "max_tokens": 10,
        "repetition_penalty": 1,
        "stop": [
          "string"
        ]
      },
      "stream": false,
      "logprobs": {
        "top_k": 0
      }
    }'
    ```
5. You should receive a 200 response with metric values set to 0,
similar to one below:
    ```
    {
      "metrics": [
        {
          "metric": "prompt_tokens",
          "value": 0,
          "unit": null
        },
        {
          "metric": "completion_tokens",
          "value": 0,
          "unit": null
        },
        {
          "metric": "total_tokens",
          "value": 0,
          "unit": null
        }
      ],
      [...]
    }
    ```

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-11 18:14:41 +02:00
Hardik Shah
5ac43268e8
feat: Add OpenAI compat /v1/vector_store APIs (#2423)
Some checks failed
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 41s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 46s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 16s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 1m31s
Unit Tests / unit-tests (3.11) (push) Failing after 1m33s
Unit Tests / unit-tests (3.10) (push) Failing after 1m35s
Pre-commit / pre-commit (push) Failing after 3h13m41s
Adding OpenAI compat `/v1/vector-store` apis. 
This PR implements the `faiss` provider with followup PRs coming up for
other providers.

Added routes to create, update, delete, list vector stores. 
Also added route to search a vector store

Inserting into vector stores is missing and will be a follow up diff. 

### Test Plan 
- Added new integration test for testing the faiss provider 
```
pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-10 13:07:39 -07:00