Commit graph

1316 commits

Author SHA1 Message Date
Mustafa Elbehery
a5c3362bcd
chore(api): add mypy coverage to meta_reference_config (#2664)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:24:30 +02:00
Mustafa Elbehery
28343fea51
chore(api): add mypy coverage to meta_reference_safety (#2661)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:22:34 +02:00
pgustafs
d39660afed
fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630)
- Fix constructor call missing files_api parameter
- Add kvstore field to MilvusVectorIOConfig
- Resolves #2626

# What does this PR do?
[https://github.com/meta-llama/llama-stack/issues/2626]
## Problem
The `MilvusVectorIOAdapter` fails to initialize due to two missing
configuration issues:
1. Missing `files_api` parameter in the constructor call
2. Missing `kvstore` field in the `MilvusVectorIOConfig` class

## Root Cause  
1. The adapter constructor expects 3 parameters `(config, inference_api,
files_api)` but the `get_adapter_impl` function only passes 2 parameters
2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the
adapter's `initialize()` method expects for metadata persistence

## Solution
- Added `files_api = deps.get(Api.files, None)` to safely retrieve files
API from dependencies
- Pass the files_api parameter to MilvusVectorIOAdapter constructor
- Added `kvstore: KVStoreConfig | None = None` field to
MilvusVectorIOConfig
- Maintains backward compatibility since both files_api and kvstore can
be None

Closes #2626

## Test Plan
- [x] Tested with Milvus configuration - server starts successfully 
```yaml
vector_io:
  - provider_id: milvus
    provider_type: remote::milvus
    config:
      uri: http://localhost:19530
      token: root:Milvus
      kvstore:
        type: sqlite
        namespace: null
        db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db
```
- [x] Vector operations work as expected
```python
from llama_stack_client import LlamaStackClient
from llama_stack_client.types.shared_params.document import Document as RAGDocument
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger
import os


endpoint =  os.getenv("LLAMA_STACK_ENDPOINT")
model =  os.getenv("INFERENCE_MODEL")

# Initialize the client
client = LlamaStackClient(base_url=endpoint)

vector_db_id = "my_documents"

response = client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id="milvus",
)

urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"]
documents = [
    RAGDocument(
        document_id=f"num-{i}",
        content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}",
        mime_type="application/pdf",
        metadata={},
    )
    for i, url in enumerate(urls)
]

client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

rag_agent = Agent(
    client,
    model=model,
    # Define instructions for the agent (system prompt)
    instructions="You are a helpful assistant",
    enable_session_persistence=False,
    # Define tools available to the agent
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
            },
        }
    ],
)

session_id = rag_agent.create_session("test-session")

user_prompts = [
    "How to start the AI Inference Server container image? use the knowledge_search tool to get information.",
]

for prompt in user_prompts:
    print(f"User> {prompt}")
    response = rag_agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
    )
    for log in AgentEventLogger().log(response):
        log.print()
```    

server logs:
```
INFO     2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000                                                             
INFO:     Started server process [769725]
INFO:     Waiting for application startup.
INFO     2025-07-04 22:18:30,390 __main__:158 server: Starting up                                                                                     
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
INFO     2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to               
20:18:52.194 [START] /v1/vector-dbs
INFO:     192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK
20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms)
20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert
INFO     2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for            
         all-MiniLM-L6-v2...                                                                                                                          
WARNING  2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed                           
INFO     2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0                         
INFO     2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2   
INFO:     192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK
20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms)
20:19:01.839 [START] /v1/agents
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK
20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms)
20:19:01.844 [START] /v1/tools
INFO     2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with                    
         base_url=http://192.168.1.183:8080/v1                                                                                                        
20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms)
20:19:01.868 [START] /v1/agents/{agent_id}/session
20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms)
20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn
20:19:01.885 [START] inference
20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms)
INFO     2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search  
         with args: {'query': 'How to start the AI Inference Server container image'}                                                                 
20:19:05.538 [START] tool_execution
20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms)
 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'}
20:19:05.935 [START] inference
20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms)
20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms)
```
- [x] No regressions in functionality
- [x] Configuration properly accepts kvstore settings

---------

Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-07-09 10:08:14 +02:00
Mustafa Elbehery
2d3d9664a7
chore(api): add mypy coverage to prompts (#2657)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:07:00 +02:00
ehhuang
daf660c4ea
feat(auth,ui): support github sign-in in the UI (#2545)
# What does this PR do?
Uses NextAuth to add github sign in support.

## Test Plan
Start server with auth configured as in
https://github.com/meta-llama/llama-stack/pull/2509


https://github.com/user-attachments/assets/61ff7442-f601-4b39-8686-5d0afb3b45ac
2025-07-08 11:02:57 -07:00
ehhuang
c8bac888af
feat(auth): support github tokens (#2509)
# What does this PR do?

This PR adds GitHub OAuth authentication support to Llama Stack,
allowing users to
  authenticate using their GitHub credentials (#2508) . 

1. support verifying github acesss tokens
2. support provider-specific auth error messages
3. opportunistic reorganized the auth configs for better ergonomics

## Test Plan
Added unit tests.

Also tested e2e manually:
```
server:
  port: 8321
  auth:
    provider_config:
      type: github_token
```
```
~/projects/llama-stack/llama_stack/ui
❯ curl -v http://localhost:8321/v1/models
* Host localhost:8321 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8321...
* Connected to localhost (::1) port 8321
> GET /v1/models HTTP/1.1
> Host: localhost:8321
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 401 Unauthorized
< date: Fri, 27 Jun 2025 21:51:25 GMT
< server: uvicorn
< content-type: application/json
< x-trace-id: 5390c6c0654086c55d87c86d7cbf2f6a
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"error": {"message": "Authentication required. Please provide a valid GitHub access token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in the Authorization header (Bearer <token>)"}}
~/projects/llama-stack/llama_stack/ui
❯ ./scripts/unit-tests.sh


~/projects/llama-stack/llama_stack/ui
❯ curl "http://localhost:8321/v1/models" \
-H "Authorization: Bearer <token_obtained_from_github>" \

{"data":[{"identifier":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_resource_id":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-guard-3-8b","provider_resource_id":"accounts/fireworks/models/llama-guard-3-8b","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-v3p1-405b-instruct","provider_resource_id":"accounts/f
```

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-08 11:02:36 -07:00
Francisco Arceo
83c89265e0
chore: Adding unit tests for Milvus and OpenAI compatibility (#2640)
Some checks failed
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
- Enabling Unit tests for Milvus to start to test OpenAI compatibility
and fixing a few bugs.
- Also fixed an inconsistency in the Milvus config between remote and
inline.
- Added pymilvus to extras for testing in CI

I'm going to refactor this later to include the other inline providers
so that we can catch issues sooner.

I have another PR where I've been testing to find other bugs in the
implementation (and required changes drafted here:
https://github.com/meta-llama/llama-stack/pull/2617).

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-08 00:50:16 -07:00
Charlie Doern
27b3cd570f
fix: use --template flag for server (#2643)
# What does this PR do?

currently when a template is used, we still use `--config`.

`server.py` has a dedicated `--template` flag and logic, use that
instead

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-08 00:48:50 -07:00
ehhuang
e9926564bd
fix: authorized sql store with postgres (#2641)
Some checks failed
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 44s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m34s
# What does this PR do?
postgres has different json extract syntax from sqlite

## Test Plan
added integration test
2025-07-07 19:36:34 -07:00
Ben Browning
5bb3817c49
fix: Restore the nvidia distro (#2639)
# What does this PR do?

The `nvidia` distro was previously collapsed into the `starter` distro.
However, the `nvidia` distro was setup specifically to use NVIDIA NeMo
microservices as providers for all APIs and not just inference, which
means it was doing quite a bit more than what the `starter` distro
covers today.

We should work with our friends at NVIDIA to determine the best place to
maintain this distro long-term, but for now this restores the `nvidia`
distro and its docs back to where they were so that things continue to
work for their users.

## Test Plan

I ensure the `nvidia` distro could build, and run at least to the point
of complaining that I didn't provide the necessary API keys.

```
uv run llama stack build --template nvidia --image-type venv
uv run llama stack run llama_stack/templates/nvidia/run.yaml
```

I also made sure the docs website built and looks reasonable, with the
`nvidia` distro docs at the same URL it was previously (because it has
incoming links from official NVIDIA NeMo docs, among other places).

```
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-07 15:50:05 -07:00
Charlie Doern
d0ec5c3d3a
fix: print proper template path upon build (#2642)
# What does this PR do?

Rather than pointing to a dir in `llama_stack/templates` (the repo
directory)

we should point to `$BUILD_DIR/IMAGE_NAME-run.yaml`
(`~/.llama/distributions/IMAGE_NAME/IMAGE_NAME-run.yaml`)

currently we are printing:

```
You can find the newly-built template here: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml
You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv
```

but should be printing things like:

```
You can find the newly-built template here: /Users/charliedoern/.llama/distributions/starter/starter-run.yaml
You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/.llama/distributions/starter/starter-run.yaml --image-type venv
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-07 15:39:39 -07:00
Wen Zhou
4bca4af3e4
refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627)
Some checks failed
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 41s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m30s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- we are using `all-minilm:l6-v2` but the model we download from ollama
is `all-minilm:latest`
  latest: https://ollama.com/library/all-minilm:latest 1b226e2802db
  l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db
- even currently they are exactly the same model but if
[all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is
updated, "latest" might not be the same for l6-v2.
- the only change in this PR is pin the model id in ollama
- also update detailed_tutorial with "starter" to replace deprecated
"ollama".

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
>INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
>llama stack build --run --template ollama --image-type venv
...
Build Successful!
You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml
....
 - metadata:                                                                                                                                  
     embedding_dimension: 384                                                                                                                 
   model_id: all-MiniLM-L6-v2                                                                                                                 
   model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType                                                                 
   - embedding                                                                                                                                
   provider_id: ollama                                                                                                                        
   provider_model_id: all-minilm:l6-v2  
   ...
```
test
```
>llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
           INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"
OpenAIChatCompletion(
    id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7',
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.",
                name=None,
                tool_calls=None,
                refusal=None,
                annotations=None,
                audio=None,
                function_call=None
            ),
            logprobs=None
        )
    ],
    created=1751644429,
    model='llama3.2:3b-instruct-fp16',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_ollama',
    usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None}
)
```

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-07-06 09:07:37 +05:30
dependabot[bot]
2faec38724
chore(deps): bump next from 15.3.2 to 15.3.3 in /llama_stack/ui (#2632)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 23s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 25s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 33s
Python Package Build Test / build (3.13) (push) Failing after 31s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m23s
Bumps [next](https://github.com/vercel/next.js) from 15.3.2 to 15.3.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.3.3</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Reinstate <code>vary</code> (<a
href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li>
<li>fix(next-swc): Fix interestingness detection for React Compiler (<a
href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li>
<li>fix(next-swc): Fix react compiler usefulness detector (<a
href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li>
<li>fix(dev-overlay): Better handle edge-case file paths in launchEditor
(<a
href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li>
<li>Client router should discard stale prefetch entries for static pages
(<a
href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/gaojude"><code>@​gaojude</code></a>, <a
href="https://github.com/kdy1"><code>@​kdy1</code></a>, <a
href="https://github.com/bgw"><code>@​bgw</code></a>, and <a
href="https://github.com/unstubbable"><code>@​unstubbable</code></a> for
helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3ab8db7383"><code>3ab8db7</code></a>
v15.3.3</li>
<li><a
href="18c8113ebd"><code>18c8113</code></a>
[backport] Reinstate <code>vary</code> (<a
href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li>
<li><a
href="e18212f546"><code>e18212f</code></a>
re-enable vary header deploy test (<a
href="https://redirect.github.com/vercel/next.js/issues/79753">#79753</a>)</li>
<li><a
href="ec202eccf0"><code>ec202ec</code></a>
Revert &quot;[next-server] skip setting vary header for basic
routes&quot; (<a
href="https://redirect.github.com/vercel/next.js/issues/79426">#79426</a>)</li>
<li><a
href="e2f264fdce"><code>e2f264f</code></a>
fix(next-swc): Fix interestingness detection for React Compiler (15.3)
(<a
href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li>
<li><a
href="562fac78da"><code>562fac7</code></a>
fix(next-swc): Fix react compiler usefulness detector (15.3) (<a
href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li>
<li><a
href="06097fd7bb"><code>06097fd</code></a>
fix(dev-overlay): Better handle edge-case file paths in launchEditor (<a
href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li>
<li><a
href="bda731fa96"><code>bda731f</code></a>
Client router should discard stale prefetch entries for static pages (<a
href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li>
<li>See full diff in <a
href="https://github.com/vercel/next.js/compare/v15.3.2...v15.3.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.2&new-version=15.3.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/meta-llama/llama-stack/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-05 00:13:33 -04:00
Sébastien Han
ea966565f6
feat: improve telemetry (#2590)
Some checks failed
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 0s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s
Python Package Build Test / build (3.12) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m40s
# What does this PR do?

* Use a single env variable to setup OTEL endpoint
* Update telemetry provider doc
* Update general telemetry doc with the metric with generate
* Left a script to setup telemetry for testing

Closes: https://github.com/meta-llama/llama-stack/issues/783

Note to reviewer: the `setup_telemetry.sh` script was useful for me, it
was nicely generated by AI, if we don't want it in the repo, and I can
delete it, and I would understand.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 17:29:09 +02:00
Derek Higgins
4eae0cbfa4
fix(starter): Add missing faiss provider to build.yaml vector_io section (#2625)
The starter template build.yaml was missing the inline::faiss provider
in the vector_io section, while it was properly configured in run.yaml
and starter.py's vector_io_providers list.

Fixes: #2624

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 17:28:57 +02:00
Sébastien Han
df6ce8befa
fix: only load mcp when enabled in tool_group (#2621)
# What does this PR do?

The agent code is currently importing MCP modules even when MCP isn’t
enabled. Do we consider this worth fixing, or are we treating MCP as a
first-class dependency? I believe we should treat it as such.

If everyone agrees, let’s go ahead and close this.

Note: The current setup breaks if someone builds a distro without
including MCP in tool_group but still serves the agent API.

Also, we should bump the MCP version to support streamable responses, as
SSE is being deprecated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 20:27:05 +05:30
Sébastien Han
c4349f532b
feat: consolidate most distros into "starter" (#2516)
# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 15:58:03 +02:00
Derek Higgins
f77d4d91f5
fix: handle encoding errors when adding files to vector store (#2574)
Some checks failed
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m35s
- Add try-catch block around data.decode() to handle UnicodeDecodeError
- Implement UTF-8 fallback when detected encoding fails
- Return empty string when both encodings fail
- add unit tests

Fixes #2572: UnicodeDecodeError when uploading files with problematic
encodings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 12:10:18 +02:00
Matthew Farrellee
ef26259209
feat: add llama guard 4 model (#2579)
add support for Llama Guard 4 model to the llama_guard safety provider

test with -

0. NVIDIA_API_KEY=... llama stack build --image-type conda --image-name
env-nvidia --providers
inference=remote::nvidia,safety=inline::llama-guard --run
1. llama-stack-client models register meta-llama/Llama-Guard-4-12B
--provider-model-id meta/llama-guard-4-12b
2. pytest tests/integration/safety/test_llama_guard.py

Co-authored-by: raghotham <rsm@meta.com>
2025-07-03 22:29:04 -07:00
Derek Higgins
0422b4fc63
fix: CI flakiness in vector IO tests by pinning pymilvus>=2.4.10 (#2610)
Some checks failed
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 1m15s
Python Package Build Test / build (3.12) (push) Failing after 1m12s
Python Package Build Test / build (3.13) (push) Failing after 1m10s
Test External Providers / test-external-providers (venv) (push) Failing after 1m27s
Unit Tests / unit-tests (3.12) (push) Failing after 35s
Unit Tests / unit-tests (3.13) (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 2m47s
This occurred when marshmallow 4.0.0 was installed (which removed
__version_info__)

By pinning pymilvus to >=2.4.10, we ensure marshmallow doesn't get
installed.

Also set the dependency in InlineProviderSpec as this is the one that
takes effect
when using the "inline::milvus" provider.

Fixes https://github.com/meta-llama/llama-stack/issues/2588

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 10:27:23 +05:30
Francisco Arceo
ea80ea63ac
chore: Updating chunk id generation to ensure uniqueness (#2618)
# What does this PR do?
This handles an edge case for `generate_chunk_id` if the concatenation
of the `document_id` and `chunk_text` combination are not unique. Adding
the window location ensures uniqueness.

## Test Plan
Added unit test

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-04 10:26:35 +05:30
Francisco Arceo
4afd619c56
chore: Add support for vector-stores files api for Milvus (#2582)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 18s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 51s
Test Llama Stack Build / build-single-provider (push) Failing after 55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 54s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?
### Summary

This pull request implements support for the OpenAI Vector Store Files
API for the Milvus vector store provider in `llama_stack`. It enables
storing, loading, updating, and deleting file metadata and file contents
in Milvus collections, allowing OpenAI vector store files to be managed
directly within Milvus.

### Main Changes

- **Milvus Vector Store Files API Implementation**
- Implements all required methods for storing, loading, updating, and
deleting vector store file metadata and contents
(`_save_openai_vector_store_file`, `_load_openai_vector_store_file`,
`_load_openai_vector_store_file_contents`,
`_update_openai_vector_store_file`,
`_delete_openai_vector_store_file_from_storage`).
- Uses two Milvus collections: `openai_vector_store_files` (for
metadata) and `openai_vector_store_files_contents` (for chunked file
contents).
- Collections are created dynamically if they do not exist, with
appropriate schema definitions.
- **Collection Name Sanitization**
- Adds a `sanitize_collection_name` utility to ensure Milvus collection
names only contain valid characters (letters, numbers, underscores).
- **Testing**
- Updates test skip logic to include `"inline::milvus"` for cases where
the OpenAI Vector Store Files API is not supported, improving
integration test accuracy.
- **Other Improvements**
  - Passes `kvstore` to `MilvusIndex` for consistency.
- Removes obsolete NotImplementedErrors and legacy code for file
storage.

## Test Plan
CI and tested via a test script

## Notes
- `VectorDB` currently uses the `name` as the `identifier` in
`openai_create_vector_store`. We need to add `name` as a field to
`VectorDB` and generate the `identifier` upon creation. OpenAI is not
idempotent with respect to the `name` field that they pass (i.e., you
can pass the same name multiple times and OpenAI will generate a new
identifier). I'll add a follow up PR for this.
- The `Files` api needs to use `files-` as a prefix in the identifier. I
have updated the Vector Store to use the OpenAI prefix `vs_*`.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-03 12:15:33 -07:00
Akram Ben Aissi
f4950f4ef0
fix: AccessDeniedError leads to HTTP 500 instead of error 403 (#2595)
Resolves access control error visibility issues where 500 errors were
returned instead of proper 403 responses with actionable error messages.

• Enhance AccessDeniedError with detailed context and improve exception
handling
• Enhanced AccessDeniedError class to include user, action, and resource
context
  - Added constructor parameters for action, resource, and user
- Generate detailed error messages showing user principal, attributes,
and attempted resource
- Backward compatible with existing usage (falls back to generic
message)

• Updated exception handling in server.py
  - Import AccessDeniedError from access_control module
  - Return proper 403 status codes with detailed error messages
- Separate handling for PermissionError (generic) vs AccessDeniedError
(detailed)

• Enhanced error context at raise sites
- Updated routing_tables/common.py to pass action, resource, and user
context
- Updated agents persistence to include context in access denied errors
  - Provides better debugging information for access control issues

• Added comprehensive unit tests
  - Created tests/unit/server/test_server.py with 13 test cases
  - Covers AccessDeniedError with and without context
- Tests all exception types (ValidationError, BadRequestError,
AuthenticationRequiredError, etc.)
  - Validates proper HTTP status codes and error message formats


# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

```
server:
  port: 8321
    access_policy:
    - permit:
        principal: admin
        actions: [create, read, delete]
        when: user with admin in groups
    - permit:
        actions: [read]
        when: user with system:authenticated in roles
```
then:

```
curl --request POST --url http://localhost:8321/v1/vector-dbs \
  --header "Authorization: Bearer your-bearer" \
  --data '{
    "vector_db_id": "my_demo_vector_db",
    "embedding_model": "ibm-granite/granite-embedding-125m-english",
    "embedding_dimension": 768,
    "provider_id": "milvus"
  }'
 
```

depending if user is in group admin or not, you should get the
`AccessDeniedError`. Before this PR, this was leading to an error 500
and `Traceback` displayed in the logs.
After the PR, logs display a simpler error (unless DEBUG logging is set)
and a 403 Forbidden error is returned on the HTTP side.

---------

Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>>
2025-07-03 10:50:49 -07:00
ehhuang
3c43a2f529
fix: store configs (#2593)
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo,
as the config expected a str but the value was converted to int.

This PR:
1. Updates the type of port in sqlstore to be int
2. template generation uses `dict` instead of `StackRunConfig` so as to
avoid failing pydantic typechecks.
3. Adds `replace_env_vars` to StackRunConfig instantiation in
`configure.py` (not sure why this wasn't needed before).

## Test Plan
`llama stack build --template postgres_demo --image-type conda --run`
2025-07-03 10:07:23 -07:00
Sébastien Han
aa273944fd
fix: add mcp dependency to agent provider (#2587)
# What does this PR do?

The agent depends on utils.tools.mcp.

Closes: https://github.com/meta-llama/llama-stack/issues/2576

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-03 14:59:01 +02:00
Nate Harada
5b07755556
docs: Minor spelling fix (#2592)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 54s
Python Package Build Test / build (3.13) (push) Failing after 42s
Test External Providers / test-external-providers (venv) (push) Failing after 27s
Unit Tests / unit-tests (3.13) (push) Failing after 36s
Unit Tests / unit-tests (3.12) (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 2m3s
# What does this PR do?
Minor spelling fix in the comments

## Test Plan
No code changes
2025-07-02 20:26:51 -04:00
Jorge
4d0d2d685f
fix: Set parameter usedforsecurity=False when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters (#2577)
Some checks failed
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 26s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 31s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 34s
Python Package Build Test / build (3.13) (push) Failing after 33s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?
Set parameter `usedforsecurity=False` when calling hashlib.md5 in order
to fix rag_tool.insert on FIPS clusters

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2571

---------

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
2025-07-02 12:07:05 +02:00
Sébastien Han
25268854bc
fix: allow default empty vars for conditionals (#2570)
# What does this PR do?

We were not using conditionals correctly, conditionals can only be used
when the env variable is set, so `${env.ENVIRONMENT:+}` would return
None is ENVIRONMENT is not set.

If you want to create a conditional value, you need to do
`${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set,
otherwise will return None.

Closes: https://github.com/meta-llama/llama-stack/issues/2564

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-01 14:42:05 +02:00
Francisco Arceo
0066135944
chore: Enabling VectorIO Integration tests for Milvus (#2546)
Some checks failed
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 35s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 41s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 2m3s
2025-06-30 19:49:59 -07:00
Francisco Arceo
5785ccda35
fix: Fixing Milvus sample config and updating documentation (#2568) 2025-06-30 19:25:23 -07:00
Matthew Farrellee
13aa367c8a
fix: default api_key from env must be a SecretStr (#2565)
# What does this PR do?

fixes the api_key type when read from env

## Test Plan

run nvidia template w/o api_key in run.yaml and perform inference

before change the inference will fail w/ -

```
  File ".../llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 118, in _get_client_for_base_url
    api_key=(self._config.api_key.get_secret_value() if self._config.api_key else "NO KEY"),
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get_secret_value'
```
2025-06-30 18:08:44 -07:00
Ashwin Bharambe
b333a3c03a
fix(ollama): Download remote image URLs for Ollama (#2551)
Some checks failed
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s
Python Package Build Test / build (3.12) (push) Failing after 43s
Test External Providers / test-external-providers (venv) (push) Failing after 40s
Python Package Build Test / build (3.13) (push) Failing after 42s
Unit Tests / unit-tests (3.13) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Update ReadTheDocs / update-readthedocs (push) Failing after 20s
Pre-commit / pre-commit (push) Successful in 2m13s
## What does this PR do?

Ollama does not support remote images. Only local file paths OR base64
inputs are supported. This PR ensures that the Stack downloads remote
images and passes the base64 down to the inference engine.

## Test Plan

Added a test cases for Responses and ran it for both `fireworks` and
`ollama` providers.
2025-06-30 20:36:11 +05:30
Sébastien Han
c9a49a80e8
docs: auto generated documentation for providers (#2543)
# What does this PR do?

Simple approach to get some provider pages in the docs.

Add or update description fields in the provider configuration class
using Pydantic’s Field, ensuring these descriptions are clear and
complete, as they will be used to auto-generate provider documentation
via ./scripts/distro_codegen.py instead of editing the docs manually.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-30 15:13:20 +02:00
Sébastien Han
8d8e90d78e
fix: add missing argument and methods (#2550)
# What does this PR do?

Resolves:

```
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

llama_stack/providers/utils/responses/responses_store.py:119: error: Missing positional argument "policy" in call to "fetch_one" of "AuthorizedSqlStore"  [call-arg]
llama_stack/providers/utils/responses/responses_store.py:122: error: "AuthorizedSqlStore" has no attribute "delete"  [attr-defined]
Found 2 errors in 1 file (checked 403 source files)
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-30 14:55:37 +02:00
Krzysztof Malczuk
be9bf68246
feat: Add webmethod for deleting openai responses (#2160)
Some checks failed
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s
Python Package Build Test / build (3.13) (push) Failing after 33s
Python Package Build Test / build (3.12) (push) Failing after 36s
Pre-commit / pre-commit (push) Failing after 1m19s
# What does this PR do?
This PR creates a webmethod for deleting open AI responses, adds and
implementation for it and makes an integration test for the OpenAI
delete response method.

[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes #2077)

## Test Plan
Ran the standard tests and the pre-commit hooks and the unit tests.

# (## Documentation)
For this pr I made the routes and implementation based on the current
get and create methods. The unit tests were not able to handle this test
due to the mock interface in use, which did not allow for effective CRUD
to be tested. I instead created an integration test to match the
existing ones in the test_openai_responses.
2025-06-30 11:28:02 +02:00
github-actions[bot]
709eb7da33 build: Bump version to 0.2.13 2025-06-27 23:56:14 +00:00
Francisco Arceo
cc19b56c87
chore: OpenAI compatibility for Milvus (#2470)
# What does this PR do?
Closes https://github.com/meta-llama/llama-stack/issues/2461



## Test Plan
Tested with the `ollama` distriubtion template and updated the vector_io
provider to:
```yaml
vector_io:
- provider_id: milvus
  provider_type: inline::milvus
  config:
    db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db
    kvstore:
      type: sqlite
      db_name: milvus_registry.db
```

Ran the stack
```bash
llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434"
```

Ran the tests:
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py  --embedding-model all-MiniLM-L6-v2
```
Output passed.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-27 16:00:36 -07:00
Charlie Doern
65b4fae51d
fix: proper checkpointing logic for HF trainer (#2429)
# What does this PR do?

currently only the last saved model is reported as a checkpoint and
associated with the job UUID. since the HF trainer handles checkpoint
collection during training, we need to add all of the `checkpoint-*`
folders as Checkpoint objects. Adjust the save strategy to be per-epoch
to make this easier and to use less storage

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-27 17:36:25 -04:00
Ramakrishna Reddy Yekulla
03e61e3fcc
fix: ValueError in faiss vector database serialization (resolves #2519) (#2526)
Some checks failed
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 22s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Pre-commit / pre-commit (push) Successful in 1m12s
The error message was misleading as it appeared to be an Ollama
connectivity issue, but actually occurred during faiss vector database
initialization.

## 🔍 Root Cause Analysis

The issue was in the faiss vector database serialization logic in
`llama_stack/providers/inline/vector_io/faiss/faiss.py`:

1. **Saving**: `faiss.serialize_index()` returns binary data (uint8
numpy array)
2. **Bug**: Code incorrectly used `np.savetxt()` which converts binary
to text with scientific notation (e.g., `7.300000000000000000e+01`)
3. **Loading**: `np.loadtxt(buffer, dtype=np.uint8)` failed to parse
scientific notation back to uint8
4. **Result**: Server crashed during initialization before reaching
Ollama connectivity check

##  Solution

Replaced text-based serialization with proper binary serialization:
```

**After (fixed):**
```python
# Saving - proper binary format
np.save(buffer, np_index, allow_pickle=False)  

# Loading - proper binary format
self.index = faiss.deserialize_index(np.load(buffer,
allow_pickle=False))
```

## 🧪 Testing

-  Binary serialization/deserialization works correctly
-  Backward compatible with existing functionality
-  No security concerns (allow_pickle=False maintained)
-  Resolves the specific ValueError mentioned in the issue

## 📊 Impact

This fix resolves:
- ValueError during server startup with Ollama templates

## 🔗 Related Issues

- Closes #2519 
- Affects all users of Ollama template and faiss vector_io configurations

## 📝 Files Changed

- `llama_stack/providers/inline/vector_io/faiss/faiss.py` - Fixed serialization methods in `initialize()` and `_save_index()`

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
2025-06-27 14:34:52 -04:00
Rohan Awhad
7cb5d3c60f
chore: standardize unsupported model error #2517 (#2518)
# What does this PR do?

- llama_stack/exceptions.py: Add UnsupportedModelError class
- remote inference ollama.py and utils/inference/model_registry.py:
Changed ValueError in favor of UnsupportedModelError
- utils/inference/litellm_openai_mixin.py: remove `register_model`
function implementation from `LiteLLMOpenAIMixin` class. Now uses the
parent class `ModelRegistryHelper`'s function implementation

Closes #2517


## Test Plan


1. Create a new `test_run_openai.yaml` and paste the following config in
it:

```yaml
version: '2'
image_name: test-image
apis:
- inference
providers:
  inference:
  - provider_id: openai
    provider_type: remote::openai
    config:
      max_tokens: 8192
models:
- metadata: {}
  model_id: "non-existent-model"
  provider_id: openai
  model_type: llm
server:
  port: 8321
```

And run the server with:
```bash
uv run llama stack run test_run_openai.yaml
```

You should now get a `llama_stack.exceptions.UnsupportedModelError` with
the supported list of models in the error message.

---

Tested for the following remote inference providers, and they all raise
the `UnsupportedModelError`:
- Anthropic
- Cerebras
- Fireworks
- Gemini
- Groq
- Ollama
- OpenAI
- SambaNova
- Together
- Watsonx

---------

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-27 14:26:58 -04:00
Juanma
e7eb9f9adc
fix: dataset metadata without provider_id (#2527)
# What does this PR do?
Fixes an error when inferring dataset provider_id with metadata

Closes #[2506](https://github.com/meta-llama/llama-stack/issues/2506)

Signed-off-by: Juanma Barea <juanmabareamartinez@gmail.com>
2025-06-27 08:51:29 -04:00
Wen Zhou
8c3f2762fb
build: update temp. created Containerfile (#2492)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- conditionally created folder /.llama/providers.d if
external_providers_dir is set
- do not create /.cache folder, not in use anywhere
- combine chmod and copy to one command


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
updated test:

```
export CONTAINER_BINARY=podman
LLAMA_STACK_DIR=. uv run llama stack build --template remote-vllm --image-type container --image-name  <name>
```
log:
```
Containerfile created successfully in /tmp/tmp.rPMunE39Aw/Containerfile

FROM python:3.11-slim
WORKDIR /app

RUN apt-get update && apt-get install -y        iputils-ping net-tools iproute2 dnsutils telnet        curl wget telnet git       procps psmisc lsof        traceroute        bubblewrap        gcc        && rm -rf /var/lib/apt/lists/*

ENV UV_SYSTEM_PYTHON=1
RUN pip install uv
RUN uv pip install --no-cache sentencepiece pillow pypdf transformers pythainlp faiss-cpu opentelemetry-sdk requests datasets chardet scipy nltk numpy matplotlib psycopg2-binary aiosqlite langdetect autoevals tree_sitter tqdm pandas chromadb-client opentelemetry-exporter-otlp-proto-http redis scikit-learn openai pymongo emoji sqlalchemy[asyncio] mcp aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
RUN uv pip install --no-cache sentence-transformers --no-deps
RUN uv pip install --no-cache torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Allows running as non-root user
RUN mkdir -p /.llama/providers.d /.cache
RUN uv pip install --no-cache llama-stack
RUN pip uninstall -y uv
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "remote-vllm"]

RUN chmod -R g+rw /app /.llama /.cache

PWD: /tmp/llama-stack
Containerfile: /tmp/tmp.rPMunE39Aw/Containerfile
+ podman build --progress=plain --security-opt label=disable --platform linux/amd64 -t distribution-remote-vllm:0.2.12 -f /tmp/tmp.rPMunE39Aw/Containerfile /tmp/llama-stack
....
Success!
Build Successful!
You can find the newly-built template here: /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml
You can run the new Llama Stack distro via: llama stack run /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml --image-type container
```

```
podman tag localhost/distribution-remote-vllm:dev quay.io/wenzhou/distribution-remote-vllm:2492_2
podman push quay.io/wenzhou/distribution-remote-vllm:2492_2



docker run --rm -p 8321:8321 -e INFERENCE_MODEL="meta-llama/Llama-2-7b-chat-hf" -e VLLM_URL="http://localhost:8000/v1" quay.io/wenzhou/distribution-remote-vllm:2492_2 --port 8321

INFO     2025-06-26 13:47:31,813 __main__:436 server: Using template remote-vllm config file:                                                         
         /app/llama-stack-source/llama_stack/templates/remote-vllm/run.yaml                                                                           
INFO     2025-06-26 13:47:31,818 __main__:438 server: Run configuration:                                                                              
INFO     2025-06-26 13:47:31,826 __main__:440 server: apis:                                                                                           
         - agents                                                                                                                                     
         - datasetio                                                                                                                                  
         - eval                                                                                                                                       
         - inference                                                                                                                                  
         - safety                                                                                                                                     
         - scoring                                                                                                                                    
         - telemetry                                                                                                                                  
         - tool_runtime                                                                                                                               
         - vector_io                                                                                                                                  
         benchmarks: []                                                                                                                               
         container_image: null                                                                                                                        
....                                                                                                 
```
-----
previous test:
local run` >llama stack build --template remote-vllm --image-type
container`
image stored in  `quay.io/wenzhou/distribution-remote-vllm:2492`

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-06-27 10:23:12 +02:00
Ben Browning
0883944bc3
fix: Some missed env variable changes from PR 2490 (#2538)
Some checks failed
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 25s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 28s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 32s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 29s
Pre-commit / pre-commit (push) Successful in 1m1s
# What does this PR do?

Some templates were still using the old environment variable substition
syntax instead of the new one and were not getting substituted properly.

Also, some places didn't handle the new None vs old empty string ("")
values that come from the conditional environment variable substitution.

This gets the starter and remote-vllm distributions starting again, and
I tested various permutations of the starter as chroma and pgvector
needed some adjustments to their config classes to handle the new
possible `None` values. And, I had to tweak our `Provider` class to also
handle `None` values, for cases where we disable providers in the
starter config via environment variables.

This may not have caught everything that was missed, but I did grep
around quite a bit to try and find anything lingering.

## Test Plan

The following permutations now all run (or attempt to run to the point
of complaining that they can't connect to chroma, vllm, etc) when before
they failed immediately on startup because of bad environment variable
substitions:

```
uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_SQLITE_VEC=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_PGVECTOR=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_CHROMADB=true uv run llama stack run llama_stack/templates/starter/run.yaml

uv run llama stack run llama_stack/templates/remote-vllm/run.yaml
```
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
2025-06-26 17:59:15 -07:00
Hardik Shah
eb01a3f1c5
ci: vector_io provider integration tests (#2537)
Runs integration tests for `vector_io` across the provider matrix. 
This new workflow adds CI testing across - `inline::faiss`,
`remote::chroma`.
2025-06-26 17:04:32 -07:00
grs
68d8f2186f
fix: fix test of root span to match what is being set (#2494)
Some checks failed
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 32s
Unit Tests / unit-tests (3.12) (push) Failing after 48s
Pre-commit / pre-commit (push) Successful in 1m32s
# What does this PR do?

I get errors when trying to query spans. It appears to be a result of
traces being inserted where there is no root_span_id which causes a
pydantic validation error on trying to load the data for a query
response (and in any case having no span referenced undermines the
purpose of the trace). The root cause as far as I can see is an invalid
test in the code that inserts the trace, where it is testing for the
string "true" against an object set to the python value True.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2493 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
With this change I can query spans.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-06-26 11:41:35 -04:00
Sébastien Han
dbdc811d16
chore: isolate bare minimum project dependencies (#2282)
Some checks failed
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 48s
# What does this PR do?

The goal is to promote the minimal set of dependencies the project needs
to run, this includes:

* dependencies needed to work with the CLI
* dependencies needed for the server to run with no providers

This also:
* Relocate redundant dependencies out of the core project and into the
  individual providers that actually require them.
* Include all necessary server dependencies so the project can run
  standalone, even without any providers.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

Build and run distro a server.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 10:14:27 +02:00
Sébastien Han
43c1f39bd6
refactor(env)!: enhanced environment variable substitution (#2490)
# What does this PR do?

This commit significantly improves the environment variable substitution
functionality in Llama Stack configuration files:
* The version field in configuration files has been changed from string
to integer type for better type consistency across build and run
configurations.

* The environment variable substitution system for ${env.FOO:} was fixed
and properly returns an error

* The environment variable substitution system for ${env.FOO+} returns
None instead of an empty strings, it better matches type annotations in
config fields

* The system includes automatic type conversion for boolean, integer,
and float values.

* The error messages have been enhanced to provide clearer guidance when
environment variables are missing, including suggestions for using
default values or conditional syntax.

* Comprehensive documentation has been added to the configuration guide
explaining all supported syntax patterns, best practices, and runtime
override capabilities.

* Multiple provider configurations have been updated to use the new
conditional syntax for optional API keys, making the system more
flexible for different deployment scenarios. The telemetry configuration
has been improved to properly handle optional endpoints with appropriate
validation, ensuring that required endpoints are specified when their
corresponding sinks are enabled.

* There were many instances of ${env.NVIDIA_API_KEY:} that should have
caused the code to fail. However, due to a bug, the distro server was
still being started, and early validation wasn’t triggered. As a result,
failures were likely being handled downstream by the providers. I’ve
maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I
believe this is incorrect for many configurations. I’ll leave it to each
provider to correct it as needed.

* Environment variable substitution now uses the same syntax as Bash
parameter expansion.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:20:08 +05:30
Sébastien Han
36d70637b9
fix: finish conversion to StrEnum (#2514)
# What does this PR do?

We still had a few enum declared to behave like string as well as enum.
Let's use StrEnum for those.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:26 +05:30
Sébastien Han
ac5fd57387
chore: remove nested imports (#2515)
# What does this PR do?

* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.

* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:05 +05:30
Ben Browning
2d9fd041eb
fix: annotations list and web_search_preview in Responses (#2520)
# What does this PR do?


These are a couple of fixes to get an example LangChain app working with
our OpenAI Responses API implementation.

The Responses API spec requires an annotations array in
`output[*].content[*].annotations` and we were not providing one. So,
this adds that as an empty list, even though we don't do anything to
populate it yet. This prevents an error from client libraries like
Langchain that expect this field to always exist, even if an empty list.

The other fix is `web_search_preview` is a valid name for the web search
tool in the Responses API, but we only responded to `web_search` or
`web_search_preview_2025_03_11`.


## Test Plan


The existing Responses unit tests were expanded to test these cases,
via:

```
pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py
```

The existing test_openai_responses.py integration tests still pass with
this change, tested as below with Fireworks:

```
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv tests/integration/agents/test_openai_responses.py \
  --text-model accounts/fireworks/models/llama4-scout-instruct-basic
```

Lastly, this example LangChain app now works with Llama stack (tested
with Ollama in the starter template in this case). This LangChain code
is using the example snippets for using Responses API at
https://python.langchain.com/docs/integrations/chat/openai/#responses-api

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8321/v1/openai/v1",
    api_key="fake",
    model="ollama/meta-llama/Llama-3.2-3B-Instruct",
)

tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])

response = llm_with_tools.invoke("What was a positive news story from today?")

print(response.content)
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-26 07:59:33 +05:30