Commit graph

1278 commits

Author SHA1 Message Date
Hardik Shah
de37a04c3e
fix: set appropriate defaults for params (#2434)
Some checks failed
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Update ReadTheDocs / update-readthedocs (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.11) (push) Failing after 1m39s
Unit Tests / unit-tests (3.13) (push) Failing after 1m37s
Unit Tests / unit-tests (3.10) (push) Failing after 1m41s
Pre-commit / pre-commit (push) Failing after 3h4m8s
Setting defaults to be `| None` else they get marked as required params
in open-api spec.
2025-06-11 17:30:34 -07:00
Hardik Shah
d55100d9b7
feat: OpenAIVectorIOMixin for vector_stores common logic (#2427)
Extracts common OpenAI vector-store code into its own mixin so that all
providers can share the same core logic.
This also makes it easy for Llama Stack to support both vector-stores
and Llama Stack APIs in the interim so that both share the same
underlying vector-dbs.

Each provider contains storage specific logic to `create / edit / delete
/ list` vector dbs while the plumbing logic is standardized in the
common code.

Ensured that this works well with both faiss and sqllite-vec. 

### Test Plan 
```
llama stack run starter
pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-11 15:40:57 -07:00
Rohan Awhad
4e37b49cdc
fix: #1867 InferenceRouter has no attribute formatter (#2422)
Some checks failed
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 49s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 53s
Integration Tests / test-matrix (http, 3.10, datasets) (push) Failing after 57s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.10, scoring) (push) Failing after 55s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 50s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 51s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 2m9s
Unit Tests / unit-tests (3.11) (push) Failing after 2m7s
Pre-commit / pre-commit (push) Failing after 3h13m50s
# What does this PR do?

Closes #1867 

[Steps to reproduce the
bug](https://github.com/meta-llama/llama-stack/issues/1867#issuecomment-2956819381)

The change was designed to minimize code changes. Open to option of
skipping `metrics` field entirely when `telemetry` is disabled.


## Test Plan
1. Build llama-stack remote-vllm container
    ```bash
    llama stack build --template remote-vllm --image-type container
    ```
2. Create a small run.yaml
    ```yaml
    version: '2'
    image_name: remote-vllm
    apis:
    - inference
    providers:
      inference:
      - provider_id: vllm-inference
        provider_type: remote::vllm
        config:
          url: ${env.VLLM_URL:http://localhost:8000/v1}
          max_tokens: ${env.VLLM_MAX_TOKENS:4096}
          api_token: ${env.VLLM_API_TOKEN:fake}
          tls_verify: ${env.VLLM_TLS_VERIFY:true}
    metadata_store:
      type: sqlite
db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db
    inference_store:
      type: sqlite
db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/inference_store.db
    models:
    - metadata: {}
      model_id: ${env.INFERENCE_MODEL}
      provider_id: vllm-inference
      model_type: llm
    shields: []
    vector_dbs: []
    datasets: []
    scoring_fns: []
    benchmarks: []
    server:
      port: 8321
    ```
3. Run the llama-stack server
    ```bash
    export VLLM_URL="http://localhost:8000/v1"
    export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
    llama stack run run.yaml
    ```
4. Then perform a curl
    ```bash
    curl -X 'POST' \
      'http://localhost:8321/v1/inference/completion' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "model_id": "meta-llama/Llama-3.2-3B-Instruct",
      "content": "string",
      "sampling_params": {
        "strategy": {
          "type": "greedy"
        },
        "max_tokens": 10,
        "repetition_penalty": 1,
        "stop": [
          "string"
        ]
      },
      "stream": false,
      "logprobs": {
        "top_k": 0
      }
    }'
    ```
5. You should receive a 200 response with metric values set to 0,
similar to one below:
    ```
    {
      "metrics": [
        {
          "metric": "prompt_tokens",
          "value": 0,
          "unit": null
        },
        {
          "metric": "completion_tokens",
          "value": 0,
          "unit": null
        },
        {
          "metric": "total_tokens",
          "value": 0,
          "unit": null
        }
      ],
      [...]
    }
    ```

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-11 18:14:41 +02:00
Hardik Shah
5ac43268e8
feat: Add OpenAI compat /v1/vector_store APIs (#2423)
Some checks failed
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 41s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 46s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 16s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 1m31s
Unit Tests / unit-tests (3.11) (push) Failing after 1m33s
Unit Tests / unit-tests (3.10) (push) Failing after 1m35s
Pre-commit / pre-commit (push) Failing after 3h13m41s
Adding OpenAI compat `/v1/vector-store` apis. 
This PR implements the `faiss` provider with followup PRs coming up for
other providers.

Added routes to create, update, delete, list vector stores. 
Also added route to search a vector store

Inserting into vector stores is missing and will be a follow up diff. 

### Test Plan 
- Added new integration test for testing the faiss provider 
```
pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-10 13:07:39 -07:00
Ibrahim Haroon
28ca00d0d9
fix(pgvector): handle case where distance is 0 by setting score to infinity (#2416)
Some checks failed
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
Fixes provider pgvector `query_vector` function for when the distance
between the query embedding and an embedding within the vector db is 0
(identical vectors). Catches `ZeroDivisionError` and then sets `score`
to infinity, which represent maximum similarity.

<!-- If resolving an issue, uncomment and update the line below -->
Closes [#2381]

## Test Plan
Checkout this PR

Execute this code and there will no longer be a `ZeroDivisionError`
exception
```
from llama_stack_client import LlamaStackClient

base_url = "http://localhost:8321"
client = LlamaStackClient(base_url=base_url)

models = client.models.list()
embedding_model = (
    em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = 384

_ = client.vector_dbs.register(
    vector_db_id="foo_db",
    embedding_model=embedding_model,
    embedding_dimension=embedding_dimension,
    provider_id="pgvector",
)

chunk = {
    "content": "foo",
    "mime_type": "text/plain",
    "metadata": {
        "document_id": "foo-id"
    }
}

client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk])
client.vector_io.query(vector_db_id="foo_db", query="foo")
```
2025-06-07 16:31:30 -04:00
Ibrahim Haroon
a34cef925b
fix(faiss): handle case where distance is 0 by setting d to minimum positive… (#2387)
# What does this PR do?
Adds try-catch to faiss `query_vector` function for when the distance
between the query embedding and an embedding within the vector db is 0
(identical vectors). Catches `ZeroDivisionError` and then appends `(1.0
/ sys.float_info.min)` to `scores` to represent maximum similarity.

<!-- If resolving an issue, uncomment and update the line below -->
Closes [#2381]

## Test Plan
Checkout this PR

Execute this code and there will no longer be a `ZeroDivisionError`
exception
```
from llama_stack_client import LlamaStackClient

base_url = "http://localhost:8321"
client = LlamaStackClient(base_url=base_url)

models = client.models.list()
embedding_model = (
    em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = 384

_ = client.vector_dbs.register(
    vector_db_id="foo_db",
    embedding_model=embedding_model,
    embedding_dimension=embedding_dimension,
    provider_id="faiss",
)

chunk = {
    "content": "foo",
    "mime_type": "text/plain",
    "metadata": {
        "document_id": "foo-id"
    }
}

client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk])
client.vector_io.query(vector_db_id="foo_db", query="foo")
```

### Running unit tests
`uv run pytest tests/unit/rag/test_rag_query.py -v`

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
2025-06-07 16:09:46 -04:00
Sumit Jaiswal
33ecefd284
feat: To add health status check for remote VLLM (#2303)
Some checks failed
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 56s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
To add health status check for remote VLLM
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
PR includes the unit test to test the added health check implementation
feature.
2025-06-06 15:33:12 -04:00
Hardik Shah
1f48577a02
fix: ChromaDB provider (#2413)
fixes the remote::chromaDB provider for vector_io by updating the method
definition appropriately.
Fixed impl to use score_threshold properly. 

### Test Plan 
```
# Start Chroma Docker 
docker run --rm \
  --name chromadb \
  -p 8800:8000 \
  -v ~/chroma:/chroma/chroma \
  -e IS_PERSISTENT=TRUE \
  -e ANONYMIZED_TELEMETRY=FALSE \
  chromadb/chroma:latest

# run pytest 
CHROMADB_URL="http://localhost:8800" pytest -sv tests/integration/vector_io/test_vector_io.py --stack-config vector_io=remote::chromadb,inference=fireworks --embedding-model nomic-ai/nomic-embed-text-v1.5
```
2025-06-06 11:25:58 -07:00
github-actions[bot]
692709cd45 build: Bump version to 0.2.10
Some checks failed
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 27s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Failing after 1m16s
2025-06-05 22:56:39 +00:00
ehhuang
446893f791
feat: add deps dynamically based on metastore config (#2405)
# What does this PR do?


## Test Plan
changed metastore in one of the templates, rerun distro gen, observe
change in build.yaml
2025-06-05 14:07:25 -07:00
Ashwin Bharambe
3251b44d8a
refactor: unify stream and non-stream impls for responses (#2388)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m18s
The non-streaming version is just a small layer on top of the streaming
version - just pluck off the final `response.completed` event and return
that as the response!

This PR also includes a couple other changes which I ended up making
while working on it on a flight:
- changes to `ollama` so it does not pull embedding models
unconditionally
- a small fix to library client to make the stream and non-stream cases
a bit more symmetric
2025-06-05 17:48:09 +02:00
Jose Angel Morena Simon
ef885d2147
fix(server): Add missing OpenTelemetry dependencies to resolve telemetry import errors (#2391)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 7s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Test Llama Stack Build / build (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 35s
Pre-commit / pre-commit (push) Successful in 1m20s
This PR fixes a runtime import error caused by missing OpenTelemetry
dependencies during `llama stack run`.

Specifically, the following imports fail if `opentelemetry-sdk` and
`opentelemetry-exporter-otlp-proto-http` are not present in the
environment:

```python
from opentelemetry import metrics, trace
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
```

See
[llama\_stack/providers/inline/telemetry/meta\_reference/telemetry.py#L10-L19](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py#L10-L19)

This PR resolves the issue by adding both packages to the
`SERVER_DEPENDENCIES` list:

```python
"opentelemetry-sdk",
"opentelemetry-exporter-otlp-proto-http",
```

### Reproduction Steps

```bash
llama stack build --config llama.yaml --image-type venv --image-name fun-with-lamas
llama stack run ~/.llama/distributions/fun-with-lamas/fun-with-lamas-run.yaml
```

Results in:

```
ModuleNotFoundError: No module named 'opentelemetry'
```

or

```
ModuleNotFoundError: No module named 'opentelemetry.exporter'
```

Signed-off-by: Jose Angel Morena <jmorenas@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
2025-06-05 09:34:46 +02:00
ehhuang
a58c0639d5
chore: update postgres_demo distro config (#2396)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?


## Test Plan
2025-06-04 17:41:27 -07:00
Sébastien Han
c8c742ba45
fix: vllm starter name (#2392)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (library, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test Llama Stack Build / build (push) Failing after 6s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 29s
Pre-commit / pre-commit (push) Successful in 2m3s
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-04 16:21:36 +02:00
grs
0de9536717
fix: remove debug print accidentally merged (#2393)
I accidentally left a debug print in a PR that was merged. This removes
that.
2025-06-04 15:14:14 +02:00
Ashwin Bharambe
ed69c1b3cc
feat(responses): add more streaming response types (#2375)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 1m21s
2025-06-03 15:48:41 -07:00
ehhuang
d96f6ec763
chore(ui): use proxy server for backend API calls; simplified k8s deployment (#2350)
# What does this PR do?
- no more CORS middleware needed


## Test Plan
### Local test
llama stack run starter --image-type conda
npm run dev
verify UI works in browser

### Deploy to k8s
temporarily change ui-k8s.yaml.template to load from PR commit
<img width="604" alt="image"
src="https://github.com/user-attachments/assets/87fa2e52-1e93-4e32-9e0f-5b283b7a37b3"
/>

sh ./apply.sh
$ kubectl get services
go to external_ip:8322 and play around with UI
<img width="1690" alt="image"
src="https://github.com/user-attachments/assets/5b7ec827-4302-4435-a9eb-df423676d873"
/>
2025-06-03 14:57:10 -07:00
grs
7c1998db25
feat: fine grained access control policy (#2264)
This allows a set of rules to be defined for determining access to
resources. The rules are (loosely) based on the cedar policy format.

A rule defines a list of action either to permit or to forbid. It may
specify a principal or a resource that must match for the rule to take
effect. It may also specify a condition, either a 'when' or an 'unless',
with additional constraints as to where the rule applies.

A list of rules is held for each type to be protected and tried in order
to find a match. If a match is found, the request is permitted or
forbidden depening on the type of rule. If no match is found, the
request is denied. If no rules are specified for a given type, a rule
that allows any action as long as the resource attributes match the user
attributes is added (i.e. the previous behaviour is the default.

Some examples in yaml:

```
    model:
    - permit:
      principal: user-1
      actions: [create, read, delete]
      comment: user-1 has full access to all models
    - permit:
      principal: user-2
      actions: [read]
      resource: model-1
      comment: user-2 has read access to model-1 only
    - permit:
      actions: [read]
      when:
        user_in: resource.namespaces
      comment: any user has read access to models with matching attributes
    vector_db:
    - forbid:
      actions: [create, read, delete]
      unless:
        user_in: role::admin
      comment: only user with admin role can use vector_db resources
```

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-06-03 14:51:12 -07:00
Ben Browning
8bee2954be
feat: Structured output for Responses API (#2324)
# What does this PR do?

This adds the missing `text` parameter to the Responses API that is how
users control structured outputs. All we do with that parameter is map
it to the corresponding chat completion response_format.

## Test Plan

The new unit tests exercise the various permutations allowed for this
property, while a couple of new verification tests actually use it for
real to verify the model outputs are following the format as expected.

Unit tests:

`python -m pytest -s -v
tests/unit/providers/agents/meta_reference/test_openai_responses.py`

Verification tests:

```
llama stack run llama_stack/templates/together/run.yaml
pytest -s -vv 'tests/verifications/openai_api/test_responses.py' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Note that the verification tests can only be run with a real Llama Stack
server (as opposed to using the library client via
`--provider=stack:together`) because the Llama Stack python client is
not yet updated to accept this text field.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-03 14:43:00 -07:00
Ignas Baranauskas
c70ca8344f
fix: resolve template name to config path in llama stack run (#2361)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes a bug where running a known template by name using:
`llama stack run ollama`
would fail with the following error:
`ValueError: Config file ollama does not exist`

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2291 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
`llama stack run ollama` should work
2025-06-03 14:39:12 -07:00
Ashwin Bharambe
cba55808ab
feat(distro): add more providers to starter distro, prefix conflicting models (#2362)
The name changes to the verifications file are unfortunate, but maybe we
don't need that @ehhuang ?

Edit: deleted the verifications template now
2025-06-03 12:10:46 -07:00
Ashwin Bharambe
b380cb463f
feat: add postgres deps to starter distro (#2360)
Once we have this, we can use the starter distro for the Kubernetes
cluster demos.
2025-06-03 11:04:23 -07:00
ehhuang
3c9a10d2fe
feat: reference implementation for files API (#2330)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 53s
# What does this PR do?
TSIA
Added Files provider to the fireworks template. Might want to add to all
templates as a follow-up.

## Test Plan
llama-stack pytest tests/unit/files/test_files.py

llama-stack llama stack build --template fireworks --image-type conda
--run
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v
tests/integration/files/
2025-06-02 21:54:24 -07:00
Ben Browning
e92f571f47
fix: ollama chat completion needs unique ids (#2344)
# What does this PR do?

The chat completion ids generated by Ollama are not unique enough to use
with stored chat completions as they rely on only 3 numbers of
randomness to give unique values - ie `chatcmpl-373`. This causes
frequent collisions in id values of chat completions in Ollama, which
creates issues in our SQL storage of chat completions by id where it
expects ids to actually be unique.

So, this adjusts Ollama responses to use uuids as unique ids. This does
mean we're replacing the ids generated natively by Ollama. If we don't
wish to do this, we'll either need to relax the unique constraint on our
chat completions id field in the inference storage or convince Ollama
upstream to use something closer to uuid values here.

Closes #2315

## Test Plan

I tested by running the openai completion / chat completion integration
tests in a loop. Without this change, I regularly get unique id
collisions. With this change, I do not. We sometimes see flakes from
these unique id collisions in our CI tests, and this will resolve those.

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

while true; do; \
  INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
  pytest -s -v \
    tests/integration/inference/test_openai_completion.py \
    --stack-config=http://localhost:8321 \
    --text-model="meta-llama/Llama-3.2-3B-Instruct"; \
done
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-02 17:43:20 -07:00
ehhuang
4540c9b3e5
chore: revert llama-stack-client dep (#2342)
# What does this PR do?


## Test Plan
2025-06-02 16:05:21 -07:00
Ashwin Bharambe
dbe4e84aca
feat(responses): implement full multi-turn support (#2295)
I think the implementation needs more simplification. Spent way too much
time trying to get the tests pass with models not co-operating :(
Finally had to switch claude-sonnet to get things to pass reliably.

### Test Plan

```
export TAVILY_SEARCH_API_KEY=...
export OPENAI_API_KEY=...

uv run pytest -p no:warnings \
   -s -v tests/verifications/openai_api/test_responses.py \
 --provider=stack:starter \
  --model openai/gpt-4o
```
2025-06-02 15:35:49 -07:00
ehhuang
cac7d404a2
fix: remove openai dep (#2337)
# What does this PR do?
1. remove openai dep
2. temporarily update llama-stack-client to stainless sync'd branch as
the responses/inputitems API wasn't included in the last push. This will
automatically be updated to the next version in the release.


## Test Plan
npm run dev
go to any responses details page
2025-06-02 15:15:12 -07:00
Sébastien Han
6bb174bb05
revert: "chore: Remove zero-width space characters from OTEL service" (#2331)
# What does this PR do?

Revert #2060 and fix PLE2515.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-02 14:21:35 -07:00
Hardik Shah
3511af7c33
fix: fireworks provider for openai compat inference endpoint (#2335)
fixes provider to use stream var correctly

Before 
```
curl --request POST \
    --url http://localhost:8321/v1/openai/v1/chat/completions \
    --header 'content-type: application/json' \
    --data '{
      "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
      "messages": [
        {
          "role": "user",
          "content": "Who are you?"
        }
      ]
    }'
{"detail":"Internal server error: An unexpected error occurred."}
```

After 
```
 llama-stack % curl --request POST \
    --url http://localhost:8321/v1/openai/v1/chat/completions \
    --header 'content-type: application/json' \
    --data '{
      "model": "accounts/fireworks/models/llama4-scout-instruct-basic",
      "messages": [
        {
          "role": "user",
          "content": "Who are you?"
        }
      ]
    }'
{"id":"chatcmpl-97978538-271d-4c73-8d4d-c509bfb6c87e","choices":[{"message":{"role":"assistant","content":"I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1748896403,"model":"accounts/fireworks/models/llama4-scout-instruct-basic"}%
```
2025-06-02 14:11:15 -07:00
ehhuang
31a3ae60f4
feat: openai files api (#2321)
# What does this PR do?
* Adds the OpenAI compatible Files API
* Modified doc gen script to support multipart parameter

## Test Plan

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/2321).
* #2330
* __->__ #2321
2025-06-02 11:45:53 -07:00
Sébastien Han
1c0c6e1e17
chore: remove usage of load_tiktoken_bpe (#2276) 2025-06-02 07:33:37 -07:00
Mark Campbell
c7be73fb16
refactor: remove container from list of run image types (#2178)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, datasets) (push) Failing after 12s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, post_training) (push) Failing after 12s
Integration Tests / test-matrix (http, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 2m1s
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Removes the ability to run llama stack container images through the
llama stack CLI
Closes #2110
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Run:
```
llama stack run /path/to/run.yaml --image-type container
```
Expected outcome:
```
llama stack run: error: argument --image-type: invalid choice: 'container' (choose from 'conda', 'venv')
```

[//]: # (## Documentation)
2025-06-02 09:57:55 +02:00
Hardik Shah
b21050935e
feat: New OpenAI compat embeddings API (#2314)
Some checks failed
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, providers) (push) Failing after 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, inference) (push) Failing after 46s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 44s
Integration Tests / test-matrix (http, inspect) (push) Failing after 47s
Integration Tests / test-matrix (http, providers) (push) Failing after 45s
Integration Tests / test-matrix (library, datasets) (push) Failing after 45s
Integration Tests / test-matrix (http, post_training) (push) Failing after 46s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s
Integration Tests / test-matrix (http, datasets) (push) Failing after 49s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
Adds a new endpoint that is compatible with OpenAI for embeddings api. 
`/openai/v1/embeddings`
Added providers for OpenAI, LiteLLM and SentenceTransformer. 


## Test Plan
```
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004
```
2025-05-31 22:11:47 -07:00
Ben Browning
277f8690ef
fix: Responses streaming tools don't concatenate None and str (#2326)
# What does this PR do?

This adds a check to ensure we don't attempt to concatenate `None + str`
or `str + None` when building up our arguments for streaming tool calls
in the Responses API.

## Test Plan

All existing tests pass with this change.

Unit tests:

```
python -m pytest -s -v \
  tests/unit/providers/agents/meta_reference/test_openai_responses.py
```

Integration tests:

```
llama stack run llama_stack/templates/together/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
python -m pytest -s -v \
  tests/integration/agents/test_openai_responses.py \
  --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Verification tests:

```
llama stack run llama_stack/templates/together/run.yaml

pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Additionally, the manual example using Codex CLI from #2325 now succeeds
instead of throwing a 500 error.

Closes #2325

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-31 18:24:04 -07:00
Francisco Arceo
f328436831
feat: Enable ingestion of precomputed embeddings (#2317)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m15s
2025-05-31 04:03:37 -06:00
github-actions[bot]
ad15276da1 build: Bump version to 0.2.9
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Pre-commit / pre-commit (push) Failing after 1m34s
2025-05-30 19:43:09 +00:00
ehhuang
2603f10f95
feat: support postgresql inference store (#2310)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 13s
Integration Tests / test-matrix (http, providers) (push) Failing after 15s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s
Integration Tests / test-matrix (http, datasets) (push) Failing after 18s
Integration Tests / test-matrix (http, scoring) (push) Failing after 16s
Integration Tests / test-matrix (http, agents) (push) Failing after 19s
Integration Tests / test-matrix (library, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, agents) (push) Failing after 18s
Integration Tests / test-matrix (http, inference) (push) Failing after 20s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, providers) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
* Added support postgresql inference store
* Added 'oracle' template that demos how to config postgresql stores
(except for telemetry, which is not supported currently)


## Test Plan

llama stack build --template oracle --image-type conda --run
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/
--text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k
'inference_store'
2025-05-29 14:33:09 -07:00
Jorge Piedrahita Ortiz
168c7113df
fix(providers): update sambanova json schema mode (#2306)
# What does this PR do?
Updates sambanova inference to use strict as false in json_schema
structured output

## Test Plan
pytest -s -v tests/integration/inference/test_text_inference.py
--stack-config=sambanova
--text-model=sambanova/Meta-Llama-3.3-70B-Instruct
2025-05-29 09:54:23 -07:00
Ashwin Bharambe
bfdd15d1fa
fix(responses): use input, not original_input when storing the Response (#2300)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, providers) (push) Failing after 7s
Integration Tests / test-matrix (http, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, inference) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 11s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 53s
We must store the full (re-hydrated) input not just the original input
in the Response object. Of course, this is not very space efficient and
we should likely find a better storage scheme so that we can only store
unique entries in the database and then re-hydrate them efficiently
later. But that can be done safely later.

Closes https://github.com/meta-llama/llama-stack/issues/2299

## Test Plan

Unit test
2025-05-28 13:17:48 -07:00
Michael Dawson
a654467552
feat: add cpu/cuda config for prompt guard (#2194)
# What does this PR do?
Previously prompt guard was hard coded to require cuda which prevented
it from being used on an instance without a cuda support.

This PR allows prompt guard to be configured to use either cpu or cuda.

[//]: # (If resolving an issue, uncomment and update the line below)
Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133)

## Test Plan (Edited after incorporating suggestion)
1) started stack configured with prompt guard as follows on a system
without a GPU
and validated prompt guard could be used through the APIs

2) validated on a system with a gpu (but without llama stack) that the
python selecting between cpu and cuda support returned the right value
when a cuda device was available.

3) ran the unit tests as per -
https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md

[//]: # (## Documentation)

---------

Signed-off-by: Michael Dawson <mdawson@devrus.com>
2025-05-28 12:23:15 -07:00
Sébastien Han
63a9f08c9e
chore: use starlette built-in Route class (#2267)
# What does this PR do?

Use a more common pattern and known terminology from the ecosystem,
where Route is more approved than Endpoint.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-28 09:53:33 -07:00
ehhuang
56e5ddb39f
feat(ui): add views for Responses (#2293)
# What does this PR do?
* Add responses list and detail views
* Refactored components to be shared as much as possible between chat
completions and responses

## Test Plan
<img width="2014" alt="image"
src="https://github.com/user-attachments/assets/6dee12ea-8876-4351-a6eb-2338058466ef"
/>
<img width="2021" alt="image"
src="https://github.com/user-attachments/assets/6c7c71b8-25b7-4199-9c57-6960be5580c8"
/>

added tests
2025-05-28 09:51:22 -07:00
ehhuang
0b695538af
fix: chat completion with more than one choice (#2288)
Some checks failed
Integration Tests / test-matrix (http, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 1m33s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, scoring) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (http, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 3m18s
# What does this PR do?
Fix a bug in openai_compat where choices are not indexed correctly.

## Test Plan
Added a new test.

Rerun the failed inference_store tests:
llama stack run fireworks --image-type conda
pytest -s -v tests/integration/ --stack-config http://localhost:8321 -k
'test_inference_store' --text-model meta-llama/Llama-3.3-70B-Instruct
--count 10
2025-05-27 15:39:15 -07:00
github-actions[bot]
7105a25b0f build: Bump version to 0.2.8 2025-05-27 20:28:29 +00:00
Ashwin Bharambe
5cdb29758a
feat(responses): add output_text delta events to responses (#2265)
This adds initial streaming support to the Responses API. 

This PR makes sure that the _first_ inference call made to chat
completions streams out.

There's more to be done:
 - tool call output tokens need to stream out when possible
- we need to loop through multiple rounds of inference and they all need
to stream out.

## Test Plan

Added a test. Executed as:

```
FIREWORKS_API_KEY=... \
  pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Then, started a llama stack fireworks distro and tested against it like
this:

```
OPENAI_API_KEY=blah \
   pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
   --base-url http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct 
```
2025-05-27 13:07:14 -07:00
Sébastien Han
6ee319ae08
fix: convert boolean string to boolean (#2284)
# What does this PR do?

Handles the case where the vllm config `tls_verify` is set to `false` or
`true`.

Closes: https://github.com/meta-llama/llama-stack/issues/2283

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 13:05:38 -07:00
Sébastien Han
a8f75d3897
chore: remove dependencies.json (#2281)
# What does this PR do?
It's not used anywhere in the build process. Ancient artifact from an
old attempt of using sub packages to build distros.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

N/A

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 10:26:57 -07:00
Ignas Baranauskas
28930cdab6
fix: handle None external_providers_dir in build with run arg (#2269)
# What does this PR do?
Fixes an issue where running `llama stack build --template ollama
--image-type venv --run` fails with a TypeError when validating external
providers directory paths.

The error occurs because `os.path.exists()` is called with `Path(None)`
instead of converting it to a string first. This change ensures
consistent handling of `None` values for `external_providers_dir` across
both build and
[run](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/cli/stack/run.py#L134)
commands by using `str()` conversion before path validation.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```bash
INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run
```
Command completes successfully without TypeError

[//]: # (## Documentation)
2025-05-27 09:41:12 +02:00
Ashwin Bharambe
51e6f529f3
fix: index non-MCP toolgroups at registration time (#2272)
Two somewhat annoying fixes: 

- we are going to index tools for non-MCP toolgroups always (like we
used to do). because there are just random assumptions in our tests,
etc. and I don't want to fix them right now
- we need to handle the funny case of toolgroups like
`builtin::rag/knowledge_search` where we added the tool name to use in
the toolgroup itself.
2025-05-26 20:33:36 -07:00
Sébastien Han
39b33a3b01
chore: allow to pass CA cert to remote vllm (#2266)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Integration Tests / test-matrix (http, inference) (push) Failing after 22s
Integration Tests / test-matrix (http, datasets) (push) Failing after 28s
Integration Tests / test-matrix (http, inspect) (push) Failing after 29s
Integration Tests / test-matrix (http, scoring) (push) Failing after 30s
Integration Tests / test-matrix (library, datasets) (push) Failing after 18s
Integration Tests / test-matrix (library, agents) (push) Failing after 28s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 35s
Integration Tests / test-matrix (http, agents) (push) Failing after 37s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (http, providers) (push) Failing after 35s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 1m18s
Pre-commit / pre-commit (push) Successful in 3m12s
# What does this PR do?

The `tls_verify` can now receive a path to a certificate file if the
endpoint requires it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-26 20:59:03 +02:00