Commit graph

2265 commits

Author SHA1 Message Date
ehhuang
0a6e588f68
feat: enable auth for LocalFS Files Provider (#2773)
Some checks failed
Integration Tests / discover-tests (push) Successful in 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Coverage Badge / unit-tests (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Test Llama Stack Build / build (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Failing after 2m19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 2m25s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m32s
Integration Tests / test-matrix (push) Failing after 2m24s
Pre-commit / pre-commit (push) Successful in 3m57s
# What does this PR do?
Supports authentication for LocalFS Files provider.

closes https://github.com/meta-llama/llama-stack/issues/2760

## Test Plan
CI. added tests.
2025-07-18 19:11:01 -07:00
Ashwin Bharambe
dd303327f3
feat(ci): add a ci-tests distro (#2826) 2025-07-18 17:11:06 -07:00
Ashwin Bharambe
199f859eec
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
2025-07-18 15:53:09 -07:00
Ashwin Bharambe
ade075152e
chore: kill inline::vllm (#2824)
Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
2025-07-18 15:52:18 -07:00
Ashwin Bharambe
68a2dfbad7
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.

_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._

## Test Plan

Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:

```
LLAMA_STACK_LOGGING=all=debug \
  ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```

See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
2025-07-18 12:20:36 -07:00
ehhuang
6d55f2f137
feat: enable ls client for files tests (#2769)
# What does this PR do?
titled

## Test Plan
CI
2025-07-18 12:10:30 -07:00
Nehanth Narendrula
874b1cb00f
fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 4s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Python Package Build Test / build (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Integration Tests / test-matrix (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Test Llama Stack Build / build (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 1m50s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s
Pre-commit / pre-commit (push) Successful in 3m20s
# What does this PR do?

This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct
Preference Optimization (DPO) parameters.

The current schema incorrectly uses PPO-inspired parameters
(`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of
the DPO algorithm. This PR updates it to use the standard DPO
parameters:

- `beta`: The KL divergence coefficient that controls deviation from the
reference model
- `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo,
kto_pair)

These parameters align with standard DPO implementations like
HuggingFace's TRL library.

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
2025-07-18 11:56:00 -07:00
Charlie Doern
d994305f0a
fix: remove disabled providers from model dump (#2784)
# What does this PR do?

currently when running `llama stack run --template starter...` the
__disabled__ providers, their models, etc are printed alongside the
enabled ones making the output really confusing

in server.py add a utility `remove_disabled_providers` which
post-processes the model_dump output to remove any dict with
`provider_id: __disabled__`

we also have `debug` logs printing the disabled providers, so I think
its safe to say that is the only indicator we need when using starter.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

before (output truncated because it was huge):


```
...
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
           provider_model_id: null
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               base_url: https://api.cerebras.ai
             provider_id: __disabled__
             provider_type: remote::cerebras
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config:
               api_token: '********'
               max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
               tls_verify: ${env.VLLM_TLS_VERIFY:=true}
               url: ${env.VLLM_URL}
             provider_id: __disabled__
             provider_type: remote::vllm
           - config:
               url: ${env.TGI_URL}
             provider_id: __disabled__
             provider_type: remote::tgi
           - config:
               api_token: '********'
               huggingface_repo: ${env.INFERENCE_MODEL}
             provider_id: __disabled__
             provider_type: remote::hf::serverless
           - config:
               api_token: '********'
               endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
             provider_id: __disabled__
             provider_type: remote::hf::endpoint
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together
           - config: {}
             provider_id: __disabled__
             provider_type: remote::bedrock
           - config:
               api_token: '********'
               url: ${env.DATABRICKS_URL}
             provider_id: __disabled__
             provider_type: remote::databricks
           - config:
               api_key: '********'
               append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
               url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
             provider_id: __disabled__
             provider_type: remote::nvidia
           - config:
               api_token: '********'
               url: ${env.RUNPOD_URL:=}
             provider_id: __disabled__
             provider_type: remote::runpod
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: __disabled__
             provider_type: remote::groq
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.llama.com/compat/v1/
             provider_id: __disabled__
             provider_type: remote::llama-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.groq.com/openai/v1
             provider_id: __disabled__
             provider_type: remote::groq-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.cerebras.ai/v1
             provider_id: __disabled__
             provider_type: remote::cerebras-openai-compat
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova
           - config:
               api_key: '********'
               url: ${env.PASSTHROUGH_URL}
             provider_id: __disabled__
             provider_type: remote::passthrough
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
               distributed_backend: null
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               otel_exporter_otlp_endpoint: null
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::sqlite-vec
           - config:
               db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::milvus
           - config:
               url: ${env.CHROMADB_URL:=}
             provider_id: __disabled__
             provider_type: remote::chromadb
           - config:
               db: ${env.PGVECTOR_DB:=}
               host: ${env.PGVECTOR_HOST:=localhost}
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db
                 type: sqlite
               password: '********'
               port: ${env.PGVECTOR_PORT:=5432}
               user: ${env.PGVECTOR_USER:=}
             provider_id: __disabled__
             provider_type: remote::pgvector
         scoring_fns: []
         server:
           auth: null
           host: null
           port: 8321
           quota: null
           tls_cafile: null
           tls_certfile: null
           tls_keyfile: null
         shields:
         - params: null
           provider_id: null
           provider_shield_id: ollama/__disabled__
           shield_id: __disabled__
         tool_groups:
         - args: null
           mcp_endpoint: null
           provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - args: null
           mcp_endpoint: null
           provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

```

after:

```
INFO     2025-07-16 13:00:32,604 __main__:448 server: Run configuration:
INFO     2025-07-16 13:00:32,606 __main__:450 server: apis:
         - agents
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/registry.db
           type: sqlite
         models:
         - metadata: {}
           model_id: ollama/llama3.2:3b
           model_type: llm
           provider_id: ollama
           provider_model_id: llama3.2:3b
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-18 10:44:35 -07:00
slekkala1
15916852e8
chore: Add slekkala1 to codeowners (#2817)
Getting started on LLAMA Stack
2025-07-18 10:33:30 -07:00
Ashwin Bharambe
9e3ae50306
feat(server): construct the stack in a persistent event loop (#2818)
When we call `construct_stack()`, providers are instantiated and
`initialize()` is called. This call can end up doing _anything_ at all
-- specifically, providers are free to create long running background
tasks as part of this. If we wrapped this within a `asyncio.run()` as in
the current code, these tasks get canceled when the stack construction
finishes. This is not correct. The PR addresses the issue by creating a
persistent event loop which is used for both the stack as well as for
running the uvicorn server. In other words, the lifetime of the
providers (and downstream async code) is now the same as the lifetime of
the uvicorn server.

## Test Plan

This should not affect any current code since we don't have background
tasks created right now. However,
https://github.com/meta-llama/llama-stack/pull/2805 will start using
this functionality.
2025-07-18 10:29:19 -07:00
Nathan Weinberg
2bb9039173
docs: fix steps in the Quick Start Guide (#2800)
# What does this PR do?
'build' command didn't take into account ENABLE flags for starter distro

for some reason, I was having issues with HuggingFace access for the
embedding model, so added a tip for that as well

Closes #2779

## Test Plan
I ran the described steps manually, but it would be nice if someone else
could try it and verify this still works

We might consider having some CI job ensure the QSG remains functional -
it's not a great experience for new users if they try Llama Stack for
the first time and it doesn't work as we describe

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:08:46 -07:00
Christian Zaccaria
e45543f7f3
test: Measure and track code coverage (#2636)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Added coverage badge to README. - [See my
fork](https://github.com/ChristianZaccaria/llama-stack)
- Added a GitHub Actions workflow that runs the tests and updates the
coverage badge. - [See
run](4574811323)
- Documented steps in `testing.md` for running the tests locally, and
viewing the `html` report.
- Excluded non-essential files from coverage reporting to provide a more
accurate measurement.

Automatically created PR to update coverage badge:
https://github.com/ChristianZaccaria/llama-stack/pull/9

# Note for reviewers
1. Currently the coverage report shows a 45% coverage. Wondering if
there are other files or directories that should also be excluded from
the report to increase the percentage. The directories with the least
test coverage are `llama_stack/cli`, `llama_stack/models`, and
`llama_stack/ui`. - Should we exclude these?
2. **[Required]** The `GITHUB_TOKEN` should have write permissions to
open a PR to update the coverage badge.

# GitHub Issue
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2355 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
The `testing.md` file describes how to run the unit tests locally.
2025-07-18 18:08:36 +02:00
Nathan Weinberg
1785a6b39c
docs: add virtualenv instructions for running starter distro (#2780)
# What does this PR do?
we had directions for a container and conda but not venv

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:07:43 -07:00
Charlie Doern
0eb0583cdf
fix: amend integration test workflow (#2812)
# What does this PR do?

trigger integration tests on ALL changes to `tests/` to catch failures
before they merge into main

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-18 15:23:36 +02:00
Mustafa Elbehery
fe6af7dc8b
chore(test): migrate unit tests from unittest to pytest nvidia test f… (#2794)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Python Package Build Test / build (3.13) (push) Failing after 11s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Integration Tests / test-matrix (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 29s
Python Package Build Test / build (3.12) (push) Failing after 1m46s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m44s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m51s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m53s
Pre-commit / pre-commit (push) Successful in 3m17s
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 12:32:19 +02:00
Mustafa Elbehery
b78b8e1486
chore: add mypy inference parallel utils (#2670)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 12:01:10 +02:00
Mustafa Elbehery
ca7edcd6a4
chore(api): add mypy coverage to chat_format (#2654)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:56:53 +02:00
Mustafa Elbehery
75480b01b8
chore(test): migrate unit tests from unittest to pytest for system prompt (#2789)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:54:02 +02:00
Mustafa Elbehery
3cdf748a8e
chore(test): migrate unit tests from unittest to pytest for nvidia datastore (#2790)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:52:47 +02:00
Mustafa Elbehery
55713abe7d
chore(test): migrate unit tests from unittest to pytest nvidia test p… (#2792)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:49:45 +02:00
Charlie Doern
d7cc38e934
fix: remove async test markers (fix pre-commit) (#2808)
# What does this PR do?

some async test markers are in the codebase causing pre-commit to fail
due to #2744

remove these pytest fixtures

## Test Plan
pre-commit passes

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-17 21:35:28 -07:00
Ashwin Bharambe
d64e096c5f
fix(cli): image name should not default to CONDA_DEFAULT_ENV (#2806)
Some checks failed
Integration Tests / discover-tests (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Unit Tests / unit-tests (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 53s
Pre-commit / pre-commit (push) Failing after 2m14s
If I am running `uv run llama stack run --image-type venv` it should not
be saying to me "Conda detected" because I am pretty clearly telling it
I need venv. The root cause is the offending line.
2025-07-17 16:40:35 -07:00
Matthew Farrellee
910b017680
chore: block asyncio marks in tests (#2744)
# What does this PR do?

use pre-commit to block addition of new asyncio marks, since we
configure pytest with async-mode=auto, see
https://github.com/meta-llama/llama-stack/pull/2730
2025-07-17 16:33:30 -07:00
Mustafa Elbehery
bd8a3ae3cc
chore(test): migrate unit tests from unittest to pytest for prompt adapter (#2788)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-07-17 16:31:38 -07:00
ehhuang
3ae4aeb344
test: add some tests for Telemetry API (#2787)
# What does this PR do?

## Test Plan
ENABLE_OLLAMA=ollama LLAMA_STACK_CONFIG=starter uv run pytest
tests/integration/telemetry
--text-model="ollama/llama3.2:3b-instruct-fp16"
2025-07-17 16:20:51 -07:00
Mustafa Elbehery
73868ce9e3
chore(test): migrate unit tests from unittest to pytest for server en… (#2795)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-17 16:20:12 -07:00
Matthew Farrellee
477bcd4d09
feat: allow dynamic model registration for nvidia inference provider (#2726)
# What does this PR do?

let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py

## Test Plan

1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
2025-07-17 12:11:30 -07:00
Matthew Farrellee
57745101be
chore: internal change, make Model.provider_model_id non-optional (#2690)
Some checks failed
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 25s
Test Llama Stack Build / build-single-provider (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s
Unit Tests / unit-tests (3.12) (push) Failing after 32s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s
Unit Tests / unit-tests (3.13) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s
Python Package Build Test / build (3.13) (push) Failing after 1m51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s
Integration Tests / test-matrix (push) Failing after 36s
Test Llama Stack Build / build (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 3m40s
- POST /v1/models accepts optional provider_model_id
- ModelsRoutingTable.register_model handler ensures it is non-None,
providing a default

usage of Model.provider_model_id will no longer need to detect None
2025-07-17 08:26:57 -07:00
Derek Higgins
c2b64dce5b
fix: Move sentence-transformers to the top (#2703)
Move sentence-transformers to be the first embedding in the list of
models. This ensures it will always be the default and is more
consistent then having the default change based on what env variables
are available

Closes: #2702

## Test Plan
Manually verified

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-17 10:31:30 -04:00
ehhuang
51b179e1c5
chore: update k8s template (#2786)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 50s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s
Unit Tests / unit-tests (3.13) (push) Failing after 54s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m40s
# What does this PR do?
- enables auth
- updates to use distribution-starter docker

## Test Plan
bash apply.sh
2025-07-16 15:07:26 -07:00
IAN MILLER
b57db11bed
feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s
Unit Tests / unit-tests (3.13) (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to create a solution that can automatically
detect when new models are added, deprecated, or removed by OpenAI and
Llama API providers, and automatically update the list of supported
models in LLamaStack.

This feature is vitally important in order to avoid missing new models
and editing the entries manually hence I created automation allowing
users to dynamically register:
- any models from OpenAI provider available at 
[https://api.openai.com/v1/models](https://api.openai.com/v1/models)
that are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py)

- any models from Llama API provider available at
[https://api.llama.com/v1/models](https://api.llama.com/v1/models) that
are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2504

this PR is dependant on #2710

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. Create venv at root llamastack directory:
`uv venv .venv --python 3.12 --seed`    
2. Activate venv:
`source .venv/bin/activate`   
3. `uv pip install -e .`
4. Create OpenAI distro modifying run.yaml
5. Build distro:
`llama stack build --template starter --image-type venv`
6. Then run LlamaStack, but before navigate to templates/starter folder:
`llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY>
ENABLE_OPENAI=openai`
7. Then try to register dummy llm that doesn't exist in OpenAI provider:
` llama-stack-client models register ianm/ianllm
--provider-model-id=ianllm --provider-id=openai `
 
You should receive this output - combined list of static config +
fetched available models from OpenAI:
 
<img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50"
src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883"
/>

8. Then register real llm from OpenAI:
llama-stack-client models register openai/gpt-4-turbo-preview
--provider-model-id=gpt-4-turbo-preview --provider-id=openai

<img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02"
src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0"
/>
<img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11"
src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee"
/>

We correctly fetched all available models from OpenAI

As for Llama API, as a non-US person I don't have access to Llama API
Key but I joined wait list. The implementation for Llama is the same as
for OpenAI since Llama is openai compatible. So, the response from GET
endpoint has the same structure as OpenAI
https://llama.developer.meta.com/docs/api/models
2025-07-16 12:49:38 -04:00
Charlie Doern
6c516d391b
fix: de-clutter llama stack run logs (#2783)
# What does this PR do?

currently each disabled provider is printed as a warning, switch to
debug. This level of verbosity isn't necessary, especially if we intend
to grow the list of providers over time that can be in a single run yaml


## Test Plan

before:

<img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37
18 PM"
src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71"
/>

after:
<img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM"
src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-16 09:44:26 -07:00
Nathan Weinberg
919ee3199b
docs: add missing bold title to match others (#2782)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 18:05:48 +02:00
Sergey Yedrikov
30be1fd8b7
fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770) (#2771)
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2770. It
replaces characters in SQLite table names that are not alphanumeric or
underscores with underscores and quotes the table names with square
brackets in SQL statements.

Closes #[2770]

## Test Plan
I added a ".123" suffix to the bank_id on the following line
```
    index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123")
```
in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the
fix in place, demonstrates the issue.
2025-07-16 08:25:44 -07:00
Nathan Weinberg
72e606355d
fix: add shutdown function for localfs provider (#2781)
# What does this PR do?
this was causing an unnessessary logger warning

## Test Plan
Run `LLAMA_STACK_DIR=. ENABLE_OLLAMA=ollama
OLLAMA_INFERENCE_MODEL=llama3.2:3b llama stack build --template starter
--image-type venv --run` and then `Crtl-C` to shutdown

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 08:24:57 -07:00
Nathan Weinberg
3165197b75
chore: remove 'gha_workflow_llama_stack_tests.yml' (#2767)
This was introduced in
https://github.com/meta-llama/llama-stack/pull/523 but as far as I can
tell has never been used. It's been over six months so it feels fair to
remove it at this point.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 07:12:26 -07:00
Matthew Farrellee
a3e249807b
chore: remove vision model URL workarounds and simplify client creation (#2775)
The vision models are now available at the standard URL, so the
workaround code has been removed. This also simplifies the codebase by
eliminating the need for per-model client caching.

- Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct
models
- Convert _get_client method to _client property for cleaner API
- Remove unnecessary lru_cache decorator and functools import
- Simplify client creation logic to use single base URL for all models
2025-07-16 07:10:04 -07:00
IAN MILLER
fa1bb9ae00
docs: fix typo and link self loop for index.html#running-tests (#2777)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes typo "here here" and self loop link at
[https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md](https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2762

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-16 07:09:44 -07:00
Sébastien Han
ff9d4d8a9d
ci: do not pull model (#2776)
the model is now available in the container image

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-16 04:58:05 -07:00
Sébastien Han
f85189022c
fix: re-hydrate requirement and fix package (#2774)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Tests / discover-tests (push) Successful in 6s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 35s
Pre-commit / pre-commit (push) Successful in 1m20s
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-16 05:46:15 -04:00
Ashwin Bharambe
95fdc8ea94 build: Bump version to 0.2.15 2025-07-15 20:29:08 -07:00
Kelly Brown
b096794959
docs: Reorganize documentation on the webpage (#2651)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
Reorganizes the Llama stack webpage into more concise index pages,
introduce more of a workflow, and reduce repetition of content.

New nav structure so far based on #2637 

Further discussions in
https://github.com/meta-llama/llama-stack/discussions/2585

**Preview:**
![Screenshot 2025-07-09 at 2 31
53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af)

You can also build a full local preview locally 

 **Feedback**
Looking for feedback on page titles and general feedback on the new
structure

**Follow up documentation**
I plan on reducing some sections and standardizing some terminology in a
follow up PR.
More discussions on that in
https://github.com/meta-llama/llama-stack/discussions/2585
2025-07-15 14:19:35 -07:00
Francisco Arceo
e1755d1ed2
chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755)
# What does this PR do?
Adding OpenAI Vector Stores Files API compatibility for PGVector

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Updated CI to include PGVector

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 15:46:49 -04:00
ehhuang
e64e4fc5a2
test: add tests against published client (#2752)
# What does this PR do?
closes #2751

## Test Plan

---------

Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>
2025-07-15 12:25:31 -07:00
Mark Campbell
65fcd03461
docs: update outdated llama stack client documentation (#2758)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds new documentation that was missing for the Llama Stack Python
Client as well as updates old/outdated docs
2025-07-15 11:49:59 -07:00
Nathan Weinberg
b3d86ca926
fix: stop image_name from being cast to an integer (#2759)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m1s
# What does this PR do?

https://github.com/meta-llama/llama-stack/pull/2490 introduced a new
function for type conversion of strings.

However, a side effect of this is that it will cast any string that can
be cast to an integer if possible, which for something like `image_name`
is not desired as we only accept strings for this field in the
`StackRunConfig`

This PR introduces logic to ensure that `image_name` remains a string 

Closes #2749

## Test Plan

You can run the original step to reproduce from the bug to verify this
manually
```bash
OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run
```

I have also added an additional unit test to prevent any future
regression here

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-15 09:44:21 -07:00
Francisco Arceo
31b088978a
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do?

Resolves https://github.com/meta-llama/llama-stack/issues/2735

Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).

This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.

Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.

NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.

## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv

LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```

Unit tests and test script below 👇 

<details> 
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>

```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os

# Initialize colorama for color support in terminal
init(autoreset=True)

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2


def colored_print(color, text):
    """Prints text to the console with the specified color."""
    print(f"{color}{text}{Style.RESET_ALL}")


def log_and_print(color, message, level=logging.INFO):
    """Logs a message and prints it to the console with the specified color."""
    logging.log(level, message)
    colored_print(color, message)


def run_tests(client, prefix="openai"):
    """
    Runs all tests using the provided OpenAI client and saves the output
    to JSON files with the given prefix.
    """
    # Create the directory if it doesn't exist
    os.makedirs('openai_testing', exist_ok=True)

    # Default values in case tests fail
    global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
    DEMO_VECTOR_STORE_ID = None
    DEMO_VECTOR_STORE_ID2 = None

    def test_idempotent_vector_store_creation():
        """
        Test that creating a vector store with the same name is idempotent.
        """
        log_and_print(Fore.BLUE, "Starting vector store creation test...")
        try:
            vector_store = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Attempt to create the same vector store again
            vector_store2 = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Check instead of assert
            if vector_store2.id != vector_store.id:
                log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)

            global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
            DEMO_VECTOR_STORE_ID = vector_store.id
            DEMO_VECTOR_STORE_ID2 = vector_store2.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
        except Exception as e:
            log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            # Create a fallback vector store ID if needed
            if 'vector_store' in locals() and vector_store:
                DEMO_VECTOR_STORE_ID = vector_store.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2

    def test_vector_store_list():
        """
        Test listing vector stores.
        """
        log_and_print(Fore.BLUE, "Starting vector store list test...")
        try:
            vector_stores = client.vector_stores.list()

            # Check instead of assert
            if not isinstance(vector_stores, pagination.SyncCursorPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Vector store list test passed!")

            vector_stores_data = vector_stores.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
                json.dump(vector_stores_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_retrieve_vector_store():
        """
        Test retrieving a specific vector store.
        """
        log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            vector_store = client.vector_stores.retrieve(
                vector_store_id=DEMO_VECTOR_STORE_ID,
            )

            # Check instead of assert
            if vector_store.id != DEMO_VECTOR_STORE_ID:
                log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Retrieve vector store test passed!")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_modify_vector_store():
        """
        Test modifying a vector store.
        """
        log_and_print(Fore.BLUE, "Starting modify vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            updated_vector_store = client.vector_stores.update(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                name="Updated Support FAQ FJA",
            )

            # Check instead of assert
            if updated_vector_store.name != "Updated Support FAQ FJA":
                log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Modify vector store test passed!")

            updated_vector_store_data = updated_vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
                json.dump(updated_vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_delete_vector_store():
        """
        Test deleting a vector store.
        """
        log_and_print(Fore.BLUE, "Starting delete vector store test...")
        if not DEMO_VECTOR_STORE_ID2:
            log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
                          level=logging.WARNING)
            return

        try:
            response = client.vector_stores.delete(
                vector_store_id=DEMO_VECTOR_STORE_ID2,
            )

            log_and_print(Fore.GREEN, "Delete vector store test passed!")

            response_data = response.to_dict()
            log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
                json.dump(response_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_create_vector_store_file():
        log_and_print(Fore.BLUE, "Starting create vector store file test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            # create jsonl of files as an example
            with open("mydata.jsonl", "w") as f:
                f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')

            # Create a simple text file if my_data_small.txt doesn't exist
            if not os.path.exists("my_data_small.txt"):
                with open("my_data_small.txt", "w") as f:
                    f.write("This is a test file for vector store testing.\n")

            created_file = client.files.create(
                file=open("my_data_small.txt", "rb"),
                purpose="assistants",
            )

            created_file_data = created_file.to_dict()
            log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
                json.dump(created_file_data, f, indent=2)

            retrieved_files = client.files.retrieve(created_file.id)
            retrieved_files_data = retrieved_files.to_dict()
            log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
                json.dump(retrieved_files_data, f, indent=2)

            vector_store_file = client.vector_stores.files.create(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                file_id=created_file.id,
            )
            log_and_print(Fore.GREEN, "Create vector store file test passed!")
        except Exception as e:
            log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_search_vector_store():
        """
        Test searching a vector store.
        """
        log_and_print(Fore.BLUE, "Starting search vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            query = "What is the banana policy?"
            search_results = client.vector_stores.search(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                query=query,
                max_num_results=10,
                ranking_options={
                    'ranker': 'default-2024-11-15',
                    'score_threshold': 0.0,
                },
                rewrite_query=False,
            )

            # Check instead of assert
            if not isinstance(search_results, pagination.SyncPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Search vector store test passed!")

            search_results_dict = search_results.to_dict()
            log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
            with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
                json.dump(search_results_dict, f, indent=2)

            log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
        except Exception as e:
            log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    # Run all tests in sequence, even if some fail
    test_results = []

    try:
        result = test_idempotent_vector_store_creation()
        if result and len(result) == 2:
            DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
        test_results.append(True)
    except Exception as e:
        log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
        test_results.append(False)

    for test_func in [
        test_vector_store_list,
        test_retrieve_vector_store,
        test_modify_vector_store,
        test_delete_vector_store,
        test_create_vector_store_file,
        test_search_vector_store
    ]:
        try:
            test_func()
            test_results.append(True)
        except Exception as e:
            log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            test_results.append(False)

    if all(test_results):
        log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
    else:
        failed_count = test_results.count(False)
        log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
    parser.add_argument(
        "--provider",
        type=str,
        default="llama",
        choices=["openai", "llama", "both"],
        help="Specify which environment to test: openai, llama, or both. Default is both.",
    )
    args = parser.parse_args()

    try:
        if args.provider in ("openai", "both"):
            openai_client = OpenAI()
            run_tests(openai_client, prefix="openai")

        if args.provider in ("llama", "both"):
            llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
            run_tests(llama_client, prefix="llama")

        log_and_print(Fore.GREEN, "All tests completed!")

    except Exception as e:
        log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 11:24:41 -04:00
ehhuang
5400a2e2b1
chore: remove tests.yaml (#2754)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / discover-tests (push) Successful in 23s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 26s
Python Package Build Test / build (3.12) (push) Failing after 22s
Integration Tests / test-matrix (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s
Unit Tests / unit-tests (3.13) (push) Failing after 57s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m2s
Pre-commit / pre-commit (push) Successful in 1m51s
# What does this PR do?
Don't think this is used anymore

## Test Plan
2025-07-14 22:02:37 -07:00
Varsha
4ae5656c2f
feat: Implement keyword search in milvus (#2231)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m9s
# What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.

In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```

You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List

from llama_stack_client import (
    Agent, 
    AgentEventLogger, 
    LlamaStackClient, 
    RAGDocument
)


class MilvusRAGDemo:
    def __init__(self, base_url: str = "http://localhost:8321/"):
        self.client = LlamaStackClient(base_url=base_url)
        self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
        self.model_id = None
        self.embedding_model_id = None
        self.embedding_dimension = None
        
    def setup_models(self):
        """Get available models and select appropriate ones for LLM and embeddings."""
        models = self.client.models.list()
    
        # Select embedding model
        embedding_models = [m for m in models if m.model_type == "embedding"]
        if not embedding_models:
            raise ValueError("No embedding models found")
        self.embedding_model_id = embedding_models[0].identifier
        self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
        
    def register_vector_db(self):
        print(f"Registering Milvus vector database: {self.vector_db_id}")
        
        response = self.client.vector_dbs.register(
            vector_db_id=self.vector_db_id,
            embedding_model=self.embedding_model_id,
            embedding_dimension=self.embedding_dimension,
            provider_id="milvus-remote",  # Use remote Milvus
        )
        print(f"Vector database registered successfully")
        return response
        
    def insert_documents(self):
        """Insert sample documents into the vector database."""
        print("\nInserting sample documents...")
        
        # Sample documents about different topics
        documents = [
            RAGDocument(
                document_id="ai_ml_basics",
                content="""
                Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
                AI refers to the simulation of human intelligence in machines, while ML is a subset
                of AI that enables computers to learn and improve from experience without being
                explicitly programmed. Deep learning, a subset of ML, uses neural networks with
                multiple layers to process complex patterns in data.
                
                Key concepts in AI/ML include:
                - Supervised Learning: Training with labeled data
                - Unsupervised Learning: Finding patterns in unlabeled data
                - Reinforcement Learning: Learning through trial and error
                - Neural Networks: Computing systems inspired by biological brains
                """,
                mime_type="text/plain",
                metadata={"topic": "technology", "category": "ai_ml"},
            ),
        ]
        
        # Insert documents with chunking
        self.client.tool_runtime.rag_tool.insert(
            documents=documents,
            vector_db_id=self.vector_db_id,
            chunk_size_in_tokens=200,  # Smaller chunks for better granularity
        )
        print(f"Inserted {len(documents)} documents with chunking")
                
    def test_keyword_search(self):
        """Test keyword-based search using BM25."""
        
        queries = [
            "neural networks",
            "Python frameworks",
            "data cleaning",
        ]
        
        for query in queries:
            response = self.client.vector_io.query(
                vector_db_id=self.vector_db_id,
                query=query,
                params={
                    "mode": "keyword",  # Keyword search
                    "max_chunks": 3,
                    "score_threshold": 0.0,
                }
            )
            
            for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
                print(f"  {i+1}. Score: {score:.4f}")
                print(f"     Content: {chunk.content[:100]}...")
                print(f"     Metadata: {chunk.metadata}")    

                
    def run_demo(self):       
        try:
            self.setup_models()
            self.register_vector_db()
            self.insert_documents()
            self.test_keyword_search()
        except Exception as e:
            print(f"Error during demo: {e}")
            raise


def main():
    """Main function to run the demo."""
    # Check if Llama Stack server is running
    demo = MilvusRAGDemo()    
    try:
        demo.run_demo()
    except Exception as e:
        print(f"Demo failed: {e}")

if __name__ == "__main__":
    main()
```

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-07-14 19:39:55 -04:00
Francisco Arceo
33f0d83ad3
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748) 2025-07-14 18:10:35 -04:00