# What does this PR do?
closes https://github.com/llamastack/llama-stack/issues/3236
mypy considered our default implementations (raise NotImplementedError)
to be trivial. the result was we implemented the same stubs in
providers.
this change puts enough into the default impls so mypy considers them
non-trivial. this allows us to remove the duplicate implementations.
# What does this PR do?
Context: https://github.com/meta-llama/llama-stack/issues/2937
The API design is inspired by existing offerings, but not exactly the
same:
* `top_n` as the parameter to control number of results, instead of
`top_k`, since `n` is conventional to control number
* `truncation` bool instead of `max_token_per_doc`, since we should just
handle the truncation automatically depending on model capability,
instead of user setting the context length manually.
* `data` field in the response, to be consistent with other OpenAI APIs
(though they don't have a rerank API). Also, it is one less name to
learn in the API.
## Test Plan
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR renames categories of llama_stack loggers.
This PR aligns logging categories as per the package name, as well as
reviews from initial
https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up
to #3061.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Replaces https://github.com/meta-llama/llama-stack/pull/2868
Part of https://github.com/meta-llama/llama-stack/issues/2865
cc @leseb @rhuss
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
# What does this PR do?
Currently the embedding integration test cases fail due to a
misalignment in the error type. This PR fixes the embedding integration
test by fixing the error type.
## Test Plan
```
pytest -s -v tests/integration/inference/test_embedding.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
# What does this PR do?
- Documentation update and fix for the NVIDIA Inference provider.
- Update the `run_moderation` for safety API with a
`NotImplementedError` placeholder. Otherwise initialization NVIDIA
inference client will raise an error.
## Test Plan
N/A
# What does this PR do?
NVIDIA asymmetric embedding models (e.g.,
`nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter
not present in the standard OpenAI embeddings API. This PR adds the
`input_type="query"` as default and updates the documentation to suggest
using the `embedding` API for passage embeddings.
<!-- If resolving an issue, uncomment and update the line below -->
Resolves#2892
## Test Plan
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
# What does this PR do?
This PR adds a step in pre-commit to enforce using `llama_stack` logger.
Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Some fixes to MCP tests. And a bunch of fixes for Vector providers.
I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`
## Test Plan
Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
--text-model openai/gpt-4o \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
-k "client_with_models"
```
Do the same with `-k openai_client`
The rest should be taken care of by CI.
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration
<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider
Closes#3046
## Test Plan
I was not able to locate any existing test for this provider, so I
performed manual testing. But the change is really trivial and
straightforward.
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
To test:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
```
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.
- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2581
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
First of, test API for shields
1. Install and start Ollama:
`ollama serve`
2. Pull Llama Guard Model in Ollama:
`ollama pull llama-guard3:8b`
3. Configure env variables:
```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```
4. Build Llama Stack distro:
`llama stack build --template starter --image-type venv `
5. Start Llama Stack server:
`llama stack run starter --port 8321`
6. Check if Ollama model is available:
`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`
7. Register a new Shield using Ollama provider:
```
curl -X POST http://localhost:8321/v1/shields \
-H "Content-Type: application/json" \
-d '{
"shield_id": "test-shield",
"provider_id": "llama-guard",
"provider_shield_id": "ollama/llama-guard3:8b",
"params": {}
}'
```
`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`
8. Check if shield was registered:
`curl -X GET http://localhost:8321/v1/shields/test-shield`
`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`
9. Run shield:
```
curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
"shield_id": "test-shield",
"messages": [
{
"role": "user",
"content": "How can I hack into someone computer?"
}
],
"params": {}
}'
```
`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `
10. Unregister shield:
`curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
11. Verify shield was deleted:
`curl -X GET http://localhost:8321/v1/shields/test-shield`
`{"detail":"Invalid value: Shield 'test-shield' not found"}%`
All tests passed ✅
```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html
```
As the title says. Distributions is in, Templates is out.
`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.
Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`
Closes: https://github.com/meta-llama/llama-stack/issues/3018
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2539
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
# What does this PR do?
closes#2995
update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin
## Test Plan
```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#2463
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.
This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.
## Test Plan
CI enabled
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
- Add base_url field to OpenAIConfig with default
"https://api.openai.com/v1"
- Update sample_run_config to support OPENAI_BASE_URL environment
variable
- Modify get_base_url() to return configured base_url instead of
hardcoded value
- Add comprehensive test suite covering:
- Default base URL behavior
- Custom base URL from config
- Environment variable override
- Config precedence over environment variables
- Client initialization with configured URL
- Model availability checks using configured URL
This enables users to configure custom OpenAI-compatible API endpoints
via environment variables or configuration files.
Closes#2910
## Test Plan
run unit tests
Add support for deleting individual chunks from vector stores
- Add abstract remove_chunk() method to EmbeddingIndex base class
- Implement chunk deletion for Faiss provider, SQLite Vec, Milvus,
PGVector
- Placeholder implementations with NotImplementedError for
Chroma/Qdrant/Weaviate
- Integrate chunk deletion into OpenAI vector store file deletion flow
- removed xfail from
test_openai_vector_store_delete_file_removes_from_vector_store
Closes: #2477
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
Enable Chroma inline unit tests and fix integration tests.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.
In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
# What does this PR do?
openai/models.py has backward compat entries for litellm model names.
the starter template includes these in the list of registered models.
the inclusion results in duplicate model registrations.
the backward compat is no longer necessary.
## Test Plan
ci
# What does this PR do?
This PR implements the openai compatible endpoints for chromadb
Closes#2462
## Test Plan
Ran ollama llama stack server and ran the command
`pytest -sv --stack-config=http://localhost:8321
tests/integration/vector_io/test_openai_vector_stores.py
--embedding-model all-MiniLM-L6-v2`
8 failed, 27 passed, 8 skipped, 1 xfailed
The failed ones are regarding files api
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
Co-authored-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
Moving vector store and vector store files helper methods to
`openai_vector_store_mixin.py`
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
The tests are already supported in the CI and tests the inline providers
and current integration tests.
Note that the `vector_index` fixture will be test `milvus_vec_adapter`,
`faiss_vec_adapter`, and `sqlite_vec_adapter` in
`tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`.
Additionally, the integration tests in `integration-vector-io-tests.yml`
runs `tests/integration/vector_io` tests for the following providers:
```python
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"]
```
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.
use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter
## Test Plan
existing unit and integration tests
Just like #2805 but for vLLM.
We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.
## Test Plan
Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:
```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
uv run llama stack run --image-type venv /tmp/starter.yaml
```
Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.
_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._
## Test Plan
Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:
```
LLAMA_STACK_LOGGING=all=debug \
ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```
See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
# What does this PR do?
let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py
## Test Plan
1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
The vision models are now available at the standard URL, so the
workaround code has been removed. This also simplifies the codebase by
eliminating the need for per-model client caching.
- Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct
models
- Convert _get_client method to _client property for cleaner API
- Remove unnecessary lru_cache decorator and functools import
- Simplify client creation logic to use single base URL for all models
# What does this PR do?
Adding OpenAI Vector Stores Files API compatibility for PGVector
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Updated CI to include PGVector
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2735
Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).
This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.
Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.
NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.
## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```
Unit tests and test script below 👇
<details>
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>
```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os
# Initialize colorama for color support in terminal
init(autoreset=True)
# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2
def colored_print(color, text):
"""Prints text to the console with the specified color."""
print(f"{color}{text}{Style.RESET_ALL}")
def log_and_print(color, message, level=logging.INFO):
"""Logs a message and prints it to the console with the specified color."""
logging.log(level, message)
colored_print(color, message)
def run_tests(client, prefix="openai"):
"""
Runs all tests using the provided OpenAI client and saves the output
to JSON files with the given prefix.
"""
# Create the directory if it doesn't exist
os.makedirs('openai_testing', exist_ok=True)
# Default values in case tests fail
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = None
DEMO_VECTOR_STORE_ID2 = None
def test_idempotent_vector_store_creation():
"""
Test that creating a vector store with the same name is idempotent.
"""
log_and_print(Fore.BLUE, "Starting vector store creation test...")
try:
vector_store = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Attempt to create the same vector store again
vector_store2 = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Check instead of assert
if vector_store2.id != vector_store.id:
log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = vector_store.id
DEMO_VECTOR_STORE_ID2 = vector_store2.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
except Exception as e:
log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Create a fallback vector store ID if needed
if 'vector_store' in locals() and vector_store:
DEMO_VECTOR_STORE_ID = vector_store.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
def test_vector_store_list():
"""
Test listing vector stores.
"""
log_and_print(Fore.BLUE, "Starting vector store list test...")
try:
vector_stores = client.vector_stores.list()
# Check instead of assert
if not isinstance(vector_stores, pagination.SyncCursorPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Vector store list test passed!")
vector_stores_data = vector_stores.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
json.dump(vector_stores_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_retrieve_vector_store():
"""
Test retrieving a specific vector store.
"""
log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
vector_store = client.vector_stores.retrieve(
vector_store_id=DEMO_VECTOR_STORE_ID,
)
# Check instead of assert
if vector_store.id != DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Retrieve vector store test passed!")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_modify_vector_store():
"""
Test modifying a vector store.
"""
log_and_print(Fore.BLUE, "Starting modify vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
updated_vector_store = client.vector_stores.update(
vector_store_id=DEMO_VECTOR_STORE_ID,
name="Updated Support FAQ FJA",
)
# Check instead of assert
if updated_vector_store.name != "Updated Support FAQ FJA":
log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Modify vector store test passed!")
updated_vector_store_data = updated_vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
json.dump(updated_vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_delete_vector_store():
"""
Test deleting a vector store.
"""
log_and_print(Fore.BLUE, "Starting delete vector store test...")
if not DEMO_VECTOR_STORE_ID2:
log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
level=logging.WARNING)
return
try:
response = client.vector_stores.delete(
vector_store_id=DEMO_VECTOR_STORE_ID2,
)
log_and_print(Fore.GREEN, "Delete vector store test passed!")
response_data = response.to_dict()
log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
json.dump(response_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_create_vector_store_file():
log_and_print(Fore.BLUE, "Starting create vector store file test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
level=logging.WARNING)
return
try:
# create jsonl of files as an example
with open("mydata.jsonl", "w") as f:
f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')
# Create a simple text file if my_data_small.txt doesn't exist
if not os.path.exists("my_data_small.txt"):
with open("my_data_small.txt", "w") as f:
f.write("This is a test file for vector store testing.\n")
created_file = client.files.create(
file=open("my_data_small.txt", "rb"),
purpose="assistants",
)
created_file_data = created_file.to_dict()
log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
json.dump(created_file_data, f, indent=2)
retrieved_files = client.files.retrieve(created_file.id)
retrieved_files_data = retrieved_files.to_dict()
log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
json.dump(retrieved_files_data, f, indent=2)
vector_store_file = client.vector_stores.files.create(
vector_store_id=DEMO_VECTOR_STORE_ID,
file_id=created_file.id,
)
log_and_print(Fore.GREEN, "Create vector store file test passed!")
except Exception as e:
log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_search_vector_store():
"""
Test searching a vector store.
"""
log_and_print(Fore.BLUE, "Starting search vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
query = "What is the banana policy?"
search_results = client.vector_stores.search(
vector_store_id=DEMO_VECTOR_STORE_ID,
query=query,
max_num_results=10,
ranking_options={
'ranker': 'default-2024-11-15',
'score_threshold': 0.0,
},
rewrite_query=False,
)
# Check instead of assert
if not isinstance(search_results, pagination.SyncPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Search vector store test passed!")
search_results_dict = search_results.to_dict()
log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
json.dump(search_results_dict, f, indent=2)
log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
except Exception as e:
log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Run all tests in sequence, even if some fail
test_results = []
try:
result = test_idempotent_vector_store_creation()
if result and len(result) == 2:
DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
for test_func in [
test_vector_store_list,
test_retrieve_vector_store,
test_modify_vector_store,
test_delete_vector_store,
test_create_vector_store_file,
test_search_vector_store
]:
try:
test_func()
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
if all(test_results):
log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
else:
failed_count = test_results.count(False)
log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
parser.add_argument(
"--provider",
type=str,
default="llama",
choices=["openai", "llama", "both"],
help="Specify which environment to test: openai, llama, or both. Default is both.",
)
args = parser.parse_args()
try:
if args.provider in ("openai", "both"):
openai_client = OpenAI()
run_tests(openai_client, prefix="openai")
if args.provider in ("llama", "both"):
llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
run_tests(llama_client, prefix="llama")
log_and_print(Fore.GREEN, "All tests completed!")
except Exception as e:
log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
```
</details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>