For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.
_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._
## Test Plan
Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:
```
LLAMA_STACK_LOGGING=all=debug \
ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```
See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
# What does this PR do?
This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct
Preference Optimization (DPO) parameters.
The current schema incorrectly uses PPO-inspired parameters
(`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of
the DPO algorithm. This PR updates it to use the standard DPO
parameters:
- `beta`: The KL divergence coefficient that controls deviation from the
reference model
- `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo,
kto_pair)
These parameters align with standard DPO implementations like
HuggingFace's TRL library.
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
When we call `construct_stack()`, providers are instantiated and
`initialize()` is called. This call can end up doing _anything_ at all
-- specifically, providers are free to create long running background
tasks as part of this. If we wrapped this within a `asyncio.run()` as in
the current code, these tasks get canceled when the stack construction
finishes. This is not correct. The PR addresses the issue by creating a
persistent event loop which is used for both the stack as well as for
running the uvicorn server. In other words, the lifetime of the
providers (and downstream async code) is now the same as the lifetime of
the uvicorn server.
## Test Plan
This should not affect any current code since we don't have background
tasks created right now. However,
https://github.com/meta-llama/llama-stack/pull/2805 will start using
this functionality.
# What does this PR do?
'build' command didn't take into account ENABLE flags for starter distro
for some reason, I was having issues with HuggingFace access for the
embedding model, so added a tip for that as well
Closes#2779
## Test Plan
I ran the described steps manually, but it would be nice if someone else
could try it and verify this still works
We might consider having some CI job ensure the QSG remains functional -
it's not a great experience for new users if they try Llama Stack for
the first time and it doesn't work as we describe
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Added coverage badge to README. - [See my
fork](https://github.com/ChristianZaccaria/llama-stack)
- Added a GitHub Actions workflow that runs the tests and updates the
coverage badge. - [See
run](4574811323)
- Documented steps in `testing.md` for running the tests locally, and
viewing the `html` report.
- Excluded non-essential files from coverage reporting to provide a more
accurate measurement.
Automatically created PR to update coverage badge:
https://github.com/ChristianZaccaria/llama-stack/pull/9
# Note for reviewers
1. Currently the coverage report shows a 45% coverage. Wondering if
there are other files or directories that should also be excluded from
the report to increase the percentage. The directories with the least
test coverage are `llama_stack/cli`, `llama_stack/models`, and
`llama_stack/ui`. - Should we exclude these?
2. **[Required]** The `GITHUB_TOKEN` should have write permissions to
open a PR to update the coverage badge.
# GitHub Issue
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2355
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
The `testing.md` file describes how to run the unit tests locally.
# What does this PR do?
trigger integration tests on ALL changes to `tests/` to catch failures
before they merge into main
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`
Part of https://github.com/meta-llama/llama-stack/issues/2647
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`
Part of https://github.com/meta-llama/llama-stack/issues/2647
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
# What does this PR do?
some async test markers are in the codebase causing pre-commit to fail
due to #2744
remove these pytest fixtures
## Test Plan
pre-commit passes
Signed-off-by: Charlie Doern <cdoern@redhat.com>
If I am running `uv run llama stack run --image-type venv` it should not
be saying to me "Conda detected" because I am pretty clearly telling it
I need venv. The root cause is the offending line.
# What does this PR do?
## Test Plan
ENABLE_OLLAMA=ollama LLAMA_STACK_CONFIG=starter uv run pytest
tests/integration/telemetry
--text-model="ollama/llama3.2:3b-instruct-fp16"
# What does this PR do?
let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py
## Test Plan
1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
- POST /v1/models accepts optional provider_model_id
- ModelsRoutingTable.register_model handler ensures it is non-None,
providing a default
usage of Model.provider_model_id will no longer need to detect None
Move sentence-transformers to be the first embedding in the list of
models. This ensures it will always be the default and is more
consistent then having the default change based on what env variables
are available
Closes: #2702
## Test Plan
Manually verified
Signed-off-by: Derek Higgins <derekh@redhat.com>
# What does this PR do?
currently each disabled provider is printed as a warning, switch to
debug. This level of verbosity isn't necessary, especially if we intend
to grow the list of providers over time that can be in a single run yaml
## Test Plan
before:
<img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37
18 PM"
src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71"
/>
after:
<img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM"
src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2770. It
replaces characters in SQLite table names that are not alphanumeric or
underscores with underscores and quotes the table names with square
brackets in SQL statements.
Closes #[2770]
## Test Plan
I added a ".123" suffix to the bank_id on the following line
```
index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123")
```
in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the
fix in place, demonstrates the issue.
# What does this PR do?
this was causing an unnessessary logger warning
## Test Plan
Run `LLAMA_STACK_DIR=. ENABLE_OLLAMA=ollama
OLLAMA_INFERENCE_MODEL=llama3.2:3b llama stack build --template starter
--image-type venv --run` and then `Crtl-C` to shutdown
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
This was introduced in
https://github.com/meta-llama/llama-stack/pull/523 but as far as I can
tell has never been used. It's been over six months so it feels fair to
remove it at this point.
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
The vision models are now available at the standard URL, so the
workaround code has been removed. This also simplifies the codebase by
eliminating the need for per-model client caching.
- Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct
models
- Convert _get_client method to _client property for cleaner API
- Remove unnecessary lru_cache decorator and functools import
- Simplify client creation logic to use single base URL for all models
# What does this PR do?
Adding OpenAI Vector Stores Files API compatibility for PGVector
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Updated CI to include PGVector
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds new documentation that was missing for the Llama Stack Python
Client as well as updates old/outdated docs
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 introduced a new
function for type conversion of strings.
However, a side effect of this is that it will cast any string that can
be cast to an integer if possible, which for something like `image_name`
is not desired as we only accept strings for this field in the
`StackRunConfig`
This PR introduces logic to ensure that `image_name` remains a string
Closes#2749
## Test Plan
You can run the original step to reproduce from the bug to verify this
manually
```bash
OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run
```
I have also added an additional unit test to prevent any future
regression here
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2735
Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).
This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.
Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.
NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.
## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```
Unit tests and test script below 👇
<details>
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>
```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os
# Initialize colorama for color support in terminal
init(autoreset=True)
# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2
def colored_print(color, text):
"""Prints text to the console with the specified color."""
print(f"{color}{text}{Style.RESET_ALL}")
def log_and_print(color, message, level=logging.INFO):
"""Logs a message and prints it to the console with the specified color."""
logging.log(level, message)
colored_print(color, message)
def run_tests(client, prefix="openai"):
"""
Runs all tests using the provided OpenAI client and saves the output
to JSON files with the given prefix.
"""
# Create the directory if it doesn't exist
os.makedirs('openai_testing', exist_ok=True)
# Default values in case tests fail
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = None
DEMO_VECTOR_STORE_ID2 = None
def test_idempotent_vector_store_creation():
"""
Test that creating a vector store with the same name is idempotent.
"""
log_and_print(Fore.BLUE, "Starting vector store creation test...")
try:
vector_store = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Attempt to create the same vector store again
vector_store2 = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Check instead of assert
if vector_store2.id != vector_store.id:
log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = vector_store.id
DEMO_VECTOR_STORE_ID2 = vector_store2.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
except Exception as e:
log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Create a fallback vector store ID if needed
if 'vector_store' in locals() and vector_store:
DEMO_VECTOR_STORE_ID = vector_store.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
def test_vector_store_list():
"""
Test listing vector stores.
"""
log_and_print(Fore.BLUE, "Starting vector store list test...")
try:
vector_stores = client.vector_stores.list()
# Check instead of assert
if not isinstance(vector_stores, pagination.SyncCursorPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Vector store list test passed!")
vector_stores_data = vector_stores.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
json.dump(vector_stores_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_retrieve_vector_store():
"""
Test retrieving a specific vector store.
"""
log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
vector_store = client.vector_stores.retrieve(
vector_store_id=DEMO_VECTOR_STORE_ID,
)
# Check instead of assert
if vector_store.id != DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Retrieve vector store test passed!")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_modify_vector_store():
"""
Test modifying a vector store.
"""
log_and_print(Fore.BLUE, "Starting modify vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
updated_vector_store = client.vector_stores.update(
vector_store_id=DEMO_VECTOR_STORE_ID,
name="Updated Support FAQ FJA",
)
# Check instead of assert
if updated_vector_store.name != "Updated Support FAQ FJA":
log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Modify vector store test passed!")
updated_vector_store_data = updated_vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
json.dump(updated_vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_delete_vector_store():
"""
Test deleting a vector store.
"""
log_and_print(Fore.BLUE, "Starting delete vector store test...")
if not DEMO_VECTOR_STORE_ID2:
log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
level=logging.WARNING)
return
try:
response = client.vector_stores.delete(
vector_store_id=DEMO_VECTOR_STORE_ID2,
)
log_and_print(Fore.GREEN, "Delete vector store test passed!")
response_data = response.to_dict()
log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
json.dump(response_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_create_vector_store_file():
log_and_print(Fore.BLUE, "Starting create vector store file test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
level=logging.WARNING)
return
try:
# create jsonl of files as an example
with open("mydata.jsonl", "w") as f:
f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')
# Create a simple text file if my_data_small.txt doesn't exist
if not os.path.exists("my_data_small.txt"):
with open("my_data_small.txt", "w") as f:
f.write("This is a test file for vector store testing.\n")
created_file = client.files.create(
file=open("my_data_small.txt", "rb"),
purpose="assistants",
)
created_file_data = created_file.to_dict()
log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
json.dump(created_file_data, f, indent=2)
retrieved_files = client.files.retrieve(created_file.id)
retrieved_files_data = retrieved_files.to_dict()
log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
json.dump(retrieved_files_data, f, indent=2)
vector_store_file = client.vector_stores.files.create(
vector_store_id=DEMO_VECTOR_STORE_ID,
file_id=created_file.id,
)
log_and_print(Fore.GREEN, "Create vector store file test passed!")
except Exception as e:
log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_search_vector_store():
"""
Test searching a vector store.
"""
log_and_print(Fore.BLUE, "Starting search vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
query = "What is the banana policy?"
search_results = client.vector_stores.search(
vector_store_id=DEMO_VECTOR_STORE_ID,
query=query,
max_num_results=10,
ranking_options={
'ranker': 'default-2024-11-15',
'score_threshold': 0.0,
},
rewrite_query=False,
)
# Check instead of assert
if not isinstance(search_results, pagination.SyncPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Search vector store test passed!")
search_results_dict = search_results.to_dict()
log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
json.dump(search_results_dict, f, indent=2)
log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
except Exception as e:
log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Run all tests in sequence, even if some fail
test_results = []
try:
result = test_idempotent_vector_store_creation()
if result and len(result) == 2:
DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
for test_func in [
test_vector_store_list,
test_retrieve_vector_store,
test_modify_vector_store,
test_delete_vector_store,
test_create_vector_store_file,
test_search_vector_store
]:
try:
test_func()
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
if all(test_results):
log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
else:
failed_count = test_results.count(False)
log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
parser.add_argument(
"--provider",
type=str,
default="llama",
choices=["openai", "llama", "both"],
help="Specify which environment to test: openai, llama, or both. Default is both.",
)
args = parser.parse_args()
try:
if args.provider in ("openai", "both"):
openai_client = OpenAI()
run_tests(openai_client, prefix="openai")
if args.provider in ("llama", "both"):
llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
run_tests(llama_client, prefix="llama")
log_and_print(Fore.GREEN, "All tests completed!")
except Exception as e:
log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
```
</details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.
In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List
from llama_stack_client import (
Agent,
AgentEventLogger,
LlamaStackClient,
RAGDocument
)
class MilvusRAGDemo:
def __init__(self, base_url: str = "http://localhost:8321/"):
self.client = LlamaStackClient(base_url=base_url)
self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
self.model_id = None
self.embedding_model_id = None
self.embedding_dimension = None
def setup_models(self):
"""Get available models and select appropriate ones for LLM and embeddings."""
models = self.client.models.list()
# Select embedding model
embedding_models = [m for m in models if m.model_type == "embedding"]
if not embedding_models:
raise ValueError("No embedding models found")
self.embedding_model_id = embedding_models[0].identifier
self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
def register_vector_db(self):
print(f"Registering Milvus vector database: {self.vector_db_id}")
response = self.client.vector_dbs.register(
vector_db_id=self.vector_db_id,
embedding_model=self.embedding_model_id,
embedding_dimension=self.embedding_dimension,
provider_id="milvus-remote", # Use remote Milvus
)
print(f"Vector database registered successfully")
return response
def insert_documents(self):
"""Insert sample documents into the vector database."""
print("\nInserting sample documents...")
# Sample documents about different topics
documents = [
RAGDocument(
document_id="ai_ml_basics",
content="""
Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
AI refers to the simulation of human intelligence in machines, while ML is a subset
of AI that enables computers to learn and improve from experience without being
explicitly programmed. Deep learning, a subset of ML, uses neural networks with
multiple layers to process complex patterns in data.
Key concepts in AI/ML include:
- Supervised Learning: Training with labeled data
- Unsupervised Learning: Finding patterns in unlabeled data
- Reinforcement Learning: Learning through trial and error
- Neural Networks: Computing systems inspired by biological brains
""",
mime_type="text/plain",
metadata={"topic": "technology", "category": "ai_ml"},
),
]
# Insert documents with chunking
self.client.tool_runtime.rag_tool.insert(
documents=documents,
vector_db_id=self.vector_db_id,
chunk_size_in_tokens=200, # Smaller chunks for better granularity
)
print(f"Inserted {len(documents)} documents with chunking")
def test_keyword_search(self):
"""Test keyword-based search using BM25."""
queries = [
"neural networks",
"Python frameworks",
"data cleaning",
]
for query in queries:
response = self.client.vector_io.query(
vector_db_id=self.vector_db_id,
query=query,
params={
"mode": "keyword", # Keyword search
"max_chunks": 3,
"score_threshold": 0.0,
}
)
for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
print(f" {i+1}. Score: {score:.4f}")
print(f" Content: {chunk.content[:100]}...")
print(f" Metadata: {chunk.metadata}")
def run_demo(self):
try:
self.setup_models()
self.register_vector_db()
self.insert_documents()
self.test_keyword_search()
except Exception as e:
print(f"Error during demo: {e}")
raise
def main():
"""Main function to run the demo."""
# Check if Llama Stack server is running
demo = MilvusRAGDemo()
try:
demo.run_demo()
except Exception as e:
print(f"Demo failed: {e}")
if __name__ == "__main__":
main()
```
[//]: # (## Documentation)
---------
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
- fireworks, together do not support Llama-guard 3 8b model anymore
- Need to default to ollama
- current safety shields logic was not correct since the shield_id was
the provider ( which had duplicates )
- Followed similar logic to models
Note: Seems a bit over-engineered but this can now be extended to other
providers and fits in the overall mechanism of how env_vars are used to
manage starter.
### How to test
```
ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2
```
### Related but not obvious in this PR
In the llama-stack-ops repo, we run tests before publishing packages and
docker containers.
The actions in that repo were using the fireworks / together distros (
which are non-existent )
So need to update that to run with `starter` and use `ollama`
specifically for safety.
# What does this PR do?
Adds a template for technical debt. Currently we don't support blank
issues so everything filed has to a bug or a feature.
This would allow maintainers as well as community members to track
things we might want to merge to expose the functionality but should be
addressed later. Such things can also be "good first issues" for new
contributors.
## Example of what we constitute as technical debt
Inelegant code solutions, tests we intend to temporarily disable but
would like to restore, CI hacks around infrastructure or installation,
etc.
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
inference providers each have a static list of supported / known models.
some also have access to a dynamic list of currently available models.
this change gives prodivers using the ModelRegistryHelper the ability to
combine their static and dynamic lists.
for instance, OpenAIInferenceAdapter can implement
```
def query_available_models(self) -> list[str]:
return [entry.model for entry in self.openai_client.models.list()]
```
to augment its static list w/ a current list from openai.
## Test Plan
scripts/unit-test.sh