mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-24 08:47:26 +00:00
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>
381 lines
15 KiB
Text
381 lines
15 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Vector Database (VectorDB) and Vector I/O (VectorIO)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Getting Started with VectorDB and VectorIO APIs Tutorial 🚀\n",
|
|
"Welcome! This interactive tutorial will guide you through using the VectorDB and VectorIO APIs, powerful tools for document storage and retrieval. Whether you're new to vector databases or an experienced developer, this notebook will help you understand the basics and get up and running quickly.\n",
|
|
"What you'll learn:\n",
|
|
"\n",
|
|
"How to set up and configure the VectorDB and VectorIO client\n",
|
|
"Creating and managing vector databases\n",
|
|
"Different ways to insert documents into the system\n",
|
|
"How to perform intelligent queries on your documents\n",
|
|
"\n",
|
|
"Prerequisites:\n",
|
|
"\n",
|
|
"Basic Python knowledge\n",
|
|
"A running instance of the Llama Stack server (we'll use localhost in \n",
|
|
"this tutorial)\n",
|
|
"\n",
|
|
"Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llamastack.github.io/latest/getting_started/index.html).\n",
|
|
"\n",
|
|
"Let's start by installing the required packages:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Set up your connection parameters:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"HOST = \"localhost\" # Replace with your host\n",
|
|
"PORT = 8321 # Replace with your port\n",
|
|
"MODEL_NAME='meta-llama/Llama-3.2-3B-Instruct'\n",
|
|
"VECTOR_DB_ID=\"tutorial_db\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install the client library and a helper package for colored output\n",
|
|
"#!pip install llama-stack-client termcolor\n",
|
|
"\n",
|
|
"# 💡 Note: If you're running this in a new environment, you might need to restart\n",
|
|
"# your kernel after installation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"1. **Initial Setup**\n",
|
|
"\n",
|
|
"First, we'll import the necessary libraries and set up some helper functions. Let's break down what each import does:\n",
|
|
"\n",
|
|
"llama_stack_client: Our main interface to the VectorDB and VectorIO APIs\n",
|
|
"base64: Helps us encode files for transmission\n",
|
|
"mimetypes: Determines file types automatically\n",
|
|
"termcolor: Makes our output prettier with colors\n",
|
|
"\n",
|
|
"❓ Question: Why do we need to convert files to data URLs?\n",
|
|
"Answer: Data URLs allow us to embed file contents directly in our requests, making it easier to transmit files to the API without needing separate file uploads."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import base64\n",
|
|
"import json\n",
|
|
"import mimetypes\n",
|
|
"import os\n",
|
|
"import requests\n",
|
|
"from pathlib import Path\n",
|
|
"\n",
|
|
"from llama_stack_client import LlamaStackClient\n",
|
|
"from llama_stack_client.types import Document\n",
|
|
"from llama_stack_client.types.vector_io_insert_params import Chunk\n",
|
|
"from termcolor import cprint\n",
|
|
"\n",
|
|
"# Helper function to convert files to data URLs\n",
|
|
"def data_url_from_file(file_path: str) -> str:\n",
|
|
" \"\"\"Convert a file to a data URL for API transmission\n",
|
|
"\n",
|
|
" Args:\n",
|
|
" file_path (str): Path to the file to convert\n",
|
|
"\n",
|
|
" Returns:\n",
|
|
" str: Data URL containing the file's contents\n",
|
|
"\n",
|
|
" Example:\n",
|
|
" >>> url = data_url_from_file('example.txt')\n",
|
|
" >>> print(url[:30]) # Preview the start of the URL\n",
|
|
" 'data:text/plain;base64,SGVsbG8='\n",
|
|
" \"\"\"\n",
|
|
" if not os.path.exists(file_path):\n",
|
|
" raise FileNotFoundError(f\"File not found: {file_path}\")\n",
|
|
"\n",
|
|
" with open(file_path, \"rb\") as file:\n",
|
|
" file_content = file.read()\n",
|
|
"\n",
|
|
" base64_content = base64.b64encode(file_content).decode(\"utf-8\")\n",
|
|
" mime_type, _ = mimetypes.guess_type(file_path)\n",
|
|
"\n",
|
|
" data_url = f\"data:{mime_type};base64,{base64_content}\"\n",
|
|
" return data_url\n",
|
|
"\n",
|
|
"# Helper function to download content from URLs\n",
|
|
"def download_from_url(url: str) -> str:\n",
|
|
" \"\"\"Download content from a URL\n",
|
|
"\n",
|
|
" Args:\n",
|
|
" url (str): URL to download content from\n",
|
|
"\n",
|
|
" Returns:\n",
|
|
" str: Content of the URL\n",
|
|
" \"\"\"\n",
|
|
" response = requests.get(url)\n",
|
|
" if response.status_code == 200:\n",
|
|
" return response.text\n",
|
|
" else:\n",
|
|
" raise Exception(f\"Failed to download content from {url}: {response.status_code}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"2. **Initialize Client and Create Vector Database**\n",
|
|
"\n",
|
|
"Now we'll set up our connection to the VectorDB API and create our first vector database. A vector database is a specialized database that stores document embeddings for semantic search.\n",
|
|
"❓ Key Concepts:\n",
|
|
"\n",
|
|
"embedding_model: The model used to convert text into vector representations\n",
|
|
"chunk_size: How large each piece of text should be when splitting documents\n",
|
|
"overlap_size: How much overlap between chunks (helps maintain context)\n",
|
|
"\n",
|
|
"✨ Pro Tip: Choose your chunk size based on your use case. Smaller chunks (256-512 tokens) are better for precise retrieval, while larger chunks (1024+ tokens) maintain more context."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4ad70258",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Initialize client\n",
|
|
"client = LlamaStackClient(\n",
|
|
" base_url=f\"http://{HOST}:{PORT}\",\n",
|
|
")\n",
|
|
"\n",
|
|
"# Let's see what providers are available\n",
|
|
"# Providers determine where and how your data is stored\n",
|
|
"providers = client.providers.list()\n",
|
|
"vector_io_providers = [p for p in providers if p.api == \"vector_io\"]\n",
|
|
"provider_id = vector_io_providers[0].provider_id if vector_io_providers else None\n",
|
|
"print(\"Available providers:\")\n",
|
|
"print(providers)\n",
|
|
"\n",
|
|
"# Create a vector database with optimized settings for general use\n",
|
|
"client.vector_dbs.register(\n",
|
|
" vector_db_id=VECTOR_DB_ID,\n",
|
|
" embedding_model=\"nomic-embed-text-v1.5\",\n",
|
|
" embedding_dimension=768, # This is the dimension for nomic-embed-text-v1.5\n",
|
|
" provider_id=provider_id,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"3. **Insert Documents**\n",
|
|
" \n",
|
|
"The VectorIO API supports multiple ways to add documents. We'll demonstrate two common approaches:\n",
|
|
"\n",
|
|
"Loading documents from URLs\n",
|
|
"Loading documents from local files\n",
|
|
"\n",
|
|
"❓ Important Concepts:\n",
|
|
"\n",
|
|
"Each document needs a unique document_id\n",
|
|
"Metadata helps organize and filter documents later\n",
|
|
"The API automatically processes and chunks documents"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Example URLs to documentation\n",
|
|
"# 💡 Replace these with your own URLs or use the examples\n",
|
|
"urls = [\n",
|
|
" \"memory_optimizations.rst\",\n",
|
|
" \"chat.rst\",\n",
|
|
" \"llama3.rst\",\n",
|
|
"]\n",
|
|
"\n",
|
|
"# Create documents from URLs\n",
|
|
"# We add metadata to help organize our documents\n",
|
|
"url_documents = []\n",
|
|
"for i, url in enumerate(urls):\n",
|
|
" full_url = f\"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}\"\n",
|
|
" try:\n",
|
|
" # Download content from URL\n",
|
|
" content = download_from_url(full_url)\n",
|
|
" # Create document with the downloaded content\n",
|
|
" document = Document(\n",
|
|
" document_id=f\"url-doc-{i}\", # Unique ID for each document\n",
|
|
" content=content, # Use the actual content instead of the URL\n",
|
|
" mime_type=\"text/plain\",\n",
|
|
" metadata={\"source\": \"url\", \"filename\": url, \"original_url\": full_url}, # Store original URL in metadata\n",
|
|
" )\n",
|
|
" url_documents.append(document)\n",
|
|
" print(f\"Successfully downloaded content from {url}\")\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Failed to download content from {url}: {e}\")\n",
|
|
"\n",
|
|
"# Example with local files\n",
|
|
"# 💡 Replace these with your actual files\n",
|
|
"local_files = [\"example.txt\", \"readme.md\"]\n",
|
|
"file_documents = []\n",
|
|
"for i, path in enumerate(local_files):\n",
|
|
" if os.path.exists(path):\n",
|
|
" try:\n",
|
|
" # Read content from file directly instead of using data URL\n",
|
|
" with open(path, 'r') as file:\n",
|
|
" content = file.read()\n",
|
|
" document = Document(\n",
|
|
" document_id=f\"file-doc-{i}\",\n",
|
|
" content=content, # Use the actual content directly\n",
|
|
" mime_type=\"text/plain\",\n",
|
|
" metadata={\"source\": \"local\", \"filename\": path},\n",
|
|
" )\n",
|
|
" file_documents.append(document)\n",
|
|
" print(f\"Successfully read content from {path}\")\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Failed to read content from {path}: {e}\")\n",
|
|
"\n",
|
|
"# Combine all documents\n",
|
|
"all_documents = url_documents + file_documents\n",
|
|
"\n",
|
|
"# Create chunks from the documents\n",
|
|
"chunks = []\n",
|
|
"for doc in all_documents:\n",
|
|
" # Split document content into chunks of 512 characters\n",
|
|
" content = doc.content\n",
|
|
" chunk_size = 512\n",
|
|
"\n",
|
|
" # Create chunks of the specified size\n",
|
|
" for i in range(0, len(content), chunk_size):\n",
|
|
" chunk_content = content[i:i+chunk_size]\n",
|
|
" if chunk_content.strip(): # Only add non-empty chunks\n",
|
|
" chunks.append(Chunk(\n",
|
|
" content=chunk_content,\n",
|
|
" metadata={\n",
|
|
" \"document_id\": doc.document_id,\n",
|
|
" \"chunk_index\": i // chunk_size,\n",
|
|
" **doc.metadata\n",
|
|
" }\n",
|
|
" ))\n",
|
|
"\n",
|
|
"# Insert chunks into vector database\n",
|
|
"if chunks: # Only proceed if we have valid chunks\n",
|
|
" client.vector_io.insert(\n",
|
|
" vector_db_id=VECTOR_DB_ID,\n",
|
|
" chunks=chunks,\n",
|
|
" )\n",
|
|
" print(f\"Documents inserted successfully! ({len(chunks)} chunks)\")\n",
|
|
"else:\n",
|
|
" print(\"No valid documents to insert.\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"4. **Query the Vector Database**\n",
|
|
" \n",
|
|
"Now for the exciting part - querying our documents! The VectorIO API uses semantic search to find relevant content based on meaning, not just keywords.\n",
|
|
"❓ Understanding Scores:\n",
|
|
"\n",
|
|
"Generally, scores above 0.7 indicate strong relevance\n",
|
|
"Consider your use case when deciding on score thresholds"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def print_query_results(query: str):\n",
|
|
" \"\"\"Helper function to print query results in a readable format\n",
|
|
"\n",
|
|
" Args:\n",
|
|
" query (str): The search query to execute\n",
|
|
" \"\"\"\n",
|
|
" print(f\"\\nQuery: {query}\")\n",
|
|
" print(\"-\" * 50)\n",
|
|
" response = client.vector_io.query(\n",
|
|
" vector_db_id=VECTOR_DB_ID,\n",
|
|
" query=query,\n",
|
|
" )\n",
|
|
"\n",
|
|
" for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):\n",
|
|
" print(f\"\\nResult {i+1} (Score: {score:.3f})\")\n",
|
|
" print(\"=\" * 40)\n",
|
|
" print(chunk.content)\n",
|
|
" print(\"=\" * 40)\n",
|
|
"\n",
|
|
"# Let's try some example queries\n",
|
|
"queries = [\n",
|
|
" \"How do I use LoRA?\", # Technical question\n",
|
|
" \"Tell me about memory optimizations\", # General topic\n",
|
|
" \"What are the key features of Llama 3?\" # Product-specific\n",
|
|
"]\n",
|
|
"\n",
|
|
"\n",
|
|
"for query in queries:\n",
|
|
" print_query_results(query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Awesome, now we can embed all our notes with Llama-stack using VectorDB and VectorIO, and ask it about the meaning of life :)\n",
|
|
"\n",
|
|
"Next up, we will learn about the safety features and how to use them: [notebook link](./06_Safety101.ipynb)."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"fileHeader": "",
|
|
"fileUid": "73bc3357-0e5e-42ff-95b1-40b916d24c4f",
|
|
"isAdHoc": false,
|
|
"kernelspec": {
|
|
"display_name": "llama-stack",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.11"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|