## Memory 

Getting Started with Memory API Tutorial üöÄ
Welcome! This interactive tutorial will guide you through using the Memory API, a powerful tool for document storage and retrieval. Whether you're new to vector databases or an experienced developer, this notebook will help you understand the basics and get up and running quickly.
What you'll learn:

How to set up and configure the Memory API client
Creating and managing memory banks (vector stores)
Different ways to insert documents into the system
How to perform intelligent queries on your documents

Prerequisites:

Basic Python knowledge
A running instance of the Memory API server (we'll use localhost in 
this tutorial)

Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).

Let's start by installing the required packages:

Set up your connection parameters:

In [1]:
HOST = "localhost"  # Replace with your host
PORT = 5000        # Replace with your port

In [2]:
# Install the client library and a helper package for colored output
#!pip install llama-stack-client termcolor

# üí° Note: If you're running this in a new environment, you might need to restart
# your kernel after installation

1. **Initial Setup**

First, we'll import the necessary libraries and set up some helper functions. Let's break down what each import does:

llama_stack_client: Our main interface to the Memory API
base64: Helps us encode files for transmission
mimetypes: Determines file types automatically
termcolor: Makes our output prettier with colors

‚ùì Question: Why do we need to convert files to data URLs?
Answer: Data URLs allow us to embed file contents directly in our requests, making it easier to transmit files to the API without needing separate file uploads.

In [1]:
import base64
import json
import mimetypes
import os
from pathlib import Path

from llama_stack_client import LlamaStackClient
from llama_stack_client.types.memory_insert_params import Document
from termcolor import cprint

# Helper function to convert files to data URLs
def data_url_from_file(file_path: str) -> str:
    """Convert a file to a data URL for API transmission

    Args:
        file_path (str): Path to the file to convert

    Returns:
        str: Data URL containing the file's contents

    Example:
        >>> url = data_url_from_file('example.txt')
        >>> print(url[:30])  # Preview the start of the URL
        'data:text/plain;base64,SGVsbG8='
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")

    with open(file_path, "rb") as file:
        file_content = file.read()

    base64_content = base64.b64encode(file_content).decode("utf-8")
    mime_type, _ = mimetypes.guess_type(file_path)

    data_url = f"data:{mime_type};base64,{base64_content}"
    return data_url

2. **Initialize Client and Create Memory Bank**

Now we'll set up our connection to the Memory API and create our first memory bank. A memory bank is like a specialized database that stores document embeddings for semantic search.
‚ùì Key Concepts:

embedding_model: The model used to convert text into vector representations
chunk_size: How large each piece of text should be when splitting documents
overlap_size: How much overlap between chunks (helps maintain context)

‚ú® Pro Tip: Choose your chunk size based on your use case. Smaller chunks (256-512 tokens) are better for precise retrieval, while larger chunks (1024+ tokens) maintain more context.

In [16]:
# Configure connection parameters
HOST = "localhost"  # Replace with your host if using a remote server
PORT = 5000       # Replace with your port if different

# Initialize client
client = LlamaStackClient(
    base_url=f"http://{HOST}:{PORT}",
)

# Let's see what providers are available
# Providers determine where and how your data is stored
providers = client.providers.list()
print("Available providers:")
#print(json.dumps(providers, indent=2))
print(providers)
# Create a memory bank with optimized settings for general use
client.memory_banks.register(
    memory_bank={
        "identifier": "tutorial_bank",  # A unique name for your memory bank
        "embedding_model": "all-MiniLM-L6-v2",  # A lightweight but effective model
        "chunk_size_in_tokens": 512,  # Good balance between precision and context
        "overlap_size_in_tokens": 64,  # Helps maintain context between chunks
        "provider_id": providers["memory"][0].provider_id,  # Use the first available provider
    }
)


Available providers:
{'inference': [ProviderInfo(provider_id='meta-reference', provider_type='meta-reference'), ProviderInfo(provider_id='meta1', provider_type='meta-reference')], 'safety': [ProviderInfo(provider_id='meta-reference', provider_type='meta-reference')], 'agents': [ProviderInfo(provider_id='meta-reference', provider_type='meta-reference')], 'memory': [ProviderInfo(provider_id='meta-reference', provider_type='meta-reference')], 'telemetry': [ProviderInfo(provider_id='meta-reference', provider_type='meta-reference')]}


3. **Insert Documents**
   
The Memory API supports multiple ways to add documents. We'll demonstrate two common approaches:

Loading documents from URLs
Loading documents from local files

‚ùì Important Concepts:

Each document needs a unique document_id
Metadata helps organize and filter documents later
The API automatically processes and chunks documents

In [17]:
# Example URLs to documentation
# üí° Replace these with your own URLs or use the examples
urls = [
    "memory_optimizations.rst",
    "chat.rst",
    "llama3.rst",
]

# Create documents from URLs
# We add metadata to help organize our documents
url_documents = [
    Document(
        document_id=f"url-doc-{i}",  # Unique ID for each document
        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
        mime_type="text/plain",
        metadata={"source": "url", "filename": url},  # Metadata helps with organization
    )
    for i, url in enumerate(urls)
]

# Example with local files
# üí° Replace these with your actual files
local_files = ["example.txt", "readme.md"]
file_documents = [
    Document(
        document_id=f"file-doc-{i}",
        content=data_url_from_file(path),
        metadata={"source": "local", "filename": path},
    )
    for i, path in enumerate(local_files)
    if os.path.exists(path)
]

# Combine all documents
all_documents = url_documents + file_documents

# Insert documents into memory bank
response = client.memory.insert(
    bank_id="tutorial_bank",
    documents=all_documents,
)

print("Documents inserted successfully!")

Documents inserted successfully!


4. **Query the Memory Bank**
   
Now for the exciting part - querying our documents! The Memory API uses semantic search to find relevant content based on meaning, not just keywords.
‚ùì Understanding Scores:

Generally, scores above 0.7 indicate strong relevance
Consider your use case when deciding on score thresholds

In [18]:
def print_query_results(query: str):
    """Helper function to print query results in a readable format

    Args:
        query (str): The search query to execute
    """
    print(f"\nQuery: {query}")
    print("-" * 50)
    response = client.memory.query(
        bank_id="tutorial_bank",
        query=[query],  # The API accepts multiple queries at once!
    )

    for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
        print(f"\nResult {i+1} (Score: {score:.3f})")
        print("=" * 40)
        print(chunk)
        print("=" * 40)

# Let's try some example queries
queries = [
    "How do I use LoRA?",  # Technical question
    "Tell me about memory optimizations",  # General topic
    "What are the key features of Llama 3?"  # Product-specific
]


for query in queries:
    print_query_results(query)


Query: How do I use LoRA?
--------------------------------------------------

Result 1 (Score: 1.322)
Chunk(content="_peft:\n\nParameter Efficient Fine-Tuning (PEFT)\n--------------------------------------\n\n.. _glossary_lora:\n\nLow Rank Adaptation (LoRA)\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\n*What's going on here?*\n\nYou can read our tutorial on :ref:`finetuning Llama2 with LoRA<lora_finetune_label>` to understand how LoRA works, and how to use it.\nSimply stated, LoRA greatly reduces the number of trainable parameters, thus saving significant gradient and optimizer\nmemory during training.\n\n*Sounds great! How do I use it?*\n\nYou can finetune using any of our recipes with the ``lora_`` prefix, e.g. :ref:`lora_finetune_single_device<lora_finetune_recipe_label>`. These recipes utilize\nLoRA-enabled model builders, which we support for all our models, and also use the ``lora_`` prefix, e.g.\nthe :func:`torchtune.models.llama3.llama3` model has a corresponding :func:`torchtune.models.ll

Awesome, now we can embed all our notes with Llama-stack and ask it about the meaning of life :)

Next up, we will learn about the safety features and how to use them: [notebook link](./05_Safety101.ipynb)