llama-stack-mirror/docs/notebooks/crewai/Llama_Stack_CrewAI.ipynb
Eric Huang 756aa0ade8 chore: update doc
# What does this PR do?


## Test Plan
2025-10-20 10:20:14 -07:00

1265 lines
72 KiB
Text
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "2ktr5ls2cas",
"metadata": {
"id": "2ktr5ls2cas"
},
"source": [
"## LlamaStack + CrewAI Integration Tutorial\n",
"\n",
"This notebook guides you through integrating **LlamaStack** with **CrewAI** to build a complete Retrieval-Augmented Generation (RAG) system.\n",
"\n",
"### Overview\n",
"\n",
"- **LlamaStack**: Provides the infrastructure for running LLMs and vector store.\n",
"- **CrewAI**: Offers a framework for orchestrating agents and tasks.\n",
"- **Integration**: Leverages LlamaStack's OpenAI-compatible API with CrewAI.\n",
"\n",
"### What You Will Learn\n",
"\n",
"1. How to set up and start the LlamaStack server using the Together AI provider.\n",
"2. How to create and manage vector stores within LlamaStack.\n",
"3. How to build RAG tool with CrewAI by utilizing the LlamaStack server.\n",
"4. How to query the RAG tool for effective information retrieval and generation.\n",
"\n",
"### Prerequisites\n",
"\n",
"A Together AI API key is required to run the examples in this notebook.\n",
"\n",
"---\n",
"\n",
"### 1. Installation and Setup\n",
"#### Install Required Dependencies\n",
"\n",
"Begin by installing all necessary packages for CrewAI integration. Ensure your `TOGETHER_API_KEY` is set as an environment variable."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
"outputId": "a6427234-b75d-40ea-a471-8c7e9acb7d88",
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: uv in /Users/kaiwu/miniconda3/lib/python3.12/site-packages (0.8.11)\n",
"`\u001b[36mcrewai\u001b[39m` is already installed\n",
"Not in Google Colab environment\n"
]
},
{
"name": "stdin",
"output_type": "stream",
"text": [
"TOGETHER_API_KEY environment variable is not set. Please enter your API key: ········\n"
]
}
],
"source": [
"!pip install uv\n",
"!uv tool install crewai\n",
"import os\n",
"import getpass\n",
"\n",
"try:\n",
" from google.colab import userdata\n",
" os.environ['TOGETHER_API_KEY'] = userdata.get('TOGETHER_API_KEY')\n",
"except ImportError:\n",
" print(\"Not in Google Colab environment\")\n",
"\n",
"for key in ['TOGETHER_API_KEY']:\n",
" try:\n",
" api_key = os.environ[key]\n",
" if not api_key:\n",
" raise ValueError(f\"{key} environment variable is empty\")\n",
" except KeyError:\n",
" api_key = getpass.getpass(f\"{key} environment variable is not set. Please enter your API key: \")\n",
" os.environ[key] = api_key"
]
},
{
"cell_type": "markdown",
"id": "wmt9jvqzh7n",
"metadata": {
"id": "wmt9jvqzh7n"
},
"source": [
"### 2. LlamaStack Server Setup\n",
"\n",
"#### Build and Start LlamaStack Server\n",
"\n",
"This section sets up the LlamaStack server with:\n",
"- **Together AI** as the inference provider\n",
"- **FAISS** as the vector database\n",
"- **Sentence Transformers** for embeddings\n",
"\n",
"The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 773
},
"id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
"outputId": "aa53f96a-6826-4bfb-d1aa-2c0ec2dd4893",
"scrolled": true
},
"outputs": [],
"source": [
"import os\n",
"import subprocess\n",
"import time\n",
"\n",
"# Remove UV_SYSTEM_PYTHON to ensure uv creates a proper virtual environment\n",
"# instead of trying to use system Python globally, which could cause permission issues\n",
"# and package conflicts with the system's Python installation\n",
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"def run_llama_stack_server_background():\n",
" \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" \"uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install\",\n",
" \"uv run --with llama-stack llama stack run starter\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
" text=True,\n",
" )\n",
"\n",
" print(f\"Building and starting Llama Stack server with PID: {process.pid}\")\n",
" return process\n",
"\n",
"\n",
"def wait_for_server_to_start():\n",
" import requests\n",
" from requests.exceptions import ConnectionError\n",
"\n",
" url = \"http://0.0.0.0:8321/v1/health\"\n",
" max_retries = 30\n",
" retry_interval = 2\n",
"\n",
" print(\"Waiting for server to start\", end=\"\")\n",
" for _ in range(max_retries):\n",
" try:\n",
" response = requests.get(url)\n",
" if response.status_code == 200:\n",
" print(\"\\nServer is ready!\")\n",
" return True\n",
" except ConnectionError:\n",
" print(\".\", end=\"\", flush=True)\n",
" time.sleep(retry_interval)\n",
"\n",
" print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
" return False\n",
"\n",
"\n",
"def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes using pkill command\n",
" os.system(\"pkill -f llama_stack.core.server.server\")\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7f1494b7-938c-4338-9ae0-c463d2bc2eea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Building and starting Llama Stack server with PID: 52433\n",
"Waiting for server to start........\n",
"Server is ready!\n"
]
}
],
"source": [
"server_process = run_llama_stack_server_background()\n",
"assert wait_for_server_to_start()"
]
},
{
"cell_type": "markdown",
"id": "0j5hag7l9x89",
"metadata": {
"id": "0j5hag7l9x89"
},
"source": [
"### 3. Initialize LlamaStack Client\n",
"\n",
"Create a client connection to the LlamaStack server with API key for Together provider.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5",
"metadata": {
"id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5"
},
"outputs": [],
"source": [
"from llama_stack_client import LlamaStackClient\n",
"\n",
"client = LlamaStackClient(\n",
" base_url=\"http://0.0.0.0:8321\",\n",
" provider_data={\"together_api_key\": os.environ[\"TOGETHER_API_KEY\"]},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "vwhexjy1e8o",
"metadata": {
"id": "vwhexjy1e8o"
},
"source": [
"#### Explore Available Models \n",
"\n",
"Check what models are available through your LlamaStack instance."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
"metadata": {
"id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
"outputId": "0604e931-e280-44db-bce5-38373c0cbea8",
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Available models:\n",
"- bedrock/meta.llama3-1-8b-instruct-v1:0\n",
"- bedrock/meta.llama3-1-70b-instruct-v1:0\n",
"- bedrock/meta.llama3-1-405b-instruct-v1:0\n",
"- sentence-transformers/all-MiniLM-L6-v2\n",
"- together/Alibaba-NLP/gte-modernbert-base\n",
"- together/arcee-ai/AFM-4.5B\n",
"- together/arcee-ai/coder-large\n",
"- together/arcee-ai/maestro-reasoning\n",
"- together/arcee-ai/virtuoso-large\n",
"- together/arcee_ai/arcee-spotlight\n",
"- together/arize-ai/qwen-2-1.5b-instruct\n",
"- together/BAAI/bge-base-en-v1.5\n",
"- together/BAAI/bge-large-en-v1.5\n",
"- together/black-forest-labs/FLUX.1-dev\n",
"- together/black-forest-labs/FLUX.1-dev-lora\n",
"- together/black-forest-labs/FLUX.1-kontext-dev\n",
"- together/black-forest-labs/FLUX.1-kontext-max\n",
"- together/black-forest-labs/FLUX.1-kontext-pro\n",
"- together/black-forest-labs/FLUX.1-krea-dev\n",
"- together/black-forest-labs/FLUX.1-pro\n",
"- together/black-forest-labs/FLUX.1-schnell\n",
"- together/black-forest-labs/FLUX.1-schnell-Free\n",
"- together/black-forest-labs/FLUX.1.1-pro\n",
"- together/cartesia/sonic\n",
"- together/cartesia/sonic-2\n",
"- together/deepcogito/cogito-v2-preview-deepseek-671b\n",
"- together/deepcogito/cogito-v2-preview-llama-109B-MoE\n",
"- together/deepcogito/cogito-v2-preview-llama-405B\n",
"- together/deepcogito/cogito-v2-preview-llama-70B\n",
"- together/deepseek-ai/DeepSeek-R1\n",
"- together/deepseek-ai/DeepSeek-R1-0528-tput\n",
"- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B\n",
"- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free\n",
"- together/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B\n",
"- together/deepseek-ai/DeepSeek-V3\n",
"- together/deepseek-ai/DeepSeek-V3.1\n",
"- together/google/gemma-3n-E4B-it\n",
"- together/intfloat/multilingual-e5-large-instruct\n",
"- together/lgai/exaone-3-5-32b-instruct\n",
"- together/lgai/exaone-deep-32b\n",
"- together/marin-community/marin-8b-instruct\n",
"- together/meta-llama/Llama-2-70b-hf\n",
"- together/meta-llama/Llama-3-70b-chat-hf\n",
"- together/meta-llama/Llama-3-70b-hf\n",
"- together/meta-llama/Llama-3.1-405B-Instruct\n",
"- together/meta-llama/Llama-3.2-1B-Instruct\n",
"- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free\n",
"- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n",
"- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n",
"- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n",
"- together/meta-llama/Llama-Guard-4-12B\n",
"- together/meta-llama/LlamaGuard-2-8b\n",
"- together/meta-llama/Meta-Llama-3-70B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-3-8B-Instruct\n",
"- together/meta-llama/Meta-Llama-3-8B-Instruct-Lite\n",
"- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Reference\n",
"- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Reference\n",
"- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-Guard-3-8B\n",
"- together/mistralai/Mistral-7B-Instruct-v0.1\n",
"- together/mistralai/Mistral-7B-Instruct-v0.2\n",
"- together/mistralai/Mistral-7B-Instruct-v0.3\n",
"- together/mistralai/Mistral-Small-24B-Instruct-2501\n",
"- together/mistralai/Mixtral-8x7B-Instruct-v0.1\n",
"- together/mixedbread-ai/Mxbai-Rerank-Large-V2\n",
"- together/moonshotai/Kimi-K2-Instruct\n",
"- together/moonshotai/Kimi-K2-Instruct-0905\n",
"- together/openai/gpt-oss-120b\n",
"- together/openai/gpt-oss-20b\n",
"- together/openai/whisper-large-v3\n",
"- together/Qwen/Qwen2.5-72B-Instruct\n",
"- together/Qwen/Qwen2.5-72B-Instruct-Turbo\n",
"- together/Qwen/Qwen2.5-7B-Instruct-Turbo\n",
"- together/Qwen/Qwen2.5-Coder-32B-Instruct\n",
"- together/Qwen/Qwen2.5-VL-72B-Instruct\n",
"- together/Qwen/Qwen3-235B-A22B-fp8-tput\n",
"- together/Qwen/Qwen3-235B-A22B-Instruct-2507-tput\n",
"- together/Qwen/Qwen3-235B-A22B-Thinking-2507\n",
"- together/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8\n",
"- together/Qwen/Qwen3-Next-80B-A3B-Instruct\n",
"- together/Qwen/Qwen3-Next-80B-A3B-Thinking\n",
"- together/Qwen/QwQ-32B\n",
"- together/Salesforce/Llama-Rank-V1\n",
"- together/scb10x/scb10x-typhoon-2-1-gemma3-12b\n",
"- together/togethercomputer/m2-bert-80M-32k-retrieval\n",
"- together/togethercomputer/MoA-1\n",
"- together/togethercomputer/MoA-1-Turbo\n",
"- together/togethercomputer/Refuel-Llm-V2\n",
"- together/togethercomputer/Refuel-Llm-V2-Small\n",
"- together/Virtue-AI/VirtueGuard-Text-Lite\n",
"- together/zai-org/GLM-4.5-Air-FP8\n",
"----\n"
]
}
],
"source": [
"print(\"Available models:\")\n",
"for m in client.models.list():\n",
" print(f\"- {m.identifier}\")\n",
"\n",
"print(\"----\")"
]
},
{
"cell_type": "markdown",
"id": "b0f28603-3207-4157-b731-638d93cd82b5",
"metadata": {
"id": "b0f28603-3207-4157-b731-638d93cd82b5"
},
"source": [
"### 4. Vector Store Setup\n",
"\n",
"#### Create a Vector Store with File Upload\n",
"\n",
"Create a vector store using the OpenAI-compatible vector stores API:\n",
"\n",
"- **Vector Store**: OpenAI-compatible vector store for document storage\n",
"- **File Upload**: Automatic chunking and embedding of uploaded files\n",
"- **Embedding Model**: Sentence Transformers model for text embeddings\n",
"- **Dimensions**: 384-dimensional embeddings"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0f241d81-19a7-451f-ac4e-2869a29300d1",
"metadata": {
"id": "0f241d81-19a7-451f-ac4e-2869a29300d1",
"outputId": "b2512715-a9e1-431e-88d4-378165a8ff8b"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"File(id='file-489db9aae0424745960e3408ff0f477f', bytes=41, created_at=1757540912, expires_at=1789076912, filename='shipping_policy.txt', object='file', purpose='assistants')\n",
"File(id='file-b2f38b0e164347f5a2b6bbe211e33ff3', bytes=48, created_at=1757540912, expires_at=1789076912, filename='returns_policy.txt', object='file', purpose='assistants')\n",
"File(id='file-6f6f157d165a4078b4abef66a095ccd6', bytes=45, created_at=1757540912, expires_at=1789076912, filename='support.txt', object='file', purpose='assistants')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores \"HTTP/1.1 200 OK\"\n"
]
}
],
"source": [
"from io import BytesIO\n",
"\n",
"docs = [\n",
" (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
" (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
" (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
"]\n",
"\n",
"file_ids = []\n",
"for content, metadata in docs:\n",
" with BytesIO(content.encode()) as file_buffer:\n",
" file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n",
" create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
" print(create_file_response)\n",
" file_ids.append(create_file_response.id)\n",
"\n",
"# Create vector store with files\n",
"vector_store = client.vector_stores.create(\n",
" name=\"acme_docs\",\n",
" file_ids=file_ids,\n",
" embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
" embedding_dimension=384,\n",
" provider_id=\"faiss\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "9061tmi1zpq",
"metadata": {
"id": "9061tmi1zpq"
},
"source": [
"#### Test Vector Search\n",
"\n",
"Query the vector store to verify it's working correctly. This performs semantic search to find relevant documents based on the query."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4a5e010c-eeeb-4020-a957-74d6d1cba342",
"metadata": {
"id": "4a5e010c-eeeb-4020-a957-74d6d1cba342",
"outputId": "14e1fde5-38ae-4532-b53b-4a2970c09352"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_dab05212-db05-402c-91ef-57e41797406b/search \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Acme ships globally in 3-5 business days.\n",
"Returns are accepted within 30 days of purchase.\n"
]
}
],
"source": [
"search_response = client.vector_stores.search(\n",
" vector_store_id=vector_store.id,\n",
" query=\"How long does shipping take?\",\n",
" max_num_results=2\n",
")\n",
"for result in search_response.data:\n",
" content = result.content[0].text\n",
" print(content)"
]
},
{
"cell_type": "markdown",
"id": "usne6mbspms",
"metadata": {
"id": "usne6mbspms"
},
"source": [
"### 5. CrewAI Integration\n",
"\n",
"#### Configure CrewAI with LlamaStack\n",
"\n",
"Set up CrewAI to use LlamaStack's OpenAI-compatible API:\n",
"\n",
"- **Base URL**: Points to LlamaStack's OpenAI endpoint\n",
"- **Headers**: Include Together AI API key for model access\n",
"- **Model**: Use Meta Llama 3.3 70B model via Together AI"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
"metadata": {
"id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
"outputId": "f7db1a39-097e-46db-ddef-e309930a4564"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json \"HTTP/1.1 200 OK\"\n"
]
}
],
"source": [
"import os\n",
"from crewai.llm import LLM\n",
"\n",
"# Point LLM class to Llamastack Server\n",
"\n",
"llamastack_llm = LLM(\n",
" model=\"openai/together/meta-llama/Llama-3.3-70B-Instruct-Turbo\", # it's an openai-api compatible model\n",
" base_url=\"http://localhost:8321/v1/openai/v1\",\n",
" api_key = os.getenv(\"OPENAI_API_KEY\", \"dummy\"),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5a4ddpcuk3l",
"metadata": {
"id": "5a4ddpcuk3l"
},
"source": [
"#### Test LLM Connection\n",
"\n",
"Verify that CrewAI LLM can successfully communicate with the LlamaStack server."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
"metadata": {
"id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
"outputId": "f48443dc-19d2-440e-a24a-4a8fb8ab4725"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[92m14:49:56 - LiteLLM:INFO\u001b[0m: utils.py:3258 - \n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:LiteLLM:\n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:50:01 - LiteLLM:INFO\u001b[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler\n",
"INFO:LiteLLM:Wrapper: Completed Call, calling success_handler\n"
]
},
{
"data": {
"text/plain": [
"\"In the Andes' gentle breeze, a llama's soft eyes gaze with peaceful ease, its fur a warm and fuzzy tease. With steps both gentle and serene, the llama roams, a symbol of calm, its beauty pure and supreme.\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Test llm with simple message\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
" {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
"]\n",
"llamastack_llm.call(messages)"
]
},
{
"cell_type": "markdown",
"id": "5f478686-aa7b-4631-a737-c2ea3c65a7c8",
"metadata": {
"id": "5f478686-aa7b-4631-a737-c2ea3c65a7c8"
},
"source": [
"#### Create CrewAI Custom Tool\n",
"\n",
"Define a custom CrewAI tool, `LlamaStackRAGTool`, to encapsulate the logic for querying the LlamaStack vector store. This tool will be used by the CrewAI agent to perform retrieval during the RAG process.\n",
"\n",
"- **Input Schema**: Defines the expected input parameters for the tool, such as the user query, the vector store ID, and optional parameters like `top_k`.\n",
"- **Tool Logic**: Implements the `_run` method, which takes the user query and vector store ID, calls the LlamaStack client's `vector_stores.search` method, and formats the retrieved documents into a human-readable string for the LLM to use as context."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "08de540f-ed47-405a-a9c5-16505f4c88c8",
"metadata": {
"id": "08de540f-ed47-405a-a9c5-16505f4c88c8"
},
"outputs": [],
"source": [
"from crewai.tools import BaseTool\n",
"from typing import Any, List, Optional, Type\n",
"from pydantic import BaseModel, Field\n",
"\n",
"# ---------- 1. Input schema ----------\n",
"class VectorStoreRAGToolInput(BaseModel):\n",
" \"\"\"Input schema for LlamaStackVectorStoreRAGTool.\"\"\"\n",
" query: str = Field(..., description=\"The user query for RAG search\")\n",
" vector_store_id: str = Field(...,\n",
" description=\"ID of the vector store to search inside the Llama-Stack server\",\n",
" )\n",
" top_k: Optional[int] = Field(\n",
" default=5,\n",
" description=\"How many documents to return\",\n",
" )\n",
" score_threshold: Optional[float] = Field(\n",
" default=None,\n",
" description=\"Optional similarity score cut-off (0-1).\",\n",
" )\n",
"\n",
"# ---------- 2. The tool ----------\n",
"class LlamaStackVectorStoreRAGTool(BaseTool):\n",
" name: str = \"Llama Stack Vector Store RAG tool\"\n",
" description: str = (\n",
" \"This tool calls a Llama-Stack endpoint for retrieval-augmented generation using a vector store. \"\n",
" \"It takes a natural-language query and returns the most relevant documents.\"\n",
" )\n",
" args_schema: Type[BaseModel] = VectorStoreRAGToolInput\n",
" client: Any\n",
" vector_store_id: str = \"\"\n",
" top_k: int = 5\n",
"\n",
" def _run(self, **kwargs: Any) -> str:\n",
" # 1. Resolve parameters (use instance defaults when not supplied)\n",
" query: str = kwargs.get(\"query\") # Required schema enforces presence\n",
" vector_store_id: str = kwargs.get(\"vector_store_id\", self.vector_store_id)\n",
" top_k: int = kwargs.get(\"top_k\", self.top_k)\n",
" if vector_store_id == \"\":\n",
" print('vector_store_id is empty, please specify which vector_store to search')\n",
" return \"No documents found.\"\n",
" # 2. Issue request to Llama-Stack\n",
" response = self.client.vector_stores.search(\n",
" vector_store_id=vector_store_id,\n",
" query=query,\n",
" max_num_results=top_k,\n",
" )\n",
"\n",
" # 3. Massage results into a single human-readable string\n",
" if not response or not response.data:\n",
" return \"No documents found.\"\n",
"\n",
" docs: List[str] = []\n",
" for result in response.data:\n",
" content = result.content[0].text if result.content else \"No content\"\n",
" filename = result.filename if result.filename else {}\n",
" docs.append(f\"filename: {filename}, content: {content}\")\n",
" return \"\\n\".join(docs)\n"
]
},
{
"cell_type": "markdown",
"id": "0xh0jg6a0l4a",
"metadata": {
"id": "0xh0jg6a0l4a"
},
"source": [
"### 6. Building the RAG tool\n",
"\n",
"#### Create a Complete RAG Pipeline\n",
"\n",
"Construct a CrewAI pipeline that orchestrates the RAG process. This pipeline includes:\n",
"\n",
"1. **Agent Definition**: Defining a CrewAI agent with a specific role (`RAG assistant`), goal, backstory, and the LlamaStack LLM and the custom RAG tool.\n",
"2. **Task Definition**: Defining a CrewAI task for the agent to perform. The task description includes placeholders for the user query and vector store ID, which will be provided during execution. The task's expected output is an answer to the question based on the retrieved context.\n",
"3. **Crew Definition**: Creating a CrewAI `Crew` object with the defined task and agent. This crew represents the complete RAG pipeline.\n",
"\n",
"**CrewAI workflow**:\n",
"`User Query → CrewAI Task → Agent invokes LlamaStackRAGTool → LlamaStack Vector Search → Retrieved Context → Agent uses Context + Question → LLM Generation → Final Response`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "9684427d-dcc7-4544-9af5-8b110d014c42",
"metadata": {
"id": "9684427d-dcc7-4544-9af5-8b110d014c42"
},
"outputs": [],
"source": [
"from crewai import Agent, Crew, Task, Process\n",
"\n",
"# ---- 3. Define the agent -----------------------------------------\n",
"agent = Agent(\n",
" role=\"RAG assistant\",\n",
" goal=\"Answer user's question with provided context\",\n",
" backstory=\"You are an experienced search assistant specializing in finding relevant information from documentation and vector_db to answer user questions accurately.\",\n",
" allow_delegation=False,\n",
" llm=llamastack_llm,\n",
" tools=[LlamaStackVectorStoreRAGTool(client=client)])\n",
"# ---- 4. Wrap everything in a Crew task ---------------------------\n",
"task = Task(\n",
" description=\"Answer the following questions: {query}, using the RAG_tool to search the provided vector_store_id {vector_store_id} if needed\",\n",
" expected_output=\"An answer to the question with provided context\",\n",
" agent=agent,\n",
")\n",
"crew = Crew(tasks=[task], verbose=True)\n"
]
},
{
"cell_type": "markdown",
"id": "0onu6rhphlra",
"metadata": {
"id": "0onu6rhphlra"
},
"source": [
"### 7. Testing the RAG System\n",
"\n",
"#### Example 1: Shipping Query"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
"metadata": {
"colab": {
"referenced_widgets": [
"39eb50b3c96244cf9c82043c0a359d8a"
]
},
"id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
"outputId": "ddc3a70d-c0f3-484f-8469-9362e44d8831"
},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080\">╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Crew Execution Started</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008080; text-decoration-color: #008080\">crew</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">ID: </span><span style=\"color: #008080; text-decoration-color: #008080\">091cf919-5c4b-4168-ac49-65fe5e8faa9e</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[36m╭─\u001b[0m\u001b[36m───────────────────────────────────────────\u001b[0m\u001b[36m Crew Execution Started \u001b[0m\u001b[36m────────────────────────────────────────────\u001b[0m\u001b[36m─╮\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[1;36mCrew Execution Started\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[36mcrew\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mID: \u001b[0m\u001b[36m091cf919-5c4b-4168-ac49-65fe5e8faa9e\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "cb8f60c158fb4a0496e78e4d596ac4c8",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[92m14:55:09 - LiteLLM:INFO\u001b[0m: utils.py:3258 - \n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:LiteLLM:\n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:11 - LiteLLM:INFO\u001b[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler\n",
"INFO:LiteLLM:Wrapper: Completed Call, calling success_handler\n"
]
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">{'query': 'How long does shipping take?', 'vector_store_id': 'vs_dab05212-db05-402c-91ef-57e41797406b', 'top_k': 1,\n",
"'score_threshold': 0.0}\n",
"</pre>\n"
],
"text/plain": [
"{'query': 'How long does shipping take?', 'vector_store_id': 'vs_dab05212-db05-402c-91ef-57e41797406b', 'top_k': 1,\n",
"'score_threshold': 0.0}\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_dab05212-db05-402c-91ef-57e41797406b/search \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:11 - LiteLLM:INFO\u001b[0m: utils.py:3258 - \n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:LiteLLM:\n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:12 - LiteLLM:INFO\u001b[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler\n",
"INFO:LiteLLM:Wrapper: Completed Call, calling success_handler\n"
]
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Task Completed</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008000; text-decoration-color: #008000\">cf3f4f08-744c-4aee-9387-e9eb70624fc1</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Agent: </span><span style=\"color: #008000; text-decoration-color: #008000\">RAG assistant</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[32m╭─\u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m Task Completion \u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m─╮\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[1;32mTask Completed\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[32mcf3f4f08-744c-4aee-9387-e9eb70624fc1\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mAgent: \u001b[0m\u001b[32mRAG assistant\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">╭──────────────────────────────────────────────── Crew Completion ────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Crew Execution Completed</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008000; text-decoration-color: #008000\">crew</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">ID: </span><span style=\"color: #008000; text-decoration-color: #008000\">091cf919-5c4b-4168-ac49-65fe5e8faa9e</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Final Output: Acme ships globally in 3-5 business days.</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[32m╭─\u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m Crew Completion \u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m─╮\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[1;32mCrew Execution Completed\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[32mcrew\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mID: \u001b[0m\u001b[32m091cf919-5c4b-4168-ac49-65fe5e8faa9e\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mFinal Output: Acme ships globally in 3-5 business days.\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"❓ How long does shipping take?\n",
"💡 Acme ships globally in 3-5 business days.\n"
]
}
],
"source": [
"query = \"How long does shipping take?\"\n",
"response = crew.kickoff(inputs={\"query\": query,\"vector_store_id\": vector_store.id})\n",
"print(\"❓\", query)\n",
"print(\"💡\", response)"
]
},
{
"cell_type": "markdown",
"id": "b7krhqj88ku",
"metadata": {
"id": "b7krhqj88ku"
},
"source": [
"#### Example 2: Returns Policy Query"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "61995550-bb0b-46a8-a5d0-023207475d60",
"metadata": {
"colab": {
"referenced_widgets": [
"1d575307e41d46f7943746d4380d08bb"
]
},
"id": "61995550-bb0b-46a8-a5d0-023207475d60",
"outputId": "a039ab06-a541-48f9-a66d-6cef17911814"
},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080\">╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Crew Execution Started</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008080; text-decoration-color: #008080\">crew</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">ID: </span><span style=\"color: #008080; text-decoration-color: #008080\">091cf919-5c4b-4168-ac49-65fe5e8faa9e</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">│</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[36m╭─\u001b[0m\u001b[36m───────────────────────────────────────────\u001b[0m\u001b[36m Crew Execution Started \u001b[0m\u001b[36m────────────────────────────────────────────\u001b[0m\u001b[36m─╮\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[1;36mCrew Execution Started\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[36mcrew\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mID: \u001b[0m\u001b[36m091cf919-5c4b-4168-ac49-65fe5e8faa9e\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m│\u001b[0m \u001b[36m│\u001b[0m\n",
"\u001b[36m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "60b83042bfc14a75b555537d13147372",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[92m14:55:19 - LiteLLM:INFO\u001b[0m: utils.py:3258 - \n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:LiteLLM:\n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:21 - LiteLLM:INFO\u001b[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler\n",
"INFO:LiteLLM:Wrapper: Completed Call, calling success_handler\n"
]
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">{'query': 'return policy after 40 days', 'vector_store_id': 'vs_dab05212-db05-402c-91ef-57e41797406b', 'top_k': 1, \n",
"'score_threshold': 0.5}\n",
"</pre>\n"
],
"text/plain": [
"{'query': 'return policy after 40 days', 'vector_store_id': 'vs_dab05212-db05-402c-91ef-57e41797406b', 'top_k': 1, \n",
"'score_threshold': 0.5}\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_dab05212-db05-402c-91ef-57e41797406b/search \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:22 - LiteLLM:INFO\u001b[0m: utils.py:3258 - \n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:LiteLLM:\n",
"LiteLLM completion() model= together/meta-llama/Llama-3.3-70B-Instruct-Turbo; provider = openai\n",
"INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
"\u001b[92m14:55:22 - LiteLLM:INFO\u001b[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler\n",
"INFO:LiteLLM:Wrapper: Completed Call, calling success_handler\n"
]
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Task Completed</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008000; text-decoration-color: #008000\">cf3f4f08-744c-4aee-9387-e9eb70624fc1</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Agent: </span><span style=\"color: #008000; text-decoration-color: #008000\">RAG assistant</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[32m╭─\u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m Task Completion \u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m─╮\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[1;32mTask Completed\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[32mcf3f4f08-744c-4aee-9387-e9eb70624fc1\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mAgent: \u001b[0m\u001b[32mRAG assistant\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">╭──────────────────────────────────────────────── Crew Completion ────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Crew Execution Completed</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Name: </span><span style=\"color: #008000; text-decoration-color: #008000\">crew</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">ID: </span><span style=\"color: #008000; text-decoration-color: #008000\">091cf919-5c4b-4168-ac49-65fe5e8faa9e</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Tool Args: </span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #c0c0c0; text-decoration-color: #c0c0c0\">Final Output: Returns are accepted within 30 days of purchase.</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">│</span> <span style=\"color: #008000; text-decoration-color: #008000\">│</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[32m╭─\u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m Crew Completion \u001b[0m\u001b[32m───────────────────────────────────────────────\u001b[0m\u001b[32m─╮\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[1;32mCrew Execution Completed\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mName: \u001b[0m\u001b[32mcrew\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mID: \u001b[0m\u001b[32m091cf919-5c4b-4168-ac49-65fe5e8faa9e\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mTool Args: \u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[37mFinal Output: Returns are accepted within 30 days of purchase.\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\n",
"\u001b[32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"</pre>\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"❓ Can I return a product after 40 days?\n",
"💡 Returns are accepted within 30 days of purchase.\n"
]
}
],
"source": [
"query = \"Can I return a product after 40 days?\"\n",
"response = crew.kickoff(inputs={\"query\": query,\"vector_store_id\": vector_store.id})\n",
"print(\"❓\", query)\n",
"print(\"💡\", response)"
]
},
{
"cell_type": "markdown",
"id": "h4w24fadvjs",
"metadata": {
"id": "h4w24fadvjs"
},
"source": [
"---\n",
"\n",
"We have successfully built a RAG system that combines:\n",
"\n",
"- **LlamaStack** for infrastructure (LLM serving + vector store)\n",
"- **CrewAI** for orchestration (agents, tasks, and tools)\n",
"- **Together AI** for high-quality language models\n",
"\n",
"### Key Benefits\n",
"\n",
"1. **Unified Infrastructure**: A single server for LLMs and vector stores simplifies deployment and management.\n",
"2. **OpenAI Compatibility**: Enables easy integration with existing libraries and frameworks that support the OpenAI API standard, such as CrewAI.\n",
"3. **Multi-Provider Support**: Offers the flexibility to switch between different LLM and embedding providers without altering the core application logic.\n",
"4. **Production Ready**: LlamaStack includes features designed for production environments, such as built-in safety shields and monitoring capabilities.\n",
"\n",
"\n",
"##### 🔧 Cleanup\n",
"\n",
"Remember to stop the LlamaStack server process when you are finished to free up resources. You can use the `kill_llama_stack_server()` helper function defined earlier in the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a21270b4-b0a7-4481-96a5-044f908de363",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}