mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-22 00:13:08 +00:00
Merge e0dda3bb06
into sapling-pr-archive-ehhuang
This commit is contained in:
commit
043b9d93cd
27 changed files with 7930 additions and 7743 deletions
|
@ -92,7 +92,7 @@ As more providers start supporting Llama 4, you can use them in Llama Stack as w
|
|||
To try Llama Stack locally, run:
|
||||
|
||||
```bash
|
||||
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/scripts/install.sh | bash
|
||||
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
|
||||
```
|
||||
|
||||
### Overview
|
||||
|
|
|
@ -51,8 +51,9 @@ device: cpu
|
|||
You can access the HuggingFace trainer via the `starter` distribution:
|
||||
|
||||
```bash
|
||||
llama stack build --distro starter --image-type venv
|
||||
llama stack run ~/.llama/distributions/starter/starter-run.yaml
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
llama stack run starter
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
|
|
@ -175,8 +175,8 @@ llama-stack-client benchmarks register \
|
|||
**1. Start the Llama Stack API Server**
|
||||
|
||||
```bash
|
||||
# Build and run a distribution (example: together)
|
||||
llama stack build --distro together --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps together | xargs -L1 uv pip install
|
||||
llama stack run together
|
||||
```
|
||||
|
||||
|
@ -209,7 +209,8 @@ The playground works with any Llama Stack distribution. Popular options include:
|
|||
<TabItem value="together" label="Together AI">
|
||||
|
||||
```bash
|
||||
llama stack build --distro together --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps together | xargs -L1 uv pip install
|
||||
llama stack run together
|
||||
```
|
||||
|
||||
|
@ -222,7 +223,8 @@ llama stack run together
|
|||
<TabItem value="ollama" label="Ollama (Local)">
|
||||
|
||||
```bash
|
||||
llama stack build --distro ollama --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps ollama | xargs -L1 uv pip install
|
||||
llama stack run ollama
|
||||
```
|
||||
|
||||
|
@ -235,7 +237,8 @@ llama stack run ollama
|
|||
<TabItem value="meta-reference" label="Meta Reference">
|
||||
|
||||
```bash
|
||||
llama stack build --distro meta-reference --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps meta-reference | xargs -L1 uv pip install
|
||||
llama stack run meta-reference
|
||||
```
|
||||
|
||||
|
|
|
@ -20,7 +20,9 @@ RAG enables your applications to reference and recall information from external
|
|||
In one terminal, start the Llama Stack server:
|
||||
|
||||
```bash
|
||||
uv run llama stack build --distro starter --image-type venv --run
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
llama stack run starter
|
||||
```
|
||||
|
||||
### 2. Connect with OpenAI Client
|
||||
|
|
|
@ -67,7 +67,7 @@ def get_base_url(self) -> str:
|
|||
|
||||
## Testing the Provider
|
||||
|
||||
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --distro together`.
|
||||
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, install its dependencies with `llama stack list-deps together | xargs -L1 uv pip install`.
|
||||
|
||||
### 1. Integration Testing
|
||||
|
||||
|
|
|
@ -12,7 +12,7 @@ This avoids the overhead of setting up a server.
|
|||
```bash
|
||||
# setup
|
||||
uv pip install llama-stack
|
||||
llama stack build --distro starter --image-type venv
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
```
|
||||
|
||||
```python
|
||||
|
|
|
@ -59,7 +59,7 @@ Start a Llama Stack server on localhost. Here is an example of how you can do th
|
|||
uv venv starter --python 3.12
|
||||
source starter/bin/activate # On Windows: starter\Scripts\activate
|
||||
pip install --no-cache llama-stack==0.2.2
|
||||
llama stack build --distro starter --image-type venv
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
export FIREWORKS_API_KEY=<SOME_KEY>
|
||||
llama stack run starter --port 5050
|
||||
```
|
||||
|
|
|
@ -166,10 +166,11 @@ docker run \
|
|||
|
||||
### Via venv
|
||||
|
||||
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
|
||||
Install the package and distribution dependencies before launching:
|
||||
|
||||
```bash
|
||||
llama stack build --distro dell --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps dell | xargs -L1 uv pip install
|
||||
INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
DEH_URL=$DEH_URL \
|
||||
CHROMA_URL=$CHROMA_URL \
|
||||
|
|
|
@ -81,10 +81,11 @@ docker run \
|
|||
|
||||
### Via venv
|
||||
|
||||
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
|
||||
Install the package and this distribution’s dependencies into your active virtualenv:
|
||||
|
||||
```bash
|
||||
llama stack build --distro meta-reference-gpu --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
||||
llama stack run distributions/meta-reference-gpu/run.yaml \
|
||||
--port 8321
|
||||
|
|
|
@ -136,11 +136,12 @@ docker run \
|
|||
|
||||
### Via venv
|
||||
|
||||
If you've set up your local development environment, you can also build the image using your local virtual environment.
|
||||
If you've set up your local development environment, you can install this distribution into your virtualenv:
|
||||
|
||||
```bash
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps nvidia | xargs -L1 uv pip install
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
|
||||
llama stack build --distro nvidia --image-type venv
|
||||
NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
llama stack run ./run.yaml \
|
||||
|
|
|
@ -240,6 +240,6 @@ additional_pip_packages:
|
|||
- sqlalchemy[asyncio]
|
||||
```
|
||||
|
||||
No other steps are required other than `llama stack build` and `llama stack run`. The build process will use `module` to install all of the provider dependencies, retrieve the spec, etc.
|
||||
No other steps are required beyond installing dependencies with `llama stack list-deps <distro> | xargs -L1 uv pip install` and then running `llama stack run`. The CLI will use `module` to install the provider dependencies, retrieve the spec, etc.
|
||||
|
||||
The provider will now be available in Llama Stack with the type `remote::ramalama`.
|
||||
|
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
|
@ -2864,7 +2864,8 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"!llama stack build --distro experimental-post-training --image-type venv --image-name __system__"
|
||||
"!uv pip install llama-stack\n",
|
||||
"llama stack list-deps experimental-post-training | xargs -L1 uv pip install --image-name __system__\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -6365,4 +6366,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
}
|
|
@ -38,7 +38,8 @@
|
|||
"source": [
|
||||
"# NBVAL_SKIP\n",
|
||||
"!pip install -U llama-stack\n",
|
||||
"!UV_SYSTEM_PYTHON=1 llama stack build --distro fireworks --image-type venv"
|
||||
"!UV_SYSTEM_PYTHON=1 uv pip install llama-stack\n",
|
||||
"llama stack list-deps fireworks | xargs -L1 uv pip install\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -3531,4 +3532,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
File diff suppressed because it is too large
Load diff
|
@ -136,7 +136,9 @@
|
|||
" \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
|
||||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||||
" process = subprocess.Popen(\n",
|
||||
" \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n",
|
||||
" \"uv pip install llama-stack\n",
|
||||
"llama stack list-deps starter | xargs -L1 uv pip install\n",
|
||||
"llama stack run starter --image-type venv --run\",\n",
|
||||
" shell=True,\n",
|
||||
" stdout=log_file,\n",
|
||||
" stderr=log_file,\n",
|
||||
|
@ -172,7 +174,7 @@
|
|||
"\n",
|
||||
"def kill_llama_stack_server():\n",
|
||||
" # Kill any existing llama stack server processes using pkill command\n",
|
||||
" os.system(\"pkill -f llama_stack.core.server.server\")"
|
||||
" os.system(\"pkill -f llama_stack.core.server.server\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1261,4 +1263,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
|
@ -105,7 +105,9 @@
|
|||
" \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
|
||||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||||
" process = subprocess.Popen(\n",
|
||||
" \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n",
|
||||
" \"uv pip install llama-stack\n",
|
||||
"llama stack list-deps starter | xargs -L1 uv pip install\n",
|
||||
"llama stack run starter --image-type venv --run\",\n",
|
||||
" shell=True,\n",
|
||||
" stdout=log_file,\n",
|
||||
" stderr=log_file,\n",
|
||||
|
@ -141,7 +143,7 @@
|
|||
"\n",
|
||||
"def kill_llama_stack_server():\n",
|
||||
" # Kill any existing llama stack server processes using pkill command\n",
|
||||
" os.system(\"pkill -f llama_stack.core.server.server\")"
|
||||
" os.system(\"pkill -f llama_stack.core.server.server\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -698,4 +700,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
|
@ -91,9 +91,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```bash\n",
|
||||
"LLAMA_STACK_DIR=$(pwd) llama stack build --distro nvidia --image-type venv\n",
|
||||
"```"
|
||||
"```bash\nuv pip install llama-stack\nllama stack list-deps nvidia | xargs -L1 uv pip install\n```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1682,4 +1680,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
|
@ -80,9 +80,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```bash\n",
|
||||
"LLAMA_STACK_DIR=$(pwd) llama stack build --distro nvidia --image-type venv\n",
|
||||
"```"
|
||||
"```bash\nuv pip install llama-stack\nllama stack list-deps nvidia | xargs -L1 uv pip install\n```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -592,4 +590,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
|
@ -1,366 +1,368 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c1e7571c",
|
||||
"metadata": {
|
||||
"id": "c1e7571c"
|
||||
},
|
||||
"source": [
|
||||
"[](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)\n",
|
||||
"\n",
|
||||
"# Llama Stack - Building AI Applications\n",
|
||||
"\n",
|
||||
"<img src=\"https://llamastack.github.io/latest/_images/llama-stack.png\" alt=\"drawing\" width=\"500\"/>\n",
|
||||
"\n",
|
||||
"Get started with Llama Stack in minutes!\n",
|
||||
"\n",
|
||||
"[Llama Stack](https://github.com/meta-llama/llama-stack) is a stateful service with REST APIs to support the seamless transition of AI applications across different environments. You can build and test using a local server first and deploy to a hosted endpoint for production.\n",
|
||||
"\n",
|
||||
"In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)\n",
|
||||
"as the inference [provider](docs/source/providers/index.md#inference) for a Llama Model.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4CV1Q19BDMVw",
|
||||
"metadata": {
|
||||
"id": "4CV1Q19BDMVw"
|
||||
},
|
||||
"source": [
|
||||
"## Step 1: Install and setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "K4AvfUAJZOeS",
|
||||
"metadata": {
|
||||
"id": "K4AvfUAJZOeS"
|
||||
},
|
||||
"source": [
|
||||
"### 1.1. Install uv and test inference with Ollama\n",
|
||||
"\n",
|
||||
"We'll install [uv](https://docs.astral.sh/uv/) to setup the Python virtual environment, along with [colab-xterm](https://github.com/InfuseAI/colab-xterm) for running command-line tools, and [Ollama](https://ollama.com/download) as the inference provider."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7a2d7b85",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install uv llama_stack llama-stack-client\n",
|
||||
"\n",
|
||||
"## If running on Collab:\n",
|
||||
"# !pip install colab-xterm\n",
|
||||
"# %load_ext colabxterm\n",
|
||||
"\n",
|
||||
"!curl https://ollama.ai/install.sh | sh"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39fa584b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1.2. Test inference with Ollama"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3bf81522",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We’ll now launch a terminal and run inference on a Llama model with Ollama to verify that the model is working correctly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a7e8e0f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## If running on Colab:\n",
|
||||
"# %xterm\n",
|
||||
"\n",
|
||||
"## To be ran in the terminal:\n",
|
||||
"# ollama serve &\n",
|
||||
"# ollama run llama3.2:3b --keepalive 60m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f3c5f243",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If successful, you should see the model respond to a prompt.\n",
|
||||
"\n",
|
||||
"...\n",
|
||||
"```\n",
|
||||
">>> hi\n",
|
||||
"Hello! How can I assist you today?\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "oDUB7M_qe-Gs",
|
||||
"metadata": {
|
||||
"id": "oDUB7M_qe-Gs"
|
||||
},
|
||||
"source": [
|
||||
"## Step 2: Run the Llama Stack server\n",
|
||||
"\n",
|
||||
"In this showcase, we will start a Llama Stack server that is running locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "732eadc6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.1. Setup the Llama Stack Server"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "J2kGed0R5PSf",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"collapsed": true,
|
||||
"id": "J2kGed0R5PSf",
|
||||
"outputId": "2478ea60-8d35-48a1-b011-f233831740c5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import subprocess\n",
|
||||
"\n",
|
||||
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
|
||||
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
|
||||
"\n",
|
||||
"# this command installs all the dependencies needed for the llama stack server with the ollama inference provider\n",
|
||||
"!uv run --with llama-stack llama stack build --distro starter\n",
|
||||
"\n",
|
||||
"def run_llama_stack_server_background():\n",
|
||||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||||
" process = subprocess.Popen(\n",
|
||||
" f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter\n",
|
||||
" shell=True,\n",
|
||||
" stdout=log_file,\n",
|
||||
" stderr=log_file,\n",
|
||||
" text=True\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
|
||||
" return process\n",
|
||||
"\n",
|
||||
"def wait_for_server_to_start():\n",
|
||||
" import requests\n",
|
||||
" from requests.exceptions import ConnectionError\n",
|
||||
" import time\n",
|
||||
"\n",
|
||||
" url = \"http://0.0.0.0:8321/v1/health\"\n",
|
||||
" max_retries = 30\n",
|
||||
" retry_interval = 1\n",
|
||||
"\n",
|
||||
" print(\"Waiting for server to start\", end=\"\")\n",
|
||||
" for _ in range(max_retries):\n",
|
||||
" try:\n",
|
||||
" response = requests.get(url)\n",
|
||||
" if response.status_code == 200:\n",
|
||||
" print(\"\\nServer is ready!\")\n",
|
||||
" return True\n",
|
||||
" except ConnectionError:\n",
|
||||
" print(\".\", end=\"\", flush=True)\n",
|
||||
" time.sleep(retry_interval)\n",
|
||||
"\n",
|
||||
" print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# use this helper if needed to kill the server\n",
|
||||
"def kill_llama_stack_server():\n",
|
||||
" # Kill any existing llama stack server processes\n",
|
||||
" os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c40e9efd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.2. Start the Llama Stack Server"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f779283d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Starting Llama Stack server with PID: 787100\n",
|
||||
"Waiting for server to start\n",
|
||||
"Server is ready!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"server_process = run_llama_stack_server_background()\n",
|
||||
"assert wait_for_server_to_start()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "28477c03",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Run the demo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7da71011",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html\n",
|
||||
"prompt> How do you do great work?\n",
|
||||
"\u001b[33minference> \u001b[0m\u001b[33m[k\u001b[0m\u001b[33mnowledge\u001b[0m\u001b[33m_search\u001b[0m\u001b[33m(query\u001b[0m\u001b[33m=\"\u001b[0m\u001b[33mWhat\u001b[0m\u001b[33m is\u001b[0m\u001b[33m the\u001b[0m\u001b[33m key\u001b[0m\u001b[33m to\u001b[0m\u001b[33m doing\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m\")]\u001b[0m\u001b[97m\u001b[0m\n",
|
||||
"\u001b[32mtool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}\u001b[0m\n",
|
||||
"\u001b[32mtool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\\nBEGIN of knowledge_search tool results.\\n', type='text'), TextContentItem(text=\"Result 1:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 2:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 3:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 4:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 5:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text='END of knowledge_search tool results.\\n', type='text'), TextContentItem(text='The above results were retrieved to help answer the user\\'s query: \"What is the key to doing great work\". Use them as supporting information only in answering this query.\\n', type='text')]\u001b[0m\n",
|
||||
"\u001b[33minference> \u001b[0m\u001b[33mDoing\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m means\u001b[0m\u001b[33m doing\u001b[0m\u001b[33m something\u001b[0m\u001b[33m important\u001b[0m\u001b[33m so\u001b[0m\u001b[33m well\u001b[0m\u001b[33m that\u001b[0m\u001b[33m you\u001b[0m\u001b[33m expand\u001b[0m\u001b[33m people\u001b[0m\u001b[33m's\u001b[0m\u001b[33m ideas\u001b[0m\u001b[33m of\u001b[0m\u001b[33m what\u001b[0m\u001b[33m's\u001b[0m\u001b[33m possible\u001b[0m\u001b[33m.\u001b[0m\u001b[33m However\u001b[0m\u001b[33m,\u001b[0m\u001b[33m there\u001b[0m\u001b[33m's\u001b[0m\u001b[33m no\u001b[0m\u001b[33m threshold\u001b[0m\u001b[33m for\u001b[0m\u001b[33m importance\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m it\u001b[0m\u001b[33m's\u001b[0m\u001b[33m often\u001b[0m\u001b[33m hard\u001b[0m\u001b[33m to\u001b[0m\u001b[33m judge\u001b[0m\u001b[33m at\u001b[0m\u001b[33m the\u001b[0m\u001b[33m time\u001b[0m\u001b[33m anyway\u001b[0m\u001b[33m.\u001b[0m\u001b[33m Great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m is\u001b[0m\u001b[33m a\u001b[0m\u001b[33m matter\u001b[0m\u001b[33m of\u001b[0m\u001b[33m degree\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m it\u001b[0m\u001b[33m can\u001b[0m\u001b[33m be\u001b[0m\u001b[33m difficult\u001b[0m\u001b[33m to\u001b[0m\u001b[33m determine\u001b[0m\u001b[33m whether\u001b[0m\u001b[33m someone\u001b[0m\u001b[33m has\u001b[0m\u001b[33m done\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m until\u001b[0m\u001b[33m after\u001b[0m\u001b[33m the\u001b[0m\u001b[33m fact\u001b[0m\u001b[33m.\u001b[0m\u001b[97m\u001b[0m\n",
|
||||
"\u001b[30m\u001b[0m"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient\n",
|
||||
"\n",
|
||||
"vector_db_id = \"my_demo_vector_db\"\n",
|
||||
"client = LlamaStackClient(base_url=\"http://0.0.0.0:8321\")\n",
|
||||
"\n",
|
||||
"models = client.models.list()\n",
|
||||
"\n",
|
||||
"# Select the first ollama and first ollama's embedding model\n",
|
||||
"model_id = next(m for m in models if m.model_type == \"llm\" and m.provider_id == \"ollama\").identifier\n",
|
||||
"embedding_model = next(m for m in models if m.model_type == \"embedding\" and m.provider_id == \"ollama\")\n",
|
||||
"embedding_model_id = embedding_model.identifier\n",
|
||||
"embedding_dimension = embedding_model.metadata[\"embedding_dimension\"]\n",
|
||||
"\n",
|
||||
"_ = client.vector_dbs.register(\n",
|
||||
" vector_db_id=vector_db_id,\n",
|
||||
" embedding_model=embedding_model_id,\n",
|
||||
" embedding_dimension=embedding_dimension,\n",
|
||||
" provider_id=\"faiss\",\n",
|
||||
")\n",
|
||||
"source = \"https://www.paulgraham.com/greatwork.html\"\n",
|
||||
"print(\"rag_tool> Ingesting document:\", source)\n",
|
||||
"document = RAGDocument(\n",
|
||||
" document_id=\"document_1\",\n",
|
||||
" content=source,\n",
|
||||
" mime_type=\"text/html\",\n",
|
||||
" metadata={},\n",
|
||||
")\n",
|
||||
"client.tool_runtime.rag_tool.insert(\n",
|
||||
" documents=[document],\n",
|
||||
" vector_db_id=vector_db_id,\n",
|
||||
" chunk_size_in_tokens=50,\n",
|
||||
")\n",
|
||||
"agent = Agent(\n",
|
||||
" client,\n",
|
||||
" model=model_id,\n",
|
||||
" instructions=\"You are a helpful assistant\",\n",
|
||||
" tools=[\n",
|
||||
" {\n",
|
||||
" \"name\": \"builtin::rag/knowledge_search\",\n",
|
||||
" \"args\": {\"vector_db_ids\": [vector_db_id]},\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"prompt = \"How do you do great work?\"\n",
|
||||
"print(\"prompt>\", prompt)\n",
|
||||
"\n",
|
||||
"response = agent.create_turn(\n",
|
||||
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
|
||||
" session_id=agent.create_session(\"rag_session\"),\n",
|
||||
" stream=True,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"for log in AgentEventLogger().log(response):\n",
|
||||
" log.print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "341aaadf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e88e1185",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next Steps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bcb73600",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you're ready to dive deeper into Llama Stack!\n",
|
||||
"- Explore the [Detailed Tutorial](./detailed_tutorial.md).\n",
|
||||
"- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).\n",
|
||||
"- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).\n",
|
||||
"- Learn about Llama Stack [Concepts](../concepts/index.md).\n",
|
||||
"- Discover how to [Build Llama Stacks](../distributions/index.md).\n",
|
||||
"- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK.\n",
|
||||
"- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"gpuType": "T4",
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c1e7571c",
|
||||
"metadata": {
|
||||
"id": "c1e7571c"
|
||||
},
|
||||
"source": [
|
||||
"[](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)\n",
|
||||
"\n",
|
||||
"# Llama Stack - Building AI Applications\n",
|
||||
"\n",
|
||||
"<img src=\"https://llamastack.github.io/latest/_images/llama-stack.png\" alt=\"drawing\" width=\"500\"/>\n",
|
||||
"\n",
|
||||
"Get started with Llama Stack in minutes!\n",
|
||||
"\n",
|
||||
"[Llama Stack](https://github.com/meta-llama/llama-stack) is a stateful service with REST APIs to support the seamless transition of AI applications across different environments. You can build and test using a local server first and deploy to a hosted endpoint for production.\n",
|
||||
"\n",
|
||||
"In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)\n",
|
||||
"as the inference [provider](docs/source/providers/index.md#inference) for a Llama Model.\n"
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4CV1Q19BDMVw",
|
||||
"metadata": {
|
||||
"id": "4CV1Q19BDMVw"
|
||||
},
|
||||
"source": [
|
||||
"## Step 1: Install and setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "K4AvfUAJZOeS",
|
||||
"metadata": {
|
||||
"id": "K4AvfUAJZOeS"
|
||||
},
|
||||
"source": [
|
||||
"### 1.1. Install uv and test inference with Ollama\n",
|
||||
"\n",
|
||||
"We'll install [uv](https://docs.astral.sh/uv/) to setup the Python virtual environment, along with [colab-xterm](https://github.com/InfuseAI/colab-xterm) for running command-line tools, and [Ollama](https://ollama.com/download) as the inference provider."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7a2d7b85",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install uv llama_stack llama-stack-client\n",
|
||||
"\n",
|
||||
"## If running on Collab:\n",
|
||||
"# !pip install colab-xterm\n",
|
||||
"# %load_ext colabxterm\n",
|
||||
"\n",
|
||||
"!curl https://ollama.ai/install.sh | sh"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39fa584b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1.2. Test inference with Ollama"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3bf81522",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We’ll now launch a terminal and run inference on a Llama model with Ollama to verify that the model is working correctly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a7e8e0f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## If running on Colab:\n",
|
||||
"# %xterm\n",
|
||||
"\n",
|
||||
"## To be ran in the terminal:\n",
|
||||
"# ollama serve &\n",
|
||||
"# ollama run llama3.2:3b --keepalive 60m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f3c5f243",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If successful, you should see the model respond to a prompt.\n",
|
||||
"\n",
|
||||
"...\n",
|
||||
"```\n",
|
||||
">>> hi\n",
|
||||
"Hello! How can I assist you today?\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "oDUB7M_qe-Gs",
|
||||
"metadata": {
|
||||
"id": "oDUB7M_qe-Gs"
|
||||
},
|
||||
"source": [
|
||||
"## Step 2: Run the Llama Stack server\n",
|
||||
"\n",
|
||||
"In this showcase, we will start a Llama Stack server that is running locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "732eadc6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.1. Setup the Llama Stack Server"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "J2kGed0R5PSf",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"collapsed": true,
|
||||
"id": "J2kGed0R5PSf",
|
||||
"outputId": "2478ea60-8d35-48a1-b011-f233831740c5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import subprocess\n",
|
||||
"\n",
|
||||
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
|
||||
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
|
||||
"\n",
|
||||
"# this command installs all the dependencies needed for the llama stack server with the ollama inference provider\n",
|
||||
"!uv pip install llama-stack\n",
|
||||
"llama stack list-deps starter | xargs -L1 uv pip install\n",
|
||||
"llama stack run starter\n",
|
||||
"\n",
|
||||
"def run_llama_stack_server_background():\n",
|
||||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||||
" process = subprocess.Popen(\n",
|
||||
" f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter\n",
|
||||
" shell=True,\n",
|
||||
" stdout=log_file,\n",
|
||||
" stderr=log_file,\n",
|
||||
" text=True\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
|
||||
" return process\n",
|
||||
"\n",
|
||||
"def wait_for_server_to_start():\n",
|
||||
" import requests\n",
|
||||
" from requests.exceptions import ConnectionError\n",
|
||||
" import time\n",
|
||||
"\n",
|
||||
" url = \"http://0.0.0.0:8321/v1/health\"\n",
|
||||
" max_retries = 30\n",
|
||||
" retry_interval = 1\n",
|
||||
"\n",
|
||||
" print(\"Waiting for server to start\", end=\"\")\n",
|
||||
" for _ in range(max_retries):\n",
|
||||
" try:\n",
|
||||
" response = requests.get(url)\n",
|
||||
" if response.status_code == 200:\n",
|
||||
" print(\"\\nServer is ready!\")\n",
|
||||
" return True\n",
|
||||
" except ConnectionError:\n",
|
||||
" print(\".\", end=\"\", flush=True)\n",
|
||||
" time.sleep(retry_interval)\n",
|
||||
"\n",
|
||||
" print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# use this helper if needed to kill the server\n",
|
||||
"def kill_llama_stack_server():\n",
|
||||
" # Kill any existing llama stack server processes\n",
|
||||
" os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c40e9efd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.2. Start the Llama Stack Server"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f779283d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Starting Llama Stack server with PID: 787100\n",
|
||||
"Waiting for server to start\n",
|
||||
"Server is ready!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"server_process = run_llama_stack_server_background()\n",
|
||||
"assert wait_for_server_to_start()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "28477c03",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Run the demo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7da71011",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html\n",
|
||||
"prompt> How do you do great work?\n",
|
||||
"\u001b[33minference> \u001b[0m\u001b[33m[k\u001b[0m\u001b[33mnowledge\u001b[0m\u001b[33m_search\u001b[0m\u001b[33m(query\u001b[0m\u001b[33m=\"\u001b[0m\u001b[33mWhat\u001b[0m\u001b[33m is\u001b[0m\u001b[33m the\u001b[0m\u001b[33m key\u001b[0m\u001b[33m to\u001b[0m\u001b[33m doing\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m\")]\u001b[0m\u001b[97m\u001b[0m\n",
|
||||
"\u001b[32mtool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}\u001b[0m\n",
|
||||
"\u001b[32mtool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\\nBEGIN of knowledge_search tool results.\\n', type='text'), TextContentItem(text=\"Result 1:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 2:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 3:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 4:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text=\"Result 5:\\nDocument_id:docum\\nContent: work. Doing great work means doing something important\\nso well that you expand people's ideas of what's possible. But\\nthere's no threshold for importance. It's a matter of degree, and\\noften hard to judge at the time anyway.\\n\", type='text'), TextContentItem(text='END of knowledge_search tool results.\\n', type='text'), TextContentItem(text='The above results were retrieved to help answer the user\\'s query: \"What is the key to doing great work\". Use them as supporting information only in answering this query.\\n', type='text')]\u001b[0m\n",
|
||||
"\u001b[33minference> \u001b[0m\u001b[33mDoing\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m means\u001b[0m\u001b[33m doing\u001b[0m\u001b[33m something\u001b[0m\u001b[33m important\u001b[0m\u001b[33m so\u001b[0m\u001b[33m well\u001b[0m\u001b[33m that\u001b[0m\u001b[33m you\u001b[0m\u001b[33m expand\u001b[0m\u001b[33m people\u001b[0m\u001b[33m's\u001b[0m\u001b[33m ideas\u001b[0m\u001b[33m of\u001b[0m\u001b[33m what\u001b[0m\u001b[33m's\u001b[0m\u001b[33m possible\u001b[0m\u001b[33m.\u001b[0m\u001b[33m However\u001b[0m\u001b[33m,\u001b[0m\u001b[33m there\u001b[0m\u001b[33m's\u001b[0m\u001b[33m no\u001b[0m\u001b[33m threshold\u001b[0m\u001b[33m for\u001b[0m\u001b[33m importance\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m it\u001b[0m\u001b[33m's\u001b[0m\u001b[33m often\u001b[0m\u001b[33m hard\u001b[0m\u001b[33m to\u001b[0m\u001b[33m judge\u001b[0m\u001b[33m at\u001b[0m\u001b[33m the\u001b[0m\u001b[33m time\u001b[0m\u001b[33m anyway\u001b[0m\u001b[33m.\u001b[0m\u001b[33m Great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m is\u001b[0m\u001b[33m a\u001b[0m\u001b[33m matter\u001b[0m\u001b[33m of\u001b[0m\u001b[33m degree\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m it\u001b[0m\u001b[33m can\u001b[0m\u001b[33m be\u001b[0m\u001b[33m difficult\u001b[0m\u001b[33m to\u001b[0m\u001b[33m determine\u001b[0m\u001b[33m whether\u001b[0m\u001b[33m someone\u001b[0m\u001b[33m has\u001b[0m\u001b[33m done\u001b[0m\u001b[33m great\u001b[0m\u001b[33m work\u001b[0m\u001b[33m until\u001b[0m\u001b[33m after\u001b[0m\u001b[33m the\u001b[0m\u001b[33m fact\u001b[0m\u001b[33m.\u001b[0m\u001b[97m\u001b[0m\n",
|
||||
"\u001b[30m\u001b[0m"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient\n",
|
||||
"\n",
|
||||
"vector_db_id = \"my_demo_vector_db\"\n",
|
||||
"client = LlamaStackClient(base_url=\"http://0.0.0.0:8321\")\n",
|
||||
"\n",
|
||||
"models = client.models.list()\n",
|
||||
"\n",
|
||||
"# Select the first ollama and first ollama's embedding model\n",
|
||||
"model_id = next(m for m in models if m.model_type == \"llm\" and m.provider_id == \"ollama\").identifier\n",
|
||||
"embedding_model = next(m for m in models if m.model_type == \"embedding\" and m.provider_id == \"ollama\")\n",
|
||||
"embedding_model_id = embedding_model.identifier\n",
|
||||
"embedding_dimension = embedding_model.metadata[\"embedding_dimension\"]\n",
|
||||
"\n",
|
||||
"_ = client.vector_dbs.register(\n",
|
||||
" vector_db_id=vector_db_id,\n",
|
||||
" embedding_model=embedding_model_id,\n",
|
||||
" embedding_dimension=embedding_dimension,\n",
|
||||
" provider_id=\"faiss\",\n",
|
||||
")\n",
|
||||
"source = \"https://www.paulgraham.com/greatwork.html\"\n",
|
||||
"print(\"rag_tool> Ingesting document:\", source)\n",
|
||||
"document = RAGDocument(\n",
|
||||
" document_id=\"document_1\",\n",
|
||||
" content=source,\n",
|
||||
" mime_type=\"text/html\",\n",
|
||||
" metadata={},\n",
|
||||
")\n",
|
||||
"client.tool_runtime.rag_tool.insert(\n",
|
||||
" documents=[document],\n",
|
||||
" vector_db_id=vector_db_id,\n",
|
||||
" chunk_size_in_tokens=50,\n",
|
||||
")\n",
|
||||
"agent = Agent(\n",
|
||||
" client,\n",
|
||||
" model=model_id,\n",
|
||||
" instructions=\"You are a helpful assistant\",\n",
|
||||
" tools=[\n",
|
||||
" {\n",
|
||||
" \"name\": \"builtin::rag/knowledge_search\",\n",
|
||||
" \"args\": {\"vector_db_ids\": [vector_db_id]},\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"prompt = \"How do you do great work?\"\n",
|
||||
"print(\"prompt>\", prompt)\n",
|
||||
"\n",
|
||||
"response = agent.create_turn(\n",
|
||||
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
|
||||
" session_id=agent.create_session(\"rag_session\"),\n",
|
||||
" stream=True,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"for log in AgentEventLogger().log(response):\n",
|
||||
" log.print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "341aaadf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e88e1185",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next Steps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bcb73600",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you're ready to dive deeper into Llama Stack!\n",
|
||||
"- Explore the [Detailed Tutorial](./detailed_tutorial.md).\n",
|
||||
"- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).\n",
|
||||
"- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).\n",
|
||||
"- Learn about Llama Stack [Concepts](../concepts/index.md).\n",
|
||||
"- Discover how to [Build Llama Stacks](../distributions/index.md).\n",
|
||||
"- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK.\n",
|
||||
"- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"gpuType": "T4",
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
|
@ -47,11 +47,12 @@ function QuickStart() {
|
|||
<pre><code>{`# Install uv and start Ollama
|
||||
ollama run llama3.2:3b --keepalive 60m
|
||||
|
||||
# Install server dependencies
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
|
||||
# Run Llama Stack server
|
||||
OLLAMA_URL=http://localhost:11434 \\
|
||||
uv run --with llama-stack \\
|
||||
llama stack build --distro starter \\
|
||||
--image-type venv --run
|
||||
OLLAMA_URL=http://localhost:11434 llama stack run starter
|
||||
|
||||
# Try the Python SDK
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
|
|
@ -78,17 +78,15 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
|
|||
|
||||
## Build, Configure, and Run Llama Stack
|
||||
|
||||
1. **Build the Llama Stack**:
|
||||
Build the Llama Stack using the `starter` template:
|
||||
1. **Install Llama Stack and dependencies**:
|
||||
```bash
|
||||
uv run --with llama-stack llama stack build --distro starter --image-type venv
|
||||
uv pip install llama-stack
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
```
|
||||
**Expected Output:**
|
||||
|
||||
2. **Start the distribution**:
|
||||
```bash
|
||||
...
|
||||
Build Successful!
|
||||
You can find the newly-built template here: ~/.llama/distributions/starter/starter-run.yaml
|
||||
You can run the new Llama Stack Distro via: uv run --with llama-stack llama stack run starter
|
||||
llama stack run starter
|
||||
```
|
||||
|
||||
3. **Set the ENV variables by exporting them to the terminal**:
|
||||
|
|
2649
llama_stack/ui/package-lock.json
generated
2649
llama_stack/ui/package-lock.json
generated
File diff suppressed because it is too large
Load diff
|
@ -43,16 +43,16 @@
|
|||
"@testing-library/dom": "^10.4.1",
|
||||
"@testing-library/jest-dom": "^6.8.0",
|
||||
"@testing-library/react": "^16.3.0",
|
||||
"@types/jest": "^29.5.14",
|
||||
"@types/jest": "^30.0.0",
|
||||
"@types/node": "^24",
|
||||
"@types/react": "^19",
|
||||
"@types/react-dom": "^19",
|
||||
"eslint": "^9",
|
||||
"eslint-config-next": "15.5.2",
|
||||
"eslint-config-next": "15.5.6",
|
||||
"eslint-config-prettier": "^10.1.8",
|
||||
"eslint-plugin-prettier": "^5.5.4",
|
||||
"jest": "^29.7.0",
|
||||
"jest-environment-jsdom": "^30.1.2",
|
||||
"jest": "^30.2.0",
|
||||
"jest-environment-jsdom": "^30.2.0",
|
||||
"prettier": "3.6.2",
|
||||
"tailwindcss": "^4",
|
||||
"ts-node": "^10.9.2",
|
||||
|
|
|
@ -5,10 +5,10 @@
|
|||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
[ -z "$BASH_VERSION" ] && {
|
||||
echo "This script must be run with bash" >&2
|
||||
exit 1
|
||||
}
|
||||
[ -z "${BASH_VERSION:-}" ] && exec /usr/bin/env bash "$0" "$@"
|
||||
if set -o | grep -Eq 'posix[[:space:]]+on'; then
|
||||
exec /usr/bin/env bash "$0" "$@"
|
||||
fi
|
||||
|
||||
set -Eeuo pipefail
|
||||
|
||||
|
@ -18,12 +18,110 @@ MODEL_ALIAS="llama3.2:3b"
|
|||
SERVER_IMAGE="docker.io/llamastack/distribution-starter:latest"
|
||||
WAIT_TIMEOUT=30
|
||||
TEMP_LOG=""
|
||||
WITH_TELEMETRY=true
|
||||
TELEMETRY_SERVICE_NAME="llama-stack"
|
||||
TELEMETRY_SINKS="otel_trace,otel_metric"
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4318"
|
||||
TEMP_TELEMETRY_DIR=""
|
||||
|
||||
materialize_telemetry_configs() {
|
||||
local dest="$1"
|
||||
mkdir -p "$dest"
|
||||
local otel_cfg="${dest}/otel-collector-config.yaml"
|
||||
local prom_cfg="${dest}/prometheus.yml"
|
||||
local graf_cfg="${dest}/grafana-datasources.yaml"
|
||||
|
||||
for asset in "$otel_cfg" "$prom_cfg" "$graf_cfg"; do
|
||||
if [ -e "$asset" ]; then
|
||||
die "Telemetry asset ${asset} already exists; refusing to overwrite"
|
||||
fi
|
||||
done
|
||||
|
||||
cat <<'EOF' > "$otel_cfg"
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc:
|
||||
endpoint: 0.0.0.0:4317
|
||||
http:
|
||||
endpoint: 0.0.0.0:4318
|
||||
|
||||
processors:
|
||||
batch:
|
||||
timeout: 1s
|
||||
send_batch_size: 1024
|
||||
|
||||
exporters:
|
||||
# Export traces to Jaeger
|
||||
otlp/jaeger:
|
||||
endpoint: jaeger:4317
|
||||
tls:
|
||||
insecure: true
|
||||
|
||||
# Export metrics to Prometheus
|
||||
prometheus:
|
||||
endpoint: 0.0.0.0:9464
|
||||
namespace: llama_stack
|
||||
|
||||
# Debug exporter for troubleshooting
|
||||
debug:
|
||||
verbosity: detailed
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [otlp/jaeger, debug]
|
||||
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [prometheus, debug]
|
||||
EOF
|
||||
|
||||
cat <<'EOF' > "$prom_cfg"
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
- job_name: 'otel-collector'
|
||||
static_configs:
|
||||
- targets: ['otel-collector:9464']
|
||||
EOF
|
||||
|
||||
cat <<'EOF' > "$graf_cfg"
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
|
||||
- name: Jaeger
|
||||
type: jaeger
|
||||
access: proxy
|
||||
url: http://jaeger:16686
|
||||
editable: true
|
||||
EOF
|
||||
}
|
||||
|
||||
# Cleanup function to remove temporary files
|
||||
cleanup() {
|
||||
if [ -n "$TEMP_LOG" ] && [ -f "$TEMP_LOG" ]; then
|
||||
rm -f "$TEMP_LOG"
|
||||
fi
|
||||
if [ -n "$TEMP_TELEMETRY_DIR" ] && [ -d "$TEMP_TELEMETRY_DIR" ]; then
|
||||
rm -rf "$TEMP_TELEMETRY_DIR"
|
||||
fi
|
||||
}
|
||||
|
||||
# Set up trap to clean up on exit, error, or interrupt
|
||||
|
@ -32,7 +130,7 @@ trap cleanup EXIT ERR INT TERM
|
|||
log(){ printf "\e[1;32m%s\e[0m\n" "$*"; }
|
||||
die(){
|
||||
printf "\e[1;31m❌ %s\e[0m\n" "$*" >&2
|
||||
printf "\e[1;31m🐛 Report an issue @ https://github.com/meta-llama/llama-stack/issues if you think it's a bug\e[0m\n" >&2
|
||||
printf "\e[1;31m🐛 Report an issue @ https://github.com/llamastack/llama-stack/issues if you think it's a bug\e[0m\n" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
|
@ -89,6 +187,12 @@ Options:
|
|||
-m, --model MODEL Model alias to use (default: ${MODEL_ALIAS})
|
||||
-i, --image IMAGE Server image (default: ${SERVER_IMAGE})
|
||||
-t, --timeout SECONDS Service wait timeout in seconds (default: ${WAIT_TIMEOUT})
|
||||
--with-telemetry Provision Jaeger, OTEL Collector, Prometheus, and Grafana (default: enabled)
|
||||
--no-telemetry, --without-telemetry
|
||||
Skip provisioning the telemetry stack
|
||||
--telemetry-service NAME Service name reported to telemetry (default: ${TELEMETRY_SERVICE_NAME})
|
||||
--telemetry-sinks SINKS Comma-separated telemetry sinks (default: ${TELEMETRY_SINKS})
|
||||
--otel-endpoint URL OTLP endpoint provided to Llama Stack (default: ${OTEL_EXPORTER_OTLP_ENDPOINT})
|
||||
-h, --help Show this help message
|
||||
|
||||
For more information:
|
||||
|
@ -127,6 +231,26 @@ while [[ $# -gt 0 ]]; do
|
|||
WAIT_TIMEOUT="$2"
|
||||
shift 2
|
||||
;;
|
||||
--with-telemetry)
|
||||
WITH_TELEMETRY=true
|
||||
shift
|
||||
;;
|
||||
--no-telemetry|--without-telemetry)
|
||||
WITH_TELEMETRY=false
|
||||
shift
|
||||
;;
|
||||
--telemetry-service)
|
||||
TELEMETRY_SERVICE_NAME="$2"
|
||||
shift 2
|
||||
;;
|
||||
--telemetry-sinks)
|
||||
TELEMETRY_SINKS="$2"
|
||||
shift 2
|
||||
;;
|
||||
--otel-endpoint)
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT="$2"
|
||||
shift 2
|
||||
;;
|
||||
*)
|
||||
die "Unknown option: $1"
|
||||
;;
|
||||
|
@ -171,7 +295,11 @@ if [ "$ENGINE" = "podman" ] && [ "$(uname -s)" = "Darwin" ]; then
|
|||
fi
|
||||
|
||||
# Clean up any leftovers from earlier runs
|
||||
for name in ollama-server llama-stack; do
|
||||
containers=(ollama-server llama-stack)
|
||||
if [ "$WITH_TELEMETRY" = true ]; then
|
||||
containers+=(jaeger otel-collector prometheus grafana)
|
||||
fi
|
||||
for name in "${containers[@]}"; do
|
||||
ids=$($ENGINE ps -aq --filter "name=^${name}$")
|
||||
if [ -n "$ids" ]; then
|
||||
log "⚠️ Found existing container(s) for '${name}', removing..."
|
||||
|
@ -191,6 +319,64 @@ if ! $ENGINE network inspect llama-net >/dev/null 2>&1; then
|
|||
fi
|
||||
fi
|
||||
|
||||
###############################################################################
|
||||
# Telemetry Stack
|
||||
###############################################################################
|
||||
if [ "$WITH_TELEMETRY" = true ]; then
|
||||
TEMP_TELEMETRY_DIR="$(mktemp -d)"
|
||||
TELEMETRY_ASSETS_DIR="$TEMP_TELEMETRY_DIR"
|
||||
log "🧰 Materializing telemetry configs..."
|
||||
materialize_telemetry_configs "$TELEMETRY_ASSETS_DIR"
|
||||
|
||||
log "📡 Starting telemetry stack..."
|
||||
|
||||
if ! execute_with_log $ENGINE run -d "${PLATFORM_OPTS[@]}" --name jaeger \
|
||||
--network llama-net \
|
||||
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
|
||||
-p 16686:16686 \
|
||||
-p 14250:14250 \
|
||||
-p 9411:9411 \
|
||||
docker.io/jaegertracing/all-in-one:latest > /dev/null 2>&1; then
|
||||
die "Jaeger startup failed"
|
||||
fi
|
||||
|
||||
if ! execute_with_log $ENGINE run -d "${PLATFORM_OPTS[@]}" --name otel-collector \
|
||||
--network llama-net \
|
||||
-p 4318:4318 \
|
||||
-p 4317:4317 \
|
||||
-p 9464:9464 \
|
||||
-p 13133:13133 \
|
||||
-v "${TELEMETRY_ASSETS_DIR}/otel-collector-config.yaml:/etc/otel-collector-config.yaml:Z" \
|
||||
docker.io/otel/opentelemetry-collector-contrib:latest \
|
||||
--config /etc/otel-collector-config.yaml > /dev/null 2>&1; then
|
||||
die "OpenTelemetry Collector startup failed"
|
||||
fi
|
||||
|
||||
if ! execute_with_log $ENGINE run -d "${PLATFORM_OPTS[@]}" --name prometheus \
|
||||
--network llama-net \
|
||||
-p 9090:9090 \
|
||||
-v "${TELEMETRY_ASSETS_DIR}/prometheus.yml:/etc/prometheus/prometheus.yml:Z" \
|
||||
docker.io/prom/prometheus:latest \
|
||||
--config.file=/etc/prometheus/prometheus.yml \
|
||||
--storage.tsdb.path=/prometheus \
|
||||
--web.console.libraries=/etc/prometheus/console_libraries \
|
||||
--web.console.templates=/etc/prometheus/consoles \
|
||||
--storage.tsdb.retention.time=200h \
|
||||
--web.enable-lifecycle > /dev/null 2>&1; then
|
||||
die "Prometheus startup failed"
|
||||
fi
|
||||
|
||||
if ! execute_with_log $ENGINE run -d "${PLATFORM_OPTS[@]}" --name grafana \
|
||||
--network llama-net \
|
||||
-p 3000:3000 \
|
||||
-e GF_SECURITY_ADMIN_PASSWORD=admin \
|
||||
-e GF_USERS_ALLOW_SIGN_UP=false \
|
||||
-v "${TELEMETRY_ASSETS_DIR}/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml:Z" \
|
||||
docker.io/grafana/grafana:11.0.0 > /dev/null 2>&1; then
|
||||
die "Grafana startup failed"
|
||||
fi
|
||||
fi
|
||||
|
||||
###############################################################################
|
||||
# 1. Ollama
|
||||
###############################################################################
|
||||
|
@ -218,9 +404,19 @@ fi
|
|||
###############################################################################
|
||||
# 2. Llama‑Stack
|
||||
###############################################################################
|
||||
server_env_opts=()
|
||||
if [ "$WITH_TELEMETRY" = true ]; then
|
||||
server_env_opts+=(
|
||||
-e TELEMETRY_SINKS="${TELEMETRY_SINKS}"
|
||||
-e OTEL_EXPORTER_OTLP_ENDPOINT="${OTEL_EXPORTER_OTLP_ENDPOINT}"
|
||||
-e OTEL_SERVICE_NAME="${TELEMETRY_SERVICE_NAME}"
|
||||
)
|
||||
fi
|
||||
|
||||
cmd=( run -d "${PLATFORM_OPTS[@]}" --name llama-stack \
|
||||
--network llama-net \
|
||||
-p "${PORT}:${PORT}" \
|
||||
"${server_env_opts[@]}" \
|
||||
-e OLLAMA_URL="http://ollama-server:${OLLAMA_PORT}" \
|
||||
"${SERVER_IMAGE}" --port "${PORT}")
|
||||
|
||||
|
@ -244,5 +440,12 @@ log "👉 API endpoint: http://localhost:${PORT}"
|
|||
log "📖 Documentation: https://llamastack.github.io/latest/references/api_reference/index.html"
|
||||
log "💻 To access the llama stack CLI, exec into the container:"
|
||||
log " $ENGINE exec -ti llama-stack bash"
|
||||
if [ "$WITH_TELEMETRY" = true ]; then
|
||||
log "📡 Telemetry dashboards:"
|
||||
log " Jaeger UI: http://localhost:16686"
|
||||
log " Prometheus UI: http://localhost:9090"
|
||||
log " Grafana UI: http://localhost:3000 (admin/admin)"
|
||||
log " OTEL Collector: http://localhost:4318"
|
||||
fi
|
||||
log "🐛 Report an issue @ https://github.com/llamastack/llama-stack/issues if you think it's a bug"
|
||||
log ""
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue