From 09619879623d720120cd94068773e51f15f5124e Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 9 Apr 2025 17:01:52 -0400 Subject: [PATCH] adding server example back in and restructuring steps Signed-off-by: Francisco Javier Arceo --- docs/source/getting_started/index.md | 28 ++++++++++++++++------------ docs/source/index.md | 3 ++- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 2f57725d7..63fa5ae6e 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -2,27 +2,32 @@ Get started with Llama Stack in minutes! +Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different +environments. You can build and test using a local server first and deploy to a hosted endpoint for production. + In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/) as the inference [provider](../providers/index.md#inference) for a Llama Model. -## Step 1. Install [uv](https://docs.astral.sh/uv/) and setup your virtual environment +## Step 1. Install and Setup +Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with +[Ollama](https://ollama.com/download). ```bash -uv pip install llama-stack aiosqlite faiss-cpu ollama \ -openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals +uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals source .venv/bin/activate export INFERENCE_MODEL="llama3.2:3b" -``` -## Step 2: Run inference locally with Ollama -```bash -# make sure to run this in a separate terminal ollama run llama3.2:3b --keepalive 60m ``` +## Step 2: Run the Llama Stack Server +```bash +INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run +``` ## Step 3: Run the Demo -You can run this with `uv run demo_script.py` or in an interactive shell. +Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. ```python from termcolor import cprint -from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.types import Document +from llama_stack_client import LlamaStackClient + vector_db = "faiss" vector_db_id = "test-vector-db" @@ -37,8 +42,7 @@ documents = [ ) ] -client = LlamaStackAsLibraryClient("ollama") -_ = client.initialize() +client = LlamaStackClient(base_url="http://localhost:8321") client.vector_dbs.register( provider_id=vector_db, vector_db_id=vector_db_id, @@ -65,7 +69,7 @@ for chunk in response.content: cprint("" + "-" * 50, "yellow") ``` And you should see output like below. -```bash +``` -------------------------------------------------- Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch? -------------------------------------------------- diff --git a/docs/source/index.md b/docs/source/index.md index a0ac95957..99b0e1a3e 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -1,3 +1,5 @@ +# Llama Stack +Welcome to Llama Stack, the open-source framework for building generative AI applications. ```{admonition} Llama 4 is here! :class: tip @@ -9,7 +11,6 @@ Check out [Getting Started with Llama 4](https://colab.research.google.com/githu Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details. ``` -# Llama Stack ## What is Llama Stack?