[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)

# Llama Stack - Building AI Applications

"drawing"

Get started with Llama Stack in minutes!

[Llama Stack](https://github.com/meta-llama/llama-stack) is a stateful service with REST APIs to support the seamless transition of AI applications across different environments. You can build and test using a local server first and deploy to a hosted endpoint for production.

In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)
as the inference [provider](docs/source/providers/index.md#inference) for a Llama Model.


## Step 1: Install and setup

### 1.1. Install uv and test inference with Ollama

We'll install [uv](https://docs.astral.sh/uv/) to setup the Python virtual environment, along with [colab-xterm](https://github.com/InfuseAI/colab-xterm) for running command-line tools, and [Ollama](https://ollama.com/download) as the inference provider.

In [None]:
%pip install uv llama_stack llama-stack-client

## If running on Collab:
# !pip install colab-xterm
# %load_ext colabxterm

!curl https://ollama.ai/install.sh | sh

### 1.2. Test inference with Ollama

We’ll now launch a terminal and run inference on a Llama model with Ollama to verify that the model is working correctly.

In [None]:
## If running on Colab:
# %xterm

## To be ran in the terminal:
# ollama serve &
# ollama run llama3.2:3b --keepalive 60m

If successful, you should see the model respond to a prompt.

...
```
>>> hi
Hello! How can I assist you today?
```

## Step 2: Run the Llama Stack server

In this showcase, we will start a Llama Stack server that is running locally.

### 2.1. Setup the Llama Stack Server

In [None]:
import os 
import subprocess

if "UV_SYSTEM_PYTHON" in os.environ:
 del os.environ["UV_SYSTEM_PYTHON"]

# this command installs all the dependencies needed for the llama stack server with the ollama inference provider
!uv run --with llama-stack llama stack build --template ollama --image-type venv --image-name myvenv

def run_llama_stack_server_background():
 log_file = open("llama_stack_server.log", "w")
 process = subprocess.Popen(
 f"uv run --with llama-stack llama stack run ollama --image-type venv --image-name myvenv --env INFERENCE_MODEL=llama3.2:3b",
 shell=True,
 stdout=log_file,
 stderr=log_file,
 text=True
 )
 
 print(f"Starting Llama Stack server with PID: {process.pid}")
 return process

def wait_for_server_to_start():
 import requests
 from requests.exceptions import ConnectionError
 import time
 
 url = "http://0.0.0.0:8321/v1/health"
 max_retries = 30
 retry_interval = 1
 
 print("Waiting for server to start", end="")
 for _ in range(max_retries):
 try:
 response = requests.get(url)
 if response.status_code == 200:
 print("\nServer is ready!")
 return True
 except ConnectionError:
 print(".", end="", flush=True)
 time.sleep(retry_interval)
 
 print("\nServer failed to start after", max_retries * retry_interval, "seconds")
 return False


# use this helper if needed to kill the server 
def kill_llama_stack_server():
 # Kill any existing llama stack server processes
 os.system("ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9")


### 2.2. Start the Llama Stack Server

In [7]:
server_process = run_llama_stack_server_background()
assert wait_for_server_to_start()

Starting Llama Stack server with PID: 787100
Waiting for server to start
Server is ready!


## Step 3: Run the demo

In [8]:
from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient

vector_db_id = "my_demo_vector_db"
client = LlamaStackClient(base_url="http://0.0.0.0:8321")

models = client.models.list()

# Select the first LLM and first embedding models
model_id = next(m for m in models if m.model_type == "llm").identifier
embedding_model_id = (
 em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = em.metadata["embedding_dimension"]

_ = client.vector_dbs.register(
 vector_db_id=vector_db_id,
 embedding_model=embedding_model_id,
 embedding_dimension=embedding_dimension,
 provider_id="faiss",
)
source = "https://www.paulgraham.com/greatwork.html"
print("rag_tool> Ingesting document:", source)
document = RAGDocument(
 document_id="document_1",
 content=source,
 mime_type="text/html",
 metadata={},
)
client.tool_runtime.rag_tool.insert(
 documents=[document],
 vector_db_id=vector_db_id,
 chunk_size_in_tokens=50,
)
agent = Agent(
 client,
 model=model_id,
 instructions="You are a helpful assistant",
 tools=[
 {
 "name": "builtin::rag/knowledge_search",
 "args": {"vector_db_ids": [vector_db_id]},
 }
 ],
)

prompt = "How do you do great work?"
print("prompt>", prompt)

response = agent.create_turn(
 messages=[{"role": "user", "content": prompt}],
 session_id=agent.create_session("rag_session"),
 stream=True,
)

for log in AgentEventLogger().log(response):
 log.print()

rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html
prompt> How do you do great work?
[33minference> [0m[33m[k[0m[33mnowledge[0m[33m_search[0m[33m(query[0m[33m="[0m[33mWhat[0m[33m is[0m[33m the[0m[33m key[0m[33m to[0m[33m doing[0m[33m great[0m[33m work[0m[33m")][0m[97m[0m
[32mtool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}[0m
[32mtool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent: work. Doing great work means doing 

Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳

## Next Steps

Now you're ready to dive deeper into Llama Stack!
- Explore the [Detailed Tutorial](./detailed_tutorial.md).
- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).
- Learn about Llama Stack [Concepts](../concepts/index.md).
- Discover how to [Build Llama Stacks](../distributions/index.md).
- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK.
- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials.