From a23e5046eea0cc203186eeeb9a6b13e7b42b3ad5 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Mon, 7 Apr 2025 23:40:40 -0400 Subject: [PATCH] adjusted based on latest feedback Signed-off-by: Francisco Javier Arceo --- docs/source/getting_started/index.md | 103 ++++++++++++++++++++++++--- 1 file changed, 94 insertions(+), 9 deletions(-) diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index a4e2ab516..fb20bb987 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -10,9 +10,12 @@ Llama Stack is a stateful service with REST APIs to support seamless transition In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) as the inference [provider](../providers/index.md#inference) for a Llama Model. +```{admonition} Note +:class: tip +These instructions outlined are for a +``` ## Step 1: Installation and Setup - ### i. Install and Start Ollama for Inference Install Ollama by following the instructions on the [Ollama website](https://ollama.com/download). @@ -60,11 +63,14 @@ uv pip install llama-stack ``` Note the Llama Stack Server includes the client SDK as well. -## Step 3: Build and Run Llama Stack +## Step 3: Build and Run the Llama Stack Server Llama Stack uses a [configuration file](../distributions/configuration.md) to define the stack. The config file is a YAML file that specifies the providers and their configurations. ### i. Build and Run the Llama Stack Config for Ollama +::::{tab-set} + +:::{tab-item} Using Python ```bash INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run ``` @@ -73,18 +79,94 @@ You will see output like below: INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` +::: + +:::{tab-item} Using a Container +To get started quickly, we provide various container images for the server component that work with different inference +providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the container image. If you'd +like to build your own image or customize the configurations, please check out [this guide](../references/index.md). + +Lets setup some environment variables and create a local directory to mount into the container’s file system. +```bash +export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" +export LLAMA_STACK_PORT=8321 +mkdir -p ~/.llama +``` +Then start the server using the container tool of your choice. For example, if you are running Docker you can use the +following command: +```bash +docker run -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + llamastack/distribution-ollama \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=$INFERENCE_MODEL \ + --env OLLAMA_URL=http://host.docker.internal:11434 +``` +Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with +`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL` +with `host.containers.internal`. + +As another example, to start the container with Podman, you can do the same but replace `docker` at the start of the command with `podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL` with `host.containers.internal`. + +Configuration for this is available at `distributions/ollama/run.yaml`. + +```{admonition} Note +:class: note + +Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host’s network directly so it can connect to Ollama running on `localhost:11434`. + +Linux users having issues running the above command should instead try the following: +```bash +docker run -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + --network=host \ + llamastack/distribution-ollama \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=$INFERENCE_MODEL \ + --env OLLAMA_URL=http://localhost:11434 +``` +::: +:::: + + ### ii. Using the Llama Stack Client -Now you can use the llama stack client to run inference and build agents! You can reuse the server setup or use the -[Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/). Note that the client package is already -included in the `llama-stack` package. +Now you can use the llama stack client to run inference and build agents! +You can reuse the server setup or use the [Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/). +Note that the client package is already included in the `llama-stack` package. -Open a new terminal and navigate to the same directory you started the server from. Then set up or activate your -virtual environment. +Open a new terminal and navigate to the same directory you started the server from. Then set up a new or activate your +existing server virtual environment. +::::{tab-set} + +:::{tab-item} Reuse the Server Setup ```bash source .venv/bin/activate ``` +::: + +:::{tab-item} Install the Llama Stack Client (venv) +```bash +uv venv client --python 3.10 +source client/bin/activate +pip install llama-stack-client +``` +::: + +:::{tab-item} Install the Llama Stack Client (conda) +```bash +yes | conda create -n stack-client python=3.10 +conda activate stack-client +pip install llama-stack-client +``` +::: +:::: + Now let's use the `llama-stack-client` CLI to check the connectivity to the server. ```bash @@ -95,7 +177,7 @@ You will see the below: Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321 ``` -#### iii. List available models +#### iii. List Available Models List the models ``` llama-stack-client models list @@ -163,6 +245,7 @@ response = client.inference.chat_completion( ) print(response.completion_message.content) ``` + ### ii. Run the Script Let's run the script using `uv` ```bash @@ -432,7 +515,8 @@ Let's run the script using `uv` ```bash uv run python rag_agent.py ``` -Which will output: +:::{dropdown} `👋 Click here to see the sample output` + ``` user> what is torchtune inference> [knowledge_search(query='TorchTune')] @@ -446,6 +530,7 @@ PyTorch Tune provides a recipe for LoRA (Low-Rank Adaptation) finetuning, which ... Overall, DORA is a powerful reinforcement learning algorithm that can learn complex tasks from human demonstrations. However, it requires careful consideration of the challenges and limitations to achieve optimal results. ``` +::: Congrats! 🥳 Now you're ready to build your own Llama Stack applications! 🚀