Merge branch 'main' into playground-ui

2025-12-17 07:32:36 +00:00 · 2024-12-02 12:44:50 -08:00 · 2024-12-02 12:44:50 -08:00 · 9bceb1912e
commit 9bceb1912e
parent 9bb6c1346b 6bcd1bd9f1
16 changed files with 145 additions and 128 deletions
--- a/README.md
+++ b/README.md
@ -93,12 +93,12 @@ Additionally, we have designed every element of the Stack such that APIs as well

 | **Distribution** 	|           **Llama Stack Docker**           	| Start This Distribution 	|
 |:----------------:	|:------------------------------------------:	|:-----------------------:	|
-|  Meta Reference  	| [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) 	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/meta-reference-gpu.html)       	|
-|  Meta Reference Quantized  	| [llamastack/distribution-meta-reference-quantized-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-quantized-gpu/general) 	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/meta-reference-quantized-gpu.html)       	|
-|      Ollama      	|       [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general)       	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/ollama.html)       	|
-|        TGI       	|         [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/tgi.html)       	|
-|        Together       	|         [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/remote_hosted_distro/together.html)       	|
-|        Fireworks       	|         [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/remote_hosted_distro/fireworks.html)       	|
+|  Meta Reference  	| [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) 	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html)       	|
+|  Meta Reference Quantized  	| [llamastack/distribution-meta-reference-quantized-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-quantized-gpu/general) 	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-quantized-gpu.html)       	|
+|      Ollama      	|       [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general)       	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html)       	|
+|        TGI       	|         [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html)       	|
+|        Together       	|         [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html)       	|
+|        Fireworks       	|         [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)        	|       [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html)       	|

 ## Installation

@ -128,7 +128,7 @@ You have two ways to install this repository:

 Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details.

-* [CLI reference](https://llama-stack.readthedocs.io/en/latest/cli_reference/index.html)
+* [CLI reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html)
    * Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
 * [Getting Started](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html)
    * Quick guide to start a Llama Stack server.
@ -136,7 +136,7 @@ Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest
    * The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).
    * A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.
 * [Contributing](CONTRIBUTING.md)
-    * [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/api_providers/new_api_provider.html) to walk-through how to add a new API provider.
+    * [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider.

 ## Llama Stack Client SDKs

--- a/docs/source/contributing/new_api_provider.md
+++ b/docs/source/contributing/new_api_provider.md
@ -8,7 +8,7 @@ This guide contains references to walk you through adding a new API provider.
    - {repopath}`Remote Providers::llama_stack/providers/remote`
    - {repopath}`Inline Providers::llama_stack/providers/inline`

-3. [Build a Llama Stack distribution](https://llama-stack.readthedocs.io/en/latest/distribution_dev/building_distro.html) with your API provider.
+3. [Build a Llama Stack distribution](https://llama-stack.readthedocs.io/en/latest/distributions/building_distro.html) with your API provider.
 4. Test your code!

 ## Testing your newly added API providers
--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -36,7 +36,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ ls ~/.llama/checkpoints
--- a/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md
@ -36,7 +36,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ ls ~/.llama/checkpoints
--- a/docs/source/distributions/self_hosted_distro/ollama.md
+++ b/docs/source/distributions/self_hosted_distro/ollama.md
@ -118,9 +118,9 @@ llama stack run ./run-with-safety.yaml \

 ### (Optional) Update Model Serving Configuration

-> [!NOTE]
-> Please check the [OLLAMA_SUPPORTED_MODELS](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers.remote/inference/ollama/ollama.py) for the supported Ollama models.
-
+```{note}
+Please check the [model_aliases](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L45) variable for supported Ollama models.
+```

 To serve a new model with `ollama`
 ```bash
--- a/docs/to_situate/developer_cookbook.md
+++ b/docs/to_situate/developer_cookbook.md
@ -13,13 +13,13 @@ Based on your developer needs, below are references to guides to help you get st
 * Developer Need: I want to start a local Llama Stack server with my GPU using meta-reference implementations.
 * Effort: 5min
 * Guide:
-  - Please see our [meta-reference-gpu](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html) on starting up a meta-reference Llama Stack server.
+  - Please see our [meta-reference-gpu](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html) on starting up a meta-reference Llama Stack server.

 ### Llama Stack Server with Remote Providers
 * Developer need: I want a Llama Stack distribution with a remote provider.
 * Effort: 10min
 * Guide
-  - Please see our [Distributions Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/index.html) on starting up distributions with remote providers.
+  - Please see our [Distributions Guide](https://llama-stack.readthedocs.io/en/latest/concepts/index.html#distributions) on starting up distributions with remote providers.


 ### On-Device (iOS) Llama Stack
@ -38,4 +38,4 @@ Based on your developer needs, below are references to guides to help you get st
 * Developer Need: I want to add a new API provider to Llama Stack.
 * Effort: 3hr
 * Guide
-  - Please see our [Adding a New API Provider](https://llama-stack.readthedocs.io/en/latest/api_providers/new_api_provider.html) guide for adding a new API provider.
+  - Please see our [Adding a New API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) guide for adding a new API provider.
--- a/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
+++ b/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
@ -231,7 +231,7 @@
   "source": [
    "Thanks for checking out this notebook! \n",
    "\n",
-    "The next one will be a guide on [Prompt Engineering](./01_Prompt_Engineering101.ipynb), please continue learning!"
+    "The next one will be a guide on [Prompt Engineering](./02_Prompt_Engineering101.ipynb), please continue learning!"
   ]
  }
 ],
--- a/docs/zero_to_hero_guide/02_Prompt_Engineering101.ipynb
+++ b/docs/zero_to_hero_guide/02_Prompt_Engineering101.ipynb
@ -276,7 +276,7 @@
   "source": [
    "Thanks for checking out this notebook! \n",
    "\n",
-    "The next one will be a guide on how to chat with images, continue to the notebook [here](./02_Image_Chat101.ipynb). Happy learning!"
+    "The next one will be a guide on how to chat with images, continue to the notebook [here](./03_Image_Chat101.ipynb). Happy learning!"
   ]
  }
 ],
--- a/docs/zero_to_hero_guide/03_Image_Chat101.ipynb
+++ b/docs/zero_to_hero_guide/03_Image_Chat101.ipynb
@ -175,7 +175,7 @@
   "source": [
    "Thanks for checking out this notebook! \n",
    "\n",
-    "The next one in the series will teach you one of the favorite applications of Large Language Models: [Tool Calling](./03_Tool_Calling101.ipynb). Enjoy!"
+    "The next one in the series will teach you one of the favorite applications of Large Language Models: [Tool Calling](./04_Tool_Calling101.ipynb). Enjoy!"
   ]
  }
 ],
--- a/docs/zero_to_hero_guide/05_Memory101.ipynb
+++ b/docs/zero_to_hero_guide/05_Memory101.ipynb
@ -373,7 +373,7 @@
   "source": [
    "Awesome, now we can embed all our notes with Llama-stack and ask it about the meaning of life :)\n",
    "\n",
-    "Next up, we will learn about the safety features and how to use them: [notebook link](./05_Safety101.ipynb)"
+    "Next up, we will learn about the safety features and how to use them: [notebook link](./06_Safety101.ipynb)."
   ]
  }
 ],
--- a/docs/zero_to_hero_guide/06_Safety101.ipynb
+++ b/docs/zero_to_hero_guide/06_Safety101.ipynb
@ -107,7 +107,7 @@
   "source": [
    "Thanks for leaning about the Safety API of Llama-Stack. \n",
    "\n",
-    "Finally, we learn about the Agents API, [here](./06_Agents101.ipynb)"
+    "Finally, we learn about the Agents API, [here](./07_Agents101.ipynb)."
   ]
  }
 ],
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@ -1,37 +1,21 @@
 # Llama Stack: from Zero to Hero

-Llama-Stack allows you to configure your distribution from various providers, allowing you to focus on going from zero to production super fast.
+Llama Stack defines and standardizes the set of core building blocks needed to bring generative AI applications to market. These building blocks are presented in the form of interoperable APIs with a broad set of Providers providing their implementations. These building blocks are assembled into Distributions which are easy for developers to get from zero to production.

-This guide will walk you through how to build a local distribution, using Ollama as an inference provider.
+This guide will walk you through an end-to-end workflow with Llama Stack with Ollama as the inference provider and ChromaDB as the memory provider. Please note the steps for configuring your provider and distribution will vary a little depending on the services you use. However, the user experience will remain universal - this is the power of Llama-Stack.

-We also have a set of notebooks walking you through how to use Llama-Stack APIs:
+If you're looking for more specific topics, we have a [Zero to Hero Guide](#next-steps) that covers everything from Tool Calling to Agents in detail. Feel free to skip to the end to explore the advanced topics you're interested in.

- Inference
- Prompt Engineering
- Chatting with Images
- Tool Calling
- Memory API for RAG
- Safety API
- Agentic API
-
-Below, we will learn how to get started with Ollama as an inference provider, please note the steps for configuring your provider will vary a little depending on the service. However, the user experience will remain universal-this is the power of Llama-Stack.
-
-Prototype locally using Ollama, deploy to the cloud with your favorite provider or own deployment. Use any API from any provider while focussing on development.
-
-# Ollama Quickstart Guide
-
-This guide will walk you through setting up an end-to-end workflow with Llama Stack with ollama, enabling you to perform text generation using the `Llama3.2-3B-Instruct` model. Follow these steps to get started quickly.
-
-If you're looking for more specific topics like tool calling or agent setup, we have a [Zero to Hero Guide](#next-steps) that covers everything from Tool Calling to Agents in detail. Feel free to skip to the end to explore the advanced topics you're interested in.
-
-> If you'd prefer not to set up a local server, explore our notebook on [tool calling with the Together API](Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb). This guide will show you how to leverage Together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.
+> If you'd prefer not to set up a local server, explore our notebook on [tool calling with the Together API](Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb). This notebook will show you how to leverage together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.

 ## Table of Contents
-1. [Setup ollama](#setup-ollama)
+1. [Setup and run ollama](#setup-ollama)
 2. [Install Dependencies and Set Up Environment](#install-dependencies-and-set-up-environment)
 3. [Build, Configure, and Run Llama Stack](#build-configure-and-run-llama-stack)
-4. [Run Ollama Model](#run-ollama-model)
-5. [Next Steps](#next-steps)
+4. [Test with llama-stack-client CLI](#test-with-llama-stack-client-cli)
+5. [Test with curl](#test-with-curl)
+6. [Test with Python](#test-with-python)
+7. [Next Steps](#next-steps)

 ---

@ -39,107 +23,137 @@ If you're looking for more specific topics like tool calling or agent setup, we

 1. **Download Ollama App**:
   - Go to [https://ollama.com/download](https://ollama.com/download).
-   - Download and unzip `Ollama-darwin.zip`.
+   - Follow instructions based on the OS you are on. For example, if you are on a Mac, download and unzip `Ollama-darwin.zip`.
   - Run the `Ollama` application.

 1. **Download the Ollama CLI**:
-   - Ensure you have the `ollama` command line tool by downloading and installing it from the same website.
+   Ensure you have the `ollama` command line tool by downloading and installing it from the same website.

 1. **Start ollama server**:
-   - Open the terminal and run:
-      ```
-      ollama serve
-      ```
-
+   Open the terminal and run:
+   ```
+   ollama serve
+   ```
 1. **Run the model**:
-   - Open the terminal and run:
-     ```bash
-     ollama run llama3.2:3b-instruct-fp16
-     ```
-     **Note**: The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43)
-
+   Open the terminal and run:
+   ```bash
+   ollama run llama3.2:3b-instruct-fp16 --keepalive -1m
+   ```
+   **Note**:
+     - The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43)
+     - `keepalive -1m` is used so that ollama continues to keep the model in memory indefinitely. Otherwise, ollama frees up memory and you would have to run `ollama run` again.

 ---

 ## Install Dependencies and Set Up Environment

 1. **Create a Conda Environment**:
-   - Create a new Conda environment with Python 3.10:
-     ```bash
-     conda create -n ollama python=3.10
-     ```
-   - Activate the environment:
-     ```bash
-     conda activate ollama
-     ```
+   Create a new Conda environment with Python 3.10:
+   ```bash
+   conda create -n ollama python=3.10
+   ```
+   Activate the environment:
+   ```bash
+   conda activate ollama
+   ```

 2. **Install ChromaDB**:
-   - Install `chromadb` using `pip`:
-     ```bash
-     pip install chromadb
-     ```
+   Install `chromadb` using `pip`:
+   ```bash
+   pip install chromadb
+   ```

 3. **Run ChromaDB**:
-   - Start the ChromaDB server:
-     ```bash
-     chroma run --host localhost --port 8000 --path ./my_chroma_data
-     ```
+   Start the ChromaDB server:
+   ```bash
+   chroma run --host localhost --port 8000 --path ./my_chroma_data
+   ```

 4. **Install Llama Stack**:
-   - Open a new terminal and install `llama-stack`:
-     ```bash
-     conda activate hack
-     pip install llama-stack==0.0.53
-     ```
+   Open a new terminal and install `llama-stack`:
+   ```bash
+   conda activate ollama
+   pip install llama-stack==0.0.55
+   ```

 ---

 ## Build, Configure, and Run Llama Stack

 1. **Build the Llama Stack**:
-   - Build the Llama Stack using the `ollama` template:
-     ```bash
-     llama stack build --template ollama --image-type conda
-     ```
-
-After this step, you will see the console output:
-
-```
-Build Successful! Next steps:
+   Build the Llama Stack using the `ollama` template:
+   ```bash
+   llama stack build --template ollama --image-type conda
+   ```
+   **Expected Output:**
+   ```
+   ...
+   Build Successful! Next steps:
   1. Set the environment variables: LLAMASTACK_PORT, OLLAMA_URL, INFERENCE_MODEL, SAFETY_MODEL
-   2. `llama stack run /Users/username/.llama/distributions/llamastack-ollama/ollama-run.yaml`
-```
+   2. `llama stack run /Users/<username>/.llama/distributions/llamastack-ollama/ollama-run.yaml
+   ```

-2. **Set the ENV variables by exporting them to the terminal**:
-```bash
-export OLLAMA_URL="http://localhost:11434"
-export LLAMA_STACK_PORT=5001
-export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
-export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
-```
+3. **Set the ENV variables by exporting them to the terminal**:
+   ```bash
+   export OLLAMA_URL="http://localhost:11434"
+   export LLAMA_STACK_PORT=5051
+   export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
+   export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
+   ```

 3. **Run the Llama Stack**:
-   - Run the stack with command shared by the API from earlier:
-     ```bash
-     llama stack run ollama  \
-    --port $LLAMA_STACK_PORT \
-    --env INFERENCE_MODEL=$INFERENCE_MODEL \
-    --env SAFETY_MODEL=$SAFETY_MODEL \
-    --env OLLAMA_URL=http://localhost:11434
-     ```
-
-Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model
+   Run the stack with command shared by the API from earlier:
+   ```bash
+   llama stack run ollama  \
+      --port $LLAMA_STACK_PORT \
+      --env INFERENCE_MODEL=$INFERENCE_MODEL \
+      --env SAFETY_MODEL=$SAFETY_MODEL \
+      --env OLLAMA_URL=$OLLAMA_URL
+   ```
+   Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.

 The server will start and listen on `http://localhost:5051`.

 ---
+## Test with `llama-stack-client` CLI
+After setting up the server, open a new terminal window and install the llama-stack-client package.

-## Testing with `curl`
+1. Install the llama-stack-client package
+   ```bash
+   conda activate ollama
+   pip install llama-stack-client
+   ```
+2. Configure the CLI to point to the llama-stack server.
+   ```bash
+   llama-stack-client configure --endpoint http://localhost:5051
+   ```
+   **Expected Output:**
+   ```bash
+   Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5051
+   ```
+3. Test the CLI by running inference:
+   ```bash
+   llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
+   ```
+   **Expected Output:**
+   ```bash
+   ChatCompletionResponse(
+       completion_message=CompletionMessage(
+           content='Here is a 2-sentence poem about the moon:\n\nSilver crescent shining bright in the night,\nA beacon of wonder, full of gentle light.',
+           role='assistant',
+           stop_reason='end_of_turn',
+           tool_calls=[]
+       ),
+       logprobs=None
+   )
+   ```
+
+## Test with `curl`

 After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:

 ```bash
-curl http://localhost:5051/inference/chat_completion \
+curl http://localhost:$LLAMA_STACK_PORT/inference/chat_completion \
 -H "Content-Type: application/json" \
 -d '{
    "model": "Llama3.2-3B-Instruct",
@ -168,15 +182,16 @@ You can check the available models with the command `llama-stack-client models l

 ---

-## Testing with Python
+## Test with Python

 You can also interact with the Llama Stack server using a simple Python script. Below is an example:

-### 1. Active Conda Environment and Install Required Python Packages
+### 1. Activate Conda Environment and Install Required Python Packages
 The `llama-stack-client` library offers a robust and efficient python methods for interacting with the Llama Stack server.

 ```bash
-conda activate your-llama-stack-conda-env
+conda activate ollama
+pip install llama-stack-client
 ```

 Note, the client library gets installed by default if you install the server library
@ -188,6 +203,8 @@ touch test_llama_stack.py

 ### 3. Create a Chat Completion Request in Python

+In `test_llama_stack.py`, write the following code:
+
 ```python
 from llama_stack_client import LlamaStackClient

@ -227,15 +244,15 @@ This command initializes the model to interact with your local Llama Stack insta
 ## Next Steps

 **Explore Other Guides**: Dive deeper into specific topics by following these guides:
- [Understanding Distribution](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider)
+- [Understanding Distribution](https://llama-stack.readthedocs.io/en/latest/concepts/index.html#distributions)
 - [Inference 101](00_Inference101.ipynb)
- [Local and Cloud Model Toggling 101](00_Local_Cloud_Inference101.ipynb)
- [Prompt Engineering](01_Prompt_Engineering101.ipynb)
- [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb)
- [Tool Calling: How to and Details](03_Tool_Calling101.ipynb)
- [Memory API: Show Simple In-Memory Retrieval](04_Memory101.ipynb)
- [Using Safety API in Conversation](05_Safety101.ipynb)
- [Agents API: Explain Components](06_Agents101.ipynb)
+- [Local and Cloud Model Toggling 101](01_Local_Cloud_Inference101.ipynb)
+- [Prompt Engineering](02_Prompt_Engineering101.ipynb)
+- [Chat with Image - LlamaStack Vision API](03_Image_Chat101.ipynb)
+- [Tool Calling: How to and Details](04_Tool_Calling101.ipynb)
+- [Memory API: Show Simple In-Memory Retrieval](05_Memory101.ipynb)
+- [Using Safety API in Conversation](06_Safety101.ipynb)
+- [Agents API: Explain Components](07_Agents101.ipynb)


 **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications:
@ -244,7 +261,7 @@ This command initializes the model to interact with your local Llama Stack insta
  - [Swift SDK](https://github.com/meta-llama/llama-stack-client-swift)
  - [Kotlin SDK](https://github.com/meta-llama/llama-stack-client-kotlin)

-**Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](https://llama-stack.readthedocs.io/en/latest/distributions/index.html#building-your-own-distribution) guide.
+**Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](https://llama-stack.readthedocs.io/en/latest/distributions/building_distro.html) guide.

 **Explore Example Apps**: Check out [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) for example applications built using Llama Stack.

--- a/llama_stack/templates/meta-reference-gpu/doc_template.md
+++ b/llama_stack/templates/meta-reference-gpu/doc_template.md
@ -29,7 +29,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ ls ~/.llama/checkpoints
--- a/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md
+++ b/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md
@ -31,7 +31,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ ls ~/.llama/checkpoints
--- a/requirements.txt
+++ b/requirements.txt
@ -2,8 +2,8 @@ blobfile
 fire
 httpx
 huggingface-hub
-llama-models>=0.0.55
-llama-stack-client>=0.0.55
+llama-models>=0.0.56
+llama-stack-client>=0.0.56
 prompt-toolkit
 python-dotenv
 pydantic>=2
--- a/setup.py
+++ b/setup.py
@ -16,7 +16,7 @@ def read_requirements():

 setup(
    name="llama_stack",
-    version="0.0.55",
+    version="0.0.56",
    author="Meta Llama",
    author_email="llama-oss@meta.com",
    description="Llama Stack",