refactor: remove Conda support from Llama Stack (#2969)

# What does this PR do?  This PR is responsible for removal of Conda support in Llama Stack   Closes #2539 ## Test Plan
2025-12-03 09:53:45 +00:00 · 2025-08-02 23:52:59 +01:00 · 2025-08-02 23:52:59 +01:00 · a749d5f4a4
commit a749d5f4a4
parent f2eee4e417
44 changed files with 159 additions and 311 deletions
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@ -47,20 +47,20 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next

 ## Install Dependencies and Set Up Environment

-1. **Create a Conda Environment**:
-   Create a new Conda environment with Python 3.12:
+1. **Install uv**:
+   Install [uv](https://docs.astral.sh/uv/) for managing dependencies:
   ```bash
-   conda create -n ollama python=3.12
-   ```
-   Activate the environment:
-   ```bash
-   conda activate ollama
+   # macOS and Linux
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Windows
+   powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
   ```

 2. **Install ChromaDB**:
-   Install `chromadb` using `pip`:
+   Install `chromadb` using `uv`:
   ```bash
-   pip install chromadb
+   uv pip install chromadb
   ```

 3. **Run ChromaDB**:
@ -69,28 +69,21 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
   chroma run --host localhost --port 8000 --path ./my_chroma_data
   ```

-4. **Install Llama Stack**:
-   Open a new terminal and install `llama-stack`:
-   ```bash
-   conda activate ollama
-   pip install -U llama-stack
-   ```
-
 ---

 ## Build, Configure, and Run Llama Stack

 1. **Build the Llama Stack**:
-   Build the Llama Stack using the `ollama` template:
+   Build the Llama Stack using the `starter` template:
   ```bash
-   llama stack build --template starter --image-type conda
+   uv run --with llama-stack llama stack build --template starter --image-type venv
   ```
   **Expected Output:**
   ```bash
   ...
   Build Successful!
-   You can find the newly-built template here: ~/.llama/distributions/ollama/ollama-run.yaml
-   You can run the new Llama Stack Distro via: llama stack run ~/.llama/distributions/ollama/ollama-run.yaml --image-type conda
+   You can find the newly-built template here: ~/.llama/distributions/starter/starter-run.yaml
+   You can run the new Llama Stack Distro via: uv run --with llama-stack llama stack run starter --image-type venv
   ```

 3. **Set the ENV variables by exporting them to the terminal**:
@ -102,12 +95,13 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
   ```

 3. **Run the Llama Stack**:
-   Run the stack with command shared by the API from earlier:
+   Run the stack using uv:
   ```bash
-   llama stack run ollama
-      --port $LLAMA_STACK_PORT
-      --env INFERENCE_MODEL=$INFERENCE_MODEL
-      --env SAFETY_MODEL=$SAFETY_MODEL
+   uv run --with llama-stack llama stack run starter \
+      --image-type venv \
+      --port $LLAMA_STACK_PORT \
+      --env INFERENCE_MODEL=$INFERENCE_MODEL \
+      --env SAFETY_MODEL=$SAFETY_MODEL \
      --env OLLAMA_URL=$OLLAMA_URL
   ```
   Note: Every time you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
@ -120,7 +114,7 @@ After setting up the server, open a new terminal window and configure the llama-

 1. Configure the CLI to point to the llama-stack server.
   ```bash
-   llama-stack-client configure --endpoint http://localhost:8321
+   uv run --with llama-stack-client llama-stack-client configure --endpoint http://localhost:8321
   ```
   **Expected Output:**
   ```bash
@ -128,7 +122,7 @@ After setting up the server, open a new terminal window and configure the llama-
   ```
 2. Test the CLI by running inference:
   ```bash
-   llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
+   uv run --with llama-stack-client llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
   ```
   **Expected Output:**
   ```bash
@ -170,7 +164,7 @@ curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
 EOF
 ```

-You can check the available models with the command `llama-stack-client models list`.
+You can check the available models with the command `uv run --with llama-stack-client llama-stack-client models list`.

 **Expected Output:**
 ```json
@ -191,18 +185,12 @@ You can check the available models with the command `llama-stack-client models l

 You can also interact with the Llama Stack server using a simple Python script. Below is an example:

-### 1. Activate Conda Environment
-
-```bash
-conda activate ollama
-```
-
-### 2. Create Python Script (`test_llama_stack.py`)
+### 1. Create Python Script (`test_llama_stack.py`)
 ```bash
 touch test_llama_stack.py
 ```

-### 3. Create a Chat Completion Request in Python
+### 2. Create a Chat Completion Request in Python

 In `test_llama_stack.py`, write the following code:

@ -233,10 +221,10 @@ response = client.inference.chat_completion(
 print(response.completion_message.content)
 ```

-### 4. Run the Python Script
+### 3. Run the Python Script

 ```bash
-python test_llama_stack.py
+uv run --with llama-stack-client python test_llama_stack.py
 ```

 **Expected Output:**