refactor: remove Conda support from Llama Stack (#2969)

# What does this PR do?  This PR is responsible for removal of Conda support in Llama Stack   Closes #2539 ## Test Plan
2025-12-03 09:53:45 +00:00 · 2025-08-02 23:52:59 +01:00 · 2025-08-02 23:52:59 +01:00 · a749d5f4a4
commit a749d5f4a4
parent f2eee4e417
44 changed files with 159 additions and 311 deletions
--- a/docs/source/building_applications/playground/index.md
+++ b/docs/source/building_applications/playground/index.md
@ -97,7 +97,7 @@ To start the Llama Stack Playground, run the following commands:
 1. Start up the Llama Stack API server

 ```bash
-llama stack build --template together --image-type conda
+llama stack build --template together --image-type venv
 llama stack run together
 ```

--- a/docs/source/distributions/building_distro.md
+++ b/docs/source/distributions/building_distro.md
@ -47,13 +47,13 @@ pip install -e .
 ```
 Use the CLI to build your distribution.
 The main points to consider are:
-1. **Image Type** - Do you want a Conda / venv environment or a Container (eg. Docker)
+1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
 2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
 3. **Config** - Do you want to use a pre-existing config file to build your distribution?

 ```
 llama stack build -h
-usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
+usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]

 Build a Llama stack container

@ -63,10 +63,10 @@ options:
                        be prompted to enter information interactively (default: None)
  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
  --list-templates      Show the available templates for building a Llama Stack distribution (default: False)
-  --image-type {conda,container,venv}
+  --image-type {container,venv}
                        Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
  --image-name IMAGE_NAME
-                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
+                        [for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if
                        found. (default: None)
  --print-deps-only     Print the dependencies for the stack only, without building the stack (default: False)
  --run                 Run the stack after building using the same image type, name, and other applicable arguments (default: False)
@ -159,7 +159,7 @@ It would be best to start with a template and understand the structure of the co
 llama stack build

 > Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
-> Enter the image type you want your Llama Stack to be built as (container or conda or venv): conda
+> Enter the image type you want your Llama Stack to be built as (container or venv): venv

 Llama Stack is composed of several APIs working together. Let's select
 the provider types (implementations) you want to use for these APIs.
@ -312,7 +312,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
 ```
 llama stack run -h
 usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
-                       [--image-type {conda,venv}] [--enable-ui]
+                       [--image-type {venv}] [--enable-ui]
                       [config | template]

 Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
@ -326,8 +326,8 @@ options:
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current environment (default: None)
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
-  --image-type {conda,venv}
-                        Image Type used during the build. This can be either conda or venv. (default: None)
+  --image-type {venv}
+                        Image Type used during the build. This should be venv. (default: None)
  --enable-ui           Start the UI server (default: False)
 ```

@ -342,9 +342,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-

 # Start using a venv
 llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
-
-# Start using a conda environment
-llama stack run --image-type conda ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
 ```

 ```
--- a/docs/source/distributions/configuration.md
+++ b/docs/source/distributions/configuration.md
@ -10,7 +10,6 @@ The default `run.yaml` files generated by templates are starting points for your

 ```yaml
 version: 2
-conda_env: ollama
 apis:
 - agents
 - inference
--- a/docs/source/distributions/ondevice_distro/android_sdk.md
+++ b/docs/source/distributions/ondevice_distro/android_sdk.md
@ -56,10 +56,10 @@ Breaking down the demo app, this section will show the core pieces that are used
 ### Setup Remote Inferencing
 Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
 ```
-conda create -n stack-fireworks python=3.10
-conda activate stack-fireworks
+python -m venv stack-fireworks
+source stack-fireworks/bin/activate  # On Windows: stack-fireworks\Scripts\activate
 pip install --no-cache llama-stack==0.2.2
-llama stack build --template fireworks --image-type conda
+llama stack build --template fireworks --image-type venv
 export FIREWORKS_API_KEY=<SOME_KEY>
 llama stack run fireworks --port 5050
 ```
--- a/docs/source/distributions/remote_hosted_distro/watsonx.md
+++ b/docs/source/distributions/remote_hosted_distro/watsonx.md
@ -57,7 +57,7 @@ Make sure you have access to a watsonx API Key. You can get one by referring [wa

 ## Running Llama Stack with watsonx

-You can do this via Conda (build code), venv or Docker which has a pre-built image.
+You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -76,13 +76,3 @@ docker run \
  --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
  --env WATSONX_BASE_URL=$WATSONX_BASE_URL
 ```
-
-### Via Conda
-
-```bash
-llama stack build --template watsonx --image-type conda
-llama stack run ./run.yaml \
-  --port $LLAMA_STACK_PORT \
-  --env WATSONX_API_KEY=$WATSONX_API_KEY \
-  --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID
-```
--- a/docs/source/distributions/self_hosted_distro/dell.md
+++ b/docs/source/distributions/self_hosted_distro/dell.md
@ -114,7 +114,7 @@ podman run --rm -it \

 ## Running Llama Stack

-Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via Conda (build code) or Docker which has a pre-built image.
+Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -164,12 +164,12 @@ docker run \
  --env CHROMA_URL=$CHROMA_URL
 ```

-### Via Conda
+### Via venv

 Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.

 ```bash
-llama stack build --template dell --image-type conda
+llama stack build --template dell --image-type venv
 llama stack run dell
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -70,7 +70,7 @@ $ llama model list --downloaded

 ## Running the Distribution

-You can do this via Conda (build code) or Docker which has a pre-built image.
+You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -104,12 +104,12 @@ docker run \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
 ```

-### Via Conda
+### Via venv

 Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.

 ```bash
-llama stack build --template meta-reference-gpu --image-type conda
+llama stack build --template meta-reference-gpu --image-type venv
 llama stack run distributions/meta-reference-gpu/run.yaml \
  --port 8321 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--- a/docs/source/distributions/self_hosted_distro/nvidia.md
+++ b/docs/source/distributions/self_hosted_distro/nvidia.md
@ -133,7 +133,7 @@ curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-inst

 ## Running Llama Stack with NVIDIA

-You can do this via Conda or venv (build code), or Docker which has a pre-built image.
+You can do this via venv (build code), or Docker which has a pre-built image.

 ### Via Docker

@ -152,17 +152,6 @@ docker run \
  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
 ```

-### Via Conda
-
-```bash
-INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
-llama stack build --template nvidia --image-type conda
-llama stack run ./run.yaml \
-  --port 8321 \
-  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
-  --env INFERENCE_MODEL=$INFERENCE_MODEL
-```
-
 ### Via venv

 If you've set up your local development environment, you can also build the image using your local virtual environment.
--- a/docs/source/distributions/self_hosted_distro/starter.md
+++ b/docs/source/distributions/self_hosted_distro/starter.md
@ -145,7 +145,7 @@ This distribution comes with a default "llama-guard" shield that can be enabled

 ## Running the Distribution

-You can run the starter distribution via Docker, Conda, or venv.
+You can run the starter distribution via Docker or venv.

 ### Via Docker

@ -164,12 +164,12 @@ docker run \
  --port $LLAMA_STACK_PORT
 ```

-### Via Conda or venv
+### Via venv

 Ensure you have configured the starter distribution using the environment variables explained above.

 ```bash
-uv run --with llama-stack llama stack build --template starter --image-type <conda|venv> --run
+uv run --with llama-stack llama stack build --template starter --image-type venv --run
 ```

 ## Example Usage
--- a/docs/source/distributions/starting_llama_stack_server.md
+++ b/docs/source/distributions/starting_llama_stack_server.md
@ -11,12 +11,6 @@ This is the simplest way to get started. Using Llama Stack as a library means yo

 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.

-
-## Conda:
-
-If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
-
-
 ## Kubernetes:

 If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
--- a/docs/source/getting_started/detailed_tutorial.md
+++ b/docs/source/getting_started/detailed_tutorial.md
@ -62,7 +62,7 @@ We use `starter` as template. By default all providers are disabled, this requir
 llama stack build --template starter --image-type venv --run
 ```
 :::
-:::{tab-item} Using `conda`
+:::{tab-item} Using `venv`
 You can use Python to build and run the Llama Stack server, which is useful for testing and development.

 Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
@ -70,7 +70,7 @@ which defines the providers and their settings.
 Now let's build and run the Llama Stack config for Ollama.

 ```bash
-llama stack build --template starter --image-type conda --run
+llama stack build --template starter --image-type venv --run
 ```
 :::
 :::{tab-item} Using a Container
@ -150,10 +150,10 @@ pip install llama-stack-client
 ```
 :::

-:::{tab-item} Install with `conda`
+:::{tab-item} Install with `venv`
 ```bash
-yes | conda create -n stack-client python=3.12
-conda activate stack-client
+python -m venv stack-client
+source stack-client/bin/activate  # On Windows: stack-client\Scripts\activate
 pip install llama-stack-client
 ```
 :::
--- a/docs/source/references/llama_cli_reference/download_models.md
+++ b/docs/source/references/llama_cli_reference/download_models.md
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
    cd ~/local
    git clone git@github.com:meta-llama/llama-stack.git

-    conda create -n myenv python=3.10
-    conda activate myenv
+    python -m venv myenv
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate

    cd llama-stack
-    $CONDA_PREFIX/bin/pip install -e .
+    pip install -e .

 ## Downloading models via CLI

--- a/docs/source/references/llama_cli_reference/index.md
+++ b/docs/source/references/llama_cli_reference/index.md
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
    cd ~/local
    git clone git@github.com:meta-llama/llama-stack.git

-    conda create -n myenv python=3.10
-    conda activate myenv
+    python -m venv myenv
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate

    cd llama-stack
-    $CONDA_PREFIX/bin/pip install -e .
+    pip install -e .


 ## `llama` subcommands
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@ -47,20 +47,20 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next

 ## Install Dependencies and Set Up Environment

-1. **Create a Conda Environment**:
-   Create a new Conda environment with Python 3.12:
+1. **Install uv**:
+   Install [uv](https://docs.astral.sh/uv/) for managing dependencies:
   ```bash
-   conda create -n ollama python=3.12
-   ```
-   Activate the environment:
-   ```bash
-   conda activate ollama
+   # macOS and Linux
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Windows
+   powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
   ```

 2. **Install ChromaDB**:
-   Install `chromadb` using `pip`:
+   Install `chromadb` using `uv`:
   ```bash
-   pip install chromadb
+   uv pip install chromadb
   ```

 3. **Run ChromaDB**:
@ -69,28 +69,21 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
   chroma run --host localhost --port 8000 --path ./my_chroma_data
   ```

-4. **Install Llama Stack**:
-   Open a new terminal and install `llama-stack`:
-   ```bash
-   conda activate ollama
-   pip install -U llama-stack
-   ```
-
 ---

 ## Build, Configure, and Run Llama Stack

 1. **Build the Llama Stack**:
-   Build the Llama Stack using the `ollama` template:
+   Build the Llama Stack using the `starter` template:
   ```bash
-   llama stack build --template starter --image-type conda
+   uv run --with llama-stack llama stack build --template starter --image-type venv
   ```
   **Expected Output:**
   ```bash
   ...
   Build Successful!
-   You can find the newly-built template here: ~/.llama/distributions/ollama/ollama-run.yaml
-   You can run the new Llama Stack Distro via: llama stack run ~/.llama/distributions/ollama/ollama-run.yaml --image-type conda
+   You can find the newly-built template here: ~/.llama/distributions/starter/starter-run.yaml
+   You can run the new Llama Stack Distro via: uv run --with llama-stack llama stack run starter --image-type venv
   ```

 3. **Set the ENV variables by exporting them to the terminal**:
@ -102,12 +95,13 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
   ```

 3. **Run the Llama Stack**:
-   Run the stack with command shared by the API from earlier:
+   Run the stack using uv:
   ```bash
-   llama stack run ollama
-      --port $LLAMA_STACK_PORT
-      --env INFERENCE_MODEL=$INFERENCE_MODEL
-      --env SAFETY_MODEL=$SAFETY_MODEL
+   uv run --with llama-stack llama stack run starter \
+      --image-type venv \
+      --port $LLAMA_STACK_PORT \
+      --env INFERENCE_MODEL=$INFERENCE_MODEL \
+      --env SAFETY_MODEL=$SAFETY_MODEL \
      --env OLLAMA_URL=$OLLAMA_URL
   ```
   Note: Every time you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
@ -120,7 +114,7 @@ After setting up the server, open a new terminal window and configure the llama-

 1. Configure the CLI to point to the llama-stack server.
   ```bash
-   llama-stack-client configure --endpoint http://localhost:8321
+   uv run --with llama-stack-client llama-stack-client configure --endpoint http://localhost:8321
   ```
   **Expected Output:**
   ```bash
@ -128,7 +122,7 @@ After setting up the server, open a new terminal window and configure the llama-
   ```
 2. Test the CLI by running inference:
   ```bash
-   llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
+   uv run --with llama-stack-client llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
   ```
   **Expected Output:**
   ```bash
@ -170,7 +164,7 @@ curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
 EOF
 ```

-You can check the available models with the command `llama-stack-client models list`.
+You can check the available models with the command `uv run --with llama-stack-client llama-stack-client models list`.

 **Expected Output:**
 ```json
@ -191,18 +185,12 @@ You can check the available models with the command `llama-stack-client models l

 You can also interact with the Llama Stack server using a simple Python script. Below is an example:

-### 1. Activate Conda Environment
-
-```bash
-conda activate ollama
-```
-
-### 2. Create Python Script (`test_llama_stack.py`)
+### 1. Create Python Script (`test_llama_stack.py`)
 ```bash
 touch test_llama_stack.py
 ```

-### 3. Create a Chat Completion Request in Python
+### 2. Create a Chat Completion Request in Python

 In `test_llama_stack.py`, write the following code:

@ -233,10 +221,10 @@ response = client.inference.chat_completion(
 print(response.completion_message.content)
 ```

-### 4. Run the Python Script
+### 3. Run the Python Script

 ```bash
-python test_llama_stack.py
+uv run --with llama-stack-client python test_llama_stack.py
 ```

 **Expected Output:**