refactor: remove Conda support from Llama Stack (#2969)

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
This commit is contained in:
IAN MILLER 2025-08-02 23:52:59 +01:00 committed by GitHub
parent f2eee4e417
commit a749d5f4a4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
44 changed files with 159 additions and 311 deletions

View file

@ -97,7 +97,7 @@ To start the Llama Stack Playground, run the following commands:
1. Start up the Llama Stack API server
```bash
llama stack build --template together --image-type conda
llama stack build --template together --image-type venv
llama stack run together
```

View file

@ -47,13 +47,13 @@ pip install -e .
```
Use the CLI to build your distribution.
The main points to consider are:
1. **Image Type** - Do you want a Conda / venv environment or a Container (eg. Docker)
1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
3. **Config** - Do you want to use a pre-existing config file to build your distribution?
```
llama stack build -h
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
Build a Llama stack container
@ -63,10 +63,10 @@ options:
be prompted to enter information interactively (default: None)
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
--list-templates Show the available templates for building a Llama Stack distribution (default: False)
--image-type {conda,container,venv}
--image-type {container,venv}
Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
--image-name IMAGE_NAME
[for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
[for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if
found. (default: None)
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
@ -159,7 +159,7 @@ It would be best to start with a template and understand the structure of the co
llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): conda
> Enter the image type you want your Llama Stack to be built as (container or venv): venv
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
@ -312,7 +312,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
```
llama stack run -h
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
[--image-type {conda,venv}] [--enable-ui]
[--image-type {venv}] [--enable-ui]
[config | template]
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
@ -326,8 +326,8 @@ options:
--image-name IMAGE_NAME
Name of the image to run. Defaults to the current environment (default: None)
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
--image-type {conda,venv}
Image Type used during the build. This can be either conda or venv. (default: None)
--image-type {venv}
Image Type used during the build. This should be venv. (default: None)
--enable-ui Start the UI server (default: False)
```
@ -342,9 +342,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
# Start using a venv
llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
# Start using a conda environment
llama stack run --image-type conda ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
```
```

View file

@ -10,7 +10,6 @@ The default `run.yaml` files generated by templates are starting points for your
```yaml
version: 2
conda_env: ollama
apis:
- agents
- inference

View file

@ -56,10 +56,10 @@ Breaking down the demo app, this section will show the core pieces that are used
### Setup Remote Inferencing
Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
```
conda create -n stack-fireworks python=3.10
conda activate stack-fireworks
python -m venv stack-fireworks
source stack-fireworks/bin/activate # On Windows: stack-fireworks\Scripts\activate
pip install --no-cache llama-stack==0.2.2
llama stack build --template fireworks --image-type conda
llama stack build --template fireworks --image-type venv
export FIREWORKS_API_KEY=<SOME_KEY>
llama stack run fireworks --port 5050
```

View file

@ -57,7 +57,7 @@ Make sure you have access to a watsonx API Key. You can get one by referring [wa
## Running Llama Stack with watsonx
You can do this via Conda (build code), venv or Docker which has a pre-built image.
You can do this via venv or Docker which has a pre-built image.
### Via Docker
@ -76,13 +76,3 @@ docker run \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
--env WATSONX_BASE_URL=$WATSONX_BASE_URL
```
### Via Conda
```bash
llama stack build --template watsonx --image-type conda
llama stack run ./run.yaml \
--port $LLAMA_STACK_PORT \
--env WATSONX_API_KEY=$WATSONX_API_KEY \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID
```

View file

@ -114,7 +114,7 @@ podman run --rm -it \
## Running Llama Stack
Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via Conda (build code) or Docker which has a pre-built image.
Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via venv or Docker which has a pre-built image.
### Via Docker
@ -164,12 +164,12 @@ docker run \
--env CHROMA_URL=$CHROMA_URL
```
### Via Conda
### Via venv
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
```bash
llama stack build --template dell --image-type conda
llama stack build --template dell --image-type venv
llama stack run dell
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \

View file

@ -70,7 +70,7 @@ $ llama model list --downloaded
## Running the Distribution
You can do this via Conda (build code) or Docker which has a pre-built image.
You can do this via venv or Docker which has a pre-built image.
### Via Docker
@ -104,12 +104,12 @@ docker run \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
```
### Via Conda
### Via venv
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
```bash
llama stack build --template meta-reference-gpu --image-type conda
llama stack build --template meta-reference-gpu --image-type venv
llama stack run distributions/meta-reference-gpu/run.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct

View file

@ -133,7 +133,7 @@ curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-inst
## Running Llama Stack with NVIDIA
You can do this via Conda or venv (build code), or Docker which has a pre-built image.
You can do this via venv (build code), or Docker which has a pre-built image.
### Via Docker
@ -152,17 +152,6 @@ docker run \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
```
### Via Conda
```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
llama stack build --template nvidia --image-type conda
llama stack run ./run.yaml \
--port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment.

View file

@ -145,7 +145,7 @@ This distribution comes with a default "llama-guard" shield that can be enabled
## Running the Distribution
You can run the starter distribution via Docker, Conda, or venv.
You can run the starter distribution via Docker or venv.
### Via Docker
@ -164,12 +164,12 @@ docker run \
--port $LLAMA_STACK_PORT
```
### Via Conda or venv
### Via venv
Ensure you have configured the starter distribution using the environment variables explained above.
```bash
uv run --with llama-stack llama stack build --template starter --image-type <conda|venv> --run
uv run --with llama-stack llama stack build --template starter --image-type venv --run
```
## Example Usage

View file

@ -11,12 +11,6 @@ This is the simplest way to get started. Using Llama Stack as a library means yo
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
## Conda:
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
## Kubernetes:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.

View file

@ -62,7 +62,7 @@ We use `starter` as template. By default all providers are disabled, this requir
llama stack build --template starter --image-type venv --run
```
:::
:::{tab-item} Using `conda`
:::{tab-item} Using `venv`
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
@ -70,7 +70,7 @@ which defines the providers and their settings.
Now let's build and run the Llama Stack config for Ollama.
```bash
llama stack build --template starter --image-type conda --run
llama stack build --template starter --image-type venv --run
```
:::
:::{tab-item} Using a Container
@ -150,10 +150,10 @@ pip install llama-stack-client
```
:::
:::{tab-item} Install with `conda`
:::{tab-item} Install with `venv`
```bash
yes | conda create -n stack-client python=3.12
conda activate stack-client
python -m venv stack-client
source stack-client/bin/activate # On Windows: stack-client\Scripts\activate
pip install llama-stack-client
```
:::

View file

@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n myenv python=3.10
conda activate myenv
python -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
cd llama-stack
$CONDA_PREFIX/bin/pip install -e .
pip install -e .
## Downloading models via CLI

View file

@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n myenv python=3.10
conda activate myenv
python -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
cd llama-stack
$CONDA_PREFIX/bin/pip install -e .
pip install -e .
## `llama` subcommands

View file

@ -47,20 +47,20 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
## Install Dependencies and Set Up Environment
1. **Create a Conda Environment**:
Create a new Conda environment with Python 3.12:
1. **Install uv**:
Install [uv](https://docs.astral.sh/uv/) for managing dependencies:
```bash
conda create -n ollama python=3.12
```
Activate the environment:
```bash
conda activate ollama
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
2. **Install ChromaDB**:
Install `chromadb` using `pip`:
Install `chromadb` using `uv`:
```bash
pip install chromadb
uv pip install chromadb
```
3. **Run ChromaDB**:
@ -69,28 +69,21 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
chroma run --host localhost --port 8000 --path ./my_chroma_data
```
4. **Install Llama Stack**:
Open a new terminal and install `llama-stack`:
```bash
conda activate ollama
pip install -U llama-stack
```
---
## Build, Configure, and Run Llama Stack
1. **Build the Llama Stack**:
Build the Llama Stack using the `ollama` template:
Build the Llama Stack using the `starter` template:
```bash
llama stack build --template starter --image-type conda
uv run --with llama-stack llama stack build --template starter --image-type venv
```
**Expected Output:**
```bash
...
Build Successful!
You can find the newly-built template here: ~/.llama/distributions/ollama/ollama-run.yaml
You can run the new Llama Stack Distro via: llama stack run ~/.llama/distributions/ollama/ollama-run.yaml --image-type conda
You can find the newly-built template here: ~/.llama/distributions/starter/starter-run.yaml
You can run the new Llama Stack Distro via: uv run --with llama-stack llama stack run starter --image-type venv
```
3. **Set the ENV variables by exporting them to the terminal**:
@ -102,12 +95,13 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
```
3. **Run the Llama Stack**:
Run the stack with command shared by the API from earlier:
Run the stack using uv:
```bash
llama stack run ollama
--port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL
--env SAFETY_MODEL=$SAFETY_MODEL
uv run --with llama-stack llama stack run starter \
--image-type venv \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env OLLAMA_URL=$OLLAMA_URL
```
Note: Every time you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
@ -120,7 +114,7 @@ After setting up the server, open a new terminal window and configure the llama-
1. Configure the CLI to point to the llama-stack server.
```bash
llama-stack-client configure --endpoint http://localhost:8321
uv run --with llama-stack-client llama-stack-client configure --endpoint http://localhost:8321
```
**Expected Output:**
```bash
@ -128,7 +122,7 @@ After setting up the server, open a new terminal window and configure the llama-
```
2. Test the CLI by running inference:
```bash
llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
uv run --with llama-stack-client llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
```
**Expected Output:**
```bash
@ -170,7 +164,7 @@ curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
EOF
```
You can check the available models with the command `llama-stack-client models list`.
You can check the available models with the command `uv run --with llama-stack-client llama-stack-client models list`.
**Expected Output:**
```json
@ -191,18 +185,12 @@ You can check the available models with the command `llama-stack-client models l
You can also interact with the Llama Stack server using a simple Python script. Below is an example:
### 1. Activate Conda Environment
```bash
conda activate ollama
```
### 2. Create Python Script (`test_llama_stack.py`)
### 1. Create Python Script (`test_llama_stack.py`)
```bash
touch test_llama_stack.py
```
### 3. Create a Chat Completion Request in Python
### 2. Create a Chat Completion Request in Python
In `test_llama_stack.py`, write the following code:
@ -233,10 +221,10 @@ response = client.inference.chat_completion(
print(response.completion_message.content)
```
### 4. Run the Python Script
### 3. Run the Python Script
```bash
python test_llama_stack.py
uv run --with llama-stack-client python test_llama_stack.py
```
**Expected Output:**