mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-12 04:50:39 +00:00
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for removal of Conda support in Llama Stack <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2539 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
parent
f2eee4e417
commit
a749d5f4a4
44 changed files with 159 additions and 311 deletions
|
@ -97,7 +97,7 @@ To start the Llama Stack Playground, run the following commands:
|
|||
1. Start up the Llama Stack API server
|
||||
|
||||
```bash
|
||||
llama stack build --template together --image-type conda
|
||||
llama stack build --template together --image-type venv
|
||||
llama stack run together
|
||||
```
|
||||
|
||||
|
|
|
@ -47,13 +47,13 @@ pip install -e .
|
|||
```
|
||||
Use the CLI to build your distribution.
|
||||
The main points to consider are:
|
||||
1. **Image Type** - Do you want a Conda / venv environment or a Container (eg. Docker)
|
||||
1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
|
||||
2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
|
||||
3. **Config** - Do you want to use a pre-existing config file to build your distribution?
|
||||
|
||||
```
|
||||
llama stack build -h
|
||||
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
|
||||
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
|
||||
|
||||
Build a Llama stack container
|
||||
|
||||
|
@ -63,10 +63,10 @@ options:
|
|||
be prompted to enter information interactively (default: None)
|
||||
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
|
||||
--list-templates Show the available templates for building a Llama Stack distribution (default: False)
|
||||
--image-type {conda,container,venv}
|
||||
--image-type {container,venv}
|
||||
Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
|
||||
--image-name IMAGE_NAME
|
||||
[for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
|
||||
[for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if
|
||||
found. (default: None)
|
||||
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
|
||||
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
|
||||
|
@ -159,7 +159,7 @@ It would be best to start with a template and understand the structure of the co
|
|||
llama stack build
|
||||
|
||||
> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
|
||||
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): conda
|
||||
> Enter the image type you want your Llama Stack to be built as (container or venv): venv
|
||||
|
||||
Llama Stack is composed of several APIs working together. Let's select
|
||||
the provider types (implementations) you want to use for these APIs.
|
||||
|
@ -312,7 +312,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
|
|||
```
|
||||
llama stack run -h
|
||||
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
|
||||
[--image-type {conda,venv}] [--enable-ui]
|
||||
[--image-type {venv}] [--enable-ui]
|
||||
[config | template]
|
||||
|
||||
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
|
||||
|
@ -326,8 +326,8 @@ options:
|
|||
--image-name IMAGE_NAME
|
||||
Name of the image to run. Defaults to the current environment (default: None)
|
||||
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
|
||||
--image-type {conda,venv}
|
||||
Image Type used during the build. This can be either conda or venv. (default: None)
|
||||
--image-type {venv}
|
||||
Image Type used during the build. This should be venv. (default: None)
|
||||
--enable-ui Start the UI server (default: False)
|
||||
```
|
||||
|
||||
|
@ -342,9 +342,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
|
|||
|
||||
# Start using a venv
|
||||
llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
|
||||
|
||||
# Start using a conda environment
|
||||
llama stack run --image-type conda ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
|
||||
```
|
||||
|
||||
```
|
||||
|
|
|
@ -10,7 +10,6 @@ The default `run.yaml` files generated by templates are starting points for your
|
|||
|
||||
```yaml
|
||||
version: 2
|
||||
conda_env: ollama
|
||||
apis:
|
||||
- agents
|
||||
- inference
|
||||
|
|
|
@ -56,10 +56,10 @@ Breaking down the demo app, this section will show the core pieces that are used
|
|||
### Setup Remote Inferencing
|
||||
Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
|
||||
```
|
||||
conda create -n stack-fireworks python=3.10
|
||||
conda activate stack-fireworks
|
||||
python -m venv stack-fireworks
|
||||
source stack-fireworks/bin/activate # On Windows: stack-fireworks\Scripts\activate
|
||||
pip install --no-cache llama-stack==0.2.2
|
||||
llama stack build --template fireworks --image-type conda
|
||||
llama stack build --template fireworks --image-type venv
|
||||
export FIREWORKS_API_KEY=<SOME_KEY>
|
||||
llama stack run fireworks --port 5050
|
||||
```
|
||||
|
|
|
@ -57,7 +57,7 @@ Make sure you have access to a watsonx API Key. You can get one by referring [wa
|
|||
|
||||
## Running Llama Stack with watsonx
|
||||
|
||||
You can do this via Conda (build code), venv or Docker which has a pre-built image.
|
||||
You can do this via venv or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
|
@ -76,13 +76,3 @@ docker run \
|
|||
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
|
||||
--env WATSONX_BASE_URL=$WATSONX_BASE_URL
|
||||
```
|
||||
|
||||
### Via Conda
|
||||
|
||||
```bash
|
||||
llama stack build --template watsonx --image-type conda
|
||||
llama stack run ./run.yaml \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env WATSONX_API_KEY=$WATSONX_API_KEY \
|
||||
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID
|
||||
```
|
||||
|
|
|
@ -114,7 +114,7 @@ podman run --rm -it \
|
|||
|
||||
## Running Llama Stack
|
||||
|
||||
Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via Conda (build code) or Docker which has a pre-built image.
|
||||
Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via venv or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
|
@ -164,12 +164,12 @@ docker run \
|
|||
--env CHROMA_URL=$CHROMA_URL
|
||||
```
|
||||
|
||||
### Via Conda
|
||||
### Via venv
|
||||
|
||||
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
|
||||
|
||||
```bash
|
||||
llama stack build --template dell --image-type conda
|
||||
llama stack build --template dell --image-type venv
|
||||
llama stack run dell
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
|
|
|
@ -70,7 +70,7 @@ $ llama model list --downloaded
|
|||
|
||||
## Running the Distribution
|
||||
|
||||
You can do this via Conda (build code) or Docker which has a pre-built image.
|
||||
You can do this via venv or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
|
@ -104,12 +104,12 @@ docker run \
|
|||
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
|
||||
```
|
||||
|
||||
### Via Conda
|
||||
### Via venv
|
||||
|
||||
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
|
||||
|
||||
```bash
|
||||
llama stack build --template meta-reference-gpu --image-type conda
|
||||
llama stack build --template meta-reference-gpu --image-type venv
|
||||
llama stack run distributions/meta-reference-gpu/run.yaml \
|
||||
--port 8321 \
|
||||
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
|
||||
|
|
|
@ -133,7 +133,7 @@ curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-inst
|
|||
|
||||
## Running Llama Stack with NVIDIA
|
||||
|
||||
You can do this via Conda or venv (build code), or Docker which has a pre-built image.
|
||||
You can do this via venv (build code), or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
|
@ -152,17 +152,6 @@ docker run \
|
|||
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
|
||||
```
|
||||
|
||||
### Via Conda
|
||||
|
||||
```bash
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
|
||||
llama stack build --template nvidia --image-type conda
|
||||
llama stack run ./run.yaml \
|
||||
--port 8321 \
|
||||
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
```
|
||||
|
||||
### Via venv
|
||||
|
||||
If you've set up your local development environment, you can also build the image using your local virtual environment.
|
||||
|
|
|
@ -145,7 +145,7 @@ This distribution comes with a default "llama-guard" shield that can be enabled
|
|||
|
||||
## Running the Distribution
|
||||
|
||||
You can run the starter distribution via Docker, Conda, or venv.
|
||||
You can run the starter distribution via Docker or venv.
|
||||
|
||||
### Via Docker
|
||||
|
||||
|
@ -164,12 +164,12 @@ docker run \
|
|||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
||||
### Via Conda or venv
|
||||
### Via venv
|
||||
|
||||
Ensure you have configured the starter distribution using the environment variables explained above.
|
||||
|
||||
```bash
|
||||
uv run --with llama-stack llama stack build --template starter --image-type <conda|venv> --run
|
||||
uv run --with llama-stack llama stack build --template starter --image-type venv --run
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
|
|
@ -11,12 +11,6 @@ This is the simplest way to get started. Using Llama Stack as a library means yo
|
|||
|
||||
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
|
||||
|
||||
|
||||
## Conda:
|
||||
|
||||
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
|
||||
|
||||
|
||||
## Kubernetes:
|
||||
|
||||
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
|
||||
|
|
|
@ -62,7 +62,7 @@ We use `starter` as template. By default all providers are disabled, this requir
|
|||
llama stack build --template starter --image-type venv --run
|
||||
```
|
||||
:::
|
||||
:::{tab-item} Using `conda`
|
||||
:::{tab-item} Using `venv`
|
||||
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
|
||||
|
||||
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
|
||||
|
@ -70,7 +70,7 @@ which defines the providers and their settings.
|
|||
Now let's build and run the Llama Stack config for Ollama.
|
||||
|
||||
```bash
|
||||
llama stack build --template starter --image-type conda --run
|
||||
llama stack build --template starter --image-type venv --run
|
||||
```
|
||||
:::
|
||||
:::{tab-item} Using a Container
|
||||
|
@ -150,10 +150,10 @@ pip install llama-stack-client
|
|||
```
|
||||
:::
|
||||
|
||||
:::{tab-item} Install with `conda`
|
||||
:::{tab-item} Install with `venv`
|
||||
```bash
|
||||
yes | conda create -n stack-client python=3.12
|
||||
conda activate stack-client
|
||||
python -m venv stack-client
|
||||
source stack-client/bin/activate # On Windows: stack-client\Scripts\activate
|
||||
pip install llama-stack-client
|
||||
```
|
||||
:::
|
||||
|
|
|
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
|
|||
cd ~/local
|
||||
git clone git@github.com:meta-llama/llama-stack.git
|
||||
|
||||
conda create -n myenv python=3.10
|
||||
conda activate myenv
|
||||
python -m venv myenv
|
||||
source myenv/bin/activate # On Windows: myenv\Scripts\activate
|
||||
|
||||
cd llama-stack
|
||||
$CONDA_PREFIX/bin/pip install -e .
|
||||
pip install -e .
|
||||
|
||||
## Downloading models via CLI
|
||||
|
||||
|
|
|
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
|
|||
cd ~/local
|
||||
git clone git@github.com:meta-llama/llama-stack.git
|
||||
|
||||
conda create -n myenv python=3.10
|
||||
conda activate myenv
|
||||
python -m venv myenv
|
||||
source myenv/bin/activate # On Windows: myenv\Scripts\activate
|
||||
|
||||
cd llama-stack
|
||||
$CONDA_PREFIX/bin/pip install -e .
|
||||
pip install -e .
|
||||
|
||||
|
||||
## `llama` subcommands
|
||||
|
|
|
@ -47,20 +47,20 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
|
|||
|
||||
## Install Dependencies and Set Up Environment
|
||||
|
||||
1. **Create a Conda Environment**:
|
||||
Create a new Conda environment with Python 3.12:
|
||||
1. **Install uv**:
|
||||
Install [uv](https://docs.astral.sh/uv/) for managing dependencies:
|
||||
```bash
|
||||
conda create -n ollama python=3.12
|
||||
```
|
||||
Activate the environment:
|
||||
```bash
|
||||
conda activate ollama
|
||||
# macOS and Linux
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Windows
|
||||
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
|
||||
```
|
||||
|
||||
2. **Install ChromaDB**:
|
||||
Install `chromadb` using `pip`:
|
||||
Install `chromadb` using `uv`:
|
||||
```bash
|
||||
pip install chromadb
|
||||
uv pip install chromadb
|
||||
```
|
||||
|
||||
3. **Run ChromaDB**:
|
||||
|
@ -69,28 +69,21 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
|
|||
chroma run --host localhost --port 8000 --path ./my_chroma_data
|
||||
```
|
||||
|
||||
4. **Install Llama Stack**:
|
||||
Open a new terminal and install `llama-stack`:
|
||||
```bash
|
||||
conda activate ollama
|
||||
pip install -U llama-stack
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build, Configure, and Run Llama Stack
|
||||
|
||||
1. **Build the Llama Stack**:
|
||||
Build the Llama Stack using the `ollama` template:
|
||||
Build the Llama Stack using the `starter` template:
|
||||
```bash
|
||||
llama stack build --template starter --image-type conda
|
||||
uv run --with llama-stack llama stack build --template starter --image-type venv
|
||||
```
|
||||
**Expected Output:**
|
||||
```bash
|
||||
...
|
||||
Build Successful!
|
||||
You can find the newly-built template here: ~/.llama/distributions/ollama/ollama-run.yaml
|
||||
You can run the new Llama Stack Distro via: llama stack run ~/.llama/distributions/ollama/ollama-run.yaml --image-type conda
|
||||
You can find the newly-built template here: ~/.llama/distributions/starter/starter-run.yaml
|
||||
You can run the new Llama Stack Distro via: uv run --with llama-stack llama stack run starter --image-type venv
|
||||
```
|
||||
|
||||
3. **Set the ENV variables by exporting them to the terminal**:
|
||||
|
@ -102,12 +95,13 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
|
|||
```
|
||||
|
||||
3. **Run the Llama Stack**:
|
||||
Run the stack with command shared by the API from earlier:
|
||||
Run the stack using uv:
|
||||
```bash
|
||||
llama stack run ollama
|
||||
--port $LLAMA_STACK_PORT
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
--env SAFETY_MODEL=$SAFETY_MODEL
|
||||
uv run --with llama-stack llama stack run starter \
|
||||
--image-type venv \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
--env SAFETY_MODEL=$SAFETY_MODEL \
|
||||
--env OLLAMA_URL=$OLLAMA_URL
|
||||
```
|
||||
Note: Every time you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
|
||||
|
@ -120,7 +114,7 @@ After setting up the server, open a new terminal window and configure the llama-
|
|||
|
||||
1. Configure the CLI to point to the llama-stack server.
|
||||
```bash
|
||||
llama-stack-client configure --endpoint http://localhost:8321
|
||||
uv run --with llama-stack-client llama-stack-client configure --endpoint http://localhost:8321
|
||||
```
|
||||
**Expected Output:**
|
||||
```bash
|
||||
|
@ -128,7 +122,7 @@ After setting up the server, open a new terminal window and configure the llama-
|
|||
```
|
||||
2. Test the CLI by running inference:
|
||||
```bash
|
||||
llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
|
||||
uv run --with llama-stack-client llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
|
||||
```
|
||||
**Expected Output:**
|
||||
```bash
|
||||
|
@ -170,7 +164,7 @@ curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
|
|||
EOF
|
||||
```
|
||||
|
||||
You can check the available models with the command `llama-stack-client models list`.
|
||||
You can check the available models with the command `uv run --with llama-stack-client llama-stack-client models list`.
|
||||
|
||||
**Expected Output:**
|
||||
```json
|
||||
|
@ -191,18 +185,12 @@ You can check the available models with the command `llama-stack-client models l
|
|||
|
||||
You can also interact with the Llama Stack server using a simple Python script. Below is an example:
|
||||
|
||||
### 1. Activate Conda Environment
|
||||
|
||||
```bash
|
||||
conda activate ollama
|
||||
```
|
||||
|
||||
### 2. Create Python Script (`test_llama_stack.py`)
|
||||
### 1. Create Python Script (`test_llama_stack.py`)
|
||||
```bash
|
||||
touch test_llama_stack.py
|
||||
```
|
||||
|
||||
### 3. Create a Chat Completion Request in Python
|
||||
### 2. Create a Chat Completion Request in Python
|
||||
|
||||
In `test_llama_stack.py`, write the following code:
|
||||
|
||||
|
@ -233,10 +221,10 @@ response = client.inference.chat_completion(
|
|||
print(response.completion_message.content)
|
||||
```
|
||||
|
||||
### 4. Run the Python Script
|
||||
### 3. Run the Python Script
|
||||
|
||||
```bash
|
||||
python test_llama_stack.py
|
||||
uv run --with llama-stack-client python test_llama_stack.py
|
||||
```
|
||||
|
||||
**Expected Output:**
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue