CLI Update: build -> configure -> run (#69)

* remove configure from build

* remove config from build

* configure to regenerate file

* update memory providers

* remove comments

* udpate build script

* add reedme

* update doc

* rename getting started

* update build cli

* update docker build script

* configure update

* clean up configure

* [tmp fix] hardware requirement tmp fix

* clean up build

* fix configure

* add example build files for conda & docker

* remove resolve_distribution_spec

* remove available_distribution_specs

* example build files

* update example build files

* more clean up on build

* add name args to override name

* move distribution to yaml files

* generate distribution specs

* getting started guide

* getting started

* add build yaml to Dockerfile

* cleanup distribution_dependencies

* configure from  docker image name

* build relative paths

* minor comment

* getting started

* Update getting_started.md

* Update getting_started.md

* address comments, configure within docker file

* remove distribution types!

* update getting started

* update documentation

* remove listing distribution

* minor heading

* address nits, remove docker_image=null

* gitignore
This commit is contained in:
Xi Yan 2024-09-16 11:02:26 -07:00 committed by GitHub
parent 73b71d9689
commit d9147f3184
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
27 changed files with 759 additions and 512 deletions

View file

@ -236,151 +236,156 @@ These commands can help understand the model interface and how prompts / message
**NOTE**: Outputs in terminal are color printed to show special tokens. **NOTE**: Outputs in terminal are color printed to show special tokens.
## Step 3: Listing, Building, and Configuring Llama Stack Distributions ## Step 3: Building, and Configuring Llama Stack Distributions
- Please see our [Getting Started](getting_started.md) guide for details.
### Step 3.1: List available distributions ### Step 3.1. Build
In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
Lets start with listing available distributions: - `name`: the name for our distribution (e.g. `8b-instruct`)
- `image_type`: our build image type (`conda | docker`)
- `distribution_spec`: our distribution specs for specifying API providers
- `description`: a short description of the configurations for the distribution
- `providers`: specifies the underlying implementation for serving each API endpoint
- `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
#### Build a local distribution with conda
The following command and specifications allows you to get started with building.
``` ```
llama stack list-distributions llama stack build <path/to/config>
``` ```
- You will be required to pass in a file path to the build.config file (e.g. `./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_toolchain/configs/distributions/` folder.
<pre style="font-family: monospace;"> The file will be of the contents
i+-------------------------------+---------------------------------------+----------------------------------------------------------------------+
| Distribution Type | Providers | Description |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local | { | Use code from `llama_toolchain` itself to serve all llama stack APIs |
| | "inference": "meta-reference", | |
| | "memory": "meta-reference-faiss", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| remote | { | Point to remote services for all llama stack APIs |
| | "inference": "remote", | |
| | "safety": "remote", | |
| | "agentic_system": "remote", | |
| | "memory": "remote" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local-ollama | { | Like local, but use ollama for running LLM inference |
| | "inference": "remote::ollama", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference", | |
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local-plus-fireworks-inference | { | Use Fireworks.ai for running LLM inference |
| | "inference": "remote::fireworks", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference", | |
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local-plus-together-inference | { | Use Together.ai for running LLM inference |
| | "inference": "remote::together", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference", | |
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local-plus-tgi-inference | { | Use TGI (local or with [Hugging Face Inference Endpoints](https:// |
| | "inference": "remote::tgi", | huggingface.co/inference-endpoints/dedicated)) for running LLM |
| | "safety": "meta-reference", | inference. When using HF Inference Endpoints, you must provide the |
| | "agentic_system": "meta-reference", | name of the endpoint. |
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
</pre>
As you can see above, each “distribution” details the “providers” it is composed of. For example, `local` uses the “meta-reference” provider for inference while local-ollama relies on a different provider (Ollama) for inference. Similarly, you can use Fireworks or Together.AI for running inference as well.
### Step 3.2: Build a distribution
Let's imagine you are working with a 8B-Instruct model. The following command will build a package (in the form of a Conda environment) _and_ configure it. As part of the configuration, you will be asked for some inputs (model_id, max_seq_len, etc.) Since we are working with a 8B model, we will name our build `8b-instruct` to help us remember the config.
```
llama stack build
``` ```
$ cat ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml
Once it runs, you will be prompted to enter build name and optional arguments, and should see some outputs in the form:
```
$ llama stack build
Enter value for name (required): 8b-instruct
Enter value for distribution (default: local) (required): local
Enter value for api_providers (optional):
Enter value for image_type (default: conda) (required):
....
....
Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
Successfully setup conda environment. Configuring build...
...
...
YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
Target `8b-test` built with configuration at /home/xiyan/.llama/builds/local/conda/8b-test.yaml
Build spec configuration saved at /home/xiyan/.llama/distributions/local/conda/8b-test-build.yaml
```
You can re-build package based on build config
```
$ cat ~/.llama/distributions/local/conda/8b-instruct-build.yaml
name: 8b-instruct name: 8b-instruct
distribution: local distribution_spec:
api_providers: null distribution_type: local
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
docker_image: null
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda image_type: conda
```
$ llama stack build --config ~/.llama/distributions/local/conda/8b-instruct-build.yaml You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
```
Successfully setup conda environment. Configuring build... $ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
... ...
... ...
Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
Target `8b-instruct` built with configuration at ~/.llama/builds/local/conda/8b-instruct.yaml
Build spec configuration saved at ~/.llama/distributions/local/conda/8b-instruct-build.yaml
``` ```
### Step 3.3: Configure a distribution After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.
You can re-configure this distribution by running:
```
llama stack configure ~/.llama/builds/local/conda/8b-instruct.yaml
```
Here is an example run of how the CLI will guide you to fill the configuration #### How to build distribution with different API providers using configs
To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.
``` ```
$ llama stack configure ~/.llama/builds/local/conda/8b-instruct.yaml $ cat ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
name: local-tgi-conda-example
distribution_spec:
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
docker_image: null
providers:
inference: remote::tgi
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda
```
The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
```
llama stack build --config ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml --name tgi
```
We provide some example build configs to help you get started with building with different API providers.
#### How to build distribution with Docker image
To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build --config <name>-build.yaml`.
```
$ cat ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
name: local-docker-example
distribution_spec:
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
docker_image: null
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: docker
```
The following command allows you to build a Docker image with the name `docker-local`
```
llama stack build --config ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml --name docker-local
Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
WORKDIR /app
...
...
You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
```
### Step 3.2. Configure
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
```
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
- Run `docker images` to check list of available images on your machine.
```
$ llama stack configure ~/.llama/distributions/conda/8b-instruct-build.yaml
Configuring API: inference (meta-reference) Configuring API: inference (meta-reference)
Enter value for model (required): Meta-Llama3.1-8B-Instruct Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional): Enter value for quantization (optional):
Enter value for torch_seed (optional): Enter value for torch_seed (optional):
Enter value for max_seq_len (required): 4096 Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (default: 1): 1 Enter value for max_batch_size (existing: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference) Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield: Entering sub-configuration for llama_guard_shield:
Enter value for model (required): Llama-Guard-3-8B Enter value for model (default: Llama-Guard-3-8B) (required):
Enter value for excluded_categories (required): [] Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False): Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False): Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield: Entering sub-configuration for prompt_guard_shield:
Enter value for model (required): Prompt-Guard-86M Enter value for model (default: Prompt-Guard-86M) (required):
...
... Configuring API: agentic_system (meta-reference)
YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
``` ```
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
As you can see, we did basic configuration above and configured: As you can see, we did basic configuration above and configured:
- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`) - inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
- Llama Guard safety shield with model `Llama-Guard-3-8B` - Llama Guard safety shield with model `Llama-Guard-3-8B`
@ -390,21 +395,18 @@ For how these configurations are stored as yaml, checkout the file printed at th
Note that all configurations as well as models are stored in `~/.llama` Note that all configurations as well as models are stored in `~/.llama`
## Step 4: Starting a Llama Stack Distribution and Testing it
### Step 4.1: Starting a distribution ### Step 3.3. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
Now lets start Llama Stack Distribution Server.
You need the YAML configuration file which was written out at the end by the `llama stack build` step.
``` ```
llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml --port 5000 llama stack run ~/.llama/builds/conda/8b-instruct-run.yaml
``` ```
You should see the Stack server start and print the APIs that it is supporting,
You should see the Llama Stack server start and print the APIs that it is supporting
``` ```
$ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml --port 5000 $ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml
> initializing model parallel with size 1 > initializing model parallel with size 1
> initializing ddp with size 1 > initializing ddp with size 1
@ -434,7 +436,6 @@ INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
``` ```
> [!NOTE] > [!NOTE]
> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`. > Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
@ -443,9 +444,8 @@ INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
This server is running a Llama model locally. This server is running a Llama model locally.
### Step 4.2: Test the distribution ### Step 3.4 Test with Client
Once the server is setup, we can test it with a client to see the example outputs.
Lets test with a client.
``` ```
cd /path/to/llama-stack cd /path/to/llama-stack
conda activate <env> # any environment containing the llama-toolchain pip package will work conda activate <env> # any environment containing the llama-toolchain pip package will work
@ -456,17 +456,19 @@ python -m llama_toolchain.inference.client localhost 5000
This will run the chat completion client and query the distributions /inference/chat_completion API. This will run the chat completion client and query the distributions /inference/chat_completion API.
Here is an example output: Here is an example output:
<pre style="font-family: monospace;"> ```
Initializing client for http://localhost:5000 Initializing client for http://localhost:5000
User>hello world, troll me in two-paragraphs about 42 User>hello world, troll me in two-paragraphs about 42
Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound. Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound.
You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha! You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha!
</pre> ```
Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by: Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
``` ```
python -m llama_toolchain.safety.client localhost 5000 python -m llama_toolchain.safety.client localhost 5000
``` ```
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/sdk_examples) repo.

317
docs/getting_started.md Normal file
View file

@ -0,0 +1,317 @@
# Getting Started
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-toolchain` package.
This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
## Quick Cheatsheet
- Quick 3 line command to build and start a LlamaStack server using our Meta Reference implementation for all API endpoints with `conda` as build type.
**`llama stack build`**
```
llama stack build --config ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml --name my-local-llama-stack
...
...
Build spec configuration saved at ~/.llama/distributions/conda/my-local-llama-stack-build.yaml
```
**`llama stack configure`**
```
llama stack configure ~/.llama/distributions/conda/my-local-llama-stack-build.yaml
Configuring API: inference (meta-reference)
Enter value for model (default: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (required): 4096
Enter value for max_batch_size (default: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): n
Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/my-local-llama-stack-run.yaml
```
**`llama stack run`**
```
llama stack run ~/.llama/builds/conda/my-local-llama-stack-run.yaml
...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
...
Finished model load YES READY
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /inference/embeddings
Serving POST /memory_banks/create
Serving DELETE /memory_bank/documents/delete
Serving DELETE /memory_banks/drop
Serving GET /memory_bank/documents/get
Serving GET /memory_banks/get
Serving POST /memory_bank/insert
Serving GET /memory_banks/list
Serving POST /memory_bank/query
Serving POST /memory_bank/update
Serving POST /safety/run_shields
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Serving GET /telemetry/get_trace
Serving POST /telemetry/log_event
Listening on :::5000
INFO: Started server process [587053]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
```
## Step 1. Build
In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
- `name`: the name for our distribution (e.g. `8b-instruct`)
- `image_type`: our build image type (`conda | docker`)
- `distribution_spec`: our distribution specs for specifying API providers
- `description`: a short description of the configurations for the distribution
- `providers`: specifies the underlying implementation for serving each API endpoint
- `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
#### Build a local distribution with conda
The following command and specifications allows you to get started with building.
```
llama stack build <path/to/config>
```
- You will be required to pass in a file path to the build.config file (e.g. `./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_toolchain/configs/distributions/` folder.
The file will be of the contents
```
$ cat ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml
name: 8b-instruct
distribution_spec:
distribution_type: local
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
docker_image: null
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda
```
You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
```
$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
...
...
Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
```
After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.
#### How to build distribution with different API providers using configs
To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.
```
$ cat ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
name: local-tgi-conda-example
distribution_spec:
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
docker_image: null
providers:
inference: remote::tgi
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda
```
The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
```
llama stack build --config ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml --name tgi
```
We provide some example build configs to help you get started with building with different API providers.
#### How to build distribution with Docker image
To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build --config <name>-build.yaml`.
```
$ cat ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
name: local-docker-example
distribution_spec:
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
docker_image: null
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: docker
```
The following command allows you to build a Docker image with the name `docker-local`
```
llama stack build --config ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml --name docker-local
Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
WORKDIR /app
...
...
You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
```
## Step 2. Configure
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
```
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
- Run `docker images` to check list of available images on your machine.
```
$ llama stack configure ~/.llama/distributions/conda/8b-instruct-build.yaml
Configuring API: inference (meta-reference)
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (existing: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-8B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield:
Enter value for model (default: Prompt-Guard-86M) (required):
Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
```
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
As you can see, we did basic configuration above and configured:
- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
- Llama Guard safety shield with model `Llama-Guard-3-8B`
- Prompt Guard safety shield with model `Prompt-Guard-86M`
For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
Note that all configurations as well as models are stored in `~/.llama`
## Step 3. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
```
llama stack run ~/.llama/builds/conda/8b-instruct-run.yaml
```
You should see the Llama Stack server start and print the APIs that it is supporting
```
$ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 19.28 seconds
NCCL version 2.20.5+cuda12.4
Finished model load YES READY
Serving POST /inference/batch_chat_completion
Serving POST /inference/batch_completion
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /safety/run_shields
Serving POST /agentic_system/memory_bank/attach
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/memory_bank/detach
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Listening on :::5000
INFO: Started server process [453333]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
```
> [!NOTE]
> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
> [!IMPORTANT]
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
This server is running a Llama model locally.
## Step 4. Test with Client
Once the server is setup, we can test it with a client to see the example outputs.
```
cd /path/to/llama-stack
conda activate <env> # any environment containing the llama-toolchain pip package will work
python -m llama_toolchain.inference.client localhost 5000
```
This will run the chat completion client and query the distributions /inference/chat_completion API.
Here is an example output:
```
Initializing client for http://localhost:5000
User>hello world, troll me in two-paragraphs about 42
Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound.
You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha!
```
Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
```
python -m llama_toolchain.safety.client localhost 5000
```
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/sdk_examples) repo.

View file

@ -8,33 +8,11 @@ import argparse
from llama_toolchain.cli.subcommand import Subcommand from llama_toolchain.cli.subcommand import Subcommand
from llama_toolchain.core.datatypes import * # noqa: F403 from llama_toolchain.core.datatypes import * # noqa: F403
from pathlib import Path
import yaml import yaml
def parse_api_provider_tuples(
tuples: str, parser: argparse.ArgumentParser
) -> Dict[str, ProviderSpec]:
from llama_toolchain.core.distribution import api_providers
all_providers = api_providers()
deps = {}
for dep in tuples.split(","):
dep = dep.strip()
if not dep:
continue
api_str, provider = dep.split("=")
api = Api(api_str)
provider = provider.strip()
if provider not in all_providers[api]:
parser.error(f"Provider `{provider}` is not available for API `{api}`")
return
deps[api] = all_providers[api][provider]
return deps
class StackBuild(Subcommand): class StackBuild(Subcommand):
def __init__(self, subparsers: argparse._SubParsersAction): def __init__(self, subparsers: argparse._SubParsersAction):
super().__init__() super().__init__()
@ -48,16 +26,16 @@ class StackBuild(Subcommand):
self.parser.set_defaults(func=self._run_stack_build_command) self.parser.set_defaults(func=self._run_stack_build_command)
def _add_arguments(self): def _add_arguments(self):
from llama_toolchain.core.distribution_registry import (
available_distribution_specs,
)
from llama_toolchain.core.package import ImageType
allowed_ids = [d.distribution_type for d in available_distribution_specs()]
self.parser.add_argument( self.parser.add_argument(
"--config", "config",
type=str, type=str,
help="Path to a config file to use for the build", help="Path to a config file to use for the build. You may find example configs in llama_toolchain/configs/distributions",
)
self.parser.add_argument(
"--name",
type=str,
help="Name of the llama stack build to override from template config",
) )
def _run_stack_build_command_from_build_config( def _run_stack_build_command_from_build_config(
@ -68,69 +46,19 @@ class StackBuild(Subcommand):
from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR
from llama_toolchain.common.serialize import EnumEncoder from llama_toolchain.common.serialize import EnumEncoder
from llama_toolchain.core.distribution_registry import resolve_distribution_spec
from llama_toolchain.core.package import ApiInput, build_package, ImageType from llama_toolchain.core.package import ApiInput, build_package, ImageType
from termcolor import cprint from termcolor import cprint
api_inputs = []
if build_config.distribution == "adhoc":
if not build_config.api_providers:
self.parser.error(
"You must specify API providers with (api=provider,...) for building an adhoc distribution"
)
return
parsed = parse_api_provider_tuples(build_config.api_providers, self.parser)
for api, provider_spec in parsed.items():
for dep in provider_spec.api_dependencies:
if dep not in parsed:
self.parser.error(
f"API {api} needs dependency {dep} provided also"
)
return
api_inputs.append(
ApiInput(
api=api,
provider=provider_spec.provider_type,
)
)
docker_image = None
else:
if build_config.api_providers:
self.parser.error(
"You cannot specify API providers for pre-registered distributions"
)
return
dist = resolve_distribution_spec(build_config.distribution)
if dist is None:
self.parser.error(
f"Could not find distribution {build_config.distribution}"
)
return
for api, provider_type in dist.providers.items():
api_inputs.append(
ApiInput(
api=api,
provider=provider_type,
)
)
docker_image = dist.docker_image
build_package(
api_inputs,
image_type=ImageType(build_config.image_type),
name=build_config.name,
distribution_type=build_config.distribution,
docker_image=docker_image,
)
# save build.yaml spec for building same distribution again # save build.yaml spec for building same distribution again
build_dir = ( if build_config.image_type == ImageType.docker.value:
DISTRIBS_BASE_DIR / build_config.distribution / build_config.image_type # docker needs build file to be in the llama-stack repo dir to be able to copy over to the image
) llama_toolchain_path = Path(os.path.relpath(__file__)).parent.parent.parent
build_dir = (
llama_toolchain_path / "configs/distributions" / build_config.image_type
)
else:
build_dir = DISTRIBS_BASE_DIR / build_config.image_type
os.makedirs(build_dir, exist_ok=True) os.makedirs(build_dir, exist_ok=True)
build_file_path = build_dir / f"{build_config.name}-build.yaml" build_file_path = build_dir / f"{build_config.name}-build.yaml"
@ -138,6 +66,8 @@ class StackBuild(Subcommand):
to_write = json.loads(json.dumps(build_config.dict(), cls=EnumEncoder)) to_write = json.loads(json.dumps(build_config.dict(), cls=EnumEncoder))
f.write(yaml.dump(to_write, sort_keys=False)) f.write(yaml.dump(to_write, sort_keys=False))
build_package(build_config, build_file_path)
cprint( cprint(
f"Build spec configuration saved at {str(build_file_path)}", f"Build spec configuration saved at {str(build_file_path)}",
color="green", color="green",
@ -147,15 +77,18 @@ class StackBuild(Subcommand):
from llama_toolchain.common.prompt_for_config import prompt_for_config from llama_toolchain.common.prompt_for_config import prompt_for_config
from llama_toolchain.core.dynamic import instantiate_class_type from llama_toolchain.core.dynamic import instantiate_class_type
if args.config: if not args.config:
with open(args.config, "r") as f: self.parser.error(
try: "No config file specified. Please use `llama stack build /path/to/*-build.yaml`. Example config files can be found in llama_toolchain/configs/distributions"
build_config = BuildConfig(**yaml.safe_load(f)) )
except Exception as e:
self.parser.error(f"Could not parse config file {args.config}: {e}")
return
self._run_stack_build_command_from_build_config(build_config)
return return
build_config = prompt_for_config(BuildConfig, None) with open(args.config, "r") as f:
self._run_stack_build_command_from_build_config(build_config) try:
build_config = BuildConfig(**yaml.safe_load(f))
except Exception as e:
self.parser.error(f"Could not parse config file {args.config}: {e}")
return
if args.name:
build_config.name = args.name
self._run_stack_build_command_from_build_config(build_config)

View file

@ -8,12 +8,18 @@ import argparse
import json import json
from pathlib import Path from pathlib import Path
import yaml import pkg_resources
import yaml
from llama_toolchain.cli.subcommand import Subcommand from llama_toolchain.cli.subcommand import Subcommand
from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR
from llama_toolchain.common.exec import run_with_pty
from termcolor import cprint from termcolor import cprint
from llama_toolchain.core.datatypes import * # noqa: F403 from llama_toolchain.core.datatypes import * # noqa: F403
import os
from termcolor import cprint
class StackConfigure(Subcommand): class StackConfigure(Subcommand):
@ -31,49 +37,107 @@ class StackConfigure(Subcommand):
self.parser.set_defaults(func=self._run_stack_configure_cmd) self.parser.set_defaults(func=self._run_stack_configure_cmd)
def _add_arguments(self): def _add_arguments(self):
from llama_toolchain.core.distribution_registry import (
available_distribution_specs,
)
from llama_toolchain.core.package import ImageType
allowed_ids = [d.distribution_type for d in available_distribution_specs()]
self.parser.add_argument( self.parser.add_argument(
"config", "config",
type=str, type=str,
help="Path to the package config file (e.g. ~/.llama/builds/<distribution>/<image_type>/<name>.yaml)", help="Path to the build config file (e.g. ~/.llama/builds/<image_type>/<name>-build.yaml). For docker, this could also be the name of the docker image. ",
)
self.parser.add_argument(
"--output-dir",
type=str,
help="Path to the output directory to store generated run.yaml config file. If not specified, will use ~/.llama/build/<image_type>/<name>-run.yaml",
) )
def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None: def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None:
from llama_toolchain.core.package import ImageType from llama_toolchain.core.package import ImageType
config_file = Path(args.config) docker_image = None
if not config_file.exists(): build_config_file = Path(args.config)
self.parser.error( if not build_config_file.exists():
f"Could not find {config_file}. Please run `llama stack build` first" cprint(
f"Could not find {build_config_file}. Trying docker image name instead...",
color="green",
)
docker_image = args.config
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
if args.output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
script = pkg_resources.resource_filename(
"llama_toolchain", "core/configure_container.sh"
)
script_args = [script, docker_image, str(builds_dir)]
return_code = run_with_pty(script_args)
# we have regenerated the build config file with script, now check if it exists
if return_code != 0:
self.parser.error(
f"Can not find {build_config_file}. Please run llama stack build first or check if docker image exists"
)
build_name = docker_image.removeprefix("llamastack-")
cprint(
f"YAML configuration has been written to {builds_dir / f'{build_name}-run.yaml'}",
color="green",
) )
return return
configure_llama_distribution(config_file) with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
self._configure_llama_distribution(build_config, args.output_dir)
def configure_llama_distribution(config_file: Path) -> None: def _configure_llama_distribution(
from llama_toolchain.common.serialize import EnumEncoder self,
from llama_toolchain.core.configure import configure_api_providers build_config: BuildConfig,
output_dir: Optional[str] = None,
):
from llama_toolchain.common.serialize import EnumEncoder
from llama_toolchain.core.configure import configure_api_providers
with open(config_file, "r") as f: builds_dir = BUILDS_BASE_DIR / build_config.image_type
config = PackageConfig(**yaml.safe_load(f)) if output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
package_name = build_config.name.replace("::", "-")
package_file = builds_dir / f"{package_name}-run.yaml"
if config.providers: api2providers = build_config.distribution_spec.providers
cprint(
f"Configuration already exists for {config.distribution_type}. Will overwrite...", stub_config = {
"yellow", api_str: {"provider_type": provider}
attrs=["bold"], for api_str, provider in api2providers.items()
}
if package_file.exists():
cprint(
f"Configuration already exists for {build_config.name}. Will overwrite...",
"yellow",
attrs=["bold"],
)
config = PackageConfig(**yaml.safe_load(package_file.read_text()))
else:
config = PackageConfig(
built_at=datetime.now(),
package_name=package_name,
providers=stub_config,
)
config.providers = configure_api_providers(config.providers)
config.docker_image = (
package_name if build_config.image_type == "docker" else None
) )
config.conda_env = package_name if build_config.image_type == "conda" else None
config.providers = configure_api_providers(config.providers) with open(package_file, "w") as f:
to_write = json.loads(json.dumps(config.dict(), cls=EnumEncoder))
f.write(yaml.dump(to_write, sort_keys=False))
with open(config_file, "w") as fp: cprint(
to_write = json.loads(json.dumps(config.dict(), cls=EnumEncoder)) f"> YAML configuration has been written to {package_file}",
fp.write(yaml.dump(to_write, sort_keys=False)) color="blue",
)
print(f"YAML configuration has been written to {config_file}")

View file

@ -1,55 +0,0 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import argparse
import json
from llama_toolchain.cli.subcommand import Subcommand
class StackListDistributions(Subcommand):
def __init__(self, subparsers: argparse._SubParsersAction):
super().__init__()
self.parser = subparsers.add_parser(
"list-distributions",
prog="llama stack list-distributions",
description="Show available Llama Stack Distributions",
formatter_class=argparse.RawTextHelpFormatter,
)
self._add_arguments()
self.parser.set_defaults(func=self._run_distribution_list_cmd)
def _add_arguments(self):
pass
def _run_distribution_list_cmd(self, args: argparse.Namespace) -> None:
from llama_toolchain.cli.table import print_table
from llama_toolchain.core.distribution_registry import (
available_distribution_specs,
)
# eventually, this should query a registry at llama.meta.com/llamastack/distributions
headers = [
"Distribution Type",
"Providers",
"Description",
]
rows = []
for spec in available_distribution_specs():
providers = {k.value: v for k, v in spec.providers.items()}
rows.append(
[
spec.distribution_type,
json.dumps(providers, indent=2),
spec.description,
]
)
print_table(
rows,
headers,
separate_rows=True,
)

View file

@ -69,9 +69,6 @@ class StackRun(Subcommand):
with open(config_file, "r") as f: with open(config_file, "r") as f:
config = PackageConfig(**yaml.safe_load(f)) config = PackageConfig(**yaml.safe_load(f))
if not config.distribution_type:
raise ValueError("Build config appears to be corrupt.")
if config.docker_image: if config.docker_image:
script = pkg_resources.resource_filename( script = pkg_resources.resource_filename(
"llama_toolchain", "llama_toolchain",

View file

@ -11,7 +11,6 @@ from llama_toolchain.cli.subcommand import Subcommand
from .build import StackBuild from .build import StackBuild
from .configure import StackConfigure from .configure import StackConfigure
from .list_apis import StackListApis from .list_apis import StackListApis
from .list_distributions import StackListDistributions
from .list_providers import StackListProviders from .list_providers import StackListProviders
from .run import StackRun from .run import StackRun
@ -31,6 +30,5 @@ class StackParser(Subcommand):
StackBuild.create(subparsers) StackBuild.create(subparsers)
StackConfigure.create(subparsers) StackConfigure.create(subparsers)
StackListApis.create(subparsers) StackListApis.create(subparsers)
StackListDistributions.create(subparsers)
StackListProviders.create(subparsers) StackListProviders.create(subparsers)
StackRun.create(subparsers) StackRun.create(subparsers)

View file

@ -0,0 +1,10 @@
name: local-conda-example
distribution_spec:
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda

View file

@ -0,0 +1,10 @@
name: local-fireworks-conda-example
distribution_spec:
description: Use Fireworks.ai for running LLM inference
providers:
inference: remote::fireworks
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda

View file

@ -0,0 +1,10 @@
name: local-ollama-conda-example
distribution_spec:
description: Like local, but use ollama for running LLM inference
providers:
inference: remote::ollama
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda

View file

@ -0,0 +1,10 @@
name: local-tgi-conda-example
distribution_spec:
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
providers:
inference: remote::tgi
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda

View file

@ -0,0 +1,10 @@
name: local-tgi-conda-example
distribution_spec:
description: Use Together.ai for running LLM inference
providers:
inference: remote::together
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda

View file

@ -0,0 +1,7 @@
description: Like local, but use ollama for running LLM inference
providers:
inference: remote::ollama
safety: meta-reference
agentic_system: meta-reference
memory: meta-reference-faiss
telemetry: console

View file

@ -0,0 +1,7 @@
description: Use Fireworks.ai for running LLM inference
providers:
inference: remote::fireworks
safety: meta-reference
agentic_system: meta-reference
memory: meta-reference-faiss
telemetry: console

View file

@ -0,0 +1,6 @@
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
providers:
inference: remote::tgi
safety: meta-reference
agentic_system: meta-reference
memory: meta-reference-faiss

View file

@ -0,0 +1,7 @@
description: Use Together.ai for running LLM inference
providers:
inference: remote::together
safety: meta-reference
agentic_system: meta-reference
memory: meta-reference-faiss
telemetry: console

View file

@ -0,0 +1,7 @@
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console

View file

@ -0,0 +1,10 @@
name: local-docker-example
distribution_spec:
description: Use code from `llama_toolchain` itself to serve all llama stack APIs
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: docker

View file

@ -19,17 +19,15 @@ fi
set -euo pipefail set -euo pipefail
if [ "$#" -ne 4 ]; then if [ "$#" -ne 2 ]; then
echo "Usage: $0 <distribution_type> <build_name> <pip_dependencies>" >&2 echo "Usage: $0 <distribution_type> <build_name> <pip_dependencies>" >&2
echo "Example: $0 <distribution_type> mybuild 'numpy pandas scipy'" >&2 echo "Example: $0 <distribution_type> mybuild 'numpy pandas scipy'" >&2
exit 1 exit 1
fi fi
distribution_type="$1" build_name="$1"
build_name="$2"
env_name="llamastack-$build_name" env_name="llamastack-$build_name"
config_file="$3" pip_dependencies="$2"
pip_dependencies="$4"
# Define color codes # Define color codes
RED='\033[0;31m' RED='\033[0;31m'
@ -115,7 +113,3 @@ ensure_conda_env_python310() {
} }
ensure_conda_env_python310 "$env_name" "$pip_dependencies" ensure_conda_env_python310 "$env_name" "$pip_dependencies"
printf "${GREEN}Successfully setup conda environment. Configuring build...${NC}\n"
$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama stack configure $config_file

View file

@ -4,18 +4,17 @@ LLAMA_MODELS_DIR=${LLAMA_MODELS_DIR:-}
LLAMA_TOOLCHAIN_DIR=${LLAMA_TOOLCHAIN_DIR:-} LLAMA_TOOLCHAIN_DIR=${LLAMA_TOOLCHAIN_DIR:-}
TEST_PYPI_VERSION=${TEST_PYPI_VERSION:-} TEST_PYPI_VERSION=${TEST_PYPI_VERSION:-}
if [ "$#" -ne 5 ]; then if [ "$#" -ne 4 ]; then
echo "Usage: $0 <distribution_type> <build_name> <docker_base> <pip_dependencies> echo "Usage: $0 <build_name> <docker_base> <pip_dependencies>
echo "Example: $0 distribution_type my-fastapi-app python:3.9-slim 'fastapi uvicorn' echo "Example: $0 my-fastapi-app python:3.9-slim 'fastapi uvicorn'
exit 1 exit 1
fi fi
distribution_type=$1 build_name="$1"
build_name="$2"
image_name="llamastack-$build_name" image_name="llamastack-$build_name"
docker_base=$3 docker_base=$2
config_file=$4 build_file_path=$3
pip_dependencies=$5 pip_dependencies=$4
# Define color codes # Define color codes
RED='\033[0;31m' RED='\033[0;31m'
@ -26,6 +25,8 @@ set -euo pipefail
SCRIPT_DIR=$(dirname "$(readlink -f "$0")") SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR")) REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR"))
DOCKER_BINARY=${DOCKER_BINARY:-docker}
DOCKER_OPTS=${DOCKER_OPTS:-}
TEMP_DIR=$(mktemp -d) TEMP_DIR=$(mktemp -d)
@ -93,6 +94,8 @@ add_to_docker <<EOF
EOF EOF
add_to_docker "ADD $build_file_path ./llamastack-build.yaml"
printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile" printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile"
cat $TEMP_DIR/Dockerfile cat $TEMP_DIR/Dockerfile
printf "\n" printf "\n"
@ -105,10 +108,10 @@ if [ -n "$LLAMA_MODELS_DIR" ]; then
mounts="$mounts -v $(readlink -f $LLAMA_MODELS_DIR):$models_mount" mounts="$mounts -v $(readlink -f $LLAMA_MODELS_DIR):$models_mount"
fi fi
set -x set -x
podman build -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts $DOCKER_BINARY build $DOCKER_OPTS -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts
set +x set +x
printf "${GREEN}Succesfully setup Podman image. Configuring build...${NC}"
echo "You can run it with: podman run -p 8000:8000 $image_name" echo "You can run it with: podman run -p 8000:8000 $image_name"
$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama stack configure $config_file echo "Checking image builds..."
podman run -it $image_name cat llamastack-build.yaml

View file

@ -0,0 +1,31 @@
#!/bin/bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
set -euo pipefail
error_handler() {
echo "Error occurred in script at line: ${1}" >&2
exit 1
}
trap 'error_handler ${LINENO}' ERR
if [ $# -lt 2 ]; then
echo "Usage: $0 <container name> <build file path>"
exit 1
fi
docker_image="$1"
host_build_dir="$2"
container_build_dir="/app/builds"
set -x
podman run -it \
-v $host_build_dir:$container_build_dir \
$docker_image \
llama stack configure ./llamastack-build.yaml --output-dir $container_build_dir

View file

@ -151,11 +151,12 @@ def remote_provider_spec(
@json_schema_type @json_schema_type
class DistributionSpec(BaseModel): class DistributionSpec(BaseModel):
distribution_type: str description: Optional[str] = Field(
description: str default="",
description="Description of the distribution",
)
docker_image: Optional[str] = None docker_image: Optional[str] = None
providers: Dict[Api, str] = Field( providers: Dict[str, str] = Field(
default_factory=dict, default_factory=dict,
description="Provider Types for each of the APIs provided by this distribution", description="Provider Types for each of the APIs provided by this distribution",
) )
@ -172,8 +173,6 @@ Reference to the distribution this package refers to. For unregistered (adhoc) p
this could be just a hash this could be just a hash
""", """,
) )
distribution_type: Optional[str] = None
docker_image: Optional[str] = Field( docker_image: Optional[str] = Field(
default=None, default=None,
description="Reference to the docker image if this package refers to a container", description="Reference to the docker image if this package refers to a container",
@ -194,12 +193,8 @@ the dependencies of these providers as well.
@json_schema_type @json_schema_type
class BuildConfig(BaseModel): class BuildConfig(BaseModel):
name: str name: str
distribution: str = Field( distribution_spec: DistributionSpec = Field(
default="local", description="Type of distribution to build (adhoc | {})" description="The distribution spec to build including API providers. "
)
api_providers: Optional[str] = Field(
default_factory=list,
description="List of API provider names to build",
) )
image_type: str = Field( image_type: str = Field(
default="conda", default="conda",

View file

@ -31,16 +31,6 @@ SERVER_DEPENDENCIES = [
] ]
def distribution_dependencies(distribution: DistributionSpec) -> List[str]:
# only consider InlineProviderSpecs when calculating dependencies
return [
dep
for provider_spec in distribution.provider_specs.values()
if isinstance(provider_spec, InlineProviderSpec)
for dep in provider_spec.pip_packages
] + SERVER_DEPENDENCIES
def stack_apis() -> List[Api]: def stack_apis() -> List[Api]:
return [v for v in Api] return [v for v in Api]

View file

@ -5,84 +5,19 @@
# the root directory of this source tree. # the root directory of this source tree.
from functools import lru_cache from functools import lru_cache
from pathlib import Path
from typing import List, Optional from typing import List, Optional
from .datatypes import * # noqa: F403 from .datatypes import * # noqa: F403
import yaml
@lru_cache() @lru_cache()
def available_distribution_specs() -> List[DistributionSpec]: def available_distribution_specs() -> List[DistributionSpec]:
return [ distribution_specs = []
DistributionSpec( for p in Path("llama_toolchain/configs/distributions/distribution_registry").rglob(
distribution_type="local", "*.yaml"
description="Use code from `llama_toolchain` itself to serve all llama stack APIs", ):
providers={ with open(p, "r") as f:
Api.inference: "meta-reference", distribution_specs.append(DistributionSpec(**yaml.safe_load(f)))
Api.memory: "meta-reference-faiss",
Api.safety: "meta-reference",
Api.agentic_system: "meta-reference",
Api.telemetry: "console",
},
),
DistributionSpec(
distribution_type="remote",
description="Point to remote services for all llama stack APIs",
providers={
**{x: "remote" for x in Api},
Api.telemetry: "console",
},
),
DistributionSpec(
distribution_type="local-ollama",
description="Like local, but use ollama for running LLM inference",
providers={
Api.inference: remote_provider_type("ollama"),
Api.safety: "meta-reference",
Api.agentic_system: "meta-reference",
Api.memory: "meta-reference-faiss",
Api.telemetry: "console",
},
),
DistributionSpec(
distribution_type="local-plus-fireworks-inference",
description="Use Fireworks.ai for running LLM inference",
providers={
Api.inference: remote_provider_type("fireworks"),
Api.safety: "meta-reference",
Api.agentic_system: "meta-reference",
Api.memory: "meta-reference-faiss",
Api.telemetry: "console",
},
),
DistributionSpec(
distribution_type="local-plus-together-inference",
description="Use Together.ai for running LLM inference",
providers={
Api.inference: remote_provider_type("together"),
Api.safety: "meta-reference",
Api.agentic_system: "meta-reference",
Api.memory: "meta-reference-faiss",
Api.telemetry: "console",
},
),
DistributionSpec(
distribution_type="local-plus-tgi-inference",
description="Use TGI for running LLM inference",
providers={
Api.inference: remote_provider_type("tgi"),
Api.safety: "meta-reference",
Api.agentic_system: "meta-reference",
Api.memory: "meta-reference-faiss",
},
),
]
return distribution_specs
@lru_cache()
def resolve_distribution_spec(
distribution_type: str,
) -> Optional[DistributionSpec]:
for spec in available_distribution_specs():
if spec.distribution_type == distribution_type:
return spec
return None

View file

@ -21,6 +21,8 @@ from pydantic import BaseModel
from termcolor import cprint from termcolor import cprint
from llama_toolchain.core.datatypes import * # noqa: F403 from llama_toolchain.core.datatypes import * # noqa: F403
from pathlib import Path
from llama_toolchain.core.distribution import api_providers, SERVER_DEPENDENCIES from llama_toolchain.core.distribution import api_providers, SERVER_DEPENDENCIES
@ -39,87 +41,35 @@ class ApiInput(BaseModel):
provider: str provider: str
def build_package( def build_package(build_config: BuildConfig, build_file_path: Path):
api_inputs: List[ApiInput],
image_type: ImageType,
name: str,
distribution_type: Optional[str] = None,
docker_image: Optional[str] = None,
):
if not distribution_type:
distribution_type = "adhoc"
build_dir = BUILDS_BASE_DIR / distribution_type / image_type.value
os.makedirs(build_dir, exist_ok=True)
package_name = name.replace("::", "-")
package_file = build_dir / f"{package_name}.yaml"
all_providers = api_providers()
package_deps = Dependencies( package_deps = Dependencies(
docker_image=docker_image or "python:3.10-slim", docker_image=build_config.distribution_spec.docker_image or "python:3.10-slim",
pip_packages=SERVER_DEPENDENCIES, pip_packages=SERVER_DEPENDENCIES,
) )
stub_config = {} # extend package dependencies based on providers spec
for api_input in api_inputs: all_providers = api_providers()
api = api_input.api for api_str, provider in build_config.distribution_spec.providers.items():
providers_for_api = all_providers[api] providers_for_api = all_providers[Api(api_str)]
if api_input.provider not in providers_for_api: if provider not in providers_for_api:
raise ValueError( raise ValueError(
f"Provider `{api_input.provider}` is not available for API `{api}`" f"Provider `{provider}` is not available for API `{api_str}`"
) )
provider = providers_for_api[api_input.provider] provider_spec = providers_for_api[provider]
package_deps.pip_packages.extend(provider.pip_packages) package_deps.pip_packages.extend(provider_spec.pip_packages)
if provider.docker_image: if provider_spec.docker_image:
raise ValueError("A stack's dependencies cannot have a docker image") raise ValueError("A stack's dependencies cannot have a docker image")
stub_config[api.value] = {"provider_type": api_input.provider} if build_config.image_type == ImageType.docker.value:
if package_file.exists():
cprint(
f"Build `{package_name}` exists; will reconfigure",
color="yellow",
)
c = PackageConfig(**yaml.safe_load(package_file.read_text()))
for api_str, new_config in stub_config.items():
if api_str not in c.providers:
c.providers[api_str] = new_config
else:
existing_config = c.providers[api_str]
if existing_config["provider_type"] != new_config["provider_type"]:
cprint(
f"Provider `{api_str}` has changed from `{existing_config}` to `{new_config}`",
color="yellow",
)
c.providers[api_str] = new_config
else:
c = PackageConfig(
built_at=datetime.now(),
package_name=package_name,
providers=stub_config,
)
c.distribution_type = distribution_type
c.docker_image = package_name if image_type == ImageType.docker else None
c.conda_env = package_name if image_type == ImageType.conda else None
with open(package_file, "w") as f:
to_write = json.loads(json.dumps(c.dict(), cls=EnumEncoder))
f.write(yaml.dump(to_write, sort_keys=False))
if image_type == ImageType.docker:
script = pkg_resources.resource_filename( script = pkg_resources.resource_filename(
"llama_toolchain", "core/build_container.sh" "llama_toolchain", "core/build_container.sh"
) )
args = [ args = [
script, script,
distribution_type, build_config.name,
package_name,
package_deps.docker_image, package_deps.docker_image,
str(package_file), str(build_file_path),
" ".join(package_deps.pip_packages), " ".join(package_deps.pip_packages),
] ]
else: else:
@ -128,21 +78,14 @@ def build_package(
) )
args = [ args = [
script, script,
distribution_type, build_config.name,
package_name,
str(package_file),
" ".join(package_deps.pip_packages), " ".join(package_deps.pip_packages),
] ]
return_code = run_with_pty(args) return_code = run_with_pty(args)
if return_code != 0: if return_code != 0:
cprint( cprint(
f"Failed to build target {package_name} with return code {return_code}", f"Failed to build target {build_config.name} with return code {return_code}",
color="red", color="red",
) )
return return
cprint(
f"Target `{package_name}` built with configuration at {str(package_file)}",
color="green",
)

View file

@ -28,10 +28,10 @@ from llama_models.llama3.api.datatypes import Message, ToolPromptFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_models.llama3.reference_impl.model import Transformer from llama_models.llama3.reference_impl.model import Transformer
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
from termcolor import cprint
from llama_toolchain.common.model_utils import model_local_dir from llama_toolchain.common.model_utils import model_local_dir
from llama_toolchain.inference.api import QuantizationType from llama_toolchain.inference.api import QuantizationType
from termcolor import cprint
from .config import MetaReferenceImplConfig from .config import MetaReferenceImplConfig
@ -80,6 +80,7 @@ class Llama:
torch.distributed.init_process_group("nccl") torch.distributed.init_process_group("nccl")
model_parallel_size = config.model_parallel_size model_parallel_size = config.model_parallel_size
if not model_parallel_is_initialized(): if not model_parallel_is_initialized():
initialize_model_parallel(model_parallel_size) initialize_model_parallel(model_parallel_size)

View file

@ -0,0 +1,5 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.