Kill llama stack configure (#371)

* remove configure

* build msg

* wip

* build->run

* delete prints

* docs

* fix docs, kill configure

* precommit

* update fireworks build

* docs

* clean up build

* comments

* fix

* test

* remove baking build.yaml into docker

* fix msg, urls

* configure msg
This commit is contained in:
Xi Yan 2024-11-06 13:32:10 -08:00 committed by GitHub
parent d289afdbde
commit 748606195b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 248 additions and 401 deletions

View file

@ -61,49 +61,7 @@
"```\n", "```\n",
"For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n", "For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n",
"$ export LLAMA_CHECKPOINT_DIR=~/.llama\n", "$ export LLAMA_CHECKPOINT_DIR=~/.llama\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"```\n", "```\n",
"Follow the prompts as part of configure.\n",
"Here is a sample output \n",
"```\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"\n",
"Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n",
"+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n",
"\n",
"Configuring API `inference`...\n",
"=== Configuring provider `meta-reference` for API inference...\n",
"Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n",
"Do you want to configure quantization? (y/n): n\n",
"Enter value for torch_seed (optional): \n",
"Enter value for max_seq_len (default: 4096) (required): \n",
"Enter value for max_batch_size (default: 1) (required): \n",
"\n",
"Configuring API `safety`...\n",
"=== Configuring provider `meta-reference` for API safety...\n",
"Do you want to configure llama_guard_shield? (y/n): n\n",
"Do you want to configure prompt_guard_shield? (y/n): n\n",
"\n",
"Configuring API `agents`...\n",
"=== Configuring provider `meta-reference` for API agents...\n",
"Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n",
"\n",
"Configuring SqliteKVStoreConfig:\n",
"Enter value for namespace (optional): \n",
"Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n",
"\n",
"Configuring API `memory`...\n",
"=== Configuring provider `meta-reference` for API memory...\n",
"> Please enter the supported memory bank type your provider has for memory: vector\n",
"\n",
"Configuring API `telemetry`...\n",
"=== Configuring provider `meta-reference` for API telemetry...\n",
"\n",
"> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n",
"You can now run `llama stack run local-gpu --port PORT`\n",
"YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n",
"```\n",
"NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n",
"\n", "\n",
"5. Run the Stack Server\n", "5. Run the Stack Server\n",
"```\n", "```\n",

View file

@ -1,53 +1,56 @@
# Developer Guide: Assemble a Llama Stack Distribution # Developer Guide: Assemble a Llama Stack Distribution
> NOTE: This doc may be out-of-date.
This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution. This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
## Step 1. Build ## Step 1. Build
In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
- `name`: the name for our distribution (e.g. `8b-instruct`) ### Llama Stack Build Options
```
llama stack build -h
```
We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
- `name`: the name for our distribution (e.g. `my-stack`)
- `image_type`: our build image type (`conda | docker`) - `image_type`: our build image type (`conda | docker`)
- `distribution_spec`: our distribution specs for specifying API providers - `distribution_spec`: our distribution specs for specifying API providers
- `description`: a short description of the configurations for the distribution - `description`: a short description of the configurations for the distribution
- `providers`: specifies the underlying implementation for serving each API endpoint - `providers`: specifies the underlying implementation for serving each API endpoint
- `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment. - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations. ::::{tab-set}
:::{tab-item} Building from Scratch
After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
#### Building from scratch
- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations. - For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
``` ```
llama stack build llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
> Enter the image type you want your Llama Stack to be built as (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
Tip: use <TAB> to see options for the providers.
> Enter provider for API inference: meta-reference
> Enter provider for API safety: meta-reference
> Enter provider for API agents: meta-reference
> Enter provider for API memory: meta-reference
> Enter provider for API datasetio: meta-reference
> Enter provider for API scoring: meta-reference
> Enter provider for API eval: meta-reference
> Enter provider for API telemetry: meta-reference
> (Optional) Enter a short description for your Llama Stack:
You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml`
``` ```
:::
Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs. :::{tab-item} Building from a template
```
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct
> Enter the image type you want your distribution to be built with (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
> (Optional) Enter a short description for your Llama Stack distribution:
Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml
```
**Ollama (optional)**
If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download).
#### Building from templates
- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers. - To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
The following command will allow you to see the available templates and their corresponding providers. The following command will allow you to see the available templates and their corresponding providers.
@ -59,18 +62,21 @@ llama stack build --list-templates
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| Template Name | Providers | Description | | Template Name | Providers | Description |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| bedrock | { | Use Amazon Bedrock APIs. | | hf-serverless | { | Like local, but use Hugging Face Inference API (serverless) for running LLM |
| | "inference": "remote::bedrock", | | | | "inference": "remote::hf::serverless", | inference. |
| | "memory": "meta-reference", | | | | "memory": "meta-reference", | See https://hf.co/docs/api-inference. |
| | "safety": "meta-reference", | | | | "safety": "meta-reference", | |
| | "agents": "meta-reference", | | | | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
| | } | | | | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| databricks | { | Use Databricks for running LLM inference | | together | { | Use Together.ai for running LLM inference |
| | "inference": "remote::databricks", | | | | "inference": "remote::together", | |
| | "memory": "meta-reference", | | | | "memory": [ | |
| | "safety": "meta-reference", | | | | "meta-reference", | |
| | "remote::weaviate" | |
| | ], | |
| | "safety": "remote::together", | |
| | "agents": "meta-reference", | | | | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
| | } | | | | } | |
@ -88,17 +94,37 @@ llama stack build --list-templates
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
| | } | | | | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| hf-endpoint | { | Like local, but use Hugging Face Inference Endpoints for running LLM inference. | | databricks | { | Use Databricks for running LLM inference |
| | "inference": "remote::hf::endpoint", | See https://hf.co/docs/api-endpoints. | | | "inference": "remote::databricks", | |
| | "memory": "meta-reference", | | | | "memory": "meta-reference", | |
| | "safety": "meta-reference", | | | | "safety": "meta-reference", | |
| | "agents": "meta-reference", | | | | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
| | } | | | | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| hf-serverless | { | Like local, but use Hugging Face Inference API (serverless) for running LLM | | vllm | { | Like local, but use vLLM for running LLM inference |
| | "inference": "remote::hf::serverless", | inference. | | | "inference": "vllm", | |
| | "memory": "meta-reference", | See https://hf.co/docs/api-inference. | | | "memory": "meta-reference", | |
| | "safety": "meta-reference", | |
| | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | |
| | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| tgi | { | Use TGI for running LLM inference |
| | "inference": "remote::tgi", | |
| | "memory": [ | |
| | "meta-reference", | |
| | "remote::chromadb", | |
| | "remote::pgvector" | |
| | ], | |
| | "safety": "meta-reference", | |
| | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | |
| | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| bedrock | { | Use Amazon Bedrock APIs. |
| | "inference": "remote::bedrock", | |
| | "memory": "meta-reference", | |
| | "safety": "meta-reference", | | | | "safety": "meta-reference", | |
| | "agents": "meta-reference", | | | | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
@ -140,31 +166,8 @@ llama stack build --list-templates
| | "telemetry": "meta-reference" | | | | "telemetry": "meta-reference" | |
| | } | | | | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| tgi | { | Use TGI for running LLM inference | | hf-endpoint | { | Like local, but use Hugging Face Inference Endpoints for running LLM inference. |
| | "inference": "remote::tgi", | | | | "inference": "remote::hf::endpoint", | See https://hf.co/docs/api-endpoints. |
| | "memory": [ | |
| | "meta-reference", | |
| | "remote::chromadb", | |
| | "remote::pgvector" | |
| | ], | |
| | "safety": "meta-reference", | |
| | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | |
| | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| together | { | Use Together.ai for running LLM inference |
| | "inference": "remote::together", | |
| | "memory": [ | |
| | "meta-reference", | |
| | "remote::weaviate" | |
| | ], | |
| | "safety": "remote::together", | |
| | "agents": "meta-reference", | |
| | "telemetry": "meta-reference" | |
| | } | |
+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
| vllm | { | Like local, but use vLLM for running LLM inference |
| | "inference": "vllm", | |
| | "memory": "meta-reference", | | | | "memory": "meta-reference", | |
| | "safety": "meta-reference", | | | | "safety": "meta-reference", | |
| | "agents": "meta-reference", | | | | "agents": "meta-reference", | |
@ -175,6 +178,7 @@ llama stack build --list-templates
You may then pick a template to build your distribution with providers fitted to your liking. You may then pick a template to build your distribution with providers fitted to your liking.
For example, to build a distribution with TGI as the inference provider, you can run:
``` ```
llama stack build --template tgi llama stack build --template tgi
``` ```
@ -182,15 +186,14 @@ llama stack build --template tgi
``` ```
$ llama stack build --template tgi $ llama stack build --template tgi
... ...
... You can now edit ~/.llama/distributions/llamastack-tgi/tgi-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-tgi/tgi-run.yaml`
Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
``` ```
:::
#### Building from config file :::{tab-item} Building from a pre-existing build config file
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. - The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`.
``` ```
$ cat llama_stack/templates/ollama/build.yaml $ cat llama_stack/templates/ollama/build.yaml
@ -210,148 +213,111 @@ image_type: conda
``` ```
llama stack build --config llama_stack/templates/ollama/build.yaml llama stack build --config llama_stack/templates/ollama/build.yaml
``` ```
:::
#### How to build distribution with Docker image :::{tab-item} Building Docker
> [!TIP] > [!TIP]
> Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman. > Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman.
To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type. To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
``` ```
llama stack build --template local --image-type docker llama stack build --template ollama --image-type docker
``` ```
Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
``` ```
name: local-docker-example $ llama stack build --template ollama --image-type docker
distribution_spec:
description: Use code from `llama_stack` itself to serve all llama stack APIs
docker_image: null
providers:
inference: meta-reference
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: docker
```
The following command allows you to build a Docker image with the name `<name>`
```
llama stack build --config <name>-build.yaml
Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
WORKDIR /app
... ...
Dockerfile created successfully in /tmp/tmp.viA3a3Rdsg/DockerfileFROM python:3.10-slim
... ...
You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
``` ```
After this step is successful, you should be able to find the built docker image and test it with `llama stack run <path/to/run.yaml>`.
:::
## Step 2. Configure ::::
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>] ## Step 2. Run
``` Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
- Run `docker images` to check list of available images on your machine.
``` ```
$ llama stack configure tgi llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
Configuring API: inference (meta-reference)
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (existing: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-1B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield:
Enter value for model (default: Prompt-Guard-86M) (required):
Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
``` ```
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
As you can see, we did basic configuration above and configured:
- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
- Llama Guard safety shield with model `Llama-Guard-3-1B`
- Prompt Guard safety shield with model `Prompt-Guard-86M`
For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
Note that all configurations as well as models are stored in `~/.llama`
## Step 3. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
```
llama stack run 8b-instruct
``` ```
$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
You should see the Llama Stack server start and print the APIs that it is supporting Loaded model...
Serving API datasets
GET /datasets/get
GET /datasets/list
POST /datasets/register
Serving API inspect
GET /health
GET /providers/list
GET /routes/list
Serving API inference
POST /inference/chat_completion
POST /inference/completion
POST /inference/embeddings
Serving API scoring_functions
GET /scoring_functions/get
GET /scoring_functions/list
POST /scoring_functions/register
Serving API scoring
POST /scoring/score
POST /scoring/score_batch
Serving API memory_banks
GET /memory_banks/get
GET /memory_banks/list
POST /memory_banks/register
Serving API memory
POST /memory/insert
POST /memory/query
Serving API safety
POST /safety/run_shield
Serving API eval
POST /eval/evaluate
POST /eval/evaluate_batch
POST /eval/job/cancel
GET /eval/job/result
GET /eval/job/status
Serving API shields
GET /shields/get
GET /shields/list
POST /shields/register
Serving API datasetio
GET /datasetio/get_rows_paginated
Serving API telemetry
GET /telemetry/get_trace
POST /telemetry/log_event
Serving API models
GET /models/get
GET /models/list
POST /models/register
Serving API agents
POST /agents/create
POST /agents/session/create
POST /agents/turn/create
POST /agents/delete
POST /agents/session/delete
POST /agents/session/get
POST /agents/step/get
POST /agents/turn/get
``` Listening on ['::', '0.0.0.0']:5000
$ llama stack run 8b-instruct INFO: Started server process [2935911]
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 19.28 seconds
NCCL version 2.20.5+cuda12.4
Finished model load YES READY
Serving POST /inference/batch_chat_completion
Serving POST /inference/batch_completion
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /safety/run_shield
Serving POST /agentic_system/memory_bank/attach
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/memory_bank/detach
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Listening on :::5000
INFO: Started server process [453333]
INFO: Waiting for application startup. INFO: Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
``` ```
> [!NOTE]
> Configuration is in `~/.llama/builds/local/conda/tgi-run.yaml`. Feel free to increase `max_seq_len`.
> [!IMPORTANT] > [!IMPORTANT]
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. > The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
> [!TIP] > [!TIP]
> You might need to use the flag `--disable-ipv6` to Disable IPv6 support > You might need to use the flag `--disable-ipv6` to Disable IPv6 support
This server is running a Llama model locally.

View file

@ -12,6 +12,10 @@ import os
from functools import lru_cache from functools import lru_cache
from pathlib import Path from pathlib import Path
from llama_stack.distribution.distribution import get_provider_registry
from llama_stack.distribution.utils.dynamic import instantiate_class_type
TEMPLATES_PATH = Path(os.path.relpath(__file__)).parent.parent.parent / "templates" TEMPLATES_PATH = Path(os.path.relpath(__file__)).parent.parent.parent / "templates"
@ -176,6 +180,66 @@ class StackBuild(Subcommand):
return return
self._run_stack_build_command_from_build_config(build_config) self._run_stack_build_command_from_build_config(build_config)
def _generate_run_config(self, build_config: BuildConfig, build_dir: Path) -> None:
"""
Generate a run.yaml template file for user to edit from a build.yaml file
"""
import json
import yaml
from termcolor import cprint
from llama_stack.distribution.build import ImageType
apis = list(build_config.distribution_spec.providers.keys())
run_config = StackRunConfig(
built_at=datetime.now(),
docker_image=(
build_config.name
if build_config.image_type == ImageType.docker.value
else None
),
image_name=build_config.name,
conda_env=(
build_config.name
if build_config.image_type == ImageType.conda.value
else None
),
apis=apis,
providers={},
)
# build providers dict
provider_registry = get_provider_registry()
for api in apis:
run_config.providers[api] = []
provider_types = build_config.distribution_spec.providers[api]
if isinstance(provider_types, str):
provider_types = [provider_types]
for i, provider_type in enumerate(provider_types):
p_spec = Provider(
provider_id=f"{provider_type}-{i}",
provider_type=provider_type,
config={},
)
config_type = instantiate_class_type(
provider_registry[Api(api)][provider_type].config_class
)
p_spec.config = config_type()
run_config.providers[api].append(p_spec)
os.makedirs(build_dir, exist_ok=True)
run_config_file = build_dir / f"{build_config.name}-run.yaml"
with open(run_config_file, "w") as f:
to_write = json.loads(run_config.model_dump_json())
f.write(yaml.dump(to_write, sort_keys=False))
cprint(
f"You can now edit {run_config_file} and run `llama stack run {run_config_file}`",
color="green",
)
def _run_stack_build_command_from_build_config( def _run_stack_build_command_from_build_config(
self, build_config: BuildConfig self, build_config: BuildConfig
) -> None: ) -> None:
@ -183,48 +247,24 @@ class StackBuild(Subcommand):
import os import os
import yaml import yaml
from termcolor import cprint
from llama_stack.distribution.build import build_image, ImageType from llama_stack.distribution.build import build_image
from llama_stack.distribution.utils.config_dirs import DISTRIBS_BASE_DIR from llama_stack.distribution.utils.config_dirs import DISTRIBS_BASE_DIR
from llama_stack.distribution.utils.serialize import EnumEncoder
# save build.yaml spec for building same distribution again # save build.yaml spec for building same distribution again
if build_config.image_type == ImageType.docker.value: build_dir = DISTRIBS_BASE_DIR / f"llamastack-{build_config.name}"
# docker needs build file to be in the llama-stack repo dir to be able to copy over to the image
llama_stack_path = Path(
os.path.abspath(__file__)
).parent.parent.parent.parent
build_dir = llama_stack_path / "tmp/configs/"
else:
build_dir = DISTRIBS_BASE_DIR / f"llamastack-{build_config.name}"
os.makedirs(build_dir, exist_ok=True) os.makedirs(build_dir, exist_ok=True)
build_file_path = build_dir / f"{build_config.name}-build.yaml" build_file_path = build_dir / f"{build_config.name}-build.yaml"
with open(build_file_path, "w") as f: with open(build_file_path, "w") as f:
to_write = json.loads(json.dumps(build_config.dict(), cls=EnumEncoder)) to_write = json.loads(build_config.model_dump_json())
f.write(yaml.dump(to_write, sort_keys=False)) f.write(yaml.dump(to_write, sort_keys=False))
return_code = build_image(build_config, build_file_path) return_code = build_image(build_config, build_file_path)
if return_code != 0: if return_code != 0:
return return
configure_name = ( self._generate_run_config(build_config, build_dir)
build_config.name
if build_config.image_type == "conda"
else (f"llamastack-{build_config.name}")
)
if build_config.image_type == "conda":
cprint(
f"You can now run `llama stack configure {configure_name}`",
color="green",
)
else:
cprint(
f"You can now edit your run.yaml file and run `docker run -it -p 5000:5000 {build_config.name}`. See full command in llama-stack/distributions/",
color="green",
)
def _run_template_list_cmd(self, args: argparse.Namespace) -> None: def _run_template_list_cmd(self, args: argparse.Namespace) -> None:
import json import json

View file

@ -7,8 +7,6 @@
import argparse import argparse
from llama_stack.cli.subcommand import Subcommand from llama_stack.cli.subcommand import Subcommand
from llama_stack.distribution.utils.config_dirs import BUILDS_BASE_DIR
from llama_stack.distribution.datatypes import * # noqa: F403
class StackConfigure(Subcommand): class StackConfigure(Subcommand):
@ -39,123 +37,10 @@ class StackConfigure(Subcommand):
) )
def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None: def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None:
import json self.parser.error(
import os """
import subprocess DEPRECATED! llama stack configure has been deprecated.
from pathlib import Path Please use llama stack run --config <path/to/run.yaml> instead.
Please see example run.yaml in /distributions folder.
import pkg_resources """
import yaml
from termcolor import cprint
from llama_stack.distribution.build import ImageType
from llama_stack.distribution.utils.exec import run_with_pty
docker_image = None
build_config_file = Path(args.config)
if build_config_file.exists():
with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
self._configure_llama_distribution(build_config, args.output_dir)
return
conda_dir = (
Path(os.path.expanduser("~/.conda/envs")) / f"llamastack-{args.config}"
)
output = subprocess.check_output(["bash", "-c", "conda info --json"])
conda_envs = json.loads(output.decode("utf-8"))["envs"]
for x in conda_envs:
if x.endswith(f"/llamastack-{args.config}"):
conda_dir = Path(x)
break
build_config_file = Path(conda_dir) / f"{args.config}-build.yaml"
if build_config_file.exists():
with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
cprint(f"Using {build_config_file}...", "green")
self._configure_llama_distribution(build_config, args.output_dir)
return
docker_image = args.config
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
if args.output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
script = pkg_resources.resource_filename(
"llama_stack", "distribution/configure_container.sh"
)
script_args = [script, docker_image, str(builds_dir)]
return_code = run_with_pty(script_args)
if return_code != 0:
self.parser.error(
f"Failed to configure container {docker_image} with return code {return_code}. Please run `llama stack build` first. "
)
def _configure_llama_distribution(
self,
build_config: BuildConfig,
output_dir: Optional[str] = None,
):
import json
import os
from pathlib import Path
import yaml
from termcolor import cprint
from llama_stack.distribution.configure import (
configure_api_providers,
parse_and_maybe_upgrade_config,
)
from llama_stack.distribution.utils.serialize import EnumEncoder
builds_dir = BUILDS_BASE_DIR / build_config.image_type
if output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
image_name = build_config.name.replace("::", "-")
run_config_file = builds_dir / f"{image_name}-run.yaml"
if run_config_file.exists():
cprint(
f"Configuration already exists at `{str(run_config_file)}`. Will overwrite...",
"yellow",
attrs=["bold"],
)
config_dict = yaml.safe_load(run_config_file.read_text())
config = parse_and_maybe_upgrade_config(config_dict)
else:
config = StackRunConfig(
built_at=datetime.now(),
image_name=image_name,
apis=list(build_config.distribution_spec.providers.keys()),
providers={},
)
config = configure_api_providers(config, build_config.distribution_spec)
config.docker_image = (
image_name if build_config.image_type == "docker" else None
)
config.conda_env = image_name if build_config.image_type == "conda" else None
with open(run_config_file, "w") as f:
to_write = json.loads(json.dumps(config.dict(), cls=EnumEncoder))
f.write(yaml.dump(to_write, sort_keys=False))
cprint(
f"> YAML configuration has been written to `{run_config_file}`.",
color="blue",
)
cprint(
f"You can now run `llama stack run {image_name} --port PORT`",
color="green",
) )

View file

@ -45,7 +45,6 @@ class StackRun(Subcommand):
import pkg_resources import pkg_resources
import yaml import yaml
from termcolor import cprint
from llama_stack.distribution.build import ImageType from llama_stack.distribution.build import ImageType
from llama_stack.distribution.configure import parse_and_maybe_upgrade_config from llama_stack.distribution.configure import parse_and_maybe_upgrade_config
@ -71,14 +70,12 @@ class StackRun(Subcommand):
if not config_file.exists(): if not config_file.exists():
self.parser.error( self.parser.error(
f"File {str(config_file)} does not exist. Please run `llama stack build` and `llama stack configure <name>` to generate a run.yaml file" f"File {str(config_file)} does not exist. Please run `llama stack build` to generate (and optionally edit) a run.yaml file"
) )
return return
cprint(f"Using config `{config_file}`", "green") config_dict = yaml.safe_load(config_file.read_text())
with open(config_file, "r") as f: config = parse_and_maybe_upgrade_config(config_dict)
config_dict = yaml.safe_load(config_file.read_text())
config = parse_and_maybe_upgrade_config(config_dict)
if config.docker_image: if config.docker_image:
script = pkg_resources.resource_filename( script = pkg_resources.resource_filename(

View file

@ -36,7 +36,6 @@ SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR")) REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR"))
DOCKER_BINARY=${DOCKER_BINARY:-docker} DOCKER_BINARY=${DOCKER_BINARY:-docker}
DOCKER_OPTS=${DOCKER_OPTS:-} DOCKER_OPTS=${DOCKER_OPTS:-}
REPO_CONFIGS_DIR="$REPO_DIR/tmp/configs"
TEMP_DIR=$(mktemp -d) TEMP_DIR=$(mktemp -d)
@ -115,8 +114,6 @@ ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server"]
EOF EOF
add_to_docker "ADD tmp/configs/$(basename "$build_file_path") ./llamastack-build.yaml"
printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile" printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile"
cat $TEMP_DIR/Dockerfile cat $TEMP_DIR/Dockerfile
printf "\n" printf "\n"
@ -138,7 +135,6 @@ set -x
$DOCKER_BINARY build $DOCKER_OPTS -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts $DOCKER_BINARY build $DOCKER_OPTS -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts
# clean up tmp/configs # clean up tmp/configs
rm -rf $REPO_CONFIGS_DIR
set +x set +x
echo "Success!" echo "Success!"

View file

@ -12,9 +12,14 @@ from pydantic import BaseModel, Field
@json_schema_type @json_schema_type
class TGIImplConfig(BaseModel): class TGIImplConfig(BaseModel):
url: str = Field( host: str = "localhost"
description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')", port: int = 8080
) protocol: str = "http"
@property
def url(self) -> str:
return f"{self.protocol}://{self.host}:{self.port}"
api_token: Optional[str] = Field( api_token: Optional[str] = Field(
default=None, default=None,
description="A bearer token if your TGI endpoint is protected.", description="A bearer token if your TGI endpoint is protected.",

View file

@ -12,6 +12,6 @@ from pydantic import BaseModel, Field
class PGVectorConfig(BaseModel): class PGVectorConfig(BaseModel):
host: str = Field(default="localhost") host: str = Field(default="localhost")
port: int = Field(default=5432) port: int = Field(default=5432)
db: str db: str = Field(default="postgres")
user: str user: str = Field(default="postgres")
password: str password: str = Field(default="mysecretpassword")

View file

@ -145,11 +145,12 @@ Fully-qualified name of the module to import. The module is expected to have:
class RemoteProviderConfig(BaseModel): class RemoteProviderConfig(BaseModel):
host: str = "localhost" host: str = "localhost"
port: int port: int = 0
protocol: str = "http"
@property @property
def url(self) -> str: def url(self) -> str:
return f"http://{self.host}:{self.port}" return f"{self.protocol}://{self.host}:{self.port}"
@json_schema_type @json_schema_type

View file

@ -4,10 +4,11 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from pydantic import BaseModel from pydantic import BaseModel, Field
from llama_stack.providers.utils.kvstore import KVStoreConfig from llama_stack.providers.utils.kvstore import KVStoreConfig
from llama_stack.providers.utils.kvstore.config import SqliteKVStoreConfig
class MetaReferenceAgentsImplConfig(BaseModel): class MetaReferenceAgentsImplConfig(BaseModel):
persistence_store: KVStoreConfig persistence_store: KVStoreConfig = Field(default=SqliteKVStoreConfig())

View file

@ -6,8 +6,6 @@ distribution_spec:
memory: memory:
- meta-reference - meta-reference
- remote::weaviate - remote::weaviate
- remote::chromadb
- remote::pgvector
safety: meta-reference safety: meta-reference
agents: meta-reference agents: meta-reference
telemetry: meta-reference telemetry: meta-reference