From e0548fe5b9178ad3292f6add5110700fad50257a Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Tue, 5 Nov 2024 11:54:44 -0800 Subject: [PATCH] fix docs, kill configure --- .../distribution_dev/building_distro.md | 303 ++++++++++-------- 1 file changed, 162 insertions(+), 141 deletions(-) diff --git a/docs/source/distribution_dev/building_distro.md b/docs/source/distribution_dev/building_distro.md index f3a9bd7bb..976474ebe 100644 --- a/docs/source/distribution_dev/building_distro.md +++ b/docs/source/distribution_dev/building_distro.md @@ -1,53 +1,72 @@ # Developer Guide: Assemble a Llama Stack Distribution -> NOTE: This doc may be out-of-date. -This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution. +This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution. ## Step 1. Build -In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: -- `name`: the name for our distribution (e.g. `8b-instruct`) + +``` +llama stack build -h + +usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates | --no-list-templates] [--image-type {conda,docker}] + +Build a Llama stack container + +options: + -h, --help show this help message and exit + --config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack/distribution/example_configs. If this argument is not provided, you will be prompted to enter information interactively + --template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates + --list-templates, --no-list-templates + Show the available templates for building a Llama Stack distribution + --image-type {conda,docker} + Image Type to use for the build. This can be either conda or docker. If not specified, will use the image type from the template config. + +``` +We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: +- `name`: the name for our distribution (e.g. `my-stack`) - `image_type`: our build image type (`conda | docker`) - `distribution_spec`: our distribution specs for specifying API providers - `description`: a short description of the configurations for the distribution - `providers`: specifies the underlying implementation for serving each API endpoint - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment. +After this step is complete, a file named `-build.yaml` and template file `-run.yaml` will be generated and saved at the output file path specified at the end of the command. -At the end of build command, we will generate `-build.yaml` file storing the build configurations. -After this step is complete, a file named `-build.yaml` will be generated and saved at the output file path specified at the end of the command. +You have 3 options for building your distribution: +1.1 Building from scratch +1.2. Building from a template +1.3. Building from a pre-existing build config file -#### Building from scratch + +### 1.1 Building from scratch - For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations. ``` llama stack build + +> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack +> Enter the image type you want your Llama Stack to be built as (docker or conda): conda + +Llama Stack is composed of several APIs working together. Let's select +the provider types (implementations) you want to use for these APIs. + +Tip: use to see options for the providers. + +> Enter provider for API inference: meta-reference +> Enter provider for API safety: meta-reference +> Enter provider for API agents: meta-reference +> Enter provider for API memory: meta-reference +> Enter provider for API datasetio: meta-reference +> Enter provider for API scoring: meta-reference +> Enter provider for API eval: meta-reference +> Enter provider for API telemetry: meta-reference + + > (Optional) Enter a short description for your Llama Stack: + +You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml` ``` -Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs. - -``` -> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct -> Enter the image type you want your distribution to be built with (docker or conda): conda - - Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. -> Enter the API provider for the inference API: (default=meta-reference): meta-reference -> Enter the API provider for the safety API: (default=meta-reference): meta-reference -> Enter the API provider for the agents API: (default=meta-reference): meta-reference -> Enter the API provider for the memory API: (default=meta-reference): meta-reference -> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference - - > (Optional) Enter a short description for your Llama Stack distribution: - -Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml -``` - -**Ollama (optional)** - -If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download). - - -#### Building from templates +### 1.2 Building from a template - To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers. The following command will allow you to see the available templates and their corresponding providers. @@ -59,18 +78,21 @@ llama stack build --list-templates +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ | Template Name | Providers | Description | +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| bedrock | { | Use Amazon Bedrock APIs. | -| | "inference": "remote::bedrock", | | -| | "memory": "meta-reference", | | +| hf-serverless | { | Like local, but use Hugging Face Inference API (serverless) for running LLM | +| | "inference": "remote::hf::serverless", | inference. | +| | "memory": "meta-reference", | See https://hf.co/docs/api-inference. | | | "safety": "meta-reference", | | | | "agents": "meta-reference", | | | | "telemetry": "meta-reference" | | | | } | | +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| databricks | { | Use Databricks for running LLM inference | -| | "inference": "remote::databricks", | | -| | "memory": "meta-reference", | | -| | "safety": "meta-reference", | | +| together | { | Use Together.ai for running LLM inference | +| | "inference": "remote::together", | | +| | "memory": [ | | +| | "meta-reference", | | +| | "remote::weaviate" | | +| | ], | | +| | "safety": "remote::together", | | | | "agents": "meta-reference", | | | | "telemetry": "meta-reference" | | | | } | | @@ -88,17 +110,37 @@ llama stack build --list-templates | | "telemetry": "meta-reference" | | | | } | | +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| hf-endpoint | { | Like local, but use Hugging Face Inference Endpoints for running LLM inference. | -| | "inference": "remote::hf::endpoint", | See https://hf.co/docs/api-endpoints. | +| databricks | { | Use Databricks for running LLM inference | +| | "inference": "remote::databricks", | | | | "memory": "meta-reference", | | | | "safety": "meta-reference", | | | | "agents": "meta-reference", | | | | "telemetry": "meta-reference" | | | | } | | +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| hf-serverless | { | Like local, but use Hugging Face Inference API (serverless) for running LLM | -| | "inference": "remote::hf::serverless", | inference. | -| | "memory": "meta-reference", | See https://hf.co/docs/api-inference. | +| vllm | { | Like local, but use vLLM for running LLM inference | +| | "inference": "vllm", | | +| | "memory": "meta-reference", | | +| | "safety": "meta-reference", | | +| | "agents": "meta-reference", | | +| | "telemetry": "meta-reference" | | +| | } | | ++------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +| tgi | { | Use TGI for running LLM inference | +| | "inference": "remote::tgi", | | +| | "memory": [ | | +| | "meta-reference", | | +| | "remote::chromadb", | | +| | "remote::pgvector" | | +| | ], | | +| | "safety": "meta-reference", | | +| | "agents": "meta-reference", | | +| | "telemetry": "meta-reference" | | +| | } | | ++------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ +| bedrock | { | Use Amazon Bedrock APIs. | +| | "inference": "remote::bedrock", | | +| | "memory": "meta-reference", | | | | "safety": "meta-reference", | | | | "agents": "meta-reference", | | | | "telemetry": "meta-reference" | | @@ -140,31 +182,8 @@ llama stack build --list-templates | | "telemetry": "meta-reference" | | | | } | | +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| tgi | { | Use TGI for running LLM inference | -| | "inference": "remote::tgi", | | -| | "memory": [ | | -| | "meta-reference", | | -| | "remote::chromadb", | | -| | "remote::pgvector" | | -| | ], | | -| | "safety": "meta-reference", | | -| | "agents": "meta-reference", | | -| | "telemetry": "meta-reference" | | -| | } | | -+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| together | { | Use Together.ai for running LLM inference | -| | "inference": "remote::together", | | -| | "memory": [ | | -| | "meta-reference", | | -| | "remote::weaviate" | | -| | ], | | -| | "safety": "remote::together", | | -| | "agents": "meta-reference", | | -| | "telemetry": "meta-reference" | | -| | } | | -+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+ -| vllm | { | Like local, but use vLLM for running LLM inference | -| | "inference": "vllm", | | +| hf-endpoint | { | Like local, but use Hugging Face Inference Endpoints for running LLM inference. | +| | "inference": "remote::hf::endpoint", | See https://hf.co/docs/api-endpoints. | | | "memory": "meta-reference", | | | | "safety": "meta-reference", | | | | "agents": "meta-reference", | | @@ -175,6 +194,7 @@ llama stack build --list-templates You may then pick a template to build your distribution with providers fitted to your liking. +For example, to build a distribution with TGI as the inference provider, you can run: ``` llama stack build --template tgi ``` @@ -182,14 +202,13 @@ llama stack build --template tgi ``` $ llama stack build --template tgi ... -... -You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml` +You can now edit ~/.llama/distributions/llamastack-tgi/tgi-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-tgi/tgi-run.yaml` ``` -#### Building from config file +### 1.3 Building from a pre-existing build config file - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. -- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. +- The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`. ``` $ cat llama_stack/templates/ollama/build.yaml @@ -210,104 +229,106 @@ image_type: conda llama stack build --config llama_stack/templates/ollama/build.yaml ``` -#### How to build distribution with Docker image - +### How to build distribution with Docker image > [!TIP] > Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman. To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type. ``` -llama stack build --template local --image-type docker +llama stack build --template ollama --image-type docker ``` -Alternatively, you may use a config file and set `image_type` to `docker` in our `-build.yaml` file, and run `llama stack build -build.yaml`. The `-build.yaml` will be of contents like: - ``` -name: local-docker-example -distribution_spec: - description: Use code from `llama_stack` itself to serve all llama stack APIs - docker_image: null - providers: - inference: meta-reference - memory: meta-reference-faiss - safety: meta-reference - agentic_system: meta-reference - telemetry: console -image_type: docker -``` - -The following command allows you to build a Docker image with the name `` -``` -llama stack build --config -build.yaml - -Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim -WORKDIR /app +$ llama stack build --template ollama --image-type docker ... +Dockerfile created successfully in /tmp/tmp.viA3a3Rdsg/DockerfileFROM python:3.10-slim ... -You can run it with: podman run -p 8000:8000 llamastack-docker-local -Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml + +You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml` ``` -After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings. - -As you can see, we did basic configuration above and configured: -- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`) -- Llama Guard safety shield with model `Llama-Guard-3-1B` -- Prompt Guard safety shield with model `Prompt-Guard-86M` - -For how these configurations are stored as yaml, checkout the file printed at the end of the configuration. - -Note that all configurations as well as models are stored in `~/.llama` +After this step is successful, you should be able to find the built docker image and test it with `llama stack run `. ## Step 2. Run Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step. ``` -llama stack run 8b-instruct +llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml ``` -You should see the Llama Stack server start and print the APIs that it is supporting - ``` -$ llama stack run 8b-instruct +$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml -> initializing model parallel with size 1 -> initializing ddp with size 1 -> initializing pipeline with size 1 -Loaded in 19.28 seconds -NCCL version 2.20.5+cuda12.4 -Finished model load YES READY -Serving POST /inference/batch_chat_completion -Serving POST /inference/batch_completion -Serving POST /inference/chat_completion -Serving POST /inference/completion -Serving POST /safety/run_shield -Serving POST /agentic_system/memory_bank/attach -Serving POST /agentic_system/create -Serving POST /agentic_system/session/create -Serving POST /agentic_system/turn/create -Serving POST /agentic_system/delete -Serving POST /agentic_system/session/delete -Serving POST /agentic_system/memory_bank/detach -Serving POST /agentic_system/session/get -Serving POST /agentic_system/step/get -Serving POST /agentic_system/turn/get -Listening on :::5000 -INFO: Started server process [453333] +Loaded model... +Serving API datasets + GET /datasets/get + GET /datasets/list + POST /datasets/register +Serving API inspect + GET /health + GET /providers/list + GET /routes/list +Serving API inference + POST /inference/chat_completion + POST /inference/completion + POST /inference/embeddings +Serving API scoring_functions + GET /scoring_functions/get + GET /scoring_functions/list + POST /scoring_functions/register +Serving API scoring + POST /scoring/score + POST /scoring/score_batch +Serving API memory_banks + GET /memory_banks/get + GET /memory_banks/list + POST /memory_banks/register +Serving API memory + POST /memory/insert + POST /memory/query +Serving API safety + POST /safety/run_shield +Serving API eval + POST /eval/evaluate + POST /eval/evaluate_batch + POST /eval/job/cancel + GET /eval/job/result + GET /eval/job/status +Serving API shields + GET /shields/get + GET /shields/list + POST /shields/register +Serving API datasetio + GET /datasetio/get_rows_paginated +Serving API telemetry + GET /telemetry/get_trace + POST /telemetry/log_event +Serving API models + GET /models/get + GET /models/list + POST /models/register +Serving API agents + POST /agents/create + POST /agents/session/create + POST /agents/turn/create + POST /agents/delete + POST /agents/session/delete + POST /agents/session/get + POST /agents/step/get + POST /agents/turn/get + +Listening on ['::', '0.0.0.0']:5000 +INFO: Started server process [2935911] INFO: Waiting for application startup. INFO: Application startup complete. -INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) +INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) +INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK ``` -> [!NOTE] -> Configuration is in `~/.llama/builds/local/conda/tgi-run.yaml`. Feel free to increase `max_seq_len`. - > [!IMPORTANT] > The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. > [!TIP] > You might need to use the flag `--disable-ipv6` to Disable IPv6 support - -This server is running a Llama model locally.