From 302fa5c4bbdc44defb42046032a8253c23de347e Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Mon, 21 Oct 2024 09:01:22 -0700 Subject: [PATCH] build/developer cookbook/new api provider --- docs/building_distro.md | 270 +++++++++++++++++++++++++++++++++++++ docs/developer_cookbook.md | 41 ++++++ docs/getting_started.md | 246 +-------------------------------- docs/new_api_provider.md | 20 +++ 4 files changed, 337 insertions(+), 240 deletions(-) create mode 100644 docs/building_distro.md create mode 100644 docs/developer_cookbook.md create mode 100644 docs/new_api_provider.md diff --git a/docs/building_distro.md b/docs/building_distro.md new file mode 100644 index 000000000..05e5c09bb --- /dev/null +++ b/docs/building_distro.md @@ -0,0 +1,270 @@ +# Building a Llama Stack Distribution + +This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution. + +## Step 1. Build +In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: +- `name`: the name for our distribution (e.g. `8b-instruct`) +- `image_type`: our build image type (`conda | docker`) +- `distribution_spec`: our distribution specs for specifying API providers + - `description`: a short description of the configurations for the distribution + - `providers`: specifies the underlying implementation for serving each API endpoint + - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment. + + +At the end of build command, we will generate `-build.yaml` file storing the build configurations. + +After this step is complete, a file named `-build.yaml` will be generated and saved at the output file path specified at the end of the command. + +#### Building from scratch +- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations. +``` +llama stack build +``` + +Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs. + +``` +> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct +> Enter the image type you want your distribution to be built with (docker or conda): conda + + Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. +> Enter the API provider for the inference API: (default=meta-reference): meta-reference +> Enter the API provider for the safety API: (default=meta-reference): meta-reference +> Enter the API provider for the agents API: (default=meta-reference): meta-reference +> Enter the API provider for the memory API: (default=meta-reference): meta-reference +> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference + + > (Optional) Enter a short description for your Llama Stack distribution: + +Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml +``` + +**Ollama (optional)** + +If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download). + + +#### Building from templates +- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers. + +The following command will allow you to see the available templates and their corresponding providers. +``` +llama stack build --list-templates +``` + +![alt text](resources/list-templates.png) + +You may then pick a template to build your distribution with providers fitted to your liking. + +``` +llama stack build --template local-tgi --name my-tgi-stack +``` + +``` +$ llama stack build --template local-tgi --name my-tgi-stack +... +... +Build spec configuration saved at ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml +You may now run `llama stack configure my-tgi-stack` or `llama stack configure ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml` +``` + +#### Building from config file +- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. + +- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. + +``` +$ cat llama_stack/distribution/templates/local-ollama-build.yaml + +name: local-ollama +distribution_spec: + description: Like local, but use ollama for running LLM inference + providers: + inference: remote::ollama + memory: meta-reference + safety: meta-reference + agents: meta-reference + telemetry: meta-reference +image_type: conda +``` + +``` +llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml +``` + +#### How to build distribution with Docker image + +> [!TIP] +> Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman. + +To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type. + +``` +llama stack build --template local --image-type docker --name docker-0 +``` + +Alternatively, you may use a config file and set `image_type` to `docker` in our `-build.yaml` file, and run `llama stack build -build.yaml`. The `-build.yaml` will be of contents like: + +``` +name: local-docker-example +distribution_spec: + description: Use code from `llama_stack` itself to serve all llama stack APIs + docker_image: null + providers: + inference: meta-reference + memory: meta-reference-faiss + safety: meta-reference + agentic_system: meta-reference + telemetry: console +image_type: docker +``` + +The following command allows you to build a Docker image with the name `` +``` +llama stack build --config -build.yaml + +Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim +WORKDIR /app +... +... +You can run it with: podman run -p 8000:8000 llamastack-docker-local +Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml +``` + + +## Step 2. Configure +After our distribution is built (either in form of docker or conda environment), we will run the following command to +``` +llama stack configure [ | | ] +``` +- For `conda` environments: would be the generated build spec saved from Step 1. +- For `docker` images downloaded from Dockerhub, you could also use as the argument. + - Run `docker images` to check list of available images on your machine. + +``` +$ llama stack configure 8b-instruct + +Configuring API: inference (meta-reference) +Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): +Enter value for quantization (optional): +Enter value for torch_seed (optional): +Enter value for max_seq_len (existing: 4096) (required): +Enter value for max_batch_size (existing: 1) (required): + +Configuring API: memory (meta-reference-faiss) + +Configuring API: safety (meta-reference) +Do you want to configure llama_guard_shield? (y/n): y +Entering sub-configuration for llama_guard_shield: +Enter value for model (default: Llama-Guard-3-1B) (required): +Enter value for excluded_categories (default: []) (required): +Enter value for disable_input_check (default: False) (required): +Enter value for disable_output_check (default: False) (required): +Do you want to configure prompt_guard_shield? (y/n): y +Entering sub-configuration for prompt_guard_shield: +Enter value for model (default: Prompt-Guard-86M) (required): + +Configuring API: agentic_system (meta-reference) +Enter value for brave_search_api_key (optional): +Enter value for bing_search_api_key (optional): +Enter value for wolfram_api_key (optional): + +Configuring API: telemetry (console) + +YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml +``` + +After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings. + +As you can see, we did basic configuration above and configured: +- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`) +- Llama Guard safety shield with model `Llama-Guard-3-1B` +- Prompt Guard safety shield with model `Prompt-Guard-86M` + +For how these configurations are stored as yaml, checkout the file printed at the end of the configuration. + +Note that all configurations as well as models are stored in `~/.llama` + + +## Step 3. Run +Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step. + +``` +llama stack run 8b-instruct +``` + +You should see the Llama Stack server start and print the APIs that it is supporting + +``` +$ llama stack run 8b-instruct + +> initializing model parallel with size 1 +> initializing ddp with size 1 +> initializing pipeline with size 1 +Loaded in 19.28 seconds +NCCL version 2.20.5+cuda12.4 +Finished model load YES READY +Serving POST /inference/batch_chat_completion +Serving POST /inference/batch_completion +Serving POST /inference/chat_completion +Serving POST /inference/completion +Serving POST /safety/run_shield +Serving POST /agentic_system/memory_bank/attach +Serving POST /agentic_system/create +Serving POST /agentic_system/session/create +Serving POST /agentic_system/turn/create +Serving POST /agentic_system/delete +Serving POST /agentic_system/session/delete +Serving POST /agentic_system/memory_bank/detach +Serving POST /agentic_system/session/get +Serving POST /agentic_system/step/get +Serving POST /agentic_system/turn/get +Listening on :::5000 +INFO: Started server process [453333] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) +``` + +> [!NOTE] +> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`. + +> [!IMPORTANT] +> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. + +> [!TIP] +> You might need to use the flag `--disable-ipv6` to Disable IPv6 support + +This server is running a Llama model locally. + +## Step 4. Test with Client +Once the server is setup, we can test it with a client to see the example outputs. +``` +cd /path/to/llama-stack +conda activate # any environment containing the llama-stack pip package will work + +python -m llama_stack.apis.inference.client localhost 5000 +``` + +This will run the chat completion client and query the distribution’s /inference/chat_completion API. + +Here is an example output: +``` +User>hello world, write me a 2 sentence poem about the moon +Assistant> Here's a 2-sentence poem about the moon: + +The moon glows softly in the midnight sky, +A beacon of wonder, as it passes by. +``` + +Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by: + +``` +python -m llama_stack.apis.safety.client localhost 5000 +``` + + +Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. + +You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. diff --git a/docs/developer_cookbook.md b/docs/developer_cookbook.md new file mode 100644 index 000000000..206a05f28 --- /dev/null +++ b/docs/developer_cookbook.md @@ -0,0 +1,41 @@ +# Llama Stack Developer Cookbook + +Based on your developer needs, below are references to guides to help you get started. + +### Hosted Llama Stack Endpoint +* Developer Need: I want to connect to a Llama Stack endpoint to build my applications. +* Effort: 1min +* Guide: + - Checkout our [DeepLearning course](https://www.deeplearning.ai/short-courses/introducing-multimodal-llama-3-2) on building with Llama Stack apps on pre-hosted Llama Stack endpoint. + + +### Local meta-reference Llama Stack Server +* Developer Need: I want to start a local Llama Stack server with my GPU using meta-reference implementations. +* Effort: 5min +* Guide: + - Please see our [Getting Started Guide](./getting_started.md) on starting up a meta-reference Llama Stack server. + +### Llama Stack Server with Remote Providers +* Developer need: I want a Llama Stack distribution with a remote provider. +* Effort: 10min +* Guide + - Please see our [Distributions Guide](../distributions/) on starting up distributions with remote providers. + + +### On-Device (iOS) Llama Stack +* Developer Need: I want to use Llama Stack on-Device +* Effort: 1.5hr +* Guide: + - Please see our [iOS Llama Stack SDK](../llama_stack/providers/impls/ios/inference) implementations + +### Assemble your own Llama Stack Distribution +* Developer Need: I want to assemble my own distribution with API providers to my likings +* Effort: 30min +* Guide + - Please see our [Building Distribution](./building_distro.md) guide for assembling your own Llama Stack distribution with your choice of API providers. + +### Adding a New API Provider +* Developer Need: I want to +* Effort: 3hr +* Guide + - Please see our [Adding a New API Provider](./) diff --git a/docs/getting_started.md b/docs/getting_started.md index 3eebf8bbc..599331e8b 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -23,8 +23,7 @@ $CONDA_PREFIX/bin/pip install -e . For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md). -## Quick Starting Llama Stack Server - +## Starting Up Llama Stack Server #### Starting up server via docker We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. @@ -160,245 +159,8 @@ INFO: Application startup complete. INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) ``` -## Building a Distribution -## Step 1. Build -In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: -- `name`: the name for our distribution (e.g. `8b-instruct`) -- `image_type`: our build image type (`conda | docker`) -- `distribution_spec`: our distribution specs for specifying API providers - - `description`: a short description of the configurations for the distribution - - `providers`: specifies the underlying implementation for serving each API endpoint - - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment. - - -At the end of build command, we will generate `-build.yaml` file storing the build configurations. - -After this step is complete, a file named `-build.yaml` will be generated and saved at the output file path specified at the end of the command. - -#### Building from scratch -- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations. -``` -llama stack build -``` - -Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs. - -``` -> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct -> Enter the image type you want your distribution to be built with (docker or conda): conda - - Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. -> Enter the API provider for the inference API: (default=meta-reference): meta-reference -> Enter the API provider for the safety API: (default=meta-reference): meta-reference -> Enter the API provider for the agents API: (default=meta-reference): meta-reference -> Enter the API provider for the memory API: (default=meta-reference): meta-reference -> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference - - > (Optional) Enter a short description for your Llama Stack distribution: - -Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml -``` - -**Ollama (optional)** - -If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download). - - -#### Building from templates -- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers. - -The following command will allow you to see the available templates and their corresponding providers. -``` -llama stack build --list-templates -``` - -![alt text](resources/list-templates.png) - -You may then pick a template to build your distribution with providers fitted to your liking. - -``` -llama stack build --template local-tgi --name my-tgi-stack -``` - -``` -$ llama stack build --template local-tgi --name my-tgi-stack -... -... -Build spec configuration saved at ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml -You may now run `llama stack configure my-tgi-stack` or `llama stack configure ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml` -``` - -#### Building from config file -- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. - -- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. - -``` -$ cat llama_stack/distribution/templates/local-ollama-build.yaml - -name: local-ollama -distribution_spec: - description: Like local, but use ollama for running LLM inference - providers: - inference: remote::ollama - memory: meta-reference - safety: meta-reference - agents: meta-reference - telemetry: meta-reference -image_type: conda -``` - -``` -llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml -``` - -#### How to build distribution with Docker image - -> [!TIP] -> Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman. - -To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type. - -``` -llama stack build --template local --image-type docker --name docker-0 -``` - -Alternatively, you may use a config file and set `image_type` to `docker` in our `-build.yaml` file, and run `llama stack build -build.yaml`. The `-build.yaml` will be of contents like: - -``` -name: local-docker-example -distribution_spec: - description: Use code from `llama_stack` itself to serve all llama stack APIs - docker_image: null - providers: - inference: meta-reference - memory: meta-reference-faiss - safety: meta-reference - agentic_system: meta-reference - telemetry: console -image_type: docker -``` - -The following command allows you to build a Docker image with the name `` -``` -llama stack build --config -build.yaml - -Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim -WORKDIR /app -... -... -You can run it with: podman run -p 8000:8000 llamastack-docker-local -Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml -``` - - -## Step 2. Configure -After our distribution is built (either in form of docker or conda environment), we will run the following command to -``` -llama stack configure [ | | ] -``` -- For `conda` environments: would be the generated build spec saved from Step 1. -- For `docker` images downloaded from Dockerhub, you could also use as the argument. - - Run `docker images` to check list of available images on your machine. - -``` -$ llama stack configure 8b-instruct - -Configuring API: inference (meta-reference) -Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): -Enter value for quantization (optional): -Enter value for torch_seed (optional): -Enter value for max_seq_len (existing: 4096) (required): -Enter value for max_batch_size (existing: 1) (required): - -Configuring API: memory (meta-reference-faiss) - -Configuring API: safety (meta-reference) -Do you want to configure llama_guard_shield? (y/n): y -Entering sub-configuration for llama_guard_shield: -Enter value for model (default: Llama-Guard-3-1B) (required): -Enter value for excluded_categories (default: []) (required): -Enter value for disable_input_check (default: False) (required): -Enter value for disable_output_check (default: False) (required): -Do you want to configure prompt_guard_shield? (y/n): y -Entering sub-configuration for prompt_guard_shield: -Enter value for model (default: Prompt-Guard-86M) (required): - -Configuring API: agentic_system (meta-reference) -Enter value for brave_search_api_key (optional): -Enter value for bing_search_api_key (optional): -Enter value for wolfram_api_key (optional): - -Configuring API: telemetry (console) - -YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml -``` - -After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings. - -As you can see, we did basic configuration above and configured: -- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`) -- Llama Guard safety shield with model `Llama-Guard-3-1B` -- Prompt Guard safety shield with model `Prompt-Guard-86M` - -For how these configurations are stored as yaml, checkout the file printed at the end of the configuration. - -Note that all configurations as well as models are stored in `~/.llama` - - -## Step 3. Run -Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step. - -``` -llama stack run 8b-instruct -``` - -You should see the Llama Stack server start and print the APIs that it is supporting - -``` -$ llama stack run 8b-instruct - -> initializing model parallel with size 1 -> initializing ddp with size 1 -> initializing pipeline with size 1 -Loaded in 19.28 seconds -NCCL version 2.20.5+cuda12.4 -Finished model load YES READY -Serving POST /inference/batch_chat_completion -Serving POST /inference/batch_completion -Serving POST /inference/chat_completion -Serving POST /inference/completion -Serving POST /safety/run_shield -Serving POST /agentic_system/memory_bank/attach -Serving POST /agentic_system/create -Serving POST /agentic_system/session/create -Serving POST /agentic_system/turn/create -Serving POST /agentic_system/delete -Serving POST /agentic_system/session/delete -Serving POST /agentic_system/memory_bank/detach -Serving POST /agentic_system/session/get -Serving POST /agentic_system/step/get -Serving POST /agentic_system/turn/get -Listening on :::5000 -INFO: Started server process [453333] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) -``` - -> [!NOTE] -> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`. - -> [!IMPORTANT] -> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. - -> [!TIP] -> You might need to use the flag `--disable-ipv6` to Disable IPv6 support - -This server is running a Llama model locally. - -## Step 4. Test with Client +## Testing with client Once the server is setup, we can test it with a client to see the example outputs. ``` cd /path/to/llama-stack @@ -428,3 +190,7 @@ python -m llama_stack.apis.safety.client localhost 5000 Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. + + +## Advanced Guides +Please see our [Building a LLama Stack Distribution](./building_distro.md) guide for more details on how to assemble your own Llama Stack Distribution. diff --git a/docs/new_api_provider.md b/docs/new_api_provider.md new file mode 100644 index 000000000..bfef3a6b3 --- /dev/null +++ b/docs/new_api_provider.md @@ -0,0 +1,20 @@ +# Developer Guide: Adding a New API Provider + +This guide contains references to walk you through + +### Adding a new API provider +1. First, decide which API your provider falls into (e.g. Inference, Safety, Agents, Memory). +2. Decide whether your provider is a remote provider, or inline implmentation. A remote provider is a provider that makes a remote request to an service. An inline provider is a provider where implementation is executed locally. Checkout the examples, and follow the structure to add your own API provider. Please find the following code pointers: + - [Inference Remote Adapter](../llama_stack/providers/adapters/inference/) + - [Inference Inline Provider](../llama_stack/providers/impls/) +3. [Build a Llama Stack distribution](./building_distro.md) with your API provider. +4. Test your code! + +### Testing your newly added API providers +1. Start Llama Stack server with your +2. Test with sending a client request to the server. +3. Add tests for your newly added provider. See [tests/](../tests/) for example unit tests. +4. Test the supported functionalities for your provider using our providers tests infra. See [llama_stack/providers/tests//test_](../llama_stack/providers/tests/inference/test_inference.py). + +### Submit your PR +After you have fully tested your newly added API provider, submit a PR with the attached test plan, and we will help you verify the necessary requirements.