diff --git a/README.md b/README.md index 238475840..fef556a73 100644 --- a/README.md +++ b/README.md @@ -90,10 +90,10 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please * [CLI reference](docs/cli_reference.md) * Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. * [Getting Started](docs/getting_started.md) - * Guide to build and run a Llama Stack server. + * Guide to start a Llama Stack server. + * [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs * [Contributing](CONTRIBUTING.md) - ## Llama Stack Client SDK | **Language** | **Client SDK** | **Package** | @@ -104,3 +104,5 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please | Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. + +You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. diff --git a/docs/getting_started.md b/docs/getting_started.md index 6c8c902c0..3eebf8bbc 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -1,45 +1,9 @@ -# llama-stack - -[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-stack)](https://pypi.org/project/llama-stack/) -[![Discord](https://img.shields.io/discord/1257833999603335178)](https://discord.gg/llama-stack) - -This repository contains the specifications and implementations of the APIs which are part of the Llama Stack. - -The Llama Stack defines and standardizes the building blocks needed to bring generative AI applications to market. These blocks span the entire development lifecycle: from model training and fine-tuning, through product evaluation, to invoking AI agents in production. Beyond definition, we're developing open-source versions and partnering with cloud providers, ensuring developers can assemble AI solutions using consistent, interlocking pieces across platforms. The ultimate goal is to accelerate innovation in the AI space. - -The Stack APIs are rapidly improving, but still very much work in progress and we invite feedback as well as direct contributions. - - -## APIs - -The Llama Stack consists of the following set of APIs: - -- Inference -- Safety -- Memory -- Agentic System -- Evaluation -- Post Training -- Synthetic Data Generation -- Reward Scoring - -Each of the APIs themselves is a collection of REST endpoints. - -## API Providers - -A Provider is what makes the API real -- they provide the actual implementation backing the API. - -As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options. - -A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs. - - -## Llama Stack Distribution - -A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications. +# Getting Started with Llama Stack +This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](../README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack. ## Installation +The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack` @@ -57,26 +21,40 @@ cd llama-stack $CONDA_PREFIX/bin/pip install -e . ``` -# Getting Started +For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md). -The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. +## Quick Starting Llama Stack Server -This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes! +#### Starting up server via docker -You may also checkout this [notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for trying out out demo scripts. +We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. +- [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general) + - This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints. +- [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general) + - This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU. -## Quick Cheatsheet - -#### Via docker +> [!NOTE] +> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container. ``` -docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack-local-gpu +export LLAMA_CHECKPOINT_DIR=~/.llama ``` > [!NOTE] > `~/.llama` should be the path containing downloaded weights of Llama models. -#### Via conda +To download and start running a pre-built docker container, you may use the following commands: + +``` +docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu +``` + +> [!TIP] +> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../llama_stack/distribution/docker/README.md) to help you get started. + +#### Build->Configure->Run Llama Stack server via conda +You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack. + **`llama stack build`** - You'll be prompted to enter build information interactively. ``` @@ -182,6 +160,7 @@ INFO: Application startup complete. INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) ``` +## Building a Distribution ## Step 1. Build In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: @@ -445,4 +424,7 @@ Similarly you can test safety (if you configured llama-guard and/or prompt-guard python -m llama_stack.apis.safety.client localhost 5000 ``` + +Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. + You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. diff --git a/llama_stack/cli/download.py b/llama_stack/cli/download.py index a1495cbf0..4a0f88aaa 100644 --- a/llama_stack/cli/download.py +++ b/llama_stack/cli/download.py @@ -152,27 +152,29 @@ def run_download_cmd(args: argparse.Namespace, parser: argparse.ArgumentParser): parser.error("Please provide a model id") return - prompt_guard = prompt_guard_model_sku() - if args.model_id == prompt_guard.model_id: - model = prompt_guard - info = prompt_guard_download_info() - else: - model = resolve_model(args.model_id) - if model is None: - parser.error(f"Model {args.model_id} not found") - return - info = llama_meta_net_info(model) + # Check if model_id is a comma-separated list + model_ids = [model_id.strip() for model_id in args.model_id.split(",")] - if args.source == "huggingface": - _hf_download(model, args.hf_token, args.ignore_patterns, parser) - else: - meta_url = args.meta_url - if not meta_url: - meta_url = input( - "Please provide the signed URL you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): " + prompt_guard = prompt_guard_model_sku() + for model_id in model_ids: + if model_id == prompt_guard.model_id: + model = prompt_guard + info = prompt_guard_download_info() + else: + model = resolve_model(model_id) + if model is None: + parser.error(f"Model {model_id} not found") + continue + info = llama_meta_net_info(model) + + if args.source == "huggingface": + _hf_download(model, args.hf_token, args.ignore_patterns, parser) + else: + meta_url = args.meta_url or input( + f"Please provide the signed URL for model {model_id} you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): " ) - assert meta_url is not None and "llamameta.net" in meta_url - _meta_download(model, meta_url, info) + assert "llamameta.net" in meta_url + _meta_download(model, meta_url, info) class ModelEntry(BaseModel): diff --git a/llama_stack/distribution/templates/local-bedrock-conda-example-build.yaml b/llama_stack/distribution/templates/build_configs/local-bedrock-conda-example-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-bedrock-conda-example-build.yaml rename to llama_stack/distribution/templates/build_configs/local-bedrock-conda-example-build.yaml diff --git a/llama_stack/distribution/templates/docker/llamastack-local-cpu/build.yaml b/llama_stack/distribution/templates/build_configs/local-cpu-docker-build.yaml similarity index 100% rename from llama_stack/distribution/templates/docker/llamastack-local-cpu/build.yaml rename to llama_stack/distribution/templates/build_configs/local-cpu-docker-build.yaml diff --git a/llama_stack/distribution/templates/local-databricks-build.yaml b/llama_stack/distribution/templates/build_configs/local-databricks-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-databricks-build.yaml rename to llama_stack/distribution/templates/build_configs/local-databricks-build.yaml diff --git a/llama_stack/distribution/templates/local-fireworks-build.yaml b/llama_stack/distribution/templates/build_configs/local-fireworks-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-fireworks-build.yaml rename to llama_stack/distribution/templates/build_configs/local-fireworks-build.yaml diff --git a/llama_stack/distribution/templates/local-build.yaml b/llama_stack/distribution/templates/build_configs/local-gpu-docker-build.yaml similarity index 87% rename from llama_stack/distribution/templates/local-build.yaml rename to llama_stack/distribution/templates/build_configs/local-gpu-docker-build.yaml index f10461256..01af1021e 100644 --- a/llama_stack/distribution/templates/local-build.yaml +++ b/llama_stack/distribution/templates/build_configs/local-gpu-docker-build.yaml @@ -1,4 +1,4 @@ -name: local +name: local-gpu distribution_spec: description: Use code from `llama_stack` itself to serve all llama stack APIs providers: @@ -7,4 +7,4 @@ distribution_spec: safety: meta-reference agents: meta-reference telemetry: meta-reference -image_type: conda +image_type: docker diff --git a/llama_stack/distribution/templates/local-hf-endpoint-build.yaml b/llama_stack/distribution/templates/build_configs/local-hf-endpoint-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-hf-endpoint-build.yaml rename to llama_stack/distribution/templates/build_configs/local-hf-endpoint-build.yaml diff --git a/llama_stack/distribution/templates/local-hf-serverless-build.yaml b/llama_stack/distribution/templates/build_configs/local-hf-serverless-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-hf-serverless-build.yaml rename to llama_stack/distribution/templates/build_configs/local-hf-serverless-build.yaml diff --git a/llama_stack/distribution/templates/local-ollama-build.yaml b/llama_stack/distribution/templates/build_configs/local-ollama-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-ollama-build.yaml rename to llama_stack/distribution/templates/build_configs/local-ollama-build.yaml diff --git a/llama_stack/distribution/templates/local-tgi-build.yaml b/llama_stack/distribution/templates/build_configs/local-tgi-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-tgi-build.yaml rename to llama_stack/distribution/templates/build_configs/local-tgi-build.yaml diff --git a/llama_stack/distribution/templates/docker/llamastack-local-gpu/build.yaml b/llama_stack/distribution/templates/build_configs/local-tgi-chroma-docker-build.yaml similarity index 53% rename from llama_stack/distribution/templates/docker/llamastack-local-gpu/build.yaml rename to llama_stack/distribution/templates/build_configs/local-tgi-chroma-docker-build.yaml index 11d1ac01c..30715c551 100644 --- a/llama_stack/distribution/templates/docker/llamastack-local-gpu/build.yaml +++ b/llama_stack/distribution/templates/build_configs/local-tgi-chroma-docker-build.yaml @@ -1,11 +1,11 @@ -name: local-gpu +name: local-tgi-chroma distribution_spec: - description: local meta reference + description: remote tgi inference + chromadb memory docker_image: null providers: - inference: meta-reference + inference: remote::tgi safety: meta-reference agents: meta-reference - memory: meta-reference + memory: remote::chromadb telemetry: meta-reference image_type: docker diff --git a/llama_stack/distribution/templates/local-together-build.yaml b/llama_stack/distribution/templates/build_configs/local-together-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-together-build.yaml rename to llama_stack/distribution/templates/build_configs/local-together-build.yaml diff --git a/llama_stack/distribution/templates/local-vllm-build.yaml b/llama_stack/distribution/templates/build_configs/local-vllm-build.yaml similarity index 100% rename from llama_stack/distribution/templates/local-vllm-build.yaml rename to llama_stack/distribution/templates/build_configs/local-vllm-build.yaml diff --git a/llama_stack/distribution/templates/docker/llamastack-local-gpu/run.yaml b/llama_stack/distribution/templates/run_configs/local-run.yaml similarity index 71% rename from llama_stack/distribution/templates/docker/llamastack-local-gpu/run.yaml rename to llama_stack/distribution/templates/run_configs/local-run.yaml index 8fb02711b..7abf2b4dc 100644 --- a/llama_stack/distribution/templates/docker/llamastack-local-gpu/run.yaml +++ b/llama_stack/distribution/templates/run_configs/local-run.yaml @@ -1,16 +1,16 @@ version: '2' -built_at: '2024-10-08T17:42:33.690666' -image_name: local-gpu -docker_image: local-gpu -conda_env: null +built_at: '2024-10-08T17:40:45.325529' +image_name: local +docker_image: null +conda_env: local apis: -- memory -- inference -- agents - shields -- safety +- agents - models +- memory - memory_banks +- inference +- safety providers: inference: - provider_id: meta-reference @@ -25,8 +25,13 @@ providers: - provider_id: meta-reference provider_type: meta-reference config: - llama_guard_shield: null - prompt_guard_shield: null + llama_guard_shield: + model: Llama-Guard-3-1B + excluded_categories: [] + disable_input_check: false + disable_output_check: false + prompt_guard_shield: + model: Prompt-Guard-86M memory: - provider_id: meta-reference provider_type: meta-reference diff --git a/llama_stack/distribution/templates/docker/llamastack-local-cpu/run.yaml b/llama_stack/distribution/templates/run_configs/local-tgi-run.yaml similarity index 60% rename from llama_stack/distribution/templates/docker/llamastack-local-cpu/run.yaml rename to llama_stack/distribution/templates/run_configs/local-tgi-run.yaml index 6b107d972..ec3af742c 100644 --- a/llama_stack/distribution/templates/docker/llamastack-local-cpu/run.yaml +++ b/llama_stack/distribution/templates/run_configs/local-tgi-run.yaml @@ -1,29 +1,33 @@ version: '2' -built_at: '2024-10-08T17:42:07.505267' -image_name: local-cpu -docker_image: local-cpu -conda_env: null +built_at: '2024-10-08T17:40:45.325529' +image_name: local +docker_image: null +conda_env: local apis: +- shields - agents -- inference - models - memory -- safety -- shields - memory_banks +- inference +- safety providers: inference: - - provider_id: remote::ollama - provider_type: remote::ollama + - provider_id: tgi0 + provider_type: remote::tgi config: - host: localhost - port: 6000 + url: http://127.0.0.1:5009 safety: - provider_id: meta-reference provider_type: meta-reference config: - llama_guard_shield: null - prompt_guard_shield: null + llama_guard_shield: + model: Llama-Guard-3-1B + excluded_categories: [] + disable_input_check: false + disable_output_check: false + prompt_guard_shield: + model: Prompt-Guard-86M memory: - provider_id: meta-reference provider_type: meta-reference