diff --git a/docs/source/distribution_dev/index.md b/docs/source/distribution_dev/index.md deleted file mode 100644 index 8a46b70fb..000000000 --- a/docs/source/distribution_dev/index.md +++ /dev/null @@ -1,20 +0,0 @@ -# Developer Guide - -```{toctree} -:hidden: -:maxdepth: 1 - -building_distro -``` - -## Key Concepts - -### API Provider -A Provider is what makes the API real -- they provide the actual implementation backing the API. - -As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options. - -A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs. - -### Distribution -A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications. diff --git a/docs/source/distribution_dev/building_distro.md b/docs/source/distributions/building_distro.md similarity index 94% rename from docs/source/distribution_dev/building_distro.md rename to docs/source/distributions/building_distro.md index b5738d998..dbc2e7ed9 100644 --- a/docs/source/distribution_dev/building_distro.md +++ b/docs/source/distributions/building_distro.md @@ -1,15 +1,22 @@ -# Developer Guide: Assemble a Llama Stack Distribution +# Build your own Distribution -This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution. +This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. -## Step 1. Build -### Llama Stack Build Options +## Llama Stack Build + +In order to build your own distribution, we recommend you clone the `llama-stack` repository. + ``` +git clone git@github.com:meta-llama/llama-stack.git +cd llama-stack +pip install -e . + llama stack build -h ``` + We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: - `name`: the name for our distribution (e.g. `my-stack`) - `image_type`: our build image type (`conda | docker`) @@ -240,7 +247,7 @@ After this step is successful, you should be able to find the built docker image :::: -## Step 2. Run +## Running your Stack server Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step. ``` @@ -250,11 +257,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack- ``` $ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml -Loaded model... -Serving API datasets - GET /datasets/get - GET /datasets/list - POST /datasets/register Serving API inspect GET /health GET /providers/list @@ -263,41 +265,7 @@ Serving API inference POST /inference/chat_completion POST /inference/completion POST /inference/embeddings -Serving API scoring_functions - GET /scoring_functions/get - GET /scoring_functions/list - POST /scoring_functions/register -Serving API scoring - POST /scoring/score - POST /scoring/score_batch -Serving API memory_banks - GET /memory_banks/get - GET /memory_banks/list - POST /memory_banks/register -Serving API memory - POST /memory/insert - POST /memory/query -Serving API safety - POST /safety/run_shield -Serving API eval - POST /eval/evaluate - POST /eval/evaluate_batch - POST /eval/job/cancel - GET /eval/job/result - GET /eval/job/status -Serving API shields - GET /shields/get - GET /shields/list - POST /shields/register -Serving API datasetio - GET /datasetio/get_rows_paginated -Serving API telemetry - GET /telemetry/get_trace - POST /telemetry/log_event -Serving API models - GET /models/get - GET /models/list - POST /models/register +... Serving API agents POST /agents/create POST /agents/session/create @@ -316,8 +284,6 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK ``` -> [!IMPORTANT] -> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. +### Troubleshooting -> [!TIP] -> You might need to use the flag `--disable-ipv6` to Disable IPv6 support +If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue. diff --git a/docs/source/distributions/index.md b/docs/source/distributions/index.md index 3d4089b19..c80353f00 100644 --- a/docs/source/distributions/index.md +++ b/docs/source/distributions/index.md @@ -1,4 +1,13 @@ # Starting a Llama Stack +```{toctree} +:maxdepth: 3 +:hidden: + +self_hosted_distro/index +remote_hosted_distro/index +building_distro +ondevice_distro/index +``` As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly. @@ -19,56 +28,9 @@ If so, we suggest: - [distribution-ollama](self_hosted_distro/ollama) - **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest: - - [distribution-together](#remote-hosted-distributions) - - [distribution-fireworks](#remote-hosted-distributions) + - [distribution-together](remote_hosted_distro/index) + - [distribution-fireworks](remote_hosted_distro/index) - **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest: - [iOS](ondevice_distro/ios_sdk) - - [Android](ondevice_distro/android_sdk) (coming soon) - - -## Remote-Hosted Distributions - -Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to. - -| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry | -|-------------|----------|-----------|---------|---------|---------|------------| -| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference | -| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference | - -You can use `llama-stack-client` to interact with these endpoints. For example, to list the available models served by the Fireworks endpoint: - -```bash -$ pip install llama-stack-client -$ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.ai -$ llama-stack-client models list -``` - -## On-Device Distributions - -On-device distributions are Llama Stack distributions that run locally on your iOS / Android device. - - -## Building Your Own Distribution - - talk about llama stack build --image-type conda, etc. - -### Prerequisites - -```bash -$ git clone git@github.com:meta-llama/llama-stack.git -``` - - -### Troubleshooting - -- If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue. -- Use `--port ` flag to use a different port number. For docker run, update the `-p :` flag. - - -```{toctree} -:maxdepth: 3 - -remote_hosted_distro/index -ondevice_distro/index -``` + - Android (coming soon) diff --git a/docs/source/distributions/ondevice_distro/index.md b/docs/source/distributions/ondevice_distro/index.md index de1850dbd..cb2fe1959 100644 --- a/docs/source/distributions/ondevice_distro/index.md +++ b/docs/source/distributions/ondevice_distro/index.md @@ -1,6 +1,12 @@ +# On-Device Distributions ```{toctree} :maxdepth: 1 +:hidden: ios_sdk ``` + +On device distributions are Llama Stack distributions that run locally on your iOS / Android device. + +Currently, we only support the [iOS SDK](ios_sdk); support for Android is coming soon. diff --git a/docs/source/index.md b/docs/source/index.md index cf58537bc..9cabc375c 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -96,5 +96,4 @@ getting_started/index concepts/index distributions/index contributing/index -distribution_dev/index ``` diff --git a/docs/source/references/llama_cli_reference/index.md b/docs/source/references/llama_cli_reference/index.md index aa2ecebf7..c751a4987 100644 --- a/docs/source/references/llama_cli_reference/index.md +++ b/docs/source/references/llama_cli_reference/index.md @@ -29,7 +29,7 @@ You have two ways to install Llama Stack: ## `llama` subcommands 1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face. 2. `model`: Lists available models and their properties. -3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distribution_dev/building_distro.md). +3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distributions/building_distro). ### Sample Usage