forked from phoenix-oss/llama-stack-mirror
More doc cleanup
This commit is contained in:
parent
900b0556e7
commit
c2c53d0272
6 changed files with 34 additions and 121 deletions
|
@ -1,20 +0,0 @@
|
||||||
# Developer Guide
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:hidden:
|
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
building_distro
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key Concepts
|
|
||||||
|
|
||||||
### API Provider
|
|
||||||
A Provider is what makes the API real -- they provide the actual implementation backing the API.
|
|
||||||
|
|
||||||
As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
|
|
||||||
|
|
||||||
A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
|
|
||||||
|
|
||||||
### Distribution
|
|
||||||
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
|
|
|
@ -1,15 +1,22 @@
|
||||||
# Developer Guide: Assemble a Llama Stack Distribution
|
# Build your own Distribution
|
||||||
|
|
||||||
|
|
||||||
This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
|
This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers.
|
||||||
|
|
||||||
## Step 1. Build
|
|
||||||
|
|
||||||
### Llama Stack Build Options
|
## Llama Stack Build
|
||||||
|
|
||||||
|
In order to build your own distribution, we recommend you clone the `llama-stack` repository.
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
|
git clone git@github.com:meta-llama/llama-stack.git
|
||||||
|
cd llama-stack
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
llama stack build -h
|
llama stack build -h
|
||||||
```
|
```
|
||||||
|
|
||||||
We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
|
We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
|
||||||
- `name`: the name for our distribution (e.g. `my-stack`)
|
- `name`: the name for our distribution (e.g. `my-stack`)
|
||||||
- `image_type`: our build image type (`conda | docker`)
|
- `image_type`: our build image type (`conda | docker`)
|
||||||
|
@ -240,7 +247,7 @@ After this step is successful, you should be able to find the built docker image
|
||||||
::::
|
::::
|
||||||
|
|
||||||
|
|
||||||
## Step 2. Run
|
## Running your Stack server
|
||||||
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
|
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -250,11 +257,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
|
||||||
```
|
```
|
||||||
$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
|
$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
|
||||||
|
|
||||||
Loaded model...
|
|
||||||
Serving API datasets
|
|
||||||
GET /datasets/get
|
|
||||||
GET /datasets/list
|
|
||||||
POST /datasets/register
|
|
||||||
Serving API inspect
|
Serving API inspect
|
||||||
GET /health
|
GET /health
|
||||||
GET /providers/list
|
GET /providers/list
|
||||||
|
@ -263,41 +265,7 @@ Serving API inference
|
||||||
POST /inference/chat_completion
|
POST /inference/chat_completion
|
||||||
POST /inference/completion
|
POST /inference/completion
|
||||||
POST /inference/embeddings
|
POST /inference/embeddings
|
||||||
Serving API scoring_functions
|
...
|
||||||
GET /scoring_functions/get
|
|
||||||
GET /scoring_functions/list
|
|
||||||
POST /scoring_functions/register
|
|
||||||
Serving API scoring
|
|
||||||
POST /scoring/score
|
|
||||||
POST /scoring/score_batch
|
|
||||||
Serving API memory_banks
|
|
||||||
GET /memory_banks/get
|
|
||||||
GET /memory_banks/list
|
|
||||||
POST /memory_banks/register
|
|
||||||
Serving API memory
|
|
||||||
POST /memory/insert
|
|
||||||
POST /memory/query
|
|
||||||
Serving API safety
|
|
||||||
POST /safety/run_shield
|
|
||||||
Serving API eval
|
|
||||||
POST /eval/evaluate
|
|
||||||
POST /eval/evaluate_batch
|
|
||||||
POST /eval/job/cancel
|
|
||||||
GET /eval/job/result
|
|
||||||
GET /eval/job/status
|
|
||||||
Serving API shields
|
|
||||||
GET /shields/get
|
|
||||||
GET /shields/list
|
|
||||||
POST /shields/register
|
|
||||||
Serving API datasetio
|
|
||||||
GET /datasetio/get_rows_paginated
|
|
||||||
Serving API telemetry
|
|
||||||
GET /telemetry/get_trace
|
|
||||||
POST /telemetry/log_event
|
|
||||||
Serving API models
|
|
||||||
GET /models/get
|
|
||||||
GET /models/list
|
|
||||||
POST /models/register
|
|
||||||
Serving API agents
|
Serving API agents
|
||||||
POST /agents/create
|
POST /agents/create
|
||||||
POST /agents/session/create
|
POST /agents/session/create
|
||||||
|
@ -316,8 +284,6 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit
|
||||||
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
|
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!IMPORTANT]
|
### Troubleshooting
|
||||||
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
|
|
||||||
|
|
||||||
> [!TIP]
|
If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
|
||||||
> You might need to use the flag `--disable-ipv6` to Disable IPv6 support
|
|
|
@ -1,4 +1,13 @@
|
||||||
# Starting a Llama Stack
|
# Starting a Llama Stack
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 3
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
self_hosted_distro/index
|
||||||
|
remote_hosted_distro/index
|
||||||
|
building_distro
|
||||||
|
ondevice_distro/index
|
||||||
|
```
|
||||||
|
|
||||||
As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly.
|
As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly.
|
||||||
|
|
||||||
|
@ -19,56 +28,9 @@ If so, we suggest:
|
||||||
- [distribution-ollama](self_hosted_distro/ollama)
|
- [distribution-ollama](self_hosted_distro/ollama)
|
||||||
|
|
||||||
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
|
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
|
||||||
- [distribution-together](#remote-hosted-distributions)
|
- [distribution-together](remote_hosted_distro/index)
|
||||||
- [distribution-fireworks](#remote-hosted-distributions)
|
- [distribution-fireworks](remote_hosted_distro/index)
|
||||||
|
|
||||||
- **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest:
|
- **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest:
|
||||||
- [iOS](ondevice_distro/ios_sdk)
|
- [iOS](ondevice_distro/ios_sdk)
|
||||||
- [Android](ondevice_distro/android_sdk) (coming soon)
|
- Android (coming soon)
|
||||||
|
|
||||||
|
|
||||||
## Remote-Hosted Distributions
|
|
||||||
|
|
||||||
Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.
|
|
||||||
|
|
||||||
| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
|
|
||||||
|-------------|----------|-----------|---------|---------|---------|------------|
|
|
||||||
| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference |
|
|
||||||
| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference |
|
|
||||||
|
|
||||||
You can use `llama-stack-client` to interact with these endpoints. For example, to list the available models served by the Fireworks endpoint:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
$ pip install llama-stack-client
|
|
||||||
$ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.ai
|
|
||||||
$ llama-stack-client models list
|
|
||||||
```
|
|
||||||
|
|
||||||
## On-Device Distributions
|
|
||||||
|
|
||||||
On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.
|
|
||||||
|
|
||||||
|
|
||||||
## Building Your Own Distribution
|
|
||||||
|
|
||||||
<TODO> talk about llama stack build --image-type conda, etc.
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
```bash
|
|
||||||
$ git clone git@github.com:meta-llama/llama-stack.git
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Troubleshooting
|
|
||||||
|
|
||||||
- If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
|
|
||||||
- Use `--port <PORT>` flag to use a different port number. For docker run, update the `-p <PORT>:<PORT>` flag.
|
|
||||||
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 3
|
|
||||||
|
|
||||||
remote_hosted_distro/index
|
|
||||||
ondevice_distro/index
|
|
||||||
```
|
|
||||||
|
|
|
@ -1,6 +1,12 @@
|
||||||
|
# On-Device Distributions
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
:hidden:
|
||||||
|
|
||||||
ios_sdk
|
ios_sdk
|
||||||
```
|
```
|
||||||
|
|
||||||
|
On device distributions are Llama Stack distributions that run locally on your iOS / Android device.
|
||||||
|
|
||||||
|
Currently, we only support the [iOS SDK](ios_sdk); support for Android is coming soon.
|
||||||
|
|
|
@ -96,5 +96,4 @@ getting_started/index
|
||||||
concepts/index
|
concepts/index
|
||||||
distributions/index
|
distributions/index
|
||||||
contributing/index
|
contributing/index
|
||||||
distribution_dev/index
|
|
||||||
```
|
```
|
||||||
|
|
|
@ -29,7 +29,7 @@ You have two ways to install Llama Stack:
|
||||||
## `llama` subcommands
|
## `llama` subcommands
|
||||||
1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
|
1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
|
||||||
2. `model`: Lists available models and their properties.
|
2. `model`: Lists available models and their properties.
|
||||||
3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distribution_dev/building_distro.md).
|
3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distributions/building_distro).
|
||||||
|
|
||||||
### Sample Usage
|
### Sample Usage
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue