Merge branch 'main' into openapi_docs

This commit is contained in:
Xi Yan 2024-11-22 17:26:30 -08:00 committed by GitHub
commit 30d7ae8f19
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
29 changed files with 1467 additions and 2438 deletions

View file

@ -1,8 +1,7 @@
# Developer Guide: Adding a New API Provider
# Adding a New API Provider
This guide contains references to walk you through adding a new API provider.
### Adding a new API provider
1. First, decide which API your provider falls into (e.g. Inference, Safety, Agents, Memory).
2. Decide whether your provider is a remote provider, or inline implmentation. A remote provider is a provider that makes a remote request to an service. An inline provider is a provider where implementation is executed locally. Checkout the examples, and follow the structure to add your own API provider. Please find the following code pointers:
@ -12,7 +11,7 @@ This guide contains references to walk you through adding a new API provider.
3. [Build a Llama Stack distribution](https://llama-stack.readthedocs.io/en/latest/distribution_dev/building_distro.html) with your API provider.
4. Test your code!
### Testing your newly added API providers
## Testing your newly added API providers
1. Start with an _integration test_ for your provider. That means we will instantiate the real provider, pass it real configuration and if it is a remote service, we will actually hit the remote service. We **strongly** discourage mocking for these tests at the provider level. Llama Stack is first and foremost about integration so we need to make sure stuff works end-to-end. See [llama_stack/providers/tests/inference/test_inference.py](../llama_stack/providers/tests/inference/test_inference.py) for an example.
@ -22,5 +21,6 @@ This guide contains references to walk you through adding a new API provider.
You can find more complex client scripts [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) repo. Note down which scripts works and do not work with your distribution.
### Submit your PR
## Submit your PR
After you have fully tested your newly added API provider, submit a PR with the attached test plan. You must have a Test Plan in the summary section of your PR.

View file

@ -1,20 +0,0 @@
# Developer Guide
```{toctree}
:hidden:
:maxdepth: 1
building_distro
```
## Key Concepts
### API Provider
A Provider is what makes the API real -- they provide the actual implementation backing the API.
As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
### Distribution
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.

View file

@ -1,15 +1,22 @@
# Developer Guide: Assemble a Llama Stack Distribution
# Build your own Distribution
This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers.
## Step 1. Build
### Llama Stack Build Options
## Llama Stack Build
In order to build your own distribution, we recommend you clone the `llama-stack` repository.
```
git clone git@github.com:meta-llama/llama-stack.git
cd llama-stack
pip install -e .
llama stack build -h
```
We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
- `name`: the name for our distribution (e.g. `my-stack`)
- `image_type`: our build image type (`conda | docker`)
@ -240,7 +247,7 @@ After this step is successful, you should be able to find the built docker image
::::
## Step 2. Run
## Running your Stack server
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
```
@ -250,11 +257,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
```
$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
Loaded model...
Serving API datasets
GET /datasets/get
GET /datasets/list
POST /datasets/register
Serving API inspect
GET /health
GET /providers/list
@ -263,41 +265,7 @@ Serving API inference
POST /inference/chat_completion
POST /inference/completion
POST /inference/embeddings
Serving API scoring_functions
GET /scoring_functions/get
GET /scoring_functions/list
POST /scoring_functions/register
Serving API scoring
POST /scoring/score
POST /scoring/score_batch
Serving API memory_banks
GET /memory_banks/get
GET /memory_banks/list
POST /memory_banks/register
Serving API memory
POST /memory/insert
POST /memory/query
Serving API safety
POST /safety/run_shield
Serving API eval
POST /eval/evaluate
POST /eval/evaluate_batch
POST /eval/job/cancel
GET /eval/job/result
GET /eval/job/status
Serving API shields
GET /shields/get
GET /shields/list
POST /shields/register
Serving API datasetio
GET /datasetio/get_rows_paginated
Serving API telemetry
GET /telemetry/get_trace
POST /telemetry/log_event
Serving API models
GET /models/get
GET /models/list
POST /models/register
...
Serving API agents
POST /agents/create
POST /agents/session/create
@ -316,8 +284,6 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
```
> [!IMPORTANT]
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
### Troubleshooting
> [!TIP]
> You might need to use the flag `--disable-ipv6` to Disable IPv6 support
If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.

View file

@ -1,4 +1,13 @@
# Starting a Llama Stack
```{toctree}
:maxdepth: 3
:hidden:
self_hosted_distro/index
remote_hosted_distro/index
building_distro
ondevice_distro/index
```
As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly.
@ -19,56 +28,9 @@ If so, we suggest:
- [distribution-ollama](self_hosted_distro/ollama)
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
- [distribution-together](#remote-hosted-distributions)
- [distribution-fireworks](#remote-hosted-distributions)
- [distribution-together](remote_hosted_distro/index)
- [distribution-fireworks](remote_hosted_distro/index)
- **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest:
- [iOS](ondevice_distro/ios_sdk)
- [Android](ondevice_distro/android_sdk) (coming soon)
## Remote-Hosted Distributions
Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.
| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
|-------------|----------|-----------|---------|---------|---------|------------|
| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference |
| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference |
You can use `llama-stack-client` to interact with these endpoints. For example, to list the available models served by the Fireworks endpoint:
```bash
$ pip install llama-stack-client
$ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.ai
$ llama-stack-client models list
```
## On-Device Distributions
On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.
## Building Your Own Distribution
<TODO> talk about llama stack build --image-type conda, etc.
### Prerequisites
```bash
$ git clone git@github.com:meta-llama/llama-stack.git
```
### Troubleshooting
- If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
- Use `--port <PORT>` flag to use a different port number. For docker run, update the `-p <PORT>:<PORT>` flag.
```{toctree}
:maxdepth: 3
remote_hosted_distro/index
ondevice_distro/index
```
- Android (coming soon)

View file

@ -1,6 +1,12 @@
# On-Device Distributions
```{toctree}
:maxdepth: 1
:hidden:
ios_sdk
```
On device distributions are Llama Stack distributions that run locally on your iOS / Android device.
Currently, we only support the [iOS SDK](ios_sdk); support for Android is coming soon.

View file

@ -72,11 +72,9 @@ Llama Stack already has a number of "adapters" available for some popular Infere
- Look at [Quick Start](getting_started/index) section to get started with Llama Stack.
- Learn more about [Llama Stack Concepts](concepts/index) to understand how different components fit together.
- Check out [Zero to Hero](zero_to_hero_guide) guide to learn in details about how to build your first agent.
- Check out [Zero to Hero](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) guide to learn in details about how to build your first agent.
- See how you can use [Llama Stack Distributions](distributions/index) to get started with popular inference and other service providers.
Kutta
We also provide a number of Client side SDKs to make it easier to connect to Llama Stack server in your preferred language.
| **Language** | **Client SDK** | **Package** |
@ -97,5 +95,6 @@ concepts/index
distributions/index
contributing/index
distribution_dev/index
references/index
api_reference/index
```

View file

@ -1,8 +1,11 @@
# References
- [Llama CLI](llama_cli_reference/index) for building and running your Llama Stack server
- [Llama Stack Client CLI](llama_stack_client_cli_reference/index) for interacting with your Llama Stack server
```{toctree}
:maxdepth: 2
:hidden:
```
# llama_cli_reference/index
# llama_cli_reference/download_models
# llama_stack_client_cli_reference/index
llama_cli_reference/index
llama_stack_client_cli_reference/index
llama_cli_reference/download_models

View file

@ -1,4 +1,4 @@
# llama CLI Reference
# llama (server-side) CLI Reference
The `llama` CLI tool helps you setup and use the Llama Stack. It should be available on your path after installing the `llama-stack` package.
@ -29,7 +29,7 @@ You have two ways to install Llama Stack:
## `llama` subcommands
1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
2. `model`: Lists available models and their properties.
3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distribution_dev/building_distro.md).
3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distributions/building_distro).
### Sample Usage

View file

@ -1,6 +1,6 @@
# llama-stack-client CLI Reference
# llama (client-side) CLI Reference
You may use the `llama-stack-client` to query information about the distribution.
The `llama-stack-client` CLI allows you to query information about the distribution.
## Basic Commands