forked from phoenix-oss/llama-stack-mirror
Auto-generate distro yamls + docs (#468)
# What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.
This commit is contained in:
parent
0784284ab5
commit
2a31163178
88 changed files with 3008 additions and 852 deletions
|
@ -1,62 +1,67 @@
|
|||
# Together Distribution
|
||||
|
||||
### Connect to a Llama Stack Together Endpoint
|
||||
- You may connect to a hosted endpoint `https://llama-stack.together.ai`, serving a Llama Stack distribution
|
||||
# Fireworks Distribution
|
||||
|
||||
The `llamastack/distribution-together` distribution consists of the following provider configurations.
|
||||
|
||||
|
||||
| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** |
|
||||
|----------------- |--------------- |---------------- |-------------------------------------------------- |---------------- |---------------- |
|
||||
| **Provider(s)** | remote::together | meta-reference | meta-reference, remote::weaviate | meta-reference | meta-reference |
|
||||
| API | Provider(s) |
|
||||
|-----|-------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| inference | `remote::together` |
|
||||
| memory | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
| safety | `inline::llama-guard` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
|
||||
|
||||
### Docker: Start the Distribution (Single Node CPU)
|
||||
### Environment Variables
|
||||
|
||||
> [!NOTE]
|
||||
> This assumes you have an hosted endpoint at Together with API Key.
|
||||
The following environment variables can be configured:
|
||||
|
||||
```
|
||||
$ cd distributions/together && docker compose up
|
||||
- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)
|
||||
- `TOGETHER_API_KEY`: Together.AI API Key (default: ``)
|
||||
|
||||
### Models
|
||||
|
||||
The following models are available by default:
|
||||
|
||||
- `meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`
|
||||
- `meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`
|
||||
- `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`
|
||||
- `meta-llama/Llama-3.2-3B-Instruct-Turbo`
|
||||
- `meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo`
|
||||
- `meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo`
|
||||
- `meta-llama/Meta-Llama-Guard-3-8B`
|
||||
- `meta-llama/Llama-Guard-3-11B-Vision-Turbo`
|
||||
|
||||
|
||||
### Prerequisite: API Keys
|
||||
|
||||
Make sure you have access to a Together API Key. You can get one by visiting [together.xyz](https://together.xyz/).
|
||||
|
||||
|
||||
## Running Llama Stack with Together
|
||||
|
||||
You can do this via Conda (build code) or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
This method allows you to get started quickly without having to build the distribution code.
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_PORT=5001
|
||||
docker run \
|
||||
-it \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ./run.yaml:/root/my-run.yaml \
|
||||
llamastack/distribution-together \
|
||||
/root/my-run.yaml \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env TOGETHER_API_KEY=$TOGETHER_API_KEY
|
||||
```
|
||||
|
||||
Make sure in your `run.yaml` file, your inference provider is pointing to the correct Together URL server endpoint. E.g.
|
||||
```
|
||||
inference:
|
||||
- provider_id: together
|
||||
provider_type: remote::together
|
||||
config:
|
||||
url: https://api.together.xyz/v1
|
||||
api_key: <optional api key>
|
||||
```
|
||||
|
||||
### Conda llama stack run (Single Node CPU)
|
||||
### Via Conda
|
||||
|
||||
```bash
|
||||
llama stack build --template together --image-type conda
|
||||
# -- modify run.yaml to a valid Together server endpoint
|
||||
llama stack run ./run.yaml
|
||||
```
|
||||
|
||||
### (Optional) Update Model Serving Configuration
|
||||
|
||||
Use `llama-stack-client models list` to check the available models served by together.
|
||||
|
||||
```
|
||||
$ llama-stack-client models list
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| identifier | llama_model | provider_id | metadata |
|
||||
+==============================+==============================+===============+============+
|
||||
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.1-70B-Instruct | Llama3.1-70B-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.1-405B-Instruct | Llama3.1-405B-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-3B-Instruct | Llama3.2-3B-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | together0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
llama stack run ./run.yaml \
|
||||
--port 5001 \
|
||||
--env TOGETHER_API_KEY=$TOGETHER_API_KEY
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue