mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-30 07:39:38 +00:00
docs
This commit is contained in:
parent
a8dc87b00b
commit
18d175e703
3 changed files with 38 additions and 105 deletions
|
@ -1,15 +1,18 @@
|
||||||
# Fireworks Distribution
|
# Fireworks Distribution
|
||||||
|
|
||||||
The `llamastack/distribution-` distribution consists of the following provider configurations.
|
The `llamastack/distribution-fireworks` distribution consists of the following provider configurations.
|
||||||
|
|
||||||
|
|
||||||
| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** |
|
| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** |
|
||||||
|----------------- |--------------- |---------------- |-------------------------------------------------- |---------------- |---------------- |
|
|----------------- |--------------- |---------------- |-------------------------------------------------- |---------------- |---------------- |
|
||||||
| **Provider(s)** | remote::fireworks | meta-reference | meta-reference | meta-reference | meta-reference |
|
| **Provider(s)** | remote::fireworks | meta-reference | meta-reference | meta-reference | meta-reference |
|
||||||
|
|
||||||
|
### Step 0. Prerequisite
|
||||||
|
- Make sure you have access to a fireworks API Key. You can get one by visiting [fireworks.ai](https://fireworks.ai/)
|
||||||
|
|
||||||
### Docker: Start the Distribution (Single Node CPU)
|
### Step 1. Start the Distribution (Single Node CPU)
|
||||||
|
|
||||||
|
#### (Option 1) Start Distribution Via Conda
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> This assumes you have an hosted endpoint at Fireworks with API Key.
|
> This assumes you have an hosted endpoint at Fireworks with API Key.
|
||||||
|
|
||||||
|
@ -26,13 +29,11 @@ inference:
|
||||||
- provider_id: fireworks
|
- provider_id: fireworks
|
||||||
provider_type: remote::fireworks
|
provider_type: remote::fireworks
|
||||||
config:
|
config:
|
||||||
url: https://api.fireworks.ai/inferenc
|
url: https://api.fireworks.ai/inference
|
||||||
api_key: <optional api key>
|
api_key: <optional api key>
|
||||||
```
|
```
|
||||||
|
|
||||||
### Conda: llama stack run (Single Node CPU)
|
#### (Option 2) Start Distribution Via Conda
|
||||||
|
|
||||||
**Via Conda**
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
llama stack build --template fireworks --image-type conda
|
llama stack build --template fireworks --image-type conda
|
||||||
|
@ -41,7 +42,7 @@ llama stack run ./run.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Model Serving
|
### (Optional) Model Serving
|
||||||
|
|
||||||
Use `llama-stack-client models list` to chekc the available models served by Fireworks.
|
Use `llama-stack-client models list` to chekc the available models served by Fireworks.
|
||||||
```
|
```
|
||||||
|
|
|
@ -8,8 +8,8 @@ The `llamastack/distribution-meta-reference-gpu` distribution consists of the fo
|
||||||
| **Provider(s)** | meta-reference | meta-reference | meta-reference, remote::pgvector, remote::chroma | meta-reference | meta-reference |
|
| **Provider(s)** | meta-reference | meta-reference | meta-reference, remote::pgvector, remote::chroma | meta-reference | meta-reference |
|
||||||
|
|
||||||
|
|
||||||
### Prerequisite
|
### Step 0. Prerequisite - Downloading Models
|
||||||
Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide]() here to download the models.
|
Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ ls ~/.llama/checkpoints
|
$ ls ~/.llama/checkpoints
|
||||||
|
@ -17,8 +17,9 @@ Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3
|
||||||
Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M
|
Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M
|
||||||
```
|
```
|
||||||
|
|
||||||
### Docker: Start the Distribution
|
### Step 1. Start the Distribution
|
||||||
|
|
||||||
|
#### (Option 1) Start with Docker
|
||||||
```
|
```
|
||||||
$ cd distributions/meta-reference-gpu && docker compose up
|
$ cd distributions/meta-reference-gpu && docker compose up
|
||||||
```
|
```
|
||||||
|
@ -37,9 +38,9 @@ This will download and start running a pre-built docker container. Alternatively
|
||||||
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
|
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Conda: Start the Distribution
|
#### (Option 2) Start with Conda
|
||||||
|
|
||||||
1. Install the `llama` CLI. See [CLI Reference]()
|
1. Install the `llama` CLI. See [CLI Reference](https://llama-stack.readthedocs.io/en/latest/cli_reference/index.html)
|
||||||
|
|
||||||
2. Build the `meta-reference-gpu` distribution
|
2. Build the `meta-reference-gpu` distribution
|
||||||
|
|
||||||
|
@ -53,7 +54,7 @@ $ cd distributions/meta-reference-gpu
|
||||||
$ llama stack run ./run.yaml
|
$ llama stack run ./run.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Serving a new model
|
### (Optional) Serving a new model
|
||||||
You may change the `config.model` in `run.yaml` to update the model currently being served by the distribution. Make sure you have the model checkpoint downloaded in your `~/.llama`.
|
You may change the `config.model` in `run.yaml` to update the model currently being served by the distribution. Make sure you have the model checkpoint downloaded in your `~/.llama`.
|
||||||
```
|
```
|
||||||
inference:
|
inference:
|
||||||
|
|
|
@ -18,7 +18,7 @@ At the end of the guide, you will have learnt how to:
|
||||||
|
|
||||||
To see more example apps built using Llama Stack, see [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main).
|
To see more example apps built using Llama Stack, see [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main).
|
||||||
|
|
||||||
## Starting Up Llama Stack Server
|
## Step 1. Starting Up Llama Stack Server
|
||||||
|
|
||||||
### Decide Your Build Type
|
### Decide Your Build Type
|
||||||
There are two ways to start a Llama Stack:
|
There are two ways to start a Llama Stack:
|
||||||
|
@ -30,116 +30,47 @@ Both of these provide options to run model inference using our reference impleme
|
||||||
|
|
||||||
### Decide Your Inference Provider
|
### Decide Your Inference Provider
|
||||||
|
|
||||||
Running inference of the underlying Llama model is one of the most critical requirements. Depending on what hardware you have available, you have various options:
|
Running inference of the underlying Llama model is one of the most critical requirements. Depending on what hardware you have available, you have various options. Note that each option have different necessary prerequisites.
|
||||||
|
|
||||||
- **Do you have access to a machine with powerful GPUs?**
|
- **Do you have access to a machine with powerful GPUs?**
|
||||||
If so, we suggest:
|
If so, we suggest:
|
||||||
- `distribution-meta-reference-gpu`:
|
- [`distribution-meta-reference-gpu`](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html)
|
||||||
- [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution)
|
- [`distribution-tgi`](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html)
|
||||||
- [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution)
|
|
||||||
- `distribution-tgi`:
|
|
||||||
- [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#docker-start-the-distribution-single-node-gpu)
|
|
||||||
- [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#conda-tgi-server-llama-stack-run)
|
|
||||||
|
|
||||||
- **Are you running on a "regular" desktop machine?**
|
- **Are you running on a "regular" desktop machine?**
|
||||||
If so, we suggest:
|
If so, we suggest:
|
||||||
- `distribution-ollama`:
|
- [`distribution-ollama`](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html)
|
||||||
- [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#docker-start-a-distribution-single-node-gpu)
|
|
||||||
- [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#conda-ollama-run-llama-stack-run)
|
|
||||||
|
|
||||||
- **Do you have access to a remote inference provider like Fireworks, Togther, etc.?** If so, we suggest:
|
- **Do you have access to a remote inference provider like Fireworks, Togther, etc.?** If so, we suggest:
|
||||||
- `distribution-together`:
|
- [`distribution-together`](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html)
|
||||||
- [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#docker-start-the-distribution-single-node-cpu)
|
- [`distribution-fireworks`](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html)
|
||||||
- [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#conda-llama-stack-run-single-node-cpu)
|
|
||||||
- `distribution-fireworks`:
|
|
||||||
- [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html)
|
|
||||||
- [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html#conda-llama-stack-run-single-node-cpu)
|
|
||||||
|
|
||||||
|
|
||||||
### Quick Start Commands
|
### Quick Start Commands
|
||||||
|
|
||||||
#### Single-Node GPU
|
The following quick starts commands. Please visit each distribution page on detailed setup.
|
||||||
|
|
||||||
**Docker**
|
##### 0. Prerequisite
|
||||||
````{tab-set-code}
|
::::{tab-set}
|
||||||
|
|
||||||
|
:::{tab-item} meta-reference-gpu
|
||||||
|
**Downloading Models**
|
||||||
|
Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models.
|
||||||
|
|
||||||
```{code-block} meta-reference-gpu
|
|
||||||
$ cd distributions/meta-reference-gpu && docker compose up
|
|
||||||
```
|
```
|
||||||
|
$ ls ~/.llama/checkpoints
|
||||||
```{code-block} tgi
|
Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B
|
||||||
$ cd distributions/tgi && docker compose up
|
Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M
|
||||||
```
|
```
|
||||||
|
:::
|
||||||
|
|
||||||
````
|
:::{tab-item} tgi
|
||||||
**Conda**
|
Single-Node GPU
|
||||||
|
:::
|
||||||
|
|
||||||
````{tab-set-code}
|
::::
|
||||||
|
|
||||||
```{code-block} meta-reference-gpu
|
## Step 2. Build Your Llama Stack App
|
||||||
$ llama stack build --template meta-reference-gpu --image-type conda
|
|
||||||
$ cd distributions/meta-reference-gpu && llama stack run ./run.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
```{code-block} tgi
|
|
||||||
$ llama stack build --template tgi --image-type conda
|
|
||||||
$ cd distributions/tgi && llama stack run ./run.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
````
|
|
||||||
|
|
||||||
#### Single-Node CPU
|
|
||||||
**Docker**
|
|
||||||
````{tab-set-code}
|
|
||||||
|
|
||||||
```{code-block} ollama
|
|
||||||
$ cd distributions/ollama/cpu && docker compose up
|
|
||||||
```
|
|
||||||
|
|
||||||
````
|
|
||||||
|
|
||||||
**Conda**
|
|
||||||
````{tab-set-code}
|
|
||||||
|
|
||||||
```{code-block} ollama
|
|
||||||
$ llama stack build --template ollama --image-type conda
|
|
||||||
$ cd distributions/ollama && llama stack run ./run.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
````
|
|
||||||
|
|
||||||
#### Single-Node CPU + Hosted Endpoint
|
|
||||||
|
|
||||||
**Docker**
|
|
||||||
````{tab-set-code}
|
|
||||||
|
|
||||||
```{code-block} together
|
|
||||||
$ cd distributions/together && docker compose up
|
|
||||||
```
|
|
||||||
|
|
||||||
```{code-block} fireworks
|
|
||||||
$ cd distributions/fireworks && docker compose up
|
|
||||||
```
|
|
||||||
|
|
||||||
````
|
|
||||||
|
|
||||||
**Conda**
|
|
||||||
````{tab-set-code}
|
|
||||||
|
|
||||||
```{code-block} together
|
|
||||||
$ llama stack build --template together --image-type conda
|
|
||||||
$ cd distributions/together && llama stack run ./run.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
```{code-block} fireworks
|
|
||||||
$ llama stack build --template fireworks --image-type conda
|
|
||||||
$ cd distributions/fireworks && llama stack run ./run.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
````
|
|
||||||
|
|
||||||
|
|
||||||
## Build Your Llama Stack App
|
|
||||||
|
|
||||||
### chat_completion sanity test
|
### chat_completion sanity test
|
||||||
Once the server is setup, we can test it with a client to see the example outputs by . This will run the chat completion client and query the distribution’s `/inference/chat_completion` API. Send a POST request to the server:
|
Once the server is setup, we can test it with a client to see the example outputs by . This will run the chat completion client and query the distribution’s `/inference/chat_completion` API. Send a POST request to the server:
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue