mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-09 23:25:58 +00:00
# What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.
82 lines
2.6 KiB
Markdown
82 lines
2.6 KiB
Markdown
# Meta Reference Distribution
|
|
|
|
The `llamastack/distribution-{{ name }}` distribution consists of the following provider configurations:
|
|
|
|
{{ providers_table }}
|
|
|
|
Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs.
|
|
|
|
{% if run_config_env_vars %}
|
|
### Environment Variables
|
|
|
|
The following environment variables can be configured:
|
|
|
|
{% for var, (default_value, description) in run_config_env_vars.items() %}
|
|
- `{{ var }}`: {{ description }} (default: `{{ default_value }}`)
|
|
{% endfor %}
|
|
{% endif %}
|
|
|
|
|
|
## Prerequisite: Downloading Models
|
|
|
|
Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
|
|
|
|
```
|
|
$ ls ~/.llama/checkpoints
|
|
Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B
|
|
Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M
|
|
```
|
|
|
|
## Running the Distribution
|
|
|
|
You can do this via Conda (build code) or Docker which has a pre-built image.
|
|
|
|
### Via Docker
|
|
|
|
This method allows you to get started quickly without having to build the distribution code.
|
|
|
|
```bash
|
|
LLAMA_STACK_PORT=5001
|
|
docker run \
|
|
-it \
|
|
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
|
-v ./run.yaml:/root/my-run.yaml \
|
|
llamastack/distribution-{{ name }} \
|
|
/root/my-run.yaml \
|
|
--port $LLAMA_STACK_PORT \
|
|
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
|
|
```
|
|
|
|
If you are using Llama Stack Safety / Shield APIs, use:
|
|
|
|
```bash
|
|
docker run \
|
|
-it \
|
|
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
|
-v ./run-with-safety.yaml:/root/my-run.yaml \
|
|
llamastack/distribution-{{ name }} \
|
|
/root/my-run.yaml \
|
|
--port $LLAMA_STACK_PORT \
|
|
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
|
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
|
|
```
|
|
|
|
### Via Conda
|
|
|
|
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
|
|
|
|
```bash
|
|
llama stack build --template meta-reference-gpu --image-type conda
|
|
llama stack run ./run.yaml \
|
|
--port 5001 \
|
|
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
|
|
```
|
|
|
|
If you are using Llama Stack Safety / Shield APIs, use:
|
|
|
|
```bash
|
|
llama stack run ./run-with-safety.yaml \
|
|
--port 5001 \
|
|
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
|
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
|
|
```
|