mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
Merge 23484d6159
into 40fdce79b3
This commit is contained in:
commit
4f6ce6f81c
1 changed files with 62 additions and 28 deletions
|
@ -64,10 +64,9 @@ options:
|
||||||
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
|
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
|
||||||
--list-templates Show the available templates for building a Llama Stack distribution (default: False)
|
--list-templates Show the available templates for building a Llama Stack distribution (default: False)
|
||||||
--image-type {conda,container,venv}
|
--image-type {conda,container,venv}
|
||||||
Image Type to use for the build. This can be either conda or container or venv. If not specified, will use the image type from the template config. (default:
|
Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
|
||||||
conda)
|
|
||||||
--image-name IMAGE_NAME
|
--image-name IMAGE_NAME
|
||||||
[for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active Conda environment will be used if
|
[for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
|
||||||
found. (default: None)
|
found. (default: None)
|
||||||
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
|
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
|
||||||
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
|
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
|
||||||
|
@ -89,32 +88,53 @@ llama stack build --list-templates
|
||||||
------------------------------+-----------------------------------------------------------------------------+
|
------------------------------+-----------------------------------------------------------------------------+
|
||||||
| Template Name | Description |
|
| Template Name | Description |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| hf-serverless | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
|
| watsonx | Use watsonx for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
|
||||||
| together | Use Together.AI for running LLM inference |
|
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| vllm-gpu | Use a built-in vLLM engine for running LLM inference |
|
| vllm-gpu | Use a built-in vLLM engine for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| experimental-post-training | Experimental template for post training |
|
| together | Use Together.AI for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
|
||||||
| remote-vllm | Use (an external) vLLM server for running LLM inference |
|
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
|
||||||
| fireworks | Use Fireworks.AI for running LLM inference |
|
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| tgi | Use (an external) TGI server for running LLM inference |
|
| tgi | Use (an external) TGI server for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| bedrock | Use AWS Bedrock for running LLM inference and safety |
|
| starter | Quick start template for running Llama Stack with several popular providers |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| meta-reference-gpu | Use Meta Reference for running LLM inference |
|
| sambanova | Use SambaNova for running LLM inference and safety |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| nvidia | Use NVIDIA NIM for running LLM inference |
|
| remote-vllm | Use (an external) vLLM server for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| cerebras | Use Cerebras for running LLM inference |
|
| postgres-demo | Quick start template for running Llama Stack with several popular providers |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| passthrough | Use Passthrough hosted llama-stack endpoint for LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| open-benchmark | Distribution for running open benchmarks |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| ollama | Use (an external) Ollama server for running LLM inference |
|
| ollama | Use (an external) Ollama server for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| nvidia | Use NVIDIA NIM for running LLM inference, evaluation and safety |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| meta-reference-gpu | Use Meta Reference for running LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| llama_api | Distribution for running e2e tests in CI |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| hf-serverless | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
| hf-endpoint | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
|
| hf-endpoint | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
|
||||||
+------------------------------+-----------------------------------------------------------------------------+
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| groq | Use Groq for running LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| fireworks | Use Fireworks.AI for running LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| experimental-post-training | Experimental template for post training |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| dell | Dell's distribution of Llama Stack. TGI inference via Dell's custom |
|
||||||
|
| | container |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| ci-tests | Distribution for running e2e tests in CI |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| cerebras | Use Cerebras for running LLM inference |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
|
| bedrock | Use AWS Bedrock for running LLM inference and safety |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+
|
||||||
```
|
```
|
||||||
|
|
||||||
You may then pick a template to build your distribution with providers fitted to your liking.
|
You may then pick a template to build your distribution with providers fitted to your liking.
|
||||||
|
@ -256,6 +276,7 @@ $ llama stack build --template ollama --image-type container
|
||||||
...
|
...
|
||||||
Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
|
Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
|
||||||
...
|
...
|
||||||
|
```
|
||||||
|
|
||||||
You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
|
You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
|
||||||
```
|
```
|
||||||
|
@ -305,30 +326,28 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
|
||||||
|
|
||||||
```
|
```
|
||||||
llama stack run -h
|
llama stack run -h
|
||||||
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
|
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
|
||||||
[--image-type {conda,container,venv}]
|
[--image-type {conda,venv}] [--enable-ui]
|
||||||
config
|
[config | template]
|
||||||
|
|
||||||
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
|
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
config Path to config file to use for the run
|
config | template Path to config file to use for the run or name of known template (`llama stack list` for a list). (default: None)
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
|
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
|
||||||
--image-name IMAGE_NAME
|
--image-name IMAGE_NAME
|
||||||
Name of the image to run. Defaults to the current environment (default: None)
|
Name of the image to run. Defaults to the current environment (default: None)
|
||||||
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
|
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
|
||||||
--tls-keyfile TLS_KEYFILE
|
--image-type {conda,venv}
|
||||||
Path to TLS key file for HTTPS (default: None)
|
Image Type used during the build. This can be either conda or venv. (default: None)
|
||||||
--tls-certfile TLS_CERTFILE
|
--enable-ui Start the UI server (default: False)
|
||||||
Path to TLS certificate file for HTTPS (default: None)
|
|
||||||
--image-type {conda,container,venv}
|
|
||||||
Image Type used during the build. This can be either conda or container or venv. (default: conda)
|
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Note:** Container images built with `llama stack build --image-type container` cannot be run using `llama stack run`. Instead, they must be run directly using Docker or Podman commands as shown in the container building section above.
|
||||||
|
|
||||||
```
|
```
|
||||||
# Start using template name
|
# Start using template name
|
||||||
llama stack run tgi
|
llama stack run tgi
|
||||||
|
@ -372,6 +391,7 @@ INFO: Application startup complete.
|
||||||
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
|
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
|
||||||
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
|
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
|
||||||
```
|
```
|
||||||
|
|
||||||
### Listing Distributions
|
### Listing Distributions
|
||||||
Using the list command, you can view all existing Llama Stack distributions, including stacks built from templates, from scratch, or using custom configuration files.
|
Using the list command, you can view all existing Llama Stack distributions, including stacks built from templates, from scratch, or using custom configuration files.
|
||||||
|
|
||||||
|
@ -391,6 +411,20 @@ Example Usage
|
||||||
llama stack list
|
llama stack list
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
| Stack Name | Path | Build Config | Run Config |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
| together | /home/wenzhou/.llama/distributions/together | Yes | No |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
| bedrock | /home/wenzhou/.llama/distributions/bedrock | Yes | No |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
| starter | /home/wenzhou/.llama/distributions/starter | No | No |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
| remote-vllm | /home/wenzhou/.llama/distributions/remote-vllm | Yes | Yes |
|
||||||
|
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
|
||||||
|
```
|
||||||
|
|
||||||
### Removing a Distribution
|
### Removing a Distribution
|
||||||
Use the remove command to delete a distribution you've previously built.
|
Use the remove command to delete a distribution you've previously built.
|
||||||
|
|
||||||
|
@ -413,7 +447,7 @@ Example
|
||||||
llama stack rm llamastack-test
|
llama stack rm llamastack-test
|
||||||
```
|
```
|
||||||
|
|
||||||
To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they’re no longer needed.
|
To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they're no longer needed.
|
||||||
|
|
||||||
### Troubleshooting
|
### Troubleshooting
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue