docs: Redirect instructions for additional hardware accelerators for remote vLLM

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
Yuan Tang 2025-04-09 21:18:28 -04:00
parent 712c6758c6
commit 098a09cfa3
No known key found for this signature in database
2 changed files with 10 additions and 4 deletions

View file

@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
You can use this distribution if you want to run an independent vLLM server for inference.
### Environment Variables
@ -41,7 +41,10 @@ The following environment variables can be configured:
## Setting up vLLM server
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
that we only use GPUs here for demonstration purposes.
### Setting up vLLM server on AMD GPU

View file

@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following
{{ providers_table }}
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
You can use this distribution if you want to run an independent vLLM server for inference.
{% if run_config_env_vars %}
### Environment Variables
@ -28,7 +28,10 @@ The following environment variables can be configured:
## Setting up vLLM server
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
that we only use GPUs here for demonstration purposes.
### Setting up vLLM server on AMD GPU