docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)

# What does this PR do?

vLLM website just added a [new index page for installing for different
hardware
accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html).
This PR adds a link to that page with additional edits to make sure
readers are aware that the use of GPUs on this page are for
demonstration purposes only.

This closes https://github.com/meta-llama/llama-stack/issues/1813.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
Yuan Tang 2025-04-10 04:04:17 -04:00 committed by GitHub
parent 712c6758c6
commit 1be66d754e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 10 additions and 4 deletions

View file

@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following
{{ providers_table }}
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
You can use this distribution if you want to run an independent vLLM server for inference.
{% if run_config_env_vars %}
### Environment Variables
@ -28,7 +28,10 @@ The following environment variables can be configured:
## Setting up vLLM server
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
that we only use GPUs here for demonstration purposes.
### Setting up vLLM server on AMD GPU