docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)

# What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-10 04:04:17 -04:00 · 2025-04-10 04:04:17 -04:00 · 1be66d754e
commit 1be66d754e
parent 712c6758c6
2 changed files with 10 additions and 4 deletions
--- a/docs/source/distributions/self_hosted_distro/remote-vllm.md
+++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md
@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following
 | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
-You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
+You can use this distribution if you want to run an independent vLLM server for inference.
 ### Environment Variables
@ -41,7 +41,10 @@ The following environment variables can be configured:
 ## Setting up vLLM server
-Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
+In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
 server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
 [supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
 that we only use GPUs here for demonstration purposes.
 ### Setting up vLLM server on AMD GPU
--- a/llama_stack/templates/remote-vllm/doc_template.md
+++ b/llama_stack/templates/remote-vllm/doc_template.md
@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following
 {{ providers_table }}
-You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
+You can use this distribution if you want to run an independent vLLM server for inference.
 {% if run_config_env_vars %}
 ### Environment Variables
@ -28,7 +28,10 @@ The following environment variables can be configured:
 ## Setting up vLLM server
-Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
+In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
 server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
 [supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
 that we only use GPUs here for demonstration purposes.
 ### Setting up vLLM server on AMD GPU