docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)

# What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-12-03 09:53:45 +00:00 · 2025-04-10 04:04:17 -04:00 · 2025-04-10 04:04:17 -04:00 · 1be66d754e
commit 1be66d754e
parent 712c6758c6
2 changed files with 10 additions and 4 deletions
--- a/llama_stack/templates/remote-vllm/doc_template.md
+++ b/llama_stack/templates/remote-vllm/doc_template.md
@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following

 {{ providers_table }}

-You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
+You can use this distribution if you want to run an independent vLLM server for inference.

 {% if run_config_env_vars %}
 ### Environment Variables
@ -28,7 +28,10 @@ The following environment variables can be configured:

 ## Setting up vLLM server

-Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
+In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
+server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
+[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
+that we only use GPUs here for demonstration purposes.

 ### Setting up vLLM server on AMD GPU