From 1be66d754e7fb4f8dcf35d388afbb8ddc85e7449 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Thu, 10 Apr 2025 04:04:17 -0400 Subject: [PATCH] docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923) # What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang --- .../source/distributions/self_hosted_distro/remote-vllm.md | 7 +++++-- llama_stack/templates/remote-vllm/doc_template.md | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/source/distributions/self_hosted_distro/remote-vllm.md b/docs/source/distributions/self_hosted_distro/remote-vllm.md index 457d703b3..e18b5bf40 100644 --- a/docs/source/distributions/self_hosted_distro/remote-vllm.md +++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md @@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. ### Environment Variables @@ -41,7 +41,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU diff --git a/llama_stack/templates/remote-vllm/doc_template.md b/llama_stack/templates/remote-vllm/doc_template.md index 7543e8239..efcdb62c6 100644 --- a/llama_stack/templates/remote-vllm/doc_template.md +++ b/llama_stack/templates/remote-vllm/doc_template.md @@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following {{ providers_table }} -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. {% if run_config_env_vars %} ### Environment Variables @@ -28,7 +28,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU