diff --git a/docs/source/distributions/self_hosted_distro/remote-vllm.md b/docs/source/distributions/self_hosted_distro/remote-vllm.md index 457d703b3..e18b5bf40 100644 --- a/docs/source/distributions/self_hosted_distro/remote-vllm.md +++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md @@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. ### Environment Variables @@ -41,7 +41,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU diff --git a/llama_stack/templates/remote-vllm/doc_template.md b/llama_stack/templates/remote-vllm/doc_template.md index 7543e8239..efcdb62c6 100644 --- a/llama_stack/templates/remote-vllm/doc_template.md +++ b/llama_stack/templates/remote-vllm/doc_template.md @@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following {{ providers_table }} -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. {% if run_config_env_vars %} ### Environment Variables @@ -28,7 +28,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU