More specific guidance

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-12-28 00:01:59 +00:00 · 2025-04-18 08:30:41 -04:00 · 2025-04-18 08:30:41 -04:00 · a2b7075fe9
commit a2b7075fe9
parent 2b1620f8d8
2 changed files with 2 additions and 2 deletions
--- a/docs/source/distributions/self_hosted_distro/remote-vllm.md
+++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md
@ -44,7 +44,7 @@ The following environment variables can be configured:
 In the following sections, we'll use AMD, NVIDIA or Intel GPUs to serve as hardware accelerators for the vLLM
 server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
 [supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
-that we only use GPUs here for demonstration purposes. Note that if you are running into issues, there's a new environment variable `VLLM_DEBUG_LOG_API_SERVER_RESPONSE` (available in vLLM v0.8.3 and above) to enable log response from API server for debugging.
+that we only use GPUs here for demonstration purposes. Note that if you run into issues, you can include the environment variable `--env VLLM_DEBUG_LOG_API_SERVER_RESPONSE=true` (available in vLLM v0.8.3 and above) in the `docker run` command to enable log response from API server for debugging.

 ### Setting up vLLM server on AMD GPU

--- a/llama_stack/templates/remote-vllm/doc_template.md
+++ b/llama_stack/templates/remote-vllm/doc_template.md
@ -31,7 +31,7 @@ The following environment variables can be configured:
 In the following sections, we'll use AMD, NVIDIA or Intel GPUs to serve as hardware accelerators for the vLLM
 server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
 [supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
-that we only use GPUs here for demonstration purposes. Note that if you are running into issues, there's a new environment variable `VLLM_DEBUG_LOG_API_SERVER_RESPONSE` (available in vLLM v0.8.3 and above) to enable log response from API server for debugging.
+that we only use GPUs here for demonstration purposes. Note that if you run into issues, you can include the environment variable `--env VLLM_DEBUG_LOG_API_SERVER_RESPONSE=true` (available in vLLM v0.8.3 and above) in the `docker run` command to enable log response from API server for debugging.

 ### Setting up vLLM server on AMD GPU