mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-31 15:10:02 +00:00
docs: add AMD ROCm of remote-vllm distro
Move the contents from remote-vllm.md to doc_template.md. Signed-off-by: Alex He <alehe@amd.com>
This commit is contained in:
parent
39030d5c14
commit
ba76111db2
2 changed files with 75 additions and 2 deletions
|
|
@ -41,7 +41,7 @@ The following environment variables can be configured:
|
|||
|
||||
## Setting up vLLM server
|
||||
|
||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and safety provider.
|
||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
|
||||
|
||||
### Setting up vLLM server on AMD GPU
|
||||
|
||||
|
|
@ -113,7 +113,6 @@ docker run \
|
|||
--port $SAFETY_PORT
|
||||
```
|
||||
|
||||
|
||||
### Setting up vLLM server on NVIDIA GPU
|
||||
|
||||
Please check the [vLLM Documentation](https://docs.vllm.ai/en/v0.5.5/serving/deploying_with_docker.html) to get a vLLM endpoint. Here is a sample script to start a vLLM server locally via Docker:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue