forked from phoenix-oss/llama-stack-mirror
docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)
# What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
parent
712c6758c6
commit
1be66d754e
2 changed files with 10 additions and 4 deletions
|
@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following
|
||||||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||||
|
|
||||||
|
|
||||||
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
|
You can use this distribution if you want to run an independent vLLM server for inference.
|
||||||
|
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
|
||||||
|
@ -41,7 +41,10 @@ The following environment variables can be configured:
|
||||||
|
|
||||||
## Setting up vLLM server
|
## Setting up vLLM server
|
||||||
|
|
||||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
|
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
|
||||||
|
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
|
||||||
|
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
|
||||||
|
that we only use GPUs here for demonstration purposes.
|
||||||
|
|
||||||
### Setting up vLLM server on AMD GPU
|
### Setting up vLLM server on AMD GPU
|
||||||
|
|
||||||
|
|
|
@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following
|
||||||
|
|
||||||
{{ providers_table }}
|
{{ providers_table }}
|
||||||
|
|
||||||
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
|
You can use this distribution if you want to run an independent vLLM server for inference.
|
||||||
|
|
||||||
{% if run_config_env_vars %}
|
{% if run_config_env_vars %}
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
@ -28,7 +28,10 @@ The following environment variables can be configured:
|
||||||
|
|
||||||
## Setting up vLLM server
|
## Setting up vLLM server
|
||||||
|
|
||||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
|
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
|
||||||
|
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
|
||||||
|
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
|
||||||
|
that we only use GPUs here for demonstration purposes.
|
||||||
|
|
||||||
### Setting up vLLM server on AMD GPU
|
### Setting up vLLM server on AMD GPU
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue