Write some docs

This commit is contained in:
Ashwin Bharambe 2024-11-08 18:02:43 -08:00
parent 38cdbdec5a
commit 211a7f8f28
3 changed files with 126 additions and 18 deletions

View file

@ -80,6 +80,11 @@ Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-
:::
:::{tab-item} vLLM
##### System Requirements
Access to Single-Node GPU to start a vLLM server.
:::
:::{tab-item} tgi
##### System Requirements
Access to Single-Node GPU to start a TGI server.
@ -119,6 +124,22 @@ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.
```
:::
:::{tab-item} vLLM
```
$ cd llama-stack/distributions/remote-vllm && docker compose up
```
The script will first start up vLLM server on port 8000, then start up Llama Stack distribution server hooking up to it for inference. You should see the following outputs --
```
<TO BE FILLED>
```
To kill the server
```
docker compose down
```
:::
:::{tab-item} tgi
```
$ cd llama-stack/distributions/tgi && docker compose up