distro readmes with model serving instructions (#339)

* readme updates

* quantied compose

* dell tgi

* config update

* readme

* update model serving readmes

* update

* update

* config
This commit is contained in:
Xi Yan 2024-10-28 17:47:14 -07:00 committed by GitHub
parent a70a4706fc
commit ae671eaf7a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 136 additions and 4 deletions

View file

@ -89,3 +89,28 @@ inference:
llama stack build --template ollama --image-type conda
llama stack run ./gpu/run.yaml
```
### Model Serving
To serve a new model with `ollama`
```
ollama run <model_name>
```
To make sure that the model is being served correctly, run `ollama ps` to get a list of models being served by ollama.
```
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.1:8b-instruct-fp16 4aacac419454 17 GB 100% GPU 4 minutes from now
```
To verify that the model served by ollama is correctly connected to Llama Stack server
```
$ llama-stack-client models list
+----------------------+----------------------+---------------+-----------------------------------------------+
| identifier | llama_model | provider_id | metadata |
+======================+======================+===============+===============================================+
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | ollama0 | {'ollama_model': 'llama3.1:8b-instruct-fp16'} |
+----------------------+----------------------+---------------+-----------------------------------------------+
```