update model serving readmes

2025-10-15 22:47:59 +00:00 · 2024-10-28 17:39:55 -07:00 · 2024-10-28 17:39:55 -07:00 · 8e8056e8da
commit 8e8056e8da
parent e6ee4c10b4
6 changed files with 112 additions and 0 deletions
--- a/distributions/ollama/README.md
+++ b/distributions/ollama/README.md
@ -89,3 +89,28 @@ inference:
 llama stack build --template ollama --image-type conda
 llama stack run ./gpu/run.yaml
 ```
+
+### Model Serving
+
+To serve a new model with `ollama`
+```
+ollama run <model_name>
+```
+
+To make sure that the model is being served correctly, run `ollama ps` to get a list of models being served by ollama.
+```
+$ ollama ps
+
+NAME                         ID              SIZE     PROCESSOR    UNTIL
+llama3.1:8b-instruct-fp16    4aacac419454    17 GB    100% GPU     4 minutes from now
+```
+
+To verify that the model served by ollama is correctly connected to Llama Stack server
+```
+$ llama-stack-client models list
+----------------------+----------------------+---------------+-----------------------------------------------+
+| identifier           | llama_model          | provider_id   | metadata                                      |
+======================+======================+===============+===============================================+
+| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | ollama0       | {'ollama_model': 'llama3.1:8b-instruct-fp16'} |
+----------------------+----------------------+---------------+-----------------------------------------------+
+```