update model serving readmes

2025-10-15 22:47:59 +00:00 · 2024-10-28 17:39:55 -07:00 · 2024-10-28 17:39:55 -07:00 · 8e8056e8da
commit 8e8056e8da
parent e6ee4c10b4
6 changed files with 112 additions and 0 deletions
--- a/distributions/meta-reference-gpu/README.md
+++ b/distributions/meta-reference-gpu/README.md
@ -84,3 +84,19 @@ memory:
 ```

 3. Run `docker compose up` with the updated `run.yaml` file.
+
+### Serving a new model
+You may change the `config.model` in `run.yaml` to update the model currently being served by the distribution. Make sure you have the model checkpoint downloaded in your `~/.llama`.
+```
+inference:
+  - provider_id: meta0
+    provider_type: meta-reference
+    config:
+      model: Llama3.2-11B-Vision-Instruct
+      quantization: null
+      torch_seed: null
+      max_seq_len: 4096
+      max_batch_size: 1
+```
+
+Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.