From c94fae5ab1a93a2721d8387c5ede4ec7dc7d85e9 Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Wed, 30 Oct 2024 11:13:01 -0700
Subject: [PATCH] tabs

---
 .../distributions/fireworks.md                |   9 +-
 .../getting_started/distributions/together.md |   5 +-
 docs/source/getting_started/index.md          | 230 +++++++++++++++++-
 3 files changed, 224 insertions(+), 20 deletions(-)

diff --git a/docs/source/getting_started/distributions/fireworks.md b/docs/source/getting_started/distributions/fireworks.md
index 100a10794..ee46cd18d 100644
--- a/docs/source/getting_started/distributions/fireworks.md
+++ b/docs/source/getting_started/distributions/fireworks.md
@@ -12,15 +12,12 @@ The `llamastack/distribution-fireworks` distribution consists of the following p
 
 ### Step 1. Start the Distribution (Single Node CPU)
 
-#### (Option 1) Start Distribution Via Conda
+#### (Option 1) Start Distribution Via Docker
 > [!NOTE]
 > This assumes you have an hosted endpoint at Fireworks with API Key.
 
 ```
-$ cd distributions/fireworks
-$ ls
-compose.yaml  run.yaml
-$ docker compose up
+$ cd distributions/fireworks && docker compose up
 ```
 
 Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g.
@@ -44,7 +41,7 @@ llama stack run ./run.yaml
 
 ### (Optional) Model Serving
 
-Use `llama-stack-client models list` to chekc the available models served by Fireworks.
+Use `llama-stack-client models list` to check the available models served by Fireworks.
 ```
 $ llama-stack-client models list
 +------------------------------+------------------------------+---------------+------------+
diff --git a/docs/source/getting_started/distributions/together.md b/docs/source/getting_started/distributions/together.md
index 5f9c90071..6a4142361 100644
--- a/docs/source/getting_started/distributions/together.md
+++ b/docs/source/getting_started/distributions/together.md
@@ -17,10 +17,7 @@ The `llamastack/distribution-together` distribution consists of the following pr
 > This assumes you have an hosted endpoint at Together with API Key.
 
 ```
-$ cd distributions/together
-$ ls
-compose.yaml  run.yaml
-$ docker compose up
+$ cd distributions/together && docker compose up
 ```
 
 Make sure in you `run.yaml` file, you inference provider is pointing to the correct Together URL server endpoint. E.g.
diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md
index 6d6e953e8..1aa974e11 100644
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@@ -62,10 +62,24 @@ $ ls ~/.llama/checkpoints
 Llama3.1-8B           Llama3.2-11B-Vision-Instruct  Llama3.2-1B-Instruct  Llama3.2-90B-Vision-Instruct  Llama-Guard-3-8B
 Llama3.1-8B-Instruct  Llama3.2-1B                   Llama3.2-3B-Instruct  Llama-Guard-3-1B              Prompt-Guard-86M
 ```
+
+> This assumes you have access to GPU to start a local server with access to your GPU.
 :::
 
 :::{tab-item} tgi
-This assumes you have access to GPU to start a TGI server with access to your GPU.
+Access to GPU to start a TGI server with access to your GPU.
+:::
+
+:::{tab-item} ollama
+Access to Single-Node CPU able to run ollama.
+:::
+
+:::{tab-item} together
+Access to Single-Node CPU with Together hosted endpoint via API_KEY from [together.ai](https://api.together.xyz/signin).
+:::
+
+:::{tab-item} fireworks
+Access to Single-Node CPU with Fireworks hosted endpoint via API_KEY from [together.ai](https://fireworks.ai/).
 :::
 
 ::::
@@ -80,14 +94,6 @@ This assumes you have access to GPU to start a TGI server with access to your GP
 $ cd distributions/meta-reference-gpu && docker compose up
 ```
 
-> [!NOTE]
-> This assumes you have access to GPU to start a local server with access to your GPU.
-
-
-> [!NOTE]
-> `~/.llama` should be the path containing downloaded weights of Llama models.
-
-
 This will download and start running a pre-built docker container. Alternatively, you may use the following commands:
 
 ```
@@ -117,6 +123,65 @@ docker compose down
 ```
 :::
 
+
+:::{tab-item} ollama
+```
+$ cd distributions/ollama/cpu && docker compose up
+```
+
+You will see outputs similar to following ---
+```
+[ollama]               | [GIN] 2024/10/18 - 21:19:41 | 200 |     226.841µs |             ::1 | GET      "/api/ps"
+[ollama]               | [GIN] 2024/10/18 - 21:19:42 | 200 |      60.908µs |             ::1 | GET      "/api/ps"
+INFO:     Started server process [1]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+[llamastack] | Resolved 12 providers
+[llamastack] |  inner-inference => ollama0
+[llamastack] |  models => __routing_table__
+[llamastack] |  inference => __autorouted__
+```
+
+To kill the server
+```
+docker compose down
+```
+:::
+
+:::{tab-item} fireworks
+```
+$ cd distributions/fireworks && docker compose up
+```
+
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g.
+```
+inference:
+  - provider_id: fireworks
+    provider_type: remote::fireworks
+    config:
+      url: https://api.fireworks.ai/inference
+      api_key: <optional api key>
+```
+:::
+
+:::{tab-item} together
+```
+$ cd distributions/together && docker compose up
+```
+
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Together URL server endpoint. E.g.
+```
+inference:
+  - provider_id: together
+    provider_type: remote::together
+    config:
+      url: https://api.together.xyz/v1
+      api_key: <optional api key>
+```
+:::
+
+
 ::::
 
 **Via Conda**
@@ -147,6 +212,78 @@ llama stack run ./gpu/run.yaml
 ```
 :::
 
+:::{tab-item} ollama
+
+If you wish to separately spin up a Ollama server, and connect with Llama Stack, you may use the following commands.
+
+#### Start Ollama server.
+- Please check the [Ollama Documentations](https://github.com/ollama/ollama) for more details.
+
+**Via Docker**
+```
+docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
+```
+
+**Via CLI**
+```
+ollama run <model_id>
+```
+
+#### Start Llama Stack server pointing to Ollama server
+
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Ollama endpoint. E.g.
+```
+inference:
+  - provider_id: ollama0
+    provider_type: remote::ollama
+    config:
+      url: http://127.0.0.1:14343
+```
+
+```
+llama stack build --template ollama --image-type conda
+llama stack run ./gpu/run.yaml
+```
+
+:::
+
+:::{tab-item} fireworks
+
+```bash
+llama stack build --template fireworks --image-type conda
+# -- modify run.yaml to a valid Fireworks server endpoint
+llama stack run ./run.yaml
+```
+
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g.
+```
+inference:
+  - provider_id: fireworks
+    provider_type: remote::fireworks
+    config:
+      url: https://api.fireworks.ai/inference
+      api_key: <optional api key>
+```
+:::
+
+:::{tab-item} together
+
+```bash
+llama stack build --template together --image-type conda
+# -- modify run.yaml to a valid Together server endpoint
+llama stack run ./run.yaml
+```
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Together URL server endpoint. E.g.
+```
+inference:
+  - provider_id: together
+    provider_type: remote::together
+    config:
+      url: https://api.together.xyz/v1
+      api_key: <optional api key>
+```
+:::
+
 ::::
 
 
@@ -170,6 +307,33 @@ inference:
 Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
 :::
 
+:::{tab-item} tgi
+To serve a new model with `tgi`, change the docker command flag `--model-id <model-to-serve>`.
+
+This can be done by edit the `command` args in `compose.yaml`. E.g. Replace "Llama-3.2-1B-Instruct" with the model you want to serve.
+
+```
+command: ["--dtype", "bfloat16", "--usage-stats", "on", "--sharded", "false", "--model-id", "meta-llama/Llama-3.2-1B-Instruct", "--port", "5009", "--cuda-memory-fraction", "0.3"]
+```
+
+or by changing the docker run command's `--model-id` flag
+```
+docker run --rm -it -v $HOME/.cache/huggingface:/data -p 5009:5009 --gpus all ghcr.io/huggingface/text-generation-inference:latest --dtype bfloat16 --usage-stats on --sharded false --model-id meta-llama/Llama-3.2-1B-Instruct --port 5009
+```
+
+In `run.yaml`, make sure you point the correct server endpoint to the TGI server endpoint serving your model.
+```
+inference:
+  - provider_id: tgi0
+    provider_type: remote::tgi
+    config:
+      url: http://127.0.0.1:5009
+```
+```
+
+Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+:::
+
 :::{tab-item} ollama
 You can use ollama for managing model downloads.
 
@@ -178,7 +342,6 @@ ollama pull llama3.1:8b-instruct-fp16
 ollama pull llama3.1:70b-instruct-fp16
 ```
 
-> [!NOTE]
 > Please check the [OLLAMA_SUPPORTED_MODELS](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/ollama.py) for the supported Ollama models.
 
 
@@ -206,6 +369,53 @@ $ llama-stack-client models list
 ```
 :::
 
+:::{tab-item} together
+Use `llama-stack-client models list` to check the available models served by together.
+
+```
+$ llama-stack-client models list
++------------------------------+------------------------------+---------------+------------+
+| identifier                   | llama_model                  | provider_id   | metadata   |
++==============================+==============================+===============+============+
+| Llama3.1-8B-Instruct         | Llama3.1-8B-Instruct         | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.1-70B-Instruct        | Llama3.1-70B-Instruct        | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.1-405B-Instruct       | Llama3.1-405B-Instruct       | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-3B-Instruct         | Llama3.2-3B-Instruct         | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | together0     | {}         |
++------------------------------+------------------------------+---------------+------------+
+```
+:::
+
+:::{tab-item} fireworks
+Use `llama-stack-client models list` to check the available models served by Fireworks.
+```
+$ llama-stack-client models list
++------------------------------+------------------------------+---------------+------------+
+| identifier                   | llama_model                  | provider_id   | metadata   |
++==============================+==============================+===============+============+
+| Llama3.1-8B-Instruct         | Llama3.1-8B-Instruct         | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.1-70B-Instruct        | Llama3.1-70B-Instruct        | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.1-405B-Instruct       | Llama3.1-405B-Instruct       | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-1B-Instruct         | Llama3.2-1B-Instruct         | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-3B-Instruct         | Llama3.2-3B-Instruct         | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | fireworks0    | {}         |
++------------------------------+------------------------------+---------------+------------+
+```
+:::
+
 ::::