Convert ollama to the new model

2025-12-17 14:59:48 +00:00 · 2024-11-17 15:19:55 -08:00 · 2024-11-17 15:19:55 -08:00 · a061f3f8c1
commit a061f3f8c1
parent 028530546f
14 changed files with 379 additions and 113 deletions
--- a/docs/source/getting_started/distributions/self_hosted_distro/ollama.md
+++ b/docs/source/getting_started/distributions/self_hosted_distro/ollama.md
@ -2,33 +2,40 @@

 The `llamastack/distribution-ollama` distribution consists of the following provider configurations.

-| **API**         	| **Inference**  	| **Agents**     	| **Memory**                       	  | **Safety**     	| **Telemetry**  	|
-|-----------------	|----------------	|----------------	|------------------------------------	|----------------	|----------------	|
-| **Provider(s)** 	| remote::ollama 	| meta-reference 	| remote::pgvector, remote::chromadb 	| meta-reference 	| meta-reference 	|
+                        Provider Configuration
+┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ API       ┃ Provider(s)                                             ┃
+┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
+│ agents    │ `inline::meta-reference`                                │
+│ inference │ `remote::ollama`                                        │
+│ memory    │ `inline::faiss`, `remote::chromadb`, `remote::pgvector` │
+│ safety    │ `inline::llama-guard`                                   │
+│ telemetry │ `inline::meta-reference`                                │
+└───────────┴─────────────────────────────────────────────────────────┘


+You should use this distribution if you have a regular desktop machine without very powerful GPUs. Of course, if you have powerful GPUs, you can still continue using this distribution since Ollama supports GPU acceleration.### Environment Variables
+
+The following environment variables can be configured:
+
+- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)
+- `INFERENCE_MODEL`: Inference model loaded into the TGI server (default: `meta-llama/Llama-3.2-3B-Instruct`)
+- `OLLAMA_PORT`: Port of the Ollama server (default: `14343`)
+- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
+### Models
+
+The following models are configured by default:
+- `${env.INFERENCE_MODEL}`
+- `${env.SAFETY_MODEL}`
+
 ## Using Docker Compose

 You can use `docker compose` to start a Ollama server and connect with Llama Stack server in a single command.

-### Docker: Start the Distribution (Single Node regular Desktop machine)
-
-> [!NOTE]
-> This will start an ollama server with CPU only, please see [Ollama Documentations](https://github.com/ollama/ollama) for serving models on CPU only.
-
 ```bash
 $ cd distributions/ollama; docker compose up
 ```

-### Docker: Start a Distribution (Single Node with nvidia GPUs)
-
-> [!NOTE]
-> This assumes you have access to GPU to start a Ollama server with access to your GPU.
-
-```bash
-$ cd distributions/ollama-gpu; docker compose up
-```
-
 You will see outputs similar to following ---
 ```bash
 [ollama]               | [GIN] 2024/10/18 - 21:19:41 | 200 |     226.841µs |             ::1 | GET      "/api/ps"
@ -71,7 +78,7 @@ ollama run <model_id>

 ```bash
 llama stack build --template ollama --image-type conda
-llama stack run ./gpu/run.yaml
+llama stack run run.yaml
 ```

 **Via Docker**