distributions

2025-12-08 03:00:56 +00:00 · 2024-10-29 15:00:58 -07:00 · 2024-10-29 15:00:58 -07:00 · 39872ca4b4
commit 39872ca4b4
parent 3fb9a8e82e
7 changed files with 336 additions and 48 deletions
--- a/distributions/ollama/README.md
+++ b/distributions/ollama/README.md
@ -7,7 +7,7 @@ The `llamastack/distribution-ollama` distribution consists of the following prov
 | **Provider(s)** 	| remote::ollama 	| meta-reference 	| remote::pgvector, remote::chroma 	| remote::ollama 	| meta-reference 	|


-### Start a Distribution (Single Node GPU)
+### Docker: Start a Distribution (Single Node GPU)

 > [!NOTE]
 > This assumes you have access to GPU to start a Ollama server with access to your GPU.
@ -38,7 +38,7 @@ To kill the server
 docker compose down
 ```

-### Start the Distribution (Single Node CPU)
+### Docker: Start the Distribution (Single Node CPU)

 > [!NOTE]
 > This will start an ollama server with CPU only, please see [Ollama Documentations](https://github.com/ollama/ollama) for serving models on CPU only.
@ -50,7 +50,7 @@ compose.yaml  run.yaml
 $ docker compose up
 ```

-### (Alternative) ollama run + llama stack run
+### Conda: ollama run + llama stack run

 If you wish to separately spin up a Ollama server, and connect with Llama Stack, you may use the following commands.

@ -69,6 +69,13 @@ ollama run <model_id>

 #### Start Llama Stack server pointing to Ollama server

+**Via Conda**
+
+```
+llama stack build --template ollama --image-type conda
+llama stack run ./gpu/run.yaml
+```
+
 **Via Docker**
 ```
 docker run --network host -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./gpu/run.yaml:/root/llamastack-run-ollama.yaml --gpus=all llamastack/distribution-ollama --yaml_config /root/llamastack-run-ollama.yaml
@ -83,13 +90,6 @@ inference:
      url: http://127.0.0.1:14343
 ```

-**Via Conda**
-
-```
-llama stack build --template ollama --image-type conda
-llama stack run ./gpu/run.yaml
-```
-
 ### Model Serving

 #### Downloading model via Ollama