Convert ollama to the new model

This commit is contained in:
Ashwin Bharambe 2024-11-17 15:19:55 -08:00
parent 028530546f
commit a061f3f8c1
14 changed files with 379 additions and 113 deletions

View file

@ -2,33 +2,40 @@
The `llamastack/distribution-ollama` distribution consists of the following provider configurations.
| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** |
|----------------- |---------------- |---------------- |------------------------------------ |---------------- |---------------- |
| **Provider(s)** | remote::ollama | meta-reference | remote::pgvector, remote::chromadb | meta-reference | meta-reference |
Provider Configuration
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ API ┃ Provider(s) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ agents │ `inline::meta-reference`
│ inference │ `remote::ollama`
│ memory │ `inline::faiss`, `remote::chromadb`, `remote::pgvector`
│ safety │ `inline::llama-guard`
│ telemetry │ `inline::meta-reference`
└───────────┴─────────────────────────────────────────────────────────┘
You should use this distribution if you have a regular desktop machine without very powerful GPUs. Of course, if you have powerful GPUs, you can still continue using this distribution since Ollama supports GPU acceleration.### Environment Variables
The following environment variables can be configured:
- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)
- `INFERENCE_MODEL`: Inference model loaded into the TGI server (default: `meta-llama/Llama-3.2-3B-Instruct`)
- `OLLAMA_PORT`: Port of the Ollama server (default: `14343`)
- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
### Models
The following models are configured by default:
- `${env.INFERENCE_MODEL}`
- `${env.SAFETY_MODEL}`
## Using Docker Compose
You can use `docker compose` to start a Ollama server and connect with Llama Stack server in a single command.
### Docker: Start the Distribution (Single Node regular Desktop machine)
> [!NOTE]
> This will start an ollama server with CPU only, please see [Ollama Documentations](https://github.com/ollama/ollama) for serving models on CPU only.
```bash
$ cd distributions/ollama; docker compose up
```
### Docker: Start a Distribution (Single Node with nvidia GPUs)
> [!NOTE]
> This assumes you have access to GPU to start a Ollama server with access to your GPU.
```bash
$ cd distributions/ollama-gpu; docker compose up
```
You will see outputs similar to following ---
```bash
[ollama] | [GIN] 2024/10/18 - 21:19:41 | 200 | 226.841µs | ::1 | GET "/api/ps"
@ -71,7 +78,7 @@ ollama run <model_id>
```bash
llama stack build --template ollama --image-type conda
llama stack run ./gpu/run.yaml
llama stack run run.yaml
```
**Via Docker**