mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-17 14:59:48 +00:00
Convert ollama to the new model
This commit is contained in:
parent
028530546f
commit
a061f3f8c1
14 changed files with 379 additions and 113 deletions
|
|
@ -2,33 +2,40 @@
|
|||
|
||||
The `llamastack/distribution-ollama` distribution consists of the following provider configurations.
|
||||
|
||||
| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** |
|
||||
|----------------- |---------------- |---------------- |------------------------------------ |---------------- |---------------- |
|
||||
| **Provider(s)** | remote::ollama | meta-reference | remote::pgvector, remote::chromadb | meta-reference | meta-reference |
|
||||
Provider Configuration
|
||||
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||
┃ API ┃ Provider(s) ┃
|
||||
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ agents │ `inline::meta-reference` │
|
||||
│ inference │ `remote::ollama` │
|
||||
│ memory │ `inline::faiss`, `remote::chromadb`, `remote::pgvector` │
|
||||
│ safety │ `inline::llama-guard` │
|
||||
│ telemetry │ `inline::meta-reference` │
|
||||
└───────────┴─────────────────────────────────────────────────────────┘
|
||||
|
||||
|
||||
You should use this distribution if you have a regular desktop machine without very powerful GPUs. Of course, if you have powerful GPUs, you can still continue using this distribution since Ollama supports GPU acceleration.### Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)
|
||||
- `INFERENCE_MODEL`: Inference model loaded into the TGI server (default: `meta-llama/Llama-3.2-3B-Instruct`)
|
||||
- `OLLAMA_PORT`: Port of the Ollama server (default: `14343`)
|
||||
- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
|
||||
### Models
|
||||
|
||||
The following models are configured by default:
|
||||
- `${env.INFERENCE_MODEL}`
|
||||
- `${env.SAFETY_MODEL}`
|
||||
|
||||
## Using Docker Compose
|
||||
|
||||
You can use `docker compose` to start a Ollama server and connect with Llama Stack server in a single command.
|
||||
|
||||
### Docker: Start the Distribution (Single Node regular Desktop machine)
|
||||
|
||||
> [!NOTE]
|
||||
> This will start an ollama server with CPU only, please see [Ollama Documentations](https://github.com/ollama/ollama) for serving models on CPU only.
|
||||
|
||||
```bash
|
||||
$ cd distributions/ollama; docker compose up
|
||||
```
|
||||
|
||||
### Docker: Start a Distribution (Single Node with nvidia GPUs)
|
||||
|
||||
> [!NOTE]
|
||||
> This assumes you have access to GPU to start a Ollama server with access to your GPU.
|
||||
|
||||
```bash
|
||||
$ cd distributions/ollama-gpu; docker compose up
|
||||
```
|
||||
|
||||
You will see outputs similar to following ---
|
||||
```bash
|
||||
[ollama] | [GIN] 2024/10/18 - 21:19:41 | 200 | 226.841µs | ::1 | GET "/api/ps"
|
||||
|
|
@ -71,7 +78,7 @@ ollama run <model_id>
|
|||
|
||||
```bash
|
||||
llama stack build --template ollama --image-type conda
|
||||
llama stack run ./gpu/run.yaml
|
||||
llama stack run run.yaml
|
||||
```
|
||||
|
||||
**Via Docker**
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue