mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-12 13:00:39 +00:00
fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)
We tried to always keep Ollama enabled. However doing so makes the provider implementation half-assed -- should it error when it cannot connect to Ollama or not? What happens during periodic model refresh? Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to conditionally enable the provider. ## Test Plan Run `uv run llama stack build --template starter --image-type venv --run` with and without `OLLAMA_URL` set. Verify using `llama-stack-client provider list` that ollama is correctly enabled.
This commit is contained in:
parent
b69bafba30
commit
fd2aaf4978
6 changed files with 23 additions and 41 deletions
|
@ -150,7 +150,7 @@
|
||||||
"def run_llama_stack_server_background():\n",
|
"def run_llama_stack_server_background():\n",
|
||||||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||||||
" process = subprocess.Popen(\n",
|
" process = subprocess.Popen(\n",
|
||||||
" f\"uv run --with llama-stack llama stack run starter --image-type venv --env INFERENCE_MODEL=llama3.2:3b\",\n",
|
" f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter --image-type venv",
|
||||||
" shell=True,\n",
|
" shell=True,\n",
|
||||||
" stdout=log_file,\n",
|
" stdout=log_file,\n",
|
||||||
" stderr=log_file,\n",
|
" stderr=log_file,\n",
|
||||||
|
|
|
@ -100,10 +100,6 @@ The following environment variables can be configured:
|
||||||
### Model Configuration
|
### Model Configuration
|
||||||
- `INFERENCE_MODEL`: HuggingFace model for serverless inference
|
- `INFERENCE_MODEL`: HuggingFace model for serverless inference
|
||||||
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
|
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
|
||||||
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
|
|
||||||
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
|
|
||||||
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
|
|
||||||
- `VLLM_INFERENCE_MODEL`: vLLM model name
|
|
||||||
|
|
||||||
### Vector Database Configuration
|
### Vector Database Configuration
|
||||||
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
|
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
|
||||||
|
@ -127,43 +123,25 @@ The following environment variables can be configured:
|
||||||
|
|
||||||
## Enabling Providers
|
## Enabling Providers
|
||||||
|
|
||||||
You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
|
You can enable specific providers by setting appropriate environment variables. For example,
|
||||||
|
|
||||||
### Examples of Enabling Providers
|
|
||||||
|
|
||||||
#### Enable FAISS Vector Provider
|
|
||||||
```bash
|
```bash
|
||||||
export ENABLE_FAISS=faiss
|
# self-hosted
|
||||||
|
export OLLAMA_URL=http://localhost:11434 # enables the Ollama inference provider
|
||||||
|
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
|
||||||
|
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider
|
||||||
|
|
||||||
|
# cloud-hosted requiring API key configuration on the server
|
||||||
|
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
|
||||||
|
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider
|
||||||
|
|
||||||
|
# vector providers
|
||||||
|
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
|
||||||
|
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
|
||||||
|
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Enable Ollama Models
|
This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.
|
||||||
```bash
|
|
||||||
export ENABLE_OLLAMA=ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Disable vLLM Models
|
|
||||||
```bash
|
|
||||||
export VLLM_INFERENCE_MODEL=__disabled__
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Disable Optional Vector Providers
|
|
||||||
```bash
|
|
||||||
export ENABLE_SQLITE_VEC=__disabled__
|
|
||||||
export ENABLE_CHROMADB=__disabled__
|
|
||||||
export ENABLE_PGVECTOR=__disabled__
|
|
||||||
```
|
|
||||||
|
|
||||||
### Provider ID Patterns
|
|
||||||
|
|
||||||
The starter distribution uses several patterns for provider IDs:
|
|
||||||
|
|
||||||
1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
|
|
||||||
2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
|
|
||||||
3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
|
|
||||||
|
|
||||||
When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
|
|
||||||
|
|
||||||
When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
|
|
||||||
|
|
||||||
## Running the Distribution
|
## Running the Distribution
|
||||||
|
|
||||||
|
|
|
@ -16,9 +16,12 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
|
||||||
```bash
|
```bash
|
||||||
ollama run llama3.2:3b --keepalive 60m
|
ollama run llama3.2:3b --keepalive 60m
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Step 2: Run the Llama Stack server
|
#### Step 2: Run the Llama Stack server
|
||||||
|
|
||||||
We will use `uv` to run the Llama Stack server.
|
We will use `uv` to run the Llama Stack server.
|
||||||
```bash
|
```bash
|
||||||
|
OLLAMA_URL=http://localhost:11434 \
|
||||||
uv run --with llama-stack llama stack build --template starter --image-type venv --run
|
uv run --with llama-stack llama stack build --template starter --image-type venv --run
|
||||||
```
|
```
|
||||||
#### Step 3: Run the demo
|
#### Step 3: Run the demo
|
||||||
|
|
|
@ -19,7 +19,7 @@ providers:
|
||||||
config:
|
config:
|
||||||
base_url: https://api.cerebras.ai
|
base_url: https://api.cerebras.ai
|
||||||
api_key: ${env.CEREBRAS_API_KEY:=}
|
api_key: ${env.CEREBRAS_API_KEY:=}
|
||||||
- provider_id: ollama
|
- provider_id: ${env.OLLAMA_URL:+ollama}
|
||||||
provider_type: remote::ollama
|
provider_type: remote::ollama
|
||||||
config:
|
config:
|
||||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||||
|
|
|
@ -19,7 +19,7 @@ providers:
|
||||||
config:
|
config:
|
||||||
base_url: https://api.cerebras.ai
|
base_url: https://api.cerebras.ai
|
||||||
api_key: ${env.CEREBRAS_API_KEY:=}
|
api_key: ${env.CEREBRAS_API_KEY:=}
|
||||||
- provider_id: ollama
|
- provider_id: ${env.OLLAMA_URL:+ollama}
|
||||||
provider_type: remote::ollama
|
provider_type: remote::ollama
|
||||||
config:
|
config:
|
||||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||||
|
|
|
@ -66,6 +66,7 @@ ENABLED_INFERENCE_PROVIDERS = [
|
||||||
]
|
]
|
||||||
|
|
||||||
INFERENCE_PROVIDER_IDS = {
|
INFERENCE_PROVIDER_IDS = {
|
||||||
|
"ollama": "${env.OLLAMA_URL:+ollama}",
|
||||||
"vllm": "${env.VLLM_URL:+vllm}",
|
"vllm": "${env.VLLM_URL:+vllm}",
|
||||||
"tgi": "${env.TGI_URL:+tgi}",
|
"tgi": "${env.TGI_URL:+tgi}",
|
||||||
"cerebras": "${env.CEREBRAS_API_KEY:+cerebras}",
|
"cerebras": "${env.CEREBRAS_API_KEY:+cerebras}",
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue