mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-12 13:00:39 +00:00
fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)
We tried to always keep Ollama enabled. However doing so makes the provider implementation half-assed -- should it error when it cannot connect to Ollama or not? What happens during periodic model refresh? Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to conditionally enable the provider. ## Test Plan Run `uv run llama stack build --template starter --image-type venv --run` with and without `OLLAMA_URL` set. Verify using `llama-stack-client provider list` that ollama is correctly enabled.
This commit is contained in:
parent
b69bafba30
commit
fd2aaf4978
6 changed files with 23 additions and 41 deletions
|
@ -100,10 +100,6 @@ The following environment variables can be configured:
|
|||
### Model Configuration
|
||||
- `INFERENCE_MODEL`: HuggingFace model for serverless inference
|
||||
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
|
||||
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
|
||||
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
|
||||
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
|
||||
- `VLLM_INFERENCE_MODEL`: vLLM model name
|
||||
|
||||
### Vector Database Configuration
|
||||
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
|
||||
|
@ -127,43 +123,25 @@ The following environment variables can be configured:
|
|||
|
||||
## Enabling Providers
|
||||
|
||||
You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
|
||||
You can enable specific providers by setting appropriate environment variables. For example,
|
||||
|
||||
### Examples of Enabling Providers
|
||||
|
||||
#### Enable FAISS Vector Provider
|
||||
```bash
|
||||
export ENABLE_FAISS=faiss
|
||||
# self-hosted
|
||||
export OLLAMA_URL=http://localhost:11434 # enables the Ollama inference provider
|
||||
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
|
||||
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider
|
||||
|
||||
# cloud-hosted requiring API key configuration on the server
|
||||
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
|
||||
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider
|
||||
|
||||
# vector providers
|
||||
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
|
||||
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
|
||||
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
|
||||
```
|
||||
|
||||
#### Enable Ollama Models
|
||||
```bash
|
||||
export ENABLE_OLLAMA=ollama
|
||||
```
|
||||
|
||||
#### Disable vLLM Models
|
||||
```bash
|
||||
export VLLM_INFERENCE_MODEL=__disabled__
|
||||
```
|
||||
|
||||
#### Disable Optional Vector Providers
|
||||
```bash
|
||||
export ENABLE_SQLITE_VEC=__disabled__
|
||||
export ENABLE_CHROMADB=__disabled__
|
||||
export ENABLE_PGVECTOR=__disabled__
|
||||
```
|
||||
|
||||
### Provider ID Patterns
|
||||
|
||||
The starter distribution uses several patterns for provider IDs:
|
||||
|
||||
1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
|
||||
2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
|
||||
3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
|
||||
|
||||
When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
|
||||
|
||||
When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
|
||||
This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.
|
||||
|
||||
## Running the Distribution
|
||||
|
||||
|
|
|
@ -16,10 +16,13 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
|
|||
```bash
|
||||
ollama run llama3.2:3b --keepalive 60m
|
||||
```
|
||||
|
||||
#### Step 2: Run the Llama Stack server
|
||||
|
||||
We will use `uv` to run the Llama Stack server.
|
||||
```bash
|
||||
uv run --with llama-stack llama stack build --template starter --image-type venv --run
|
||||
OLLAMA_URL=http://localhost:11434 \
|
||||
uv run --with llama-stack llama stack build --template starter --image-type venv --run
|
||||
```
|
||||
#### Step 3: Run the demo
|
||||
Now open up a new terminal and copy the following script into a file named `demo_script.py`.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue