Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
Sébastien Han 2025-06-27 10:13:55 +02:00
parent bedfea38c3
commit 6b616cc780
No known key found for this signature in database
3 changed files with 34 additions and 140 deletions

View file

@ -15,7 +15,7 @@ The `llamastack/distribution-starter` distribution is a comprehensive, multi-pro
## Provider Composition ## Provider Composition
The starter distribution consists of the following provider configurations: The starter distribution consists of the following configurations:
| API | Provider(s) | | API | Provider(s) |
|-----|-------------| |-----|-------------|
@ -23,8 +23,9 @@ The starter distribution consists of the following provider configurations:
| datasetio | `remote::huggingface`, `inline::localfs` | | datasetio | `remote::huggingface`, `inline::localfs` |
| eval | `inline::meta-reference` | | eval | `inline::meta-reference` |
| files | `inline::localfs` | | files | `inline::localfs` |
| inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers` | | inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers`, `remote::passthrough` |
| safety | `inline::llama-guard` | | safety | `inline::llama-guard` |
| post_training | `inline::huggingface` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` | | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| telemetry | `inline::meta-reference` | | telemetry | `inline::meta-reference` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` | | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
@ -34,8 +35,8 @@ The starter distribution consists of the following provider configurations:
The starter distribution includes a comprehensive set of inference providers: The starter distribution includes a comprehensive set of inference providers:
### Hosted Providers - **OpenAI**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings - point to the relevant provider
- **OpenAI**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings configuration documentation for more details
- **Fireworks**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - **Fireworks**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
- **Together**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - **Together**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
- **Anthropic**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - **Anthropic**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings
@ -46,114 +47,40 @@ The starter distribution includes a comprehensive set of inference providers:
- **NVIDIA**: NVIDIA NIM models - **NVIDIA**: NVIDIA NIM models
- **HuggingFace**: Serverless and endpoint models - **HuggingFace**: Serverless and endpoint models
- **Bedrock**: AWS Bedrock models - **Bedrock**: AWS Bedrock models
- **Passthrough**: Passthrough provider - use this to connect to any other inference provider that is not supported by Llama Stack
### Local/Remote Providers
- **Ollama**: Local Ollama models - **Ollama**: Local Ollama models
- **vLLM**: Local or remote vLLM server - **vLLM**: remote vLLM server
- **TGI**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`) - **TGI**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`)
- **Sentence Transformers**: Local embedding models - **Sentence Transformers**: Local embedding models
All providers are disabled by default. So you need to enable them by setting the environment variables. All providers are **disabled** by default. So you need to enable them by setting the environment
variables. See [Enabling Providers](#enabling-providers) for more details.
## Environment Variables ## Vector Providers
The following environment variables can be configured: The starter distribution includes a comprehensive set of vector providers:
### Server Configuration - **FAISS**: Local FAISS vector store - enabled by default
- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`) - **SQLite**: Local SQLite vector store - disabled by default
- **ChromaDB**: Remote ChromaDB server - disabled by default
### API Keys for Hosted Providers - **PGVector**: Remote PGVector server - disabled by default
- `OPENAI_API_KEY`: OpenAI API key
- `FIREWORKS_API_KEY`: Fireworks API key
- `TOGETHER_API_KEY`: Together API key
- `ANTHROPIC_API_KEY`: Anthropic API key
- `GEMINI_API_KEY`: Google Gemini API key
- `GROQ_API_KEY`: Groq API key
- `SAMBANOVA_API_KEY`: SambaNova API key
- `CEREBRAS_API_KEY`: Cerebras API key
- `LLAMA_API_KEY`: Llama API key
- `NVIDIA_API_KEY`: NVIDIA API key
- `HF_API_TOKEN`: HuggingFace API token
### Local Provider Configuration
- `OLLAMA_URL`: Ollama server URL (default: `http://localhost:11434`)
- `VLLM_URL`: vLLM server URL (default: `http://localhost:8000/v1`)
- `VLLM_MAX_TOKENS`: vLLM max tokens (default: `4096`)
- `VLLM_API_TOKEN`: vLLM API token (default: `fake`)
- `VLLM_TLS_VERIFY`: vLLM TLS verification (default: `true`)
- `TGI_URL`: TGI server URL
### Model Configuration
- `INFERENCE_MODEL`: HuggingFace model for serverless inference
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
- `VLLM_INFERENCE_MODEL`: vLLM model name
### Vector Database Configuration
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
- `ENABLE_SQLITE_VEC`: Enable SQLite vector provider
- `ENABLE_CHROMADB`: Enable ChromaDB provider
- `ENABLE_PGVECTOR`: Enable PGVector provider
- `CHROMADB_URL`: ChromaDB server URL
- `PGVECTOR_HOST`: PGVector host (default: `localhost`)
- `PGVECTOR_PORT`: PGVector port (default: `5432`)
- `PGVECTOR_DB`: PGVector database name
- `PGVECTOR_USER`: PGVector username
- `PGVECTOR_PASSWORD`: PGVector password
### Tool Configuration
- `BRAVE_SEARCH_API_KEY`: Brave Search API key
- `TAVILY_SEARCH_API_KEY`: Tavily Search API key
### Telemetry Configuration
- `OTEL_SERVICE_NAME`: OpenTelemetry service name
- `TELEMETRY_SINKS`: Telemetry sinks (default: `console,sqlite`)
## Enabling Providers ## Enabling Providers
You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys. You can enable specific providers by setting their provider ID to a string value using environment
variables.
### Examples of Enabling Providers For instance, to enable the Ollama provider, you can set the `ENABLE_OLLAMA` environment variable to `ollama`.
#### Enable FAISS Vector Provider
```bash
export ENABLE_FAISS=faiss
```
#### Enable Ollama Models
```bash ```bash
export ENABLE_OLLAMA=ollama export ENABLE_OLLAMA=ollama
``` ```
#### Disable vLLM Models To disable a provider, you can set the environment variable to `ENABLE_OLLAMA=__disabled__`.
```bash
export VLLM_INFERENCE_MODEL=__disabled__
```
#### Disable Optional Vector Providers
```bash
export ENABLE_SQLITE_VEC=__disabled__
export ENABLE_CHROMADB=__disabled__
export ENABLE_PGVECTOR=__disabled__
```
### Provider ID Patterns
The starter distribution uses several patterns for provider IDs:
1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC+sqlite-vec}`
3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
## Running the Distribution ## Running the Distribution
You can run the starter distribution via Docker or Conda. You can run the starter distribution via Docker or directly using the Llama Stack CLI.
### Via Docker ### Via Docker
@ -165,57 +92,19 @@ docker run \
-it \ -it \
--pull always \ --pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-e OPENAI_API_KEY=your_openai_key \ -e ENABLE_OLLAMA=ollama \
-e FIREWORKS_API_KEY=your_fireworks_key \ -e OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
-e TOGETHER_API_KEY=your_together_key \
llamastack/distribution-starter \ llamastack/distribution-starter \
--port $LLAMA_STACK_PORT --port $LLAMA_STACK_PORT
``` ```
### Via Conda You can also use the `llama stack run` command to run the distribution.
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
```bash ```bash
llama stack build --template starter --image-type conda
llama stack run distributions/starter/run.yaml \ llama stack run distributions/starter/run.yaml \
--port 8321 \ --port 8321 \
--env OPENAI_API_KEY=your_openai_key \ --env ENABLE_OLLAMA=ollama \
--env FIREWORKS_API_KEY=your_fireworks_key \ --env OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--env TOGETHER_API_KEY=your_together_key
```
## Example Usage
Once the distribution is running, you can use any of the available models. Here are some examples:
### Using OpenAI Models
```bash
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id openai/gpt-4o \
--message "Hello, how are you?"
```
### Using Fireworks Models
```bash
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id fireworks/meta-llama/Llama-3.2-3B-Instruct \
--message "Write a short story about a robot."
```
### Using Local Ollama Models
```bash
# First, make sure Ollama is running and you have a model
ollama run llama3.2:3b
# Then use it through Llama Stack
export OLLAMA_INFERENCE_MODEL=llama3.2:3b
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id ollama/llama3.2:3b \
--message "Explain quantum computing in simple terms."
``` ```
## Storage ## Storage

View file

@ -106,11 +106,7 @@ providers:
type: sqlite type: sqlite
namespace: null namespace: null
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/faiss_store.db db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/faiss_store.db
<<<<<<< HEAD
- provider_id: ${env.ENABLE_SQLITE_VEC:+sqlite-vec}
=======
- provider_id: ${env.ENABLE_SQLITE_VEC:=__disabled__} - provider_id: ${env.ENABLE_SQLITE_VEC:=__disabled__}
>>>>>>> fbcc565e (feat: consolidate most distros into "starter")
provider_type: inline::sqlite-vec provider_type: inline::sqlite-vec
config: config:
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db
@ -597,6 +593,11 @@ models:
provider_id: ${env.ENABLE_OLLAMA:=__disabled__} provider_id: ${env.ENABLE_OLLAMA:=__disabled__}
provider_model_id: ${env.OLLAMA_EMBEDDING_MODEL:=__disabled__} provider_model_id: ${env.OLLAMA_EMBEDDING_MODEL:=__disabled__}
model_type: embedding model_type: embedding
- metadata: {}
model_id: ${env.ENABLE_OLLAMA:=__disabled__}/${env.OLLAMA_SAFETY_MODEL:=__disabled__}
provider_id: ${env.ENABLE_OLLAMA:=__disabled__}
provider_model_id: ${env.OLLAMA_SAFETY_MODEL:=__disabled__}
model_type: llm
- metadata: {} - metadata: {}
model_id: ${env.ENABLE_ANTHROPIC:=__disabled__}/anthropic/claude-3-5-sonnet-latest model_id: ${env.ENABLE_ANTHROPIC:=__disabled__}/anthropic/claude-3-5-sonnet-latest
provider_id: ${env.ENABLE_ANTHROPIC:=__disabled__} provider_id: ${env.ENABLE_ANTHROPIC:=__disabled__}

View file

@ -110,6 +110,10 @@ def get_inference_providers() -> tuple[list[Provider], dict[str, list[ProviderMo
"embedding_dimension": "${env.OLLAMA_EMBEDDING_DIMENSION:=384}", "embedding_dimension": "${env.OLLAMA_EMBEDDING_DIMENSION:=384}",
}, },
), ),
ProviderModelEntry(
provider_model_id="${env.OLLAMA_SAFETY_MODEL:=__disabled__}",
model_type=ModelType.llm,
),
], ],
OllamaImplConfig.sample_run_config( OllamaImplConfig.sample_run_config(
url="${env.OLLAMA_URL:=http://localhost:11434}", raise_on_connect_error=False url="${env.OLLAMA_URL:=http://localhost:11434}", raise_on_connect_error=False