llama-stack-mirror/docs/docs/providers/prompts/remote_mlflow.mdx
William Caban 0e0d311dea feat: Add MLflow Prompt Registry provider (squashed commit)
Add a new remote provider that integrates MLflow's Prompt Registry with
Llama Stack's prompts API, enabling centralized prompt management and
versioning using MLflow as the backend.

Features:
- Full implementation of Llama Stack Prompts protocol
- Support for prompt versioning and default version management
- Automatic variable extraction from Jinja2-style templates
- MLflow tag-based metadata for efficient prompt filtering
- Flexible authentication (config, environment variables, per-request)
- Bidirectional ID mapping (pmpt_<hex> ↔ llama_prompt_<hex>)
- Comprehensive error handling and validation

Implementation:
- Remote provider: src/llama_stack/providers/remote/prompts/mlflow/
- Inline reference provider: src/llama_stack/providers/inline/prompts/reference/
- MLflow 3.4+ required for Prompt Registry API support
- Deterministic ID mapping ensures consistency across conversions

Testing:
- 15 comprehensive unit tests (config validation, ID mapping)
- 18 end-to-end integration tests (full CRUD workflows)
- GitHub Actions workflow for automated CI testing with MLflow server
- Integration test fixtures with automatic server setup

Documentation:
- Complete provider configuration reference
- Setup and usage examples with code samples
- Authentication options and security best practices

Signed-off-by: William Caban <william.caban@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 09:42:50 -05:00

751 lines
20 KiB
Text

---
description: |
[MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
sidebar_label: Remote - MLflow
title: remote::mlflow
---
# remote::mlflow
## Description
[MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
## Features
MLflow Prompts Provider supports:
- Create and store prompts with automatic versioning
- Retrieve prompts by ID and version
- Update prompts (creates new immutable versions)
- List all prompts or all versions of a specific prompt
- Set default version for a prompt
- Automatic variable extraction from templates
- Metadata storage and retrieval
- Centralized prompt management across teams
## Key Capabilities
- **Version Control**: Immutable versioning ensures prompt history is preserved
- **Default Version Management**: Easily switch between prompt versions
- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
- **Metadata Tags**: Stores Llama Stack metadata for seamless integration
- **Team Collaboration**: Centralized MLflow server enables multi-user access
## Usage
To use MLflow Prompts Provider in your Llama Stack project:
1. Install MLflow 3.4 or later
2. Start an MLflow server (local or remote)
3. Configure your Llama Stack project to use the MLflow provider
4. Start creating and managing prompts
## Installation
Install MLflow using pip or uv:
```bash
pip install 'mlflow>=3.4.0'
# or
uv pip install 'mlflow>=3.4.0'
```
## Quick Start
### 1. Start MLflow Server
**Local server** (for development):
```bash
mlflow server --host 127.0.0.1 --port 5555
```
**Remote server** (for production):
```bash
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri postgresql://user:pass@host/db
```
### 2. Configure Llama Stack
Add to your Llama Stack configuration:
```yaml
prompts:
- provider_id: mlflow-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://localhost:5555
experiment_name: llama-stack-prompts
```
### 3. Use the Prompts API
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:5000")
# Create a prompt
prompt = client.prompts.create(
prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
variables=["num_sentences", "text"]
)
print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
# Retrieve prompt
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
print(f"Retrieved: {retrieved.prompt}")
# Update prompt (creates version 2)
updated = client.prompts.update(
prompt_id=prompt.prompt_id,
prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
version=1,
set_as_default=True
)
print(f"Updated to version: {updated.version}")
# List all prompts
prompts = client.prompts.list()
print(f"Found {len(prompts.data)} prompts")
```
## Configuration Examples
### Local Development
For local development with filesystem storage:
```yaml
prompts:
- provider_id: mlflow-local
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://localhost:5555
experiment_name: dev-prompts
timeout_seconds: 30
```
### Remote MLflow Server
For production with a remote MLflow server:
```yaml
prompts:
- provider_id: mlflow-production
provider_type: remote::mlflow
config:
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI}
experiment_name: production-prompts
timeout_seconds: 60
```
### Advanced Configuration
With custom settings:
```yaml
prompts:
- provider_id: mlflow-custom
provider_type: remote::mlflow
config:
mlflow_tracking_uri: https://mlflow.example.com
experiment_name: team-prompts
timeout_seconds: 45
```
## Authentication
The MLflow provider supports three authentication methods with the following precedence (highest to lowest):
1. **Per-Request Provider Data** (via headers)
2. **Configuration Auth Credential** (in config file)
3. **Environment Variables** (MLflow defaults)
### Method 1: Per-Request Provider Data (Recommended for Multi-Tenant)
For multi-tenant deployments where each user has their own credentials:
**Configuration**:
```yaml
prompts:
- provider_id: mlflow-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://mlflow.company.com
experiment_name: production-prompts
# No auth_credential - use per-request tokens
```
**Client Usage**:
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:5000")
# User 1 with their own token
prompts_user1 = client.prompts.list(
extra_headers={
"x-llamastack-provider-data": '{"mlflow_api_token": "user1-token"}'
}
)
# User 2 with their own token
prompts_user2 = client.prompts.list(
extra_headers={
"x-llamastack-provider-data": '{"mlflow_api_token": "user2-token"}'
}
)
```
**Benefits**:
- Per-user authentication and authorization
- No shared credentials
- Ideal for SaaS deployments
- Supports user-specific MLflow experiments
### Method 2: Configuration Auth Credential (Server-Level)
For server-level authentication where all requests use the same credentials:
**Using Environment Variable** (recommended):
```yaml
prompts:
- provider_id: mlflow-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://mlflow.company.com
experiment_name: production-prompts
auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
```
**Using Direct Value** (not recommended for production):
```yaml
prompts:
- provider_id: mlflow-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://mlflow.company.com
experiment_name: production-prompts
auth_credential: "mlflow-server-token"
```
**Client Usage**:
```python
# No extra headers needed - server handles authentication
client = LlamaStackClient(base_url="http://localhost:5000")
prompts = client.prompts.list()
```
**Benefits**:
- Simple configuration
- Single point of credential management
- Good for single-tenant deployments
### Method 3: Environment Variables (MLflow Default)
MLflow reads standard environment variables automatically:
**Set before starting Llama Stack**:
```bash
export MLFLOW_TRACKING_TOKEN="your-token"
export MLFLOW_TRACKING_USERNAME="user" # Optional: Basic auth
export MLFLOW_TRACKING_PASSWORD="pass" # Optional: Basic auth
llama stack run my-config.yaml
```
**Configuration** (no auth_credential needed):
```yaml
prompts:
- provider_id: mlflow-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://mlflow.company.com
experiment_name: production-prompts
```
**Benefits**:
- Standard MLflow behavior
- No configuration changes needed
- Good for containerized deployments
### Databricks Authentication
For Databricks-managed MLflow:
**Configuration**:
```yaml
prompts:
- provider_id: databricks-prompts
provider_type: remote::mlflow
config:
mlflow_tracking_uri: databricks
# Or with workspace URL:
# mlflow_tracking_uri: databricks://profile-name
experiment_name: /Shared/llama-stack-prompts
auth_credential: ${env.DATABRICKS_TOKEN}
```
**Environment Setup**:
```bash
export DATABRICKS_TOKEN="dapi..."
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
```
**Client Usage**:
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:5000")
# Create prompt in Databricks MLflow
prompt = client.prompts.create(
prompt="Analyze {{ topic }} with focus on {{ aspect }}",
variables=["topic", "aspect"]
)
# View in Databricks UI:
# https://workspace.cloud.databricks.com/#mlflow/experiments/<experiment-id>
```
### Enterprise MLflow with Authentication
Example for enterprise MLflow server with API key authentication:
**Configuration**:
```yaml
prompts:
- provider_id: enterprise-mlflow
provider_type: remote::mlflow
config:
mlflow_tracking_uri: https://mlflow.enterprise.com
experiment_name: production-prompts
auth_credential: ${env.MLFLOW_API_KEY}
timeout_seconds: 60
```
**Client Usage**:
```python
from llama_stack_client import LlamaStackClient
# Option A: Use server's configured credential
client = LlamaStackClient(base_url="http://localhost:5000")
prompt = client.prompts.create(
prompt="Classify sentiment: {{ text }}",
variables=["text"]
)
# Option B: Override with per-request credential
prompt = client.prompts.create(
prompt="Classify sentiment: {{ text }}",
variables=["text"],
extra_headers={
"x-llamastack-provider-data": '{"mlflow_api_token": "user-specific-key"}'
}
)
```
### Authentication Precedence
When multiple authentication methods are configured, the provider uses this precedence:
1. **Per-request provider data** (from `x-llamastack-provider-data` header)
- Highest priority
- Overrides all other methods
- Used for multi-tenant scenarios
2. **Configuration auth_credential** (from config file)
- Medium priority
- Fallback if no provider data header
- Good for server-level auth
3. **Environment variables** (MLflow standard)
- Lowest priority
- Used if no other credentials provided
- Standard MLflow behavior
**Example showing precedence**:
```yaml
# Config file
prompts:
- provider_id: mlflow
provider_type: remote::mlflow
config:
mlflow_tracking_uri: http://mlflow.company.com
auth_credential: ${env.MLFLOW_TRACKING_TOKEN} # Fallback
```
```bash
# Environment variable
export MLFLOW_TRACKING_TOKEN="server-token" # Lowest priority
```
```python
# Client code
client.prompts.create(
prompt="Test",
extra_headers={
# This takes precedence over config and env vars
"x-llamastack-provider-data": '{"mlflow_api_token": "user-token"}'
}
)
```
### Security Best Practices
1. **Never hardcode tokens** in configuration files:
```yaml
# Bad - hardcoded credential
auth_credential: "my-secret-token"
# Good - use environment variable
auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
```
2. **Use per-request credentials** for multi-tenant deployments:
```python
# Good - each user provides their own token
headers = {
"x-llamastack-provider-data": f'{{"mlflow_api_token": "{user_token}"}}'
}
client.prompts.list(extra_headers=headers)
```
3. **Rotate credentials regularly** in production environments
4. **Use HTTPS** for MLflow tracking URI in production:
```yaml
mlflow_tracking_uri: https://mlflow.company.com # Good
# Not: http://mlflow.company.com # Bad for production
```
5. **Store secrets in secure vaults** (AWS Secrets Manager, HashiCorp Vault, etc.)
## API Reference
### Create Prompt
Creates a new prompt (version 1) or registers a prompt in MLflow:
```python
prompt = client.prompts.create(
prompt="You are a {{ role }} assistant. {{ instruction }}",
variables=["role", "instruction"] # Optional - auto-extracted if omitted
)
```
**Auto-extraction**: If `variables` is not provided, the provider automatically extracts variables from `{{ variable }}` placeholders.
### Retrieve Prompt
Get a prompt by ID (retrieves default version):
```python
prompt = client.prompts.get(prompt_id="pmpt_abc123...")
```
Get a specific version:
```python
prompt = client.prompts.get(prompt_id="pmpt_abc123...", version=2)
```
### Update Prompt
Creates a new version of an existing prompt:
```python
updated = client.prompts.update(
prompt_id="pmpt_abc123...",
prompt="Updated template with {{ variable }}",
version=1, # Must be the latest version
set_as_default=True # Make this the new default
)
```
**Important**: You must provide the current latest version number. The update creates a new version (e.g., version 2).
### List Prompts
List all prompts (returns default versions only):
```python
response = client.prompts.list()
for prompt in response.data:
print(f"{prompt.prompt_id}: v{prompt.version} (default)")
```
### List Prompt Versions
List all versions of a specific prompt:
```python
response = client.prompts.list_versions(prompt_id="pmpt_abc123...")
for prompt in response.data:
default = " (default)" if prompt.is_default else ""
print(f"Version {prompt.version}{default}")
```
### Set Default Version
Change which version is the default:
```python
client.prompts.set_default_version(
prompt_id="pmpt_abc123...",
version=2
)
```
## ID Mapping
The MLflow provider uses deterministic bidirectional ID mapping:
- **Llama Stack format**: `pmpt_<48-hex-chars>`
- **MLflow format**: `llama_prompt_<48-hex-chars>`
Example:
- Llama Stack ID: `pmpt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
- MLflow name: `llama_prompt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
This ensures prompts created through Llama Stack are easily identifiable in MLflow.
## Version Management
MLflow Prompts Provider implements immutable versioning:
1. **Create**: Creates version 1
2. **Update**: Creates a new version (2, 3, 4, ...)
3. **Default**: The "default" alias points to the current default version
4. **History**: All versions are preserved and retrievable
```
pmpt_abc123
├── Version 1 (Original)
├── Version 2 (Updated)
└── Version 3 (Latest, Default) ← Default alias points here
```
## Troubleshooting
### MLflow Server Not Available
**Error**: `Failed to connect to MLflow server`
**Solutions**:
1. Verify MLflow server is running: `curl http://localhost:5555/health`
2. Check `mlflow_tracking_uri` in configuration
3. Ensure network connectivity to remote server
4. Check firewall settings
### Version Mismatch Error
**Error**: `Version X is not the latest version. Use latest version Y to update.`
**Cause**: Attempting to update an outdated version
**Solution**: Always use the latest version number when updating:
```python
# Get latest version
versions = client.prompts.list_versions(prompt_id)
latest_version = max(v.version for v in versions.data)
# Use latest version for update
client.prompts.update(prompt_id=prompt_id, version=latest_version, ...)
```
### Variable Validation Error
**Error**: `Template contains undeclared variables: ['var2']`
**Cause**: Template has `{{ var2 }}` but `variables` list doesn't include it
**Solution**: Either add missing variable or let the provider auto-extract:
```python
# Option 1: Add missing variable
client.prompts.create(
prompt="Template with {{ var1 }} and {{ var2 }}",
variables=["var1", "var2"]
)
# Option 2: Let provider auto-extract (recommended)
client.prompts.create(
prompt="Template with {{ var1 }} and {{ var2 }}"
)
```
### Timeout Errors
**Error**: Connection timeout when communicating with MLflow
**Solutions**:
1. Increase `timeout_seconds` in configuration:
```yaml
config:
timeout_seconds: 60 # Default: 30
```
2. Check network latency to MLflow server
3. Verify MLflow server is responsive
### Prompt Not Found
**Error**: `Prompt pmpt_abc123... not found`
**Possible causes**:
1. Prompt ID is incorrect
2. Prompt was created in a different MLflow server/experiment
3. Experiment name mismatch in configuration
**Solution**: Verify prompt exists in MLflow UI at `http://localhost:5555`
## Limitations
### No Deletion Support
**MLflow does not support deleting prompts or versions**. The `delete_prompt()` method raises `NotImplementedError`.
**Workaround**: Mark prompts as deprecated using naming conventions or set a different version as default.
### Experiment Required
All prompts are stored within an MLflow experiment. The experiment is created automatically if it doesn't exist.
### ID Format Constraints
- Prompt IDs must follow the format: `pmpt_<48-hex-chars>`
- MLflow names use the prefix: `llama_prompt_`
- Manual creation in MLflow with different names won't be recognized
### Version Numbering
- Versions are sequential integers (1, 2, 3, ...)
- You cannot skip version numbers
- You cannot manually set version numbers
## Best Practices
### 1. Use Environment Variables
Store MLflow URIs in environment variables:
```yaml
config:
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5555}
```
### 2. Auto-Extract Variables
Let the provider auto-extract variables to avoid validation errors:
```python
# Recommended
prompt = client.prompts.create(
prompt="Summarize {{ text }} in {{ format }}"
)
```
### 3. Organize by Experiment
Use different experiments for different environments:
- `dev-prompts` for development
- `staging-prompts` for staging
- `production-prompts` for production
### 4. Version Management
- Always retrieve latest version before updating
- Use `set_as_default=True` when updating to make new version active
- Keep version history for audit trail
### 5. Use Meaningful Templates
Include context in your templates:
```python
# Good
prompt = """You are a {{ role }} assistant specialized in {{ domain }}.
Task: {{ task }}
Output format: {{ format }}"""
# Less clear
prompt = "Do {{ task }} as {{ role }}"
```
### 6. Monitor MLflow Server
- Use MLflow UI to visualize prompts: `http://your-server:5555`
- Monitor experiment metrics and prompt versions
- Set up alerts for MLflow server health
## Production Deployment
### Database Backend
For production, use a database backend instead of filesystem:
```bash
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--backend-store-uri postgresql://user:pass@host:5432/mlflow \
--default-artifact-root s3://my-bucket/mlflow-artifacts
```
### High Availability
- Deploy multiple MLflow server instances behind a load balancer
- Use managed database (RDS, Cloud SQL, etc.)
- Store artifacts in object storage (S3, GCS, Azure Blob)
### Security
- Enable authentication on MLflow server
- Use HTTPS for MLflow tracking URI
- Restrict network access with firewall rules
- Use IAM roles for cloud deployments
### Monitoring
Set up monitoring for:
- MLflow server availability
- Database connection pool
- API response times
- Prompt creation/retrieval rates
## Documentation
See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `mlflow_tracking_uri` | `str` | No | http://localhost:5000 | MLflow tracking server URI |
| `mlflow_registry_uri` | `str \| None` | No | None | MLflow model registry URI (defaults to tracking URI if not set) |
| `experiment_name` | `str` | No | llama-stack-prompts | MLflow experiment name for storing prompts |
| `auth_credential` | `SecretStr \| None` | No | None | MLflow API token for authentication. Can be overridden via provider data header. |
| `timeout_seconds` | `int` | No | 30 | Timeout for MLflow API calls (1-300 seconds) |
## Sample Configuration
**Without authentication** (local development):
```yaml
mlflow_tracking_uri: http://localhost:5555
experiment_name: llama-stack-prompts
timeout_seconds: 30
```
**With authentication** (production):
```yaml
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5000}
experiment_name: llama-stack-prompts
auth_credential: ${env.MLFLOW_TRACKING_TOKEN:=}
timeout_seconds: 30
```