Merge 5c4da04f29 into 4237eb4aaa

2025-12-03 09:53:45 +00:00 · 2025-12-03 01:04:08 +00:00 · 2025-12-03 01:04:08 +00:00 · d5836c3b5a
commit d5836c3b5a
parent 4237eb4aaa 5c4da04f29
24 changed files with 3594 additions and 225 deletions
--- a/docs/docs/providers/prompts/index.mdx
+++ b/docs/docs/providers/prompts/index.mdx
@ -0,0 +1,92 @@
+---
+sidebar_label: Prompts
+title: Prompts
+---
+
+# Prompts
+
+## Overview
+
+This section contains documentation for all available providers for the **prompts** API.
+
+The Prompts API enables centralized management of prompt templates with versioning, variable handling, and team collaboration capabilities.
+
+## Available Providers
+
+### Inline Providers
+
+Inline providers run in the same process as the Llama Stack server and require no external dependencies:
+
+- **[inline::reference](inline_reference.mdx)** - Reference implementation using KVStore backend (SQLite, PostgreSQL, etc.)
+  - Zero external dependencies
+  - Supports local SQLite or PostgreSQL storage
+  - Full CRUD operations including deletion
+  - Ideal for local development and single-server deployments
+
+### Remote Providers
+
+Remote providers connect to external services for centralized prompt management:
+
+- **[remote::mlflow](remote_mlflow.mdx)** - MLflow Prompt Registry integration (requires MLflow 3.4+)
+  - Centralized prompt management across teams
+  - Built-in versioning and audit trail
+  - Supports authentication (per-request, config, or environment variables)
+  - Integrates with Databricks and enterprise MLflow deployments
+  - Ideal for team collaboration and production environments
+
+## Choosing a Provider
+
+### Use `inline::reference` when:
+- Developing locally or deploying to a single server
+- You want zero external dependencies
+- SQLite or PostgreSQL storage is sufficient
+- You need full CRUD operations (including deletion)
+- You prefer simple configuration
+
+### Use `remote::mlflow` when:
+- Working in a team environment with multiple users
+- You need centralized prompt management
+- Integration with existing MLflow infrastructure
+- You need authentication and multi-tenant support
+- Advanced versioning and audit trail capabilities are required
+
+## Quick Start Examples
+
+### Using inline::reference
+
+```yaml
+prompts:
+  - provider_id: local-prompts
+    provider_type: inline::reference
+    config:
+      run_config:
+        storage:
+          stores:
+            prompts:
+              type: sqlite
+              db_path: ./prompts.db
+```
+
+### Using remote::mlflow
+
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://localhost:5555
+      experiment_name: llama-stack-prompts
+      auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
+```
+
+## Common Features
+
+All prompt providers support:
+- Create and store prompts with version control
+- Retrieve prompts by ID and version
+- Update prompts (creates new versions)
+- List all prompts or versions of a specific prompt
+- Set default version for a prompt
+- Automatic variable extraction from `{{ variable }}` templates
+
+For detailed documentation on each provider, see the individual provider pages linked above.
--- a/docs/docs/providers/prompts/inline_reference.mdx
+++ b/docs/docs/providers/prompts/inline_reference.mdx
@ -0,0 +1,496 @@
+---
+description: |
+  Reference implementation of the Prompts API using KVStore backend (SQLite, PostgreSQL, etc.)
+  for centralized prompt management with versioning support. This is the default provider for
+  prompts that works without external dependencies.
+
+  ## Features
+  The Reference Prompts Provider supports:
+  - Create and store prompts with automatic versioning
+  - Retrieve prompts by ID and version
+  - Update prompts (creates new immutable versions)
+  - Delete prompts and their versions
+  - List all prompts or all versions of a specific prompt
+  - Set default version for a prompt
+  - Automatic variable extraction from templates
+  - Storage in SQLite, PostgreSQL, or other KVStore backends
+
+  ## Key Capabilities
+  - **Zero Dependencies**: No external services required, runs in-process
+  - **Flexible Storage**: Supports SQLite (default), PostgreSQL, and other KVStore backends
+  - **Version Control**: Immutable versioning ensures prompt history is preserved
+  - **Default Version Management**: Easily switch between prompt versions
+  - **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
+  - **Full CRUD Support**: Unlike remote providers, supports deletion of prompts
+
+  ## Usage
+
+  To use Reference Prompts Provider in your Llama Stack project:
+
+  1. Configure your Llama Stack project with the inline::reference provider
+  2. Optionally configure storage backend (defaults to SQLite)
+  3. Start creating and managing prompts
+
+  ## Quick Start
+
+  ### 1. Configure Llama Stack
+
+  **Basic configuration with SQLite** (default):
+
+  ```yaml
+  prompts:
+    - provider_id: reference-prompts
+      provider_type: inline::reference
+      config:
+        run_config:
+          storage:
+            stores:
+              prompts:
+                type: sqlite
+                db_path: ./prompts.db
+  ```
+
+  **With PostgreSQL**:
+
+  ```yaml
+  prompts:
+    - provider_id: postgres-prompts
+      provider_type: inline::reference
+      config:
+        run_config:
+          storage:
+            stores:
+              prompts:
+                type: postgres
+                url: postgresql://user:pass@localhost/llama_stack
+  ```
+
+  ### 2. Use the Prompts API
+
+  ```python
+  from llama_stack_client import LlamaStackClient
+
+  client = LlamaStackClient(base_url="http://localhost:5000")
+
+  # Create a prompt
+  prompt = client.prompts.create(
+      prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
+      variables=["num_sentences", "text"]
+  )
+  print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
+
+  # Retrieve prompt
+  retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
+  print(f"Retrieved: {retrieved.prompt}")
+
+  # Update prompt (creates version 2)
+  updated = client.prompts.update(
+      prompt_id=prompt.prompt_id,
+      prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
+      version=1,
+      set_as_default=True
+  )
+  print(f"Updated to version: {updated.version}")
+
+  # List all prompts
+  prompts = client.prompts.list()
+  print(f"Found {len(prompts.data)} prompts")
+
+  # Delete prompt
+  client.prompts.delete(prompt_id=prompt.prompt_id)
+  ```
+
+sidebar_label: Inline - Reference
+title: inline::reference
+---
+
+# inline::reference
+
+## Description
+
+Reference implementation of the Prompts API using KVStore backend (SQLite, PostgreSQL, etc.)
+for centralized prompt management with versioning support. This is the default provider for
+prompts that works without external dependencies.
+
+## Features
+The Reference Prompts Provider supports:
+- Create and store prompts with automatic versioning
+- Retrieve prompts by ID and version
+- Update prompts (creates new immutable versions)
+- Delete prompts and their versions
+- List all prompts or all versions of a specific prompt
+- Set default version for a prompt
+- Automatic variable extraction from templates
+- Storage in SQLite, PostgreSQL, or other KVStore backends
+
+## Key Capabilities
+- **Zero Dependencies**: No external services required, runs in-process
+- **Flexible Storage**: Supports SQLite (default), PostgreSQL, and other KVStore backends
+- **Version Control**: Immutable versioning ensures prompt history is preserved
+- **Default Version Management**: Easily switch between prompt versions
+- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
+- **Full CRUD Support**: Unlike remote providers, supports deletion of prompts
+
+## Configuration Examples
+
+### SQLite (Local Development)
+
+For local development with filesystem storage:
+
+```yaml
+prompts:
+  - provider_id: local-prompts
+    provider_type: inline::reference
+    config:
+      run_config:
+        storage:
+          stores:
+            prompts:
+              type: sqlite
+              db_path: ./prompts.db
+```
+
+### PostgreSQL (Production)
+
+For production with PostgreSQL:
+
+```yaml
+prompts:
+  - provider_id: prod-prompts
+    provider_type: inline::reference
+    config:
+      run_config:
+        storage:
+          stores:
+            prompts:
+              type: postgres
+              url: ${env.DATABASE_URL}
+```
+
+### With Explicit Backend Configuration
+
+```yaml
+prompts:
+  - provider_id: reference-prompts
+    provider_type: inline::reference
+    config:
+      run_config:
+        storage:
+          backends:
+            kv_default:
+              type: sqlite
+              db_path: ./data/prompts.db
+          stores:
+            prompts:
+              backend: kv_default
+              namespace: prompts
+```
+
+## API Reference
+
+### Create Prompt
+
+Creates a new prompt (version 1):
+
+```python
+prompt = client.prompts.create(
+    prompt="You are a {{ role }} assistant. {{ instruction }}",
+    variables=["role", "instruction"]  # Optional - auto-extracted if omitted
+)
+```
+
+**Auto-extraction**: If `variables` is not provided, the provider automatically extracts variables from `{{ variable }}` placeholders.
+
+### Retrieve Prompt
+
+Get a prompt by ID (retrieves default version):
+
+```python
+prompt = client.prompts.get(prompt_id="pmpt_abc123...")
+```
+
+Get a specific version:
+
+```python
+prompt = client.prompts.get(prompt_id="pmpt_abc123...", version=2)
+```
+
+### Update Prompt
+
+Creates a new version of an existing prompt:
+
+```python
+updated = client.prompts.update(
+    prompt_id="pmpt_abc123...",
+    prompt="Updated template with {{ variable }}",
+    version=1,  # Must be the latest version
+    set_as_default=True  # Make this the new default
+)
+```
+
+**Important**: You must provide the current latest version number. The update creates a new version (e.g., version 2).
+
+### Delete Prompt
+
+Delete a prompt and all its versions:
+
+```python
+client.prompts.delete(prompt_id="pmpt_abc123...")
+```
+
+**Note**: This operation is permanent and deletes all versions of the prompt.
+
+### List Prompts
+
+List all prompts (returns default versions only):
+
+```python
+response = client.prompts.list()
+for prompt in response.data:
+    print(f"{prompt.prompt_id}: v{prompt.version} (default)")
+```
+
+### List Prompt Versions
+
+List all versions of a specific prompt:
+
+```python
+response = client.prompts.list_versions(prompt_id="pmpt_abc123...")
+for prompt in response.data:
+    default = " (default)" if prompt.is_default else ""
+    print(f"Version {prompt.version}{default}")
+```
+
+### Set Default Version
+
+Change which version is the default:
+
+```python
+client.prompts.set_default_version(
+    prompt_id="pmpt_abc123...",
+    version=2
+)
+```
+
+## Version Management
+
+The Reference Prompts Provider implements immutable versioning:
+
+1. **Create**: Creates version 1
+2. **Update**: Creates a new version (2, 3, 4, ...)
+3. **Default**: One version is marked as default
+4. **History**: All versions are preserved and retrievable
+5. **Delete**: Can delete all versions at once
+
+```
+pmpt_abc123
+├── Version 1 (Original)
+├── Version 2 (Updated)
+└── Version 3 (Latest, Default) <- Current default version
+```
+
+## Storage Backends
+
+The reference provider uses Llama Stack's KVStore abstraction, which supports multiple backends:
+
+### SQLite (Default)
+
+Best for:
+- Local development
+- Single-server deployments
+- Embedded applications
+- Testing
+
+Limitations:
+- Not suitable for high-concurrency scenarios
+- No built-in replication
+
+### PostgreSQL
+
+Best for:
+- Production deployments
+- Multi-server setups
+- High availability requirements
+- Team collaboration
+
+Advantages:
+- Supports concurrent access
+- Built-in replication and backups
+- Scalable and robust
+
+## Best Practices
+
+### 1. Choose Appropriate Storage
+
+**Development**:
+```yaml
+# Use SQLite for local development
+storage:
+  stores:
+    prompts:
+      type: sqlite
+      db_path: ./dev-prompts.db
+```
+
+**Production**:
+```yaml
+# Use PostgreSQL for production
+storage:
+  stores:
+    prompts:
+      type: postgres
+      url: ${env.DATABASE_URL}
+```
+
+### 2. Backup Your Data
+
+For SQLite:
+```bash
+# Backup SQLite database
+cp prompts.db prompts.db.backup
+```
+
+For PostgreSQL:
+```bash
+# Backup PostgreSQL database
+pg_dump llama_stack > backup.sql
+```
+
+### 3. Version Management
+
+- Always retrieve latest version before updating
+- Use `set_as_default=True` when updating to make new version active
+- Keep version history for audit trail
+- Use deletion sparingly (consider archiving instead)
+
+### 4. Auto-Extract Variables
+
+Let the provider auto-extract variables to avoid validation errors:
+
+```python
+# Recommended
+prompt = client.prompts.create(
+    prompt="Summarize {{ text }} in {{ format }}"
+)
+```
+
+### 5. Use Meaningful Templates
+
+Include context in your templates:
+
+```python
+# Good
+prompt = """You are a {{ role }} assistant specialized in {{ domain }}.
+
+Task: {{ task }}
+
+Output format: {{ format }}"""
+
+# Less clear
+prompt = "Do {{ task }} as {{ role }}"
+```
+
+## Troubleshooting
+
+### Database Connection Errors
+
+**Error**: Failed to connect to database
+
+**Solutions**:
+1. Verify database URL is correct
+2. Ensure database server is running (for PostgreSQL)
+3. Check file permissions (for SQLite)
+4. Verify network connectivity (for remote databases)
+
+### Version Mismatch Error
+
+**Error**: `Version X is not the latest version. Use latest version Y to update.`
+
+**Cause**: Attempting to update an outdated version
+
+**Solution**: Always use the latest version number when updating:
+```python
+# Get latest version
+versions = client.prompts.list_versions(prompt_id)
+latest_version = max(v.version for v in versions.data)
+
+# Use latest version for update
+client.prompts.update(prompt_id=prompt_id, version=latest_version, ...)
+```
+
+### Variable Validation Error
+
+**Error**: `Template contains undeclared variables: ['var2']`
+
+**Cause**: Template has `{{ var2 }}` but `variables` list doesn't include it
+
+**Solution**: Either add missing variable or let the provider auto-extract:
+```python
+# Option 1: Add missing variable
+client.prompts.create(
+    prompt="Template with {{ var1 }} and {{ var2 }}",
+    variables=["var1", "var2"]
+)
+
+# Option 2: Let provider auto-extract (recommended)
+client.prompts.create(
+    prompt="Template with {{ var1 }} and {{ var2 }}"
+)
+```
+
+### Prompt Not Found
+
+**Error**: `Prompt pmpt_abc123... not found`
+
+**Possible causes**:
+1. Prompt ID is incorrect
+2. Prompt was deleted
+3. Wrong database or storage backend
+
+**Solution**: Verify prompt exists using `list()` method
+
+## Migration Guide
+
+### Migrating from Core Implementation
+
+If you're upgrading from an older Llama Stack version where prompts were in `core/prompts`:
+
+**Old code** (still works):
+```python
+from llama_stack.core.prompts import PromptServiceConfig, PromptServiceImpl
+```
+
+**New code** (recommended):
+```python
+from llama_stack.providers.inline.prompts.reference import ReferencePromptsConfig, PromptServiceImpl
+```
+
+**Note**: Backward compatibility is maintained. Old imports still work.
+
+### Data Migration
+
+No data migration needed when upgrading:
+- Same KVStore backend is used
+- Existing prompts remain accessible
+- Configuration structure is compatible
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `run_config` | `StackRunConfig` | Yes |  | Stack run configuration containing storage configuration for KVStore |
+
+## Sample Configuration
+
+```yaml
+run_config:
+  storage:
+    backends:
+      kv_default:
+        type: sqlite
+        db_path: ./prompts.db
+    stores:
+      prompts:
+        backend: kv_default
+        namespace: prompts
+```
--- a/docs/docs/providers/prompts/remote_mlflow.mdx
+++ b/docs/docs/providers/prompts/remote_mlflow.mdx
@ -0,0 +1,751 @@
+---
+description: |
+  [MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
+  using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
+  prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
+ 
+  See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
+
+sidebar_label: Remote - MLflow
+title: remote::mlflow
+---
+
+# remote::mlflow
+
+## Description
+
+[MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
+using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
+prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
+
+## Features
+MLflow Prompts Provider supports:
+- Create and store prompts with automatic versioning
+- Retrieve prompts by ID and version
+- Update prompts (creates new immutable versions)
+- List all prompts or all versions of a specific prompt
+- Set default version for a prompt
+- Automatic variable extraction from templates
+- Metadata storage and retrieval
+- Centralized prompt management across teams
+
+## Key Capabilities
+- **Version Control**: Immutable versioning ensures prompt history is preserved
+- **Default Version Management**: Easily switch between prompt versions
+- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
+- **Metadata Tags**: Stores Llama Stack metadata for seamless integration
+- **Team Collaboration**: Centralized MLflow server enables multi-user access
+
+## Usage
+
+To use MLflow Prompts Provider in your Llama Stack project:
+
+1. Install MLflow 3.4 or later
+2. Start an MLflow server (local or remote)
+3. Configure your Llama Stack project to use the MLflow provider
+4. Start creating and managing prompts
+
+## Installation
+
+Install MLflow using pip or uv:
+
+```bash
+pip install 'mlflow>=3.4.0'
+# or
+uv pip install 'mlflow>=3.4.0'
+```
+
+## Quick Start
+
+### 1. Start MLflow Server
+
+**Local server** (for development):
+```bash
+mlflow server --host 127.0.0.1 --port 5555
+```
+
+**Remote server** (for production):
+```bash
+mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri postgresql://user:pass@host/db
+```
+
+### 2. Configure Llama Stack
+
+Add to your Llama Stack configuration:
+
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://localhost:5555
+      experiment_name: llama-stack-prompts
+```
+
+### 3. Use the Prompts API
+
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:5000")
+
+# Create a prompt
+prompt = client.prompts.create(
+    prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
+    variables=["num_sentences", "text"]
+)
+print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
+
+# Retrieve prompt
+retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
+print(f"Retrieved: {retrieved.prompt}")
+
+# Update prompt (creates version 2)
+updated = client.prompts.update(
+    prompt_id=prompt.prompt_id,
+    prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
+    version=1,
+    set_as_default=True
+)
+print(f"Updated to version: {updated.version}")
+
+# List all prompts
+prompts = client.prompts.list()
+print(f"Found {len(prompts.data)} prompts")
+```
+
+## Configuration Examples
+
+### Local Development
+
+For local development with filesystem storage:
+
+```yaml
+prompts:
+  - provider_id: mlflow-local
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://localhost:5555
+      experiment_name: dev-prompts
+      timeout_seconds: 30
+```
+
+### Remote MLflow Server
+
+For production with a remote MLflow server:
+
+```yaml
+prompts:
+  - provider_id: mlflow-production
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI}
+      experiment_name: production-prompts
+      timeout_seconds: 60
+```
+
+### Advanced Configuration
+
+With custom settings:
+
+```yaml
+prompts:
+  - provider_id: mlflow-custom
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: https://mlflow.example.com
+      experiment_name: team-prompts
+      timeout_seconds: 45
+```
+
+## Authentication
+
+The MLflow provider supports three authentication methods with the following precedence (highest to lowest):
+
+1. **Per-Request Provider Data** (via headers)
+2. **Configuration Auth Credential** (in config file)
+3. **Environment Variables** (MLflow defaults)
+
+### Method 1: Per-Request Provider Data (Recommended for Multi-Tenant)
+
+For multi-tenant deployments where each user has their own credentials:
+
+**Configuration**:
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://mlflow.company.com
+      experiment_name: production-prompts
+      # No auth_credential - use per-request tokens
+```
+
+**Client Usage**:
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:5000")
+
+# User 1 with their own token
+prompts_user1 = client.prompts.list(
+    extra_headers={
+        "x-llamastack-provider-data": '{"mlflow_api_token": "user1-token"}'
+    }
+)
+
+# User 2 with their own token
+prompts_user2 = client.prompts.list(
+    extra_headers={
+        "x-llamastack-provider-data": '{"mlflow_api_token": "user2-token"}'
+    }
+)
+```
+
+**Benefits**:
+- Per-user authentication and authorization
+- No shared credentials
+- Ideal for SaaS deployments
+- Supports user-specific MLflow experiments
+
+### Method 2: Configuration Auth Credential (Server-Level)
+
+For server-level authentication where all requests use the same credentials:
+
+**Using Environment Variable** (recommended):
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://mlflow.company.com
+      experiment_name: production-prompts
+      auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
+```
+
+**Using Direct Value** (not recommended for production):
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://mlflow.company.com
+      experiment_name: production-prompts
+      auth_credential: "mlflow-server-token"
+```
+
+**Client Usage**:
+```python
+# No extra headers needed - server handles authentication
+client = LlamaStackClient(base_url="http://localhost:5000")
+prompts = client.prompts.list()
+```
+
+**Benefits**:
+- Simple configuration
+- Single point of credential management
+- Good for single-tenant deployments
+
+### Method 3: Environment Variables (MLflow Default)
+
+MLflow reads standard environment variables automatically:
+
+**Set before starting Llama Stack**:
+```bash
+export MLFLOW_TRACKING_TOKEN="your-token"
+export MLFLOW_TRACKING_USERNAME="user"  # Optional: Basic auth
+export MLFLOW_TRACKING_PASSWORD="pass"  # Optional: Basic auth
+llama stack run my-config.yaml
+```
+
+**Configuration** (no auth_credential needed):
+```yaml
+prompts:
+  - provider_id: mlflow-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://mlflow.company.com
+      experiment_name: production-prompts
+```
+
+**Benefits**:
+- Standard MLflow behavior
+- No configuration changes needed
+- Good for containerized deployments
+
+### Databricks Authentication
+
+For Databricks-managed MLflow:
+
+**Configuration**:
+```yaml
+prompts:
+  - provider_id: databricks-prompts
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: databricks
+      # Or with workspace URL:
+      # mlflow_tracking_uri: databricks://profile-name
+      experiment_name: /Shared/llama-stack-prompts
+      auth_credential: ${env.DATABRICKS_TOKEN}
+```
+
+**Environment Setup**:
+```bash
+export DATABRICKS_TOKEN="dapi..."
+export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
+```
+
+**Client Usage**:
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:5000")
+
+# Create prompt in Databricks MLflow
+prompt = client.prompts.create(
+    prompt="Analyze {{ topic }} with focus on {{ aspect }}",
+    variables=["topic", "aspect"]
+)
+
+# View in Databricks UI:
+# https://workspace.cloud.databricks.com/#mlflow/experiments/<experiment-id>
+```
+
+### Enterprise MLflow with Authentication
+
+Example for enterprise MLflow server with API key authentication:
+
+**Configuration**:
+```yaml
+prompts:
+  - provider_id: enterprise-mlflow
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: https://mlflow.enterprise.com
+      experiment_name: production-prompts
+      auth_credential: ${env.MLFLOW_API_KEY}
+      timeout_seconds: 60
+```
+
+**Client Usage**:
+```python
+from llama_stack_client import LlamaStackClient
+
+# Option A: Use server's configured credential
+client = LlamaStackClient(base_url="http://localhost:5000")
+prompt = client.prompts.create(
+    prompt="Classify sentiment: {{ text }}",
+    variables=["text"]
+)
+
+# Option B: Override with per-request credential
+prompt = client.prompts.create(
+    prompt="Classify sentiment: {{ text }}",
+    variables=["text"],
+    extra_headers={
+        "x-llamastack-provider-data": '{"mlflow_api_token": "user-specific-key"}'
+    }
+)
+```
+
+### Authentication Precedence
+
+When multiple authentication methods are configured, the provider uses this precedence:
+
+1. **Per-request provider data** (from `x-llamastack-provider-data` header)
+   - Highest priority
+   - Overrides all other methods
+   - Used for multi-tenant scenarios
+
+2. **Configuration auth_credential** (from config file)
+   - Medium priority
+   - Fallback if no provider data header
+   - Good for server-level auth
+
+3. **Environment variables** (MLflow standard)
+   - Lowest priority
+   - Used if no other credentials provided
+   - Standard MLflow behavior
+
+**Example showing precedence**:
+```yaml
+# Config file
+prompts:
+  - provider_id: mlflow
+    provider_type: remote::mlflow
+    config:
+      mlflow_tracking_uri: http://mlflow.company.com
+      auth_credential: ${env.MLFLOW_TRACKING_TOKEN}  # Fallback
+```
+
+```bash
+# Environment variable
+export MLFLOW_TRACKING_TOKEN="server-token"  # Lowest priority
+```
+
+```python
+# Client code
+client.prompts.create(
+    prompt="Test",
+    extra_headers={
+        # This takes precedence over config and env vars
+        "x-llamastack-provider-data": '{"mlflow_api_token": "user-token"}'
+    }
+)
+```
+
+### Security Best Practices
+
+1. **Never hardcode tokens** in configuration files:
+   ```yaml
+   # Bad - hardcoded credential
+   auth_credential: "my-secret-token"
+
+   # Good - use environment variable
+   auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
+   ```
+
+2. **Use per-request credentials** for multi-tenant deployments:
+   ```python
+   # Good - each user provides their own token
+   headers = {
+       "x-llamastack-provider-data": f'{{"mlflow_api_token": "{user_token}"}}'
+   }
+   client.prompts.list(extra_headers=headers)
+   ```
+
+3. **Rotate credentials regularly** in production environments
+
+4. **Use HTTPS** for MLflow tracking URI in production:
+   ```yaml
+   mlflow_tracking_uri: https://mlflow.company.com  # Good
+   # Not: http://mlflow.company.com  # Bad for production
+   ```
+
+5. **Store secrets in secure vaults** (AWS Secrets Manager, HashiCorp Vault, etc.)
+
+## API Reference
+
+### Create Prompt
+
+Creates a new prompt (version 1) or registers a prompt in MLflow:
+
+```python
+prompt = client.prompts.create(
+    prompt="You are a {{ role }} assistant. {{ instruction }}",
+    variables=["role", "instruction"]  # Optional - auto-extracted if omitted
+)
+```
+
+**Auto-extraction**: If `variables` is not provided, the provider automatically extracts variables from `{{ variable }}` placeholders.
+
+### Retrieve Prompt
+
+Get a prompt by ID (retrieves default version):
+
+```python
+prompt = client.prompts.get(prompt_id="pmpt_abc123...")
+```
+
+Get a specific version:
+
+```python
+prompt = client.prompts.get(prompt_id="pmpt_abc123...", version=2)
+```
+
+### Update Prompt
+
+Creates a new version of an existing prompt:
+
+```python
+updated = client.prompts.update(
+    prompt_id="pmpt_abc123...",
+    prompt="Updated template with {{ variable }}",
+    version=1,  # Must be the latest version
+    set_as_default=True  # Make this the new default
+)
+```
+
+**Important**: You must provide the current latest version number. The update creates a new version (e.g., version 2).
+
+### List Prompts
+
+List all prompts (returns default versions only):
+
+```python
+response = client.prompts.list()
+for prompt in response.data:
+    print(f"{prompt.prompt_id}: v{prompt.version} (default)")
+```
+
+### List Prompt Versions
+
+List all versions of a specific prompt:
+
+```python
+response = client.prompts.list_versions(prompt_id="pmpt_abc123...")
+for prompt in response.data:
+    default = " (default)" if prompt.is_default else ""
+    print(f"Version {prompt.version}{default}")
+```
+
+### Set Default Version
+
+Change which version is the default:
+
+```python
+client.prompts.set_default_version(
+    prompt_id="pmpt_abc123...",
+    version=2
+)
+```
+
+## ID Mapping
+
+The MLflow provider uses deterministic bidirectional ID mapping:
+
+- **Llama Stack format**: `pmpt_<48-hex-chars>`
+- **MLflow format**: `llama_prompt_<48-hex-chars>`
+
+Example:
+- Llama Stack ID: `pmpt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
+- MLflow name: `llama_prompt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
+
+This ensures prompts created through Llama Stack are easily identifiable in MLflow.
+
+## Version Management
+
+MLflow Prompts Provider implements immutable versioning:
+
+1. **Create**: Creates version 1
+2. **Update**: Creates a new version (2, 3, 4, ...)
+3. **Default**: The "default" alias points to the current default version
+4. **History**: All versions are preserved and retrievable
+
+```
+pmpt_abc123
+├── Version 1 (Original)
+├── Version 2 (Updated)
+└── Version 3 (Latest, Default) ← Default alias points here
+```
+
+## Troubleshooting
+
+### MLflow Server Not Available
+
+**Error**: `Failed to connect to MLflow server`
+
+**Solutions**:
+1. Verify MLflow server is running: `curl http://localhost:5555/health`
+2. Check `mlflow_tracking_uri` in configuration
+3. Ensure network connectivity to remote server
+4. Check firewall settings
+
+### Version Mismatch Error
+
+**Error**: `Version X is not the latest version. Use latest version Y to update.`
+
+**Cause**: Attempting to update an outdated version
+
+**Solution**: Always use the latest version number when updating:
+```python
+# Get latest version
+versions = client.prompts.list_versions(prompt_id)
+latest_version = max(v.version for v in versions.data)
+
+# Use latest version for update
+client.prompts.update(prompt_id=prompt_id, version=latest_version, ...)
+```
+
+### Variable Validation Error
+
+**Error**: `Template contains undeclared variables: ['var2']`
+
+**Cause**: Template has `{{ var2 }}` but `variables` list doesn't include it
+
+**Solution**: Either add missing variable or let the provider auto-extract:
+```python
+# Option 1: Add missing variable
+client.prompts.create(
+    prompt="Template with {{ var1 }} and {{ var2 }}",
+    variables=["var1", "var2"]
+)
+
+# Option 2: Let provider auto-extract (recommended)
+client.prompts.create(
+    prompt="Template with {{ var1 }} and {{ var2 }}"
+)
+```
+
+### Timeout Errors
+
+**Error**: Connection timeout when communicating with MLflow
+
+**Solutions**:
+1. Increase `timeout_seconds` in configuration:
+   ```yaml
+   config:
+     timeout_seconds: 60  # Default: 30
+   ```
+2. Check network latency to MLflow server
+3. Verify MLflow server is responsive
+
+### Prompt Not Found
+
+**Error**: `Prompt pmpt_abc123... not found`
+
+**Possible causes**:
+1. Prompt ID is incorrect
+2. Prompt was created in a different MLflow server/experiment
+3. Experiment name mismatch in configuration
+
+**Solution**: Verify prompt exists in MLflow UI at `http://localhost:5555`
+
+## Limitations
+
+### No Deletion Support
+
+**MLflow does not support deleting prompts or versions**. The `delete_prompt()` method raises `NotImplementedError`.
+
+**Workaround**: Mark prompts as deprecated using naming conventions or set a different version as default.
+
+### Experiment Required
+
+All prompts are stored within an MLflow experiment. The experiment is created automatically if it doesn't exist.
+
+### ID Format Constraints
+
+- Prompt IDs must follow the format: `pmpt_<48-hex-chars>`
+- MLflow names use the prefix: `llama_prompt_`
+- Manual creation in MLflow with different names won't be recognized
+
+### Version Numbering
+
+- Versions are sequential integers (1, 2, 3, ...)
+- You cannot skip version numbers
+- You cannot manually set version numbers
+
+## Best Practices
+
+### 1. Use Environment Variables
+
+Store MLflow URIs in environment variables:
+
+```yaml
+config:
+  mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5555}
+```
+
+### 2. Auto-Extract Variables
+
+Let the provider auto-extract variables to avoid validation errors:
+
+```python
+# Recommended
+prompt = client.prompts.create(
+    prompt="Summarize {{ text }} in {{ format }}"
+)
+```
+
+### 3. Organize by Experiment
+
+Use different experiments for different environments:
+
+- `dev-prompts` for development
+- `staging-prompts` for staging
+- `production-prompts` for production
+
+### 4. Version Management
+
+- Always retrieve latest version before updating
+- Use `set_as_default=True` when updating to make new version active
+- Keep version history for audit trail
+
+### 5. Use Meaningful Templates
+
+Include context in your templates:
+
+```python
+# Good
+prompt = """You are a {{ role }} assistant specialized in {{ domain }}.
+
+Task: {{ task }}
+
+Output format: {{ format }}"""
+
+# Less clear
+prompt = "Do {{ task }} as {{ role }}"
+```
+
+### 6. Monitor MLflow Server
+
+- Use MLflow UI to visualize prompts: `http://your-server:5555`
+- Monitor experiment metrics and prompt versions
+- Set up alerts for MLflow server health
+
+## Production Deployment
+
+### Database Backend
+
+For production, use a database backend instead of filesystem:
+
+```bash
+mlflow server \
+  --host 0.0.0.0 \
+  --port 5000 \
+  --backend-store-uri postgresql://user:pass@host:5432/mlflow \
+  --default-artifact-root s3://my-bucket/mlflow-artifacts
+```
+
+### High Availability
+
+- Deploy multiple MLflow server instances behind a load balancer
+- Use managed database (RDS, Cloud SQL, etc.)
+- Store artifacts in object storage (S3, GCS, Azure Blob)
+
+### Security
+
+- Enable authentication on MLflow server
+- Use HTTPS for MLflow tracking URI
+- Restrict network access with firewall rules
+- Use IAM roles for cloud deployments
+
+### Monitoring
+
+Set up monitoring for:
+- MLflow server availability
+- Database connection pool
+- API response times
+- Prompt creation/retrieval rates
+
+## Documentation
+See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
+
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `mlflow_tracking_uri` | `str` | No | http://localhost:5000 | MLflow tracking server URI |
+| `mlflow_registry_uri` | `str \| None` | No | None | MLflow model registry URI (defaults to tracking URI if not set) |
+| `experiment_name` | `str` | No | llama-stack-prompts | MLflow experiment name for storing prompts |
+| `auth_credential` | `SecretStr \| None` | No | None | MLflow API token for authentication. Can be overridden via provider data header. |
+| `timeout_seconds` | `int` | No | 30 | Timeout for MLflow API calls (1-300 seconds) |
+
+## Sample Configuration
+
+**Without authentication** (local development):
+```yaml
+mlflow_tracking_uri: http://localhost:5555
+experiment_name: llama-stack-prompts
+timeout_seconds: 30
+```
+
+**With authentication** (production):
+```yaml
+mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5000}
+experiment_name: llama-stack-prompts
+auth_credential: ${env.MLFLOW_TRACKING_TOKEN:=}
+timeout_seconds: 30
+```