mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
Merge 5c4da04f29 into 4237eb4aaa
This commit is contained in:
commit
d5836c3b5a
24 changed files with 3594 additions and 225 deletions
92
docs/docs/providers/prompts/index.mdx
Normal file
92
docs/docs/providers/prompts/index.mdx
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
---
|
||||
sidebar_label: Prompts
|
||||
title: Prompts
|
||||
---
|
||||
|
||||
# Prompts
|
||||
|
||||
## Overview
|
||||
|
||||
This section contains documentation for all available providers for the **prompts** API.
|
||||
|
||||
The Prompts API enables centralized management of prompt templates with versioning, variable handling, and team collaboration capabilities.
|
||||
|
||||
## Available Providers
|
||||
|
||||
### Inline Providers
|
||||
|
||||
Inline providers run in the same process as the Llama Stack server and require no external dependencies:
|
||||
|
||||
- **[inline::reference](inline_reference.mdx)** - Reference implementation using KVStore backend (SQLite, PostgreSQL, etc.)
|
||||
- Zero external dependencies
|
||||
- Supports local SQLite or PostgreSQL storage
|
||||
- Full CRUD operations including deletion
|
||||
- Ideal for local development and single-server deployments
|
||||
|
||||
### Remote Providers
|
||||
|
||||
Remote providers connect to external services for centralized prompt management:
|
||||
|
||||
- **[remote::mlflow](remote_mlflow.mdx)** - MLflow Prompt Registry integration (requires MLflow 3.4+)
|
||||
- Centralized prompt management across teams
|
||||
- Built-in versioning and audit trail
|
||||
- Supports authentication (per-request, config, or environment variables)
|
||||
- Integrates with Databricks and enterprise MLflow deployments
|
||||
- Ideal for team collaboration and production environments
|
||||
|
||||
## Choosing a Provider
|
||||
|
||||
### Use `inline::reference` when:
|
||||
- Developing locally or deploying to a single server
|
||||
- You want zero external dependencies
|
||||
- SQLite or PostgreSQL storage is sufficient
|
||||
- You need full CRUD operations (including deletion)
|
||||
- You prefer simple configuration
|
||||
|
||||
### Use `remote::mlflow` when:
|
||||
- Working in a team environment with multiple users
|
||||
- You need centralized prompt management
|
||||
- Integration with existing MLflow infrastructure
|
||||
- You need authentication and multi-tenant support
|
||||
- Advanced versioning and audit trail capabilities are required
|
||||
|
||||
## Quick Start Examples
|
||||
|
||||
### Using inline::reference
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: local-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: sqlite
|
||||
db_path: ./prompts.db
|
||||
```
|
||||
|
||||
### Using remote::mlflow
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://localhost:5555
|
||||
experiment_name: llama-stack-prompts
|
||||
auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
|
||||
```
|
||||
|
||||
## Common Features
|
||||
|
||||
All prompt providers support:
|
||||
- Create and store prompts with version control
|
||||
- Retrieve prompts by ID and version
|
||||
- Update prompts (creates new versions)
|
||||
- List all prompts or versions of a specific prompt
|
||||
- Set default version for a prompt
|
||||
- Automatic variable extraction from `{{ variable }}` templates
|
||||
|
||||
For detailed documentation on each provider, see the individual provider pages linked above.
|
||||
496
docs/docs/providers/prompts/inline_reference.mdx
Normal file
496
docs/docs/providers/prompts/inline_reference.mdx
Normal file
|
|
@ -0,0 +1,496 @@
|
|||
---
|
||||
description: |
|
||||
Reference implementation of the Prompts API using KVStore backend (SQLite, PostgreSQL, etc.)
|
||||
for centralized prompt management with versioning support. This is the default provider for
|
||||
prompts that works without external dependencies.
|
||||
|
||||
## Features
|
||||
The Reference Prompts Provider supports:
|
||||
- Create and store prompts with automatic versioning
|
||||
- Retrieve prompts by ID and version
|
||||
- Update prompts (creates new immutable versions)
|
||||
- Delete prompts and their versions
|
||||
- List all prompts or all versions of a specific prompt
|
||||
- Set default version for a prompt
|
||||
- Automatic variable extraction from templates
|
||||
- Storage in SQLite, PostgreSQL, or other KVStore backends
|
||||
|
||||
## Key Capabilities
|
||||
- **Zero Dependencies**: No external services required, runs in-process
|
||||
- **Flexible Storage**: Supports SQLite (default), PostgreSQL, and other KVStore backends
|
||||
- **Version Control**: Immutable versioning ensures prompt history is preserved
|
||||
- **Default Version Management**: Easily switch between prompt versions
|
||||
- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
|
||||
- **Full CRUD Support**: Unlike remote providers, supports deletion of prompts
|
||||
|
||||
## Usage
|
||||
|
||||
To use Reference Prompts Provider in your Llama Stack project:
|
||||
|
||||
1. Configure your Llama Stack project with the inline::reference provider
|
||||
2. Optionally configure storage backend (defaults to SQLite)
|
||||
3. Start creating and managing prompts
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Configure Llama Stack
|
||||
|
||||
**Basic configuration with SQLite** (default):
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: reference-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: sqlite
|
||||
db_path: ./prompts.db
|
||||
```
|
||||
|
||||
**With PostgreSQL**:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: postgres-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: postgres
|
||||
url: postgresql://user:pass@localhost/llama_stack
|
||||
```
|
||||
|
||||
### 2. Use the Prompts API
|
||||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
|
||||
# Create a prompt
|
||||
prompt = client.prompts.create(
|
||||
prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
|
||||
variables=["num_sentences", "text"]
|
||||
)
|
||||
print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
|
||||
|
||||
# Retrieve prompt
|
||||
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
|
||||
print(f"Retrieved: {retrieved.prompt}")
|
||||
|
||||
# Update prompt (creates version 2)
|
||||
updated = client.prompts.update(
|
||||
prompt_id=prompt.prompt_id,
|
||||
prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
|
||||
version=1,
|
||||
set_as_default=True
|
||||
)
|
||||
print(f"Updated to version: {updated.version}")
|
||||
|
||||
# List all prompts
|
||||
prompts = client.prompts.list()
|
||||
print(f"Found {len(prompts.data)} prompts")
|
||||
|
||||
# Delete prompt
|
||||
client.prompts.delete(prompt_id=prompt.prompt_id)
|
||||
```
|
||||
|
||||
sidebar_label: Inline - Reference
|
||||
title: inline::reference
|
||||
---
|
||||
|
||||
# inline::reference
|
||||
|
||||
## Description
|
||||
|
||||
Reference implementation of the Prompts API using KVStore backend (SQLite, PostgreSQL, etc.)
|
||||
for centralized prompt management with versioning support. This is the default provider for
|
||||
prompts that works without external dependencies.
|
||||
|
||||
## Features
|
||||
The Reference Prompts Provider supports:
|
||||
- Create and store prompts with automatic versioning
|
||||
- Retrieve prompts by ID and version
|
||||
- Update prompts (creates new immutable versions)
|
||||
- Delete prompts and their versions
|
||||
- List all prompts or all versions of a specific prompt
|
||||
- Set default version for a prompt
|
||||
- Automatic variable extraction from templates
|
||||
- Storage in SQLite, PostgreSQL, or other KVStore backends
|
||||
|
||||
## Key Capabilities
|
||||
- **Zero Dependencies**: No external services required, runs in-process
|
||||
- **Flexible Storage**: Supports SQLite (default), PostgreSQL, and other KVStore backends
|
||||
- **Version Control**: Immutable versioning ensures prompt history is preserved
|
||||
- **Default Version Management**: Easily switch between prompt versions
|
||||
- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
|
||||
- **Full CRUD Support**: Unlike remote providers, supports deletion of prompts
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### SQLite (Local Development)
|
||||
|
||||
For local development with filesystem storage:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: local-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: sqlite
|
||||
db_path: ./prompts.db
|
||||
```
|
||||
|
||||
### PostgreSQL (Production)
|
||||
|
||||
For production with PostgreSQL:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: prod-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: postgres
|
||||
url: ${env.DATABASE_URL}
|
||||
```
|
||||
|
||||
### With Explicit Backend Configuration
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: reference-prompts
|
||||
provider_type: inline::reference
|
||||
config:
|
||||
run_config:
|
||||
storage:
|
||||
backends:
|
||||
kv_default:
|
||||
type: sqlite
|
||||
db_path: ./data/prompts.db
|
||||
stores:
|
||||
prompts:
|
||||
backend: kv_default
|
||||
namespace: prompts
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Create Prompt
|
||||
|
||||
Creates a new prompt (version 1):
|
||||
|
||||
```python
|
||||
prompt = client.prompts.create(
|
||||
prompt="You are a {{ role }} assistant. {{ instruction }}",
|
||||
variables=["role", "instruction"] # Optional - auto-extracted if omitted
|
||||
)
|
||||
```
|
||||
|
||||
**Auto-extraction**: If `variables` is not provided, the provider automatically extracts variables from `{{ variable }}` placeholders.
|
||||
|
||||
### Retrieve Prompt
|
||||
|
||||
Get a prompt by ID (retrieves default version):
|
||||
|
||||
```python
|
||||
prompt = client.prompts.get(prompt_id="pmpt_abc123...")
|
||||
```
|
||||
|
||||
Get a specific version:
|
||||
|
||||
```python
|
||||
prompt = client.prompts.get(prompt_id="pmpt_abc123...", version=2)
|
||||
```
|
||||
|
||||
### Update Prompt
|
||||
|
||||
Creates a new version of an existing prompt:
|
||||
|
||||
```python
|
||||
updated = client.prompts.update(
|
||||
prompt_id="pmpt_abc123...",
|
||||
prompt="Updated template with {{ variable }}",
|
||||
version=1, # Must be the latest version
|
||||
set_as_default=True # Make this the new default
|
||||
)
|
||||
```
|
||||
|
||||
**Important**: You must provide the current latest version number. The update creates a new version (e.g., version 2).
|
||||
|
||||
### Delete Prompt
|
||||
|
||||
Delete a prompt and all its versions:
|
||||
|
||||
```python
|
||||
client.prompts.delete(prompt_id="pmpt_abc123...")
|
||||
```
|
||||
|
||||
**Note**: This operation is permanent and deletes all versions of the prompt.
|
||||
|
||||
### List Prompts
|
||||
|
||||
List all prompts (returns default versions only):
|
||||
|
||||
```python
|
||||
response = client.prompts.list()
|
||||
for prompt in response.data:
|
||||
print(f"{prompt.prompt_id}: v{prompt.version} (default)")
|
||||
```
|
||||
|
||||
### List Prompt Versions
|
||||
|
||||
List all versions of a specific prompt:
|
||||
|
||||
```python
|
||||
response = client.prompts.list_versions(prompt_id="pmpt_abc123...")
|
||||
for prompt in response.data:
|
||||
default = " (default)" if prompt.is_default else ""
|
||||
print(f"Version {prompt.version}{default}")
|
||||
```
|
||||
|
||||
### Set Default Version
|
||||
|
||||
Change which version is the default:
|
||||
|
||||
```python
|
||||
client.prompts.set_default_version(
|
||||
prompt_id="pmpt_abc123...",
|
||||
version=2
|
||||
)
|
||||
```
|
||||
|
||||
## Version Management
|
||||
|
||||
The Reference Prompts Provider implements immutable versioning:
|
||||
|
||||
1. **Create**: Creates version 1
|
||||
2. **Update**: Creates a new version (2, 3, 4, ...)
|
||||
3. **Default**: One version is marked as default
|
||||
4. **History**: All versions are preserved and retrievable
|
||||
5. **Delete**: Can delete all versions at once
|
||||
|
||||
```
|
||||
pmpt_abc123
|
||||
├── Version 1 (Original)
|
||||
├── Version 2 (Updated)
|
||||
└── Version 3 (Latest, Default) <- Current default version
|
||||
```
|
||||
|
||||
## Storage Backends
|
||||
|
||||
The reference provider uses Llama Stack's KVStore abstraction, which supports multiple backends:
|
||||
|
||||
### SQLite (Default)
|
||||
|
||||
Best for:
|
||||
- Local development
|
||||
- Single-server deployments
|
||||
- Embedded applications
|
||||
- Testing
|
||||
|
||||
Limitations:
|
||||
- Not suitable for high-concurrency scenarios
|
||||
- No built-in replication
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
Best for:
|
||||
- Production deployments
|
||||
- Multi-server setups
|
||||
- High availability requirements
|
||||
- Team collaboration
|
||||
|
||||
Advantages:
|
||||
- Supports concurrent access
|
||||
- Built-in replication and backups
|
||||
- Scalable and robust
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Choose Appropriate Storage
|
||||
|
||||
**Development**:
|
||||
```yaml
|
||||
# Use SQLite for local development
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: sqlite
|
||||
db_path: ./dev-prompts.db
|
||||
```
|
||||
|
||||
**Production**:
|
||||
```yaml
|
||||
# Use PostgreSQL for production
|
||||
storage:
|
||||
stores:
|
||||
prompts:
|
||||
type: postgres
|
||||
url: ${env.DATABASE_URL}
|
||||
```
|
||||
|
||||
### 2. Backup Your Data
|
||||
|
||||
For SQLite:
|
||||
```bash
|
||||
# Backup SQLite database
|
||||
cp prompts.db prompts.db.backup
|
||||
```
|
||||
|
||||
For PostgreSQL:
|
||||
```bash
|
||||
# Backup PostgreSQL database
|
||||
pg_dump llama_stack > backup.sql
|
||||
```
|
||||
|
||||
### 3. Version Management
|
||||
|
||||
- Always retrieve latest version before updating
|
||||
- Use `set_as_default=True` when updating to make new version active
|
||||
- Keep version history for audit trail
|
||||
- Use deletion sparingly (consider archiving instead)
|
||||
|
||||
### 4. Auto-Extract Variables
|
||||
|
||||
Let the provider auto-extract variables to avoid validation errors:
|
||||
|
||||
```python
|
||||
# Recommended
|
||||
prompt = client.prompts.create(
|
||||
prompt="Summarize {{ text }} in {{ format }}"
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Use Meaningful Templates
|
||||
|
||||
Include context in your templates:
|
||||
|
||||
```python
|
||||
# Good
|
||||
prompt = """You are a {{ role }} assistant specialized in {{ domain }}.
|
||||
|
||||
Task: {{ task }}
|
||||
|
||||
Output format: {{ format }}"""
|
||||
|
||||
# Less clear
|
||||
prompt = "Do {{ task }} as {{ role }}"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Database Connection Errors
|
||||
|
||||
**Error**: Failed to connect to database
|
||||
|
||||
**Solutions**:
|
||||
1. Verify database URL is correct
|
||||
2. Ensure database server is running (for PostgreSQL)
|
||||
3. Check file permissions (for SQLite)
|
||||
4. Verify network connectivity (for remote databases)
|
||||
|
||||
### Version Mismatch Error
|
||||
|
||||
**Error**: `Version X is not the latest version. Use latest version Y to update.`
|
||||
|
||||
**Cause**: Attempting to update an outdated version
|
||||
|
||||
**Solution**: Always use the latest version number when updating:
|
||||
```python
|
||||
# Get latest version
|
||||
versions = client.prompts.list_versions(prompt_id)
|
||||
latest_version = max(v.version for v in versions.data)
|
||||
|
||||
# Use latest version for update
|
||||
client.prompts.update(prompt_id=prompt_id, version=latest_version, ...)
|
||||
```
|
||||
|
||||
### Variable Validation Error
|
||||
|
||||
**Error**: `Template contains undeclared variables: ['var2']`
|
||||
|
||||
**Cause**: Template has `{{ var2 }}` but `variables` list doesn't include it
|
||||
|
||||
**Solution**: Either add missing variable or let the provider auto-extract:
|
||||
```python
|
||||
# Option 1: Add missing variable
|
||||
client.prompts.create(
|
||||
prompt="Template with {{ var1 }} and {{ var2 }}",
|
||||
variables=["var1", "var2"]
|
||||
)
|
||||
|
||||
# Option 2: Let provider auto-extract (recommended)
|
||||
client.prompts.create(
|
||||
prompt="Template with {{ var1 }} and {{ var2 }}"
|
||||
)
|
||||
```
|
||||
|
||||
### Prompt Not Found
|
||||
|
||||
**Error**: `Prompt pmpt_abc123... not found`
|
||||
|
||||
**Possible causes**:
|
||||
1. Prompt ID is incorrect
|
||||
2. Prompt was deleted
|
||||
3. Wrong database or storage backend
|
||||
|
||||
**Solution**: Verify prompt exists using `list()` method
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### Migrating from Core Implementation
|
||||
|
||||
If you're upgrading from an older Llama Stack version where prompts were in `core/prompts`:
|
||||
|
||||
**Old code** (still works):
|
||||
```python
|
||||
from llama_stack.core.prompts import PromptServiceConfig, PromptServiceImpl
|
||||
```
|
||||
|
||||
**New code** (recommended):
|
||||
```python
|
||||
from llama_stack.providers.inline.prompts.reference import ReferencePromptsConfig, PromptServiceImpl
|
||||
```
|
||||
|
||||
**Note**: Backward compatibility is maintained. Old imports still work.
|
||||
|
||||
### Data Migration
|
||||
|
||||
No data migration needed when upgrading:
|
||||
- Same KVStore backend is used
|
||||
- Existing prompts remain accessible
|
||||
- Configuration structure is compatible
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `run_config` | `StackRunConfig` | Yes | | Stack run configuration containing storage configuration for KVStore |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
run_config:
|
||||
storage:
|
||||
backends:
|
||||
kv_default:
|
||||
type: sqlite
|
||||
db_path: ./prompts.db
|
||||
stores:
|
||||
prompts:
|
||||
backend: kv_default
|
||||
namespace: prompts
|
||||
```
|
||||
751
docs/docs/providers/prompts/remote_mlflow.mdx
Normal file
751
docs/docs/providers/prompts/remote_mlflow.mdx
Normal file
|
|
@ -0,0 +1,751 @@
|
|||
---
|
||||
description: |
|
||||
[MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
|
||||
using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
|
||||
prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
|
||||
|
||||
See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
|
||||
|
||||
sidebar_label: Remote - MLflow
|
||||
title: remote::mlflow
|
||||
---
|
||||
|
||||
# remote::mlflow
|
||||
|
||||
## Description
|
||||
|
||||
[MLflow](https://mlflow.org/) is a remote provider for centralized prompt management and versioning
|
||||
using MLflow's Prompt Registry (available in MLflow 3.4+). It allows you to store, version, and manage
|
||||
prompts in a centralized MLflow server, enabling team collaboration and prompt lifecycle management.
|
||||
|
||||
## Features
|
||||
MLflow Prompts Provider supports:
|
||||
- Create and store prompts with automatic versioning
|
||||
- Retrieve prompts by ID and version
|
||||
- Update prompts (creates new immutable versions)
|
||||
- List all prompts or all versions of a specific prompt
|
||||
- Set default version for a prompt
|
||||
- Automatic variable extraction from templates
|
||||
- Metadata storage and retrieval
|
||||
- Centralized prompt management across teams
|
||||
|
||||
## Key Capabilities
|
||||
- **Version Control**: Immutable versioning ensures prompt history is preserved
|
||||
- **Default Version Management**: Easily switch between prompt versions
|
||||
- **Variable Auto-Extraction**: Automatically detects `{{ variable }}` placeholders
|
||||
- **Metadata Tags**: Stores Llama Stack metadata for seamless integration
|
||||
- **Team Collaboration**: Centralized MLflow server enables multi-user access
|
||||
|
||||
## Usage
|
||||
|
||||
To use MLflow Prompts Provider in your Llama Stack project:
|
||||
|
||||
1. Install MLflow 3.4 or later
|
||||
2. Start an MLflow server (local or remote)
|
||||
3. Configure your Llama Stack project to use the MLflow provider
|
||||
4. Start creating and managing prompts
|
||||
|
||||
## Installation
|
||||
|
||||
Install MLflow using pip or uv:
|
||||
|
||||
```bash
|
||||
pip install 'mlflow>=3.4.0'
|
||||
# or
|
||||
uv pip install 'mlflow>=3.4.0'
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Start MLflow Server
|
||||
|
||||
**Local server** (for development):
|
||||
```bash
|
||||
mlflow server --host 127.0.0.1 --port 5555
|
||||
```
|
||||
|
||||
**Remote server** (for production):
|
||||
```bash
|
||||
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri postgresql://user:pass@host/db
|
||||
```
|
||||
|
||||
### 2. Configure Llama Stack
|
||||
|
||||
Add to your Llama Stack configuration:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://localhost:5555
|
||||
experiment_name: llama-stack-prompts
|
||||
```
|
||||
|
||||
### 3. Use the Prompts API
|
||||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
|
||||
# Create a prompt
|
||||
prompt = client.prompts.create(
|
||||
prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
|
||||
variables=["num_sentences", "text"]
|
||||
)
|
||||
print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
|
||||
|
||||
# Retrieve prompt
|
||||
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
|
||||
print(f"Retrieved: {retrieved.prompt}")
|
||||
|
||||
# Update prompt (creates version 2)
|
||||
updated = client.prompts.update(
|
||||
prompt_id=prompt.prompt_id,
|
||||
prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
|
||||
version=1,
|
||||
set_as_default=True
|
||||
)
|
||||
print(f"Updated to version: {updated.version}")
|
||||
|
||||
# List all prompts
|
||||
prompts = client.prompts.list()
|
||||
print(f"Found {len(prompts.data)} prompts")
|
||||
```
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Local Development
|
||||
|
||||
For local development with filesystem storage:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-local
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://localhost:5555
|
||||
experiment_name: dev-prompts
|
||||
timeout_seconds: 30
|
||||
```
|
||||
|
||||
### Remote MLflow Server
|
||||
|
||||
For production with a remote MLflow server:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-production
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI}
|
||||
experiment_name: production-prompts
|
||||
timeout_seconds: 60
|
||||
```
|
||||
|
||||
### Advanced Configuration
|
||||
|
||||
With custom settings:
|
||||
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-custom
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: https://mlflow.example.com
|
||||
experiment_name: team-prompts
|
||||
timeout_seconds: 45
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
The MLflow provider supports three authentication methods with the following precedence (highest to lowest):
|
||||
|
||||
1. **Per-Request Provider Data** (via headers)
|
||||
2. **Configuration Auth Credential** (in config file)
|
||||
3. **Environment Variables** (MLflow defaults)
|
||||
|
||||
### Method 1: Per-Request Provider Data (Recommended for Multi-Tenant)
|
||||
|
||||
For multi-tenant deployments where each user has their own credentials:
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://mlflow.company.com
|
||||
experiment_name: production-prompts
|
||||
# No auth_credential - use per-request tokens
|
||||
```
|
||||
|
||||
**Client Usage**:
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
|
||||
# User 1 with their own token
|
||||
prompts_user1 = client.prompts.list(
|
||||
extra_headers={
|
||||
"x-llamastack-provider-data": '{"mlflow_api_token": "user1-token"}'
|
||||
}
|
||||
)
|
||||
|
||||
# User 2 with their own token
|
||||
prompts_user2 = client.prompts.list(
|
||||
extra_headers={
|
||||
"x-llamastack-provider-data": '{"mlflow_api_token": "user2-token"}'
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Per-user authentication and authorization
|
||||
- No shared credentials
|
||||
- Ideal for SaaS deployments
|
||||
- Supports user-specific MLflow experiments
|
||||
|
||||
### Method 2: Configuration Auth Credential (Server-Level)
|
||||
|
||||
For server-level authentication where all requests use the same credentials:
|
||||
|
||||
**Using Environment Variable** (recommended):
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://mlflow.company.com
|
||||
experiment_name: production-prompts
|
||||
auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
|
||||
```
|
||||
|
||||
**Using Direct Value** (not recommended for production):
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://mlflow.company.com
|
||||
experiment_name: production-prompts
|
||||
auth_credential: "mlflow-server-token"
|
||||
```
|
||||
|
||||
**Client Usage**:
|
||||
```python
|
||||
# No extra headers needed - server handles authentication
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
prompts = client.prompts.list()
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Simple configuration
|
||||
- Single point of credential management
|
||||
- Good for single-tenant deployments
|
||||
|
||||
### Method 3: Environment Variables (MLflow Default)
|
||||
|
||||
MLflow reads standard environment variables automatically:
|
||||
|
||||
**Set before starting Llama Stack**:
|
||||
```bash
|
||||
export MLFLOW_TRACKING_TOKEN="your-token"
|
||||
export MLFLOW_TRACKING_USERNAME="user" # Optional: Basic auth
|
||||
export MLFLOW_TRACKING_PASSWORD="pass" # Optional: Basic auth
|
||||
llama stack run my-config.yaml
|
||||
```
|
||||
|
||||
**Configuration** (no auth_credential needed):
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: mlflow-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://mlflow.company.com
|
||||
experiment_name: production-prompts
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Standard MLflow behavior
|
||||
- No configuration changes needed
|
||||
- Good for containerized deployments
|
||||
|
||||
### Databricks Authentication
|
||||
|
||||
For Databricks-managed MLflow:
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: databricks-prompts
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: databricks
|
||||
# Or with workspace URL:
|
||||
# mlflow_tracking_uri: databricks://profile-name
|
||||
experiment_name: /Shared/llama-stack-prompts
|
||||
auth_credential: ${env.DATABRICKS_TOKEN}
|
||||
```
|
||||
|
||||
**Environment Setup**:
|
||||
```bash
|
||||
export DATABRICKS_TOKEN="dapi..."
|
||||
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
|
||||
```
|
||||
|
||||
**Client Usage**:
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
|
||||
# Create prompt in Databricks MLflow
|
||||
prompt = client.prompts.create(
|
||||
prompt="Analyze {{ topic }} with focus on {{ aspect }}",
|
||||
variables=["topic", "aspect"]
|
||||
)
|
||||
|
||||
# View in Databricks UI:
|
||||
# https://workspace.cloud.databricks.com/#mlflow/experiments/<experiment-id>
|
||||
```
|
||||
|
||||
### Enterprise MLflow with Authentication
|
||||
|
||||
Example for enterprise MLflow server with API key authentication:
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
prompts:
|
||||
- provider_id: enterprise-mlflow
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: https://mlflow.enterprise.com
|
||||
experiment_name: production-prompts
|
||||
auth_credential: ${env.MLFLOW_API_KEY}
|
||||
timeout_seconds: 60
|
||||
```
|
||||
|
||||
**Client Usage**:
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
# Option A: Use server's configured credential
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
prompt = client.prompts.create(
|
||||
prompt="Classify sentiment: {{ text }}",
|
||||
variables=["text"]
|
||||
)
|
||||
|
||||
# Option B: Override with per-request credential
|
||||
prompt = client.prompts.create(
|
||||
prompt="Classify sentiment: {{ text }}",
|
||||
variables=["text"],
|
||||
extra_headers={
|
||||
"x-llamastack-provider-data": '{"mlflow_api_token": "user-specific-key"}'
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Authentication Precedence
|
||||
|
||||
When multiple authentication methods are configured, the provider uses this precedence:
|
||||
|
||||
1. **Per-request provider data** (from `x-llamastack-provider-data` header)
|
||||
- Highest priority
|
||||
- Overrides all other methods
|
||||
- Used for multi-tenant scenarios
|
||||
|
||||
2. **Configuration auth_credential** (from config file)
|
||||
- Medium priority
|
||||
- Fallback if no provider data header
|
||||
- Good for server-level auth
|
||||
|
||||
3. **Environment variables** (MLflow standard)
|
||||
- Lowest priority
|
||||
- Used if no other credentials provided
|
||||
- Standard MLflow behavior
|
||||
|
||||
**Example showing precedence**:
|
||||
```yaml
|
||||
# Config file
|
||||
prompts:
|
||||
- provider_id: mlflow
|
||||
provider_type: remote::mlflow
|
||||
config:
|
||||
mlflow_tracking_uri: http://mlflow.company.com
|
||||
auth_credential: ${env.MLFLOW_TRACKING_TOKEN} # Fallback
|
||||
```
|
||||
|
||||
```bash
|
||||
# Environment variable
|
||||
export MLFLOW_TRACKING_TOKEN="server-token" # Lowest priority
|
||||
```
|
||||
|
||||
```python
|
||||
# Client code
|
||||
client.prompts.create(
|
||||
prompt="Test",
|
||||
extra_headers={
|
||||
# This takes precedence over config and env vars
|
||||
"x-llamastack-provider-data": '{"mlflow_api_token": "user-token"}'
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Security Best Practices
|
||||
|
||||
1. **Never hardcode tokens** in configuration files:
|
||||
```yaml
|
||||
# Bad - hardcoded credential
|
||||
auth_credential: "my-secret-token"
|
||||
|
||||
# Good - use environment variable
|
||||
auth_credential: ${env.MLFLOW_TRACKING_TOKEN}
|
||||
```
|
||||
|
||||
2. **Use per-request credentials** for multi-tenant deployments:
|
||||
```python
|
||||
# Good - each user provides their own token
|
||||
headers = {
|
||||
"x-llamastack-provider-data": f'{{"mlflow_api_token": "{user_token}"}}'
|
||||
}
|
||||
client.prompts.list(extra_headers=headers)
|
||||
```
|
||||
|
||||
3. **Rotate credentials regularly** in production environments
|
||||
|
||||
4. **Use HTTPS** for MLflow tracking URI in production:
|
||||
```yaml
|
||||
mlflow_tracking_uri: https://mlflow.company.com # Good
|
||||
# Not: http://mlflow.company.com # Bad for production
|
||||
```
|
||||
|
||||
5. **Store secrets in secure vaults** (AWS Secrets Manager, HashiCorp Vault, etc.)
|
||||
|
||||
## API Reference
|
||||
|
||||
### Create Prompt
|
||||
|
||||
Creates a new prompt (version 1) or registers a prompt in MLflow:
|
||||
|
||||
```python
|
||||
prompt = client.prompts.create(
|
||||
prompt="You are a {{ role }} assistant. {{ instruction }}",
|
||||
variables=["role", "instruction"] # Optional - auto-extracted if omitted
|
||||
)
|
||||
```
|
||||
|
||||
**Auto-extraction**: If `variables` is not provided, the provider automatically extracts variables from `{{ variable }}` placeholders.
|
||||
|
||||
### Retrieve Prompt
|
||||
|
||||
Get a prompt by ID (retrieves default version):
|
||||
|
||||
```python
|
||||
prompt = client.prompts.get(prompt_id="pmpt_abc123...")
|
||||
```
|
||||
|
||||
Get a specific version:
|
||||
|
||||
```python
|
||||
prompt = client.prompts.get(prompt_id="pmpt_abc123...", version=2)
|
||||
```
|
||||
|
||||
### Update Prompt
|
||||
|
||||
Creates a new version of an existing prompt:
|
||||
|
||||
```python
|
||||
updated = client.prompts.update(
|
||||
prompt_id="pmpt_abc123...",
|
||||
prompt="Updated template with {{ variable }}",
|
||||
version=1, # Must be the latest version
|
||||
set_as_default=True # Make this the new default
|
||||
)
|
||||
```
|
||||
|
||||
**Important**: You must provide the current latest version number. The update creates a new version (e.g., version 2).
|
||||
|
||||
### List Prompts
|
||||
|
||||
List all prompts (returns default versions only):
|
||||
|
||||
```python
|
||||
response = client.prompts.list()
|
||||
for prompt in response.data:
|
||||
print(f"{prompt.prompt_id}: v{prompt.version} (default)")
|
||||
```
|
||||
|
||||
### List Prompt Versions
|
||||
|
||||
List all versions of a specific prompt:
|
||||
|
||||
```python
|
||||
response = client.prompts.list_versions(prompt_id="pmpt_abc123...")
|
||||
for prompt in response.data:
|
||||
default = " (default)" if prompt.is_default else ""
|
||||
print(f"Version {prompt.version}{default}")
|
||||
```
|
||||
|
||||
### Set Default Version
|
||||
|
||||
Change which version is the default:
|
||||
|
||||
```python
|
||||
client.prompts.set_default_version(
|
||||
prompt_id="pmpt_abc123...",
|
||||
version=2
|
||||
)
|
||||
```
|
||||
|
||||
## ID Mapping
|
||||
|
||||
The MLflow provider uses deterministic bidirectional ID mapping:
|
||||
|
||||
- **Llama Stack format**: `pmpt_<48-hex-chars>`
|
||||
- **MLflow format**: `llama_prompt_<48-hex-chars>`
|
||||
|
||||
Example:
|
||||
- Llama Stack ID: `pmpt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
|
||||
- MLflow name: `llama_prompt_8c2bf57972a215cd0413e399d03b901cce93815448173c1c`
|
||||
|
||||
This ensures prompts created through Llama Stack are easily identifiable in MLflow.
|
||||
|
||||
## Version Management
|
||||
|
||||
MLflow Prompts Provider implements immutable versioning:
|
||||
|
||||
1. **Create**: Creates version 1
|
||||
2. **Update**: Creates a new version (2, 3, 4, ...)
|
||||
3. **Default**: The "default" alias points to the current default version
|
||||
4. **History**: All versions are preserved and retrievable
|
||||
|
||||
```
|
||||
pmpt_abc123
|
||||
├── Version 1 (Original)
|
||||
├── Version 2 (Updated)
|
||||
└── Version 3 (Latest, Default) ← Default alias points here
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### MLflow Server Not Available
|
||||
|
||||
**Error**: `Failed to connect to MLflow server`
|
||||
|
||||
**Solutions**:
|
||||
1. Verify MLflow server is running: `curl http://localhost:5555/health`
|
||||
2. Check `mlflow_tracking_uri` in configuration
|
||||
3. Ensure network connectivity to remote server
|
||||
4. Check firewall settings
|
||||
|
||||
### Version Mismatch Error
|
||||
|
||||
**Error**: `Version X is not the latest version. Use latest version Y to update.`
|
||||
|
||||
**Cause**: Attempting to update an outdated version
|
||||
|
||||
**Solution**: Always use the latest version number when updating:
|
||||
```python
|
||||
# Get latest version
|
||||
versions = client.prompts.list_versions(prompt_id)
|
||||
latest_version = max(v.version for v in versions.data)
|
||||
|
||||
# Use latest version for update
|
||||
client.prompts.update(prompt_id=prompt_id, version=latest_version, ...)
|
||||
```
|
||||
|
||||
### Variable Validation Error
|
||||
|
||||
**Error**: `Template contains undeclared variables: ['var2']`
|
||||
|
||||
**Cause**: Template has `{{ var2 }}` but `variables` list doesn't include it
|
||||
|
||||
**Solution**: Either add missing variable or let the provider auto-extract:
|
||||
```python
|
||||
# Option 1: Add missing variable
|
||||
client.prompts.create(
|
||||
prompt="Template with {{ var1 }} and {{ var2 }}",
|
||||
variables=["var1", "var2"]
|
||||
)
|
||||
|
||||
# Option 2: Let provider auto-extract (recommended)
|
||||
client.prompts.create(
|
||||
prompt="Template with {{ var1 }} and {{ var2 }}"
|
||||
)
|
||||
```
|
||||
|
||||
### Timeout Errors
|
||||
|
||||
**Error**: Connection timeout when communicating with MLflow
|
||||
|
||||
**Solutions**:
|
||||
1. Increase `timeout_seconds` in configuration:
|
||||
```yaml
|
||||
config:
|
||||
timeout_seconds: 60 # Default: 30
|
||||
```
|
||||
2. Check network latency to MLflow server
|
||||
3. Verify MLflow server is responsive
|
||||
|
||||
### Prompt Not Found
|
||||
|
||||
**Error**: `Prompt pmpt_abc123... not found`
|
||||
|
||||
**Possible causes**:
|
||||
1. Prompt ID is incorrect
|
||||
2. Prompt was created in a different MLflow server/experiment
|
||||
3. Experiment name mismatch in configuration
|
||||
|
||||
**Solution**: Verify prompt exists in MLflow UI at `http://localhost:5555`
|
||||
|
||||
## Limitations
|
||||
|
||||
### No Deletion Support
|
||||
|
||||
**MLflow does not support deleting prompts or versions**. The `delete_prompt()` method raises `NotImplementedError`.
|
||||
|
||||
**Workaround**: Mark prompts as deprecated using naming conventions or set a different version as default.
|
||||
|
||||
### Experiment Required
|
||||
|
||||
All prompts are stored within an MLflow experiment. The experiment is created automatically if it doesn't exist.
|
||||
|
||||
### ID Format Constraints
|
||||
|
||||
- Prompt IDs must follow the format: `pmpt_<48-hex-chars>`
|
||||
- MLflow names use the prefix: `llama_prompt_`
|
||||
- Manual creation in MLflow with different names won't be recognized
|
||||
|
||||
### Version Numbering
|
||||
|
||||
- Versions are sequential integers (1, 2, 3, ...)
|
||||
- You cannot skip version numbers
|
||||
- You cannot manually set version numbers
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Environment Variables
|
||||
|
||||
Store MLflow URIs in environment variables:
|
||||
|
||||
```yaml
|
||||
config:
|
||||
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5555}
|
||||
```
|
||||
|
||||
### 2. Auto-Extract Variables
|
||||
|
||||
Let the provider auto-extract variables to avoid validation errors:
|
||||
|
||||
```python
|
||||
# Recommended
|
||||
prompt = client.prompts.create(
|
||||
prompt="Summarize {{ text }} in {{ format }}"
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Organize by Experiment
|
||||
|
||||
Use different experiments for different environments:
|
||||
|
||||
- `dev-prompts` for development
|
||||
- `staging-prompts` for staging
|
||||
- `production-prompts` for production
|
||||
|
||||
### 4. Version Management
|
||||
|
||||
- Always retrieve latest version before updating
|
||||
- Use `set_as_default=True` when updating to make new version active
|
||||
- Keep version history for audit trail
|
||||
|
||||
### 5. Use Meaningful Templates
|
||||
|
||||
Include context in your templates:
|
||||
|
||||
```python
|
||||
# Good
|
||||
prompt = """You are a {{ role }} assistant specialized in {{ domain }}.
|
||||
|
||||
Task: {{ task }}
|
||||
|
||||
Output format: {{ format }}"""
|
||||
|
||||
# Less clear
|
||||
prompt = "Do {{ task }} as {{ role }}"
|
||||
```
|
||||
|
||||
### 6. Monitor MLflow Server
|
||||
|
||||
- Use MLflow UI to visualize prompts: `http://your-server:5555`
|
||||
- Monitor experiment metrics and prompt versions
|
||||
- Set up alerts for MLflow server health
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Database Backend
|
||||
|
||||
For production, use a database backend instead of filesystem:
|
||||
|
||||
```bash
|
||||
mlflow server \
|
||||
--host 0.0.0.0 \
|
||||
--port 5000 \
|
||||
--backend-store-uri postgresql://user:pass@host:5432/mlflow \
|
||||
--default-artifact-root s3://my-bucket/mlflow-artifacts
|
||||
```
|
||||
|
||||
### High Availability
|
||||
|
||||
- Deploy multiple MLflow server instances behind a load balancer
|
||||
- Use managed database (RDS, Cloud SQL, etc.)
|
||||
- Store artifacts in object storage (S3, GCS, Azure Blob)
|
||||
|
||||
### Security
|
||||
|
||||
- Enable authentication on MLflow server
|
||||
- Use HTTPS for MLflow tracking URI
|
||||
- Restrict network access with firewall rules
|
||||
- Use IAM roles for cloud deployments
|
||||
|
||||
### Monitoring
|
||||
|
||||
Set up monitoring for:
|
||||
- MLflow server availability
|
||||
- Database connection pool
|
||||
- API response times
|
||||
- Prompt creation/retrieval rates
|
||||
|
||||
## Documentation
|
||||
See [MLflow's documentation](https://mlflow.org/docs/latest/prompts.html) for more details about MLflow Prompt Registry.
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `mlflow_tracking_uri` | `str` | No | http://localhost:5000 | MLflow tracking server URI |
|
||||
| `mlflow_registry_uri` | `str \| None` | No | None | MLflow model registry URI (defaults to tracking URI if not set) |
|
||||
| `experiment_name` | `str` | No | llama-stack-prompts | MLflow experiment name for storing prompts |
|
||||
| `auth_credential` | `SecretStr \| None` | No | None | MLflow API token for authentication. Can be overridden via provider data header. |
|
||||
| `timeout_seconds` | `int` | No | 30 | Timeout for MLflow API calls (1-300 seconds) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
**Without authentication** (local development):
|
||||
```yaml
|
||||
mlflow_tracking_uri: http://localhost:5555
|
||||
experiment_name: llama-stack-prompts
|
||||
timeout_seconds: 30
|
||||
```
|
||||
|
||||
**With authentication** (production):
|
||||
```yaml
|
||||
mlflow_tracking_uri: ${env.MLFLOW_TRACKING_URI:=http://localhost:5000}
|
||||
experiment_name: llama-stack-prompts
|
||||
auth_credential: ${env.MLFLOW_TRACKING_TOKEN:=}
|
||||
timeout_seconds: 30
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue