removed error handling for Chunk and add error handling for maybe_await

2025-12-03 09:53:45 +00:00 · 2025-12-02 17:33:05 +09:00 · 2025-12-02 17:33:05 +09:00 · 1de6d49064
commit 1de6d49064
parent 470adfc2df ee107aadd6
3053 changed files with 324713 additions and 738731 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -13,6 +13,42 @@ npm run serve
 ```
 You can open up the docs in your browser at http://localhost:3000

+## File Import System
+
+This documentation uses `remark-code-import` to import files directly from the repository, eliminating copy-paste maintenance. Files are automatically embedded during build time.
+
+### Importing Code Files
+
+To import Python code (or any code files) with syntax highlighting, use this syntax in `.mdx` files:
+
+```markdown
+```python file=./demo_script.py title="demo_script.py"
+```
+```
+
+This automatically imports the file content and displays it as a formatted code block with Python syntax highlighting.
+
+**Note:** Paths are relative to the current `.mdx` file location, not the repository root.
+
+### Importing Markdown Files as Content
+
+For importing and rendering markdown files (like CONTRIBUTING.md), use the raw-loader approach:
+
+```jsx
+import Contributing from '!!raw-loader!../../../CONTRIBUTING.md';
+import ReactMarkdown from 'react-markdown';
+
+<ReactMarkdown>{Contributing}</ReactMarkdown>
+```
+
+**Requirements:**
+- Install dependencies: `npm install --save-dev raw-loader react-markdown`
+
+**Path Resolution:**
+- For `remark-code-import`: Paths are relative to the current `.mdx` file location
+- For `raw-loader`: Paths are relative to the current `.mdx` file location
+- Use `../` to navigate up directories as needed
+
 ## Content

 Try out Llama Stack's capabilities through our detailed Jupyter notebooks:
--- a/docs/docs/api-deprecated/index.mdx
+++ b/docs/docs/api-deprecated/index.mdx
@ -0,0 +1,62 @@
+---
+title: Deprecated APIs
+description: Legacy APIs that are being phased out
+sidebar_label: Deprecated
+sidebar_position: 1
+---
+
+# Deprecated APIs
+
+This section contains APIs that are being phased out in favor of newer, more standardized implementations. These APIs are maintained for backward compatibility but are not recommended for new projects.
+
+:::warning Deprecation Notice
+These APIs are deprecated and will be removed in future versions. Please migrate to the recommended alternatives listed below.
+:::
+
+## Migration Guide
+
+When using deprecated APIs, please refer to the migration guides provided for each API to understand how to transition to the supported alternatives.
+
+## Deprecated API List
+
+### Legacy Inference APIs
+Some older inference endpoints that have been superseded by the standardized Inference API.
+
+**Migration Path:** Use the [Inference API](../api/) instead.
+
+### Legacy Vector Operations
+Older vector database operations that have been replaced by the Vector IO API.
+
+**Migration Path:** Use the [Vector IO API](../api/) instead.
+
+### Legacy File Operations
+Older file management endpoints that have been replaced by the Files API.
+
+**Migration Path:** Use the [Files API](../api/) instead.
+
+## Support Timeline
+
+Deprecated APIs will be supported according to the following timeline:
+
+- **Current Version**: Full support with deprecation warnings
+- **Next Major Version**: Limited support with migration notices
+- **Following Major Version**: Removal of deprecated APIs
+
+## Getting Help
+
+If you need assistance migrating from deprecated APIs:
+
+1. Check the specific migration guides for each API
+2. Review the [API Reference](../api/) for current alternatives
+3. Consult the [Community Forums](https://github.com/llamastack/llama-stack/discussions) for migration support
+4. Open an issue on GitHub for specific migration questions
+
+## Contributing
+
+If you find issues with deprecated APIs or have suggestions for improving the migration process, please contribute by:
+
+1. Opening an issue describing the problem
+2. Submitting a pull request with improvements
+3. Updating migration documentation
+
+For more information on contributing, see our [Contributing Guide](../contributing/).
--- a/docs/docs/api-experimental/index.mdx
+++ b/docs/docs/api-experimental/index.mdx
@ -0,0 +1,128 @@
+---
+title: Experimental APIs
+description: APIs in development with limited support
+sidebar_label: Experimental
+sidebar_position: 1
+---
+
+# Experimental APIs
+
+This section contains APIs that are currently in development and may have limited support or stability. These APIs are available for testing and feedback but should not be used in production environments.
+
+:::warning Experimental Notice
+These APIs are experimental and may change without notice. Use with caution and provide feedback to help improve them.
+:::
+
+## Current Experimental APIs
+
+### Batch Inference API
+Run inference on a dataset of inputs in batch mode for improved efficiency.
+
+**Status:** In Development
+**Provider Support:** Limited
+**Use Case:** Large-scale inference operations
+
+**Features:**
+- Batch processing of multiple inputs
+- Optimized resource utilization
+- Progress tracking and monitoring
+
+### Batch Agents API
+Run agentic workflows on a dataset of inputs in batch mode.
+
+**Status:** In Development
+**Provider Support:** Limited
+**Use Case:** Large-scale agent operations
+
+**Features:**
+- Batch agent execution
+- Parallel processing capabilities
+- Result aggregation and analysis
+
+### Synthetic Data Generation API
+Generate synthetic data for model development and testing.
+
+**Status:** Early Development
+**Provider Support:** Very Limited
+**Use Case:** Training data augmentation
+
+**Features:**
+- Automated data generation
+- Quality control mechanisms
+- Customizable generation parameters
+
+### Batches API (OpenAI-compatible)
+OpenAI-compatible batch management for inference operations.
+
+**Status:** In Development
+**Provider Support:** Limited
+**Use Case:** OpenAI batch processing compatibility
+
+**Features:**
+- OpenAI batch API compatibility
+- Job scheduling and management
+- Status tracking and monitoring
+
+## Getting Started with Experimental APIs
+
+### Prerequisites
+- Llama Stack server running with experimental features enabled
+- Appropriate provider configurations
+- Understanding of API limitations
+
+### Configuration
+Experimental APIs may require special configuration flags or provider settings. Check the specific API documentation for setup requirements.
+
+### Usage Guidelines
+1. **Testing Only**: Use experimental APIs for testing and development only
+2. **Monitor Changes**: Watch for updates and breaking changes
+3. **Provide Feedback**: Report issues and suggest improvements
+4. **Backup Data**: Always backup important data when using experimental features
+
+## Feedback and Contribution
+
+We encourage feedback on experimental APIs to help improve them:
+
+### Reporting Issues
+- Use GitHub issues with the "experimental" label
+- Include detailed error messages and reproduction steps
+- Specify the API version and provider being used
+
+### Feature Requests
+- Submit feature requests through GitHub discussions
+- Provide use cases and expected behavior
+- Consider contributing implementations
+
+### Testing
+- Test experimental APIs in your environment
+- Report performance issues and optimization opportunities
+- Share success stories and use cases
+
+## Migration to Stable APIs
+
+As experimental APIs mature, they will be moved to the stable API section. When this happens:
+
+1. **Announcement**: We'll announce the promotion in release notes
+2. **Migration Guide**: Detailed migration instructions will be provided
+3. **Deprecation Timeline**: Experimental versions will be deprecated with notice
+4. **Support**: Full support will be available for stable versions
+
+## Provider Support
+
+Experimental APIs may have limited provider support. Check the specific API documentation for:
+
+- Supported providers
+- Configuration requirements
+- Known limitations
+- Performance characteristics
+
+## Roadmap
+
+Experimental APIs are part of our ongoing development roadmap:
+
+- **Q1 2024**: Batch Inference API stabilization
+- **Q2 2024**: Batch Agents API improvements
+- **Q3 2024**: Synthetic Data Generation API expansion
+- **Q4 2024**: Batches API full OpenAI compatibility
+
+For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).
--- a/docs/docs/api-openai/index.mdx
+++ b/docs/docs/api-openai/index.mdx
@ -0,0 +1,287 @@
+---
+title: OpenAI API Compatibility
+description: OpenAI-compatible APIs and features in Llama Stack
+sidebar_label: OpenAI Compatibility
+sidebar_position: 1
+---
+
+# OpenAI API Compatibility
+
+Llama Stack provides comprehensive OpenAI API compatibility, allowing you to use existing OpenAI API clients and tools with Llama Stack providers. This compatibility layer ensures seamless migration and interoperability.
+
+## Overview
+
+OpenAI API compatibility in Llama Stack includes:
+
+- **OpenAI-compatible endpoints** for all major APIs
+- **Request/response format compatibility** with OpenAI standards
+- **Authentication and authorization** using OpenAI-style API keys
+- **Error handling** with OpenAI-compatible error codes and messages
+- **Rate limiting** and usage tracking compatible with OpenAI patterns
+
+## Supported OpenAI APIs
+
+### Chat Completions API
+OpenAI-compatible chat completions for conversational AI applications.
+
+**Endpoint:** `/v1/chat/completions`
+**Compatibility:** Full OpenAI API compatibility
+**Providers:** All inference providers
+
+**Features:**
+- Message-based conversations
+- System prompts and user messages
+- Function calling support
+- Streaming responses
+- Temperature and other parameter controls
+
+### Completions API
+OpenAI-compatible text completions for general text generation.
+
+**Endpoint:** `/v1/completions`
+**Compatibility:** Full OpenAI API compatibility
+**Providers:** All inference providers
+
+**Features:**
+- Text completion generation
+- Prompt engineering support
+- Customizable parameters
+- Batch processing capabilities
+
+### Embeddings API
+OpenAI-compatible embeddings for vector operations.
+
+**Endpoint:** `/v1/embeddings`
+**Compatibility:** Full OpenAI API compatibility
+**Providers:** All embedding providers
+
+**Features:**
+- Text embedding generation
+- Multiple embedding models
+- Batch embedding processing
+- Vector similarity operations
+
+### Files API
+OpenAI-compatible file management for document processing.
+
+**Endpoint:** `/v1/files`
+**Compatibility:** Full OpenAI API compatibility
+**Providers:** Local Filesystem, S3
+
+**Features:**
+- File upload and management
+- Document processing
+- File metadata tracking
+- Secure file access
+
+### Vector Store Files API
+OpenAI-compatible vector store file operations for RAG applications.
+
+**Endpoint:** `/v1/vector_stores/{vector_store_id}/files`
+**Compatibility:** Full OpenAI API compatibility
+**Providers:** FAISS, SQLite-vec, Milvus, ChromaDB, Qdrant, Weaviate, Postgres (PGVector)
+
+**Features:**
+- Automatic document processing
+- Vector store integration
+- File chunking and indexing
+- Search and retrieval operations
+
+### Batches API
+OpenAI-compatible batch processing for large-scale operations.
+
+**Endpoint:** `/v1/batches`
+**Compatibility:** OpenAI API compatibility (experimental)
+**Providers:** Limited support
+
+**Features:**
+- Batch job creation and management
+- Progress tracking
+- Result retrieval
+- Error handling
+
+## Migration from OpenAI
+
+### Step 1: Update API Endpoint
+Change your API endpoint from OpenAI to your Llama Stack server:
+
+```python
+# Before (OpenAI)
+import openai
+client = openai.OpenAI(api_key="your-openai-key")
+
+# After (Llama Stack)
+import openai
+client = openai.OpenAI(
+    api_key="your-llama-stack-key",
+    base_url="http://localhost:8000/v1"  # Your Llama Stack server
+)
+```
+
+### Step 2: Configure Providers
+Set up your preferred providers in the Llama Stack configuration:
+
+```yaml
+# stack-config.yaml
+inference:
+  providers:
+    - name: "meta-reference"
+      type: "inline"
+      model: "llama-3.1-8b"
+```
+
+### Step 3: Test Compatibility
+Verify that your existing code works with Llama Stack:
+
+```python
+# Test chat completions
+response = client.chat.completions.create(
+    model="llama-3.1-8b",
+    messages=[
+        {"role": "user", "content": "Hello, world!"}
+    ]
+)
+print(response.choices[0].message.content)
+```
+
+## Provider-Specific Features
+
+### Meta Reference Provider
+- Full OpenAI API compatibility
+- Local model execution
+- Custom model support
+
+### Remote Providers
+- OpenAI API compatibility
+- Cloud-based execution
+- Scalable infrastructure
+
+### Vector Store Providers
+- OpenAI vector store API compatibility
+- Automatic document processing
+- Advanced search capabilities
+
+## Authentication
+
+Llama Stack supports OpenAI-style authentication:
+
+### API Key Authentication
+```python
+client = openai.OpenAI(
+    api_key="your-api-key",
+    base_url="http://localhost:8000/v1"
+)
+```
+
+### Environment Variables
+```bash
+export OPENAI_API_KEY="your-api-key"
+export OPENAI_BASE_URL="http://localhost:8000/v1"
+```
+
+## Error Handling
+
+Llama Stack provides OpenAI-compatible error responses:
+
+```python
+try:
+    response = client.chat.completions.create(...)
+except openai.APIError as e:
+    print(f"API Error: {e}")
+except openai.RateLimitError as e:
+    print(f"Rate Limit Error: {e}")
+except openai.APIConnectionError as e:
+    print(f"Connection Error: {e}")
+```
+
+## Rate Limiting
+
+OpenAI-compatible rate limiting is supported:
+
+- **Requests per minute** limits
+- **Tokens per minute** limits
+- **Concurrent request** limits
+- **Usage tracking** and monitoring
+
+## Monitoring and Observability
+
+Track your API usage with OpenAI-compatible monitoring:
+
+- **Request/response logging**
+- **Usage metrics** and analytics
+- **Performance monitoring**
+- **Error tracking** and alerting
+
+## Best Practices
+
+### 1. Provider Selection
+Choose providers based on your requirements:
+- **Local development**: Meta Reference, Ollama
+- **Production**: Cloud providers (Fireworks, Together, NVIDIA)
+- **Specialized use cases**: Custom providers
+
+### 2. Model Configuration
+Configure models for optimal performance:
+- **Model selection** based on task requirements
+- **Parameter tuning** for specific use cases
+- **Resource allocation** for performance
+
+### 3. Error Handling
+Implement robust error handling:
+- **Retry logic** for transient failures
+- **Fallback providers** for high availability
+- **Monitoring** and alerting for issues
+
+### 4. Security
+Follow security best practices:
+- **API key management** and rotation
+- **Access control** and authorization
+- **Data privacy** and compliance
+
+## Implementation Examples
+
+For detailed code examples and implementation guides, see our [OpenAI Implementation Guide](../providers/openai.mdx).
+
+## Known Limitations
+
+### Responses API Limitations
+The Responses API is still in active development. For detailed information about current limitations and implementation status, see our [OpenAI Responses API Limitations](../providers/openai_responses_limitations.mdx).
+
+## Troubleshooting
+
+### Common Issues
+
+**Connection Errors**
+- Verify server is running
+- Check network connectivity
+- Validate API endpoint URL
+
+**Authentication Errors**
+- Verify API key is correct
+- Check key permissions
+- Ensure proper authentication headers
+
+**Model Errors**
+- Verify model is available
+- Check provider configuration
+- Validate model parameters
+
+### Getting Help
+
+For OpenAI compatibility issues:
+
+1. **Check Documentation**: Review provider-specific documentation
+2. **Community Support**: Ask questions in GitHub discussions
+3. **Issue Reporting**: Open GitHub issues for bugs
+4. **Professional Support**: Contact support for enterprise issues
+
+## Roadmap
+
+Upcoming OpenAI compatibility features:
+
+- **Enhanced batch processing** support
+- **Advanced function calling** capabilities
+- **Improved error handling** and diagnostics
+- **Performance optimizations** for large-scale deployments
+
+For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).
--- a/docs/docs/api/index.mdx
+++ b/docs/docs/api/index.mdx
@ -0,0 +1,144 @@
+---
+title: API Reference
+description: Complete reference for Llama Stack APIs
+sidebar_label: Overview
+sidebar_position: 1
+---
+
+# API Reference
+
+Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers.
+
+## Core APIs
+
+### Inference API
+Run inference with Large Language Models (LLMs) and embedding models.
+
+**Supported Providers:**
+- Meta Reference (Single Node)
+- Ollama (Single Node)
+- Fireworks (Hosted)
+- Together (Hosted)
+- NVIDIA NIM (Hosted and Single Node)
+- vLLM (Hosted and Single Node)
+- TGI (Hosted and Single Node)
+- AWS Bedrock (Hosted)
+- Cerebras (Hosted)
+- Groq (Hosted)
+- SambaNova (Hosted)
+- PyTorch ExecuTorch (On-device iOS, Android)
+- OpenAI (Hosted)
+- Anthropic (Hosted)
+- Gemini (Hosted)
+- WatsonX (Hosted)
+
+### Agents API
+Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning.
+
+**Supported Providers:**
+- Meta Reference (Single Node)
+- Fireworks (Hosted)
+- Together (Hosted)
+- PyTorch ExecuTorch (On-device iOS)
+
+### Vector IO API
+Perform operations on vector stores, including adding documents, searching, and deleting documents.
+
+**Supported Providers:**
+- FAISS (Single Node)
+- SQLite-Vec (Single Node)
+- Chroma (Hosted and Single Node)
+- Milvus (Hosted and Single Node)
+- Postgres (PGVector) (Hosted and Single Node)
+- Weaviate (Hosted)
+- Qdrant (Hosted and Single Node)
+
+### Files API (OpenAI-compatible)
+Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints.
+
+**Supported Providers:**
+- Local Filesystem (Single Node)
+- S3 (Hosted)
+
+### Vector Store Files API (OpenAI-compatible)
+Integrate file operations with vector stores for automatic document processing and search.
+
+**Supported Providers:**
+- FAISS (Single Node)
+- SQLite-vec (Single Node)
+- Milvus (Single Node)
+- ChromaDB (Hosted and Single Node)
+- Qdrant (Hosted and Single Node)
+- Weaviate (Hosted)
+- Postgres (PGVector) (Hosted and Single Node)
+
+### Safety API
+Apply safety policies to outputs at a systems level, not just model level.
+
+**Supported Providers:**
+- Llama Guard (Depends on Inference Provider)
+- Prompt Guard (Single Node)
+- Code Scanner (Single Node)
+- AWS Bedrock (Hosted)
+
+### Post Training API
+Fine-tune models for specific use cases and domains.
+
+**Supported Providers:**
+- Meta Reference (Single Node)
+- HuggingFace (Single Node)
+- TorchTune (Single Node)
+- NVIDIA NEMO (Hosted)
+
+### Eval API
+Generate outputs and perform scoring to evaluate system performance.
+
+**Supported Providers:**
+- Meta Reference (Single Node)
+- NVIDIA NEMO (Hosted)
+
+### Telemetry API
+Collect telemetry data from the system for monitoring and observability.
+
+**Supported Providers:**
+- Meta Reference (Single Node)
+
+### Tool Runtime API
+Interact with various tools and protocols to extend LLM capabilities.
+
+**Supported Providers:**
+- Brave Search (Hosted)
+- RAG Runtime (Single Node)
+
+## API Compatibility
+
+All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to:
+- Use existing OpenAI API clients and tools
+- Migrate from OpenAI to other providers seamlessly
+- Maintain consistent API contracts across different environments
+
+## Getting Started
+
+To get started with Llama Stack APIs:
+
+1. **Choose a Distribution**: Select a pre-configured distribution that matches your environment
+2. **Configure Providers**: Set up the providers you want to use for each API
+3. **Start the Server**: Launch the Llama Stack server with your configuration
+4. **Use the APIs**: Make requests to the API endpoints using your preferred client
+
+For detailed setup instructions, see our [Getting Started Guide](../getting_started/quickstart).
+
+## Provider Details
+
+For complete provider compatibility and setup instructions, see our [Providers Documentation](../providers/).
+
+## API Stability
+
+Llama Stack APIs are organized by stability level:
+- **[Stable APIs](./index.mdx)** - Production-ready APIs with full support
+- **[Experimental APIs](../api-experimental/)** - APIs in development with limited support
+- **[Deprecated APIs](../api-deprecated/)** - Legacy APIs being phased out
+
+## OpenAI Integration
+
+For specific OpenAI API compatibility features, see our [OpenAI Compatibility Guide](../api-openai/).
--- a/docs/docs/building_applications/index.mdx
+++ b/docs/docs/building_applications/index.mdx
@ -35,9 +35,6 @@ Here are the key topics that will help you build effective AI applications:
 - **[Telemetry](./telemetry.mdx)** - Monitor and analyze your agents' performance and behavior
 - **[Safety](./safety.mdx)** - Implement guardrails and safety measures to ensure responsible AI behavior

-### 🎮 **Interactive Development**
- **[Playground](./playground.mdx)** - Interactive environment for testing and developing applications
-
 ## Application Patterns

 ### 🤖 **Conversational Agents**
--- a/docs/docs/building_applications/playground.mdx
+++ b/docs/docs/building_applications/playground.mdx
@ -1,298 +1,87 @@
 ---
-title: Llama Stack Playground
-description: Interactive interface to explore and experiment with Llama Stack capabilities
+title: Admin UI & Chat Playground
+description: Web-based admin interface and chat playground for Llama Stack
 sidebar_label: Playground
 sidebar_position: 10
 ---

-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
+# Admin UI & Chat Playground

-# Llama Stack Playground
+The Llama Stack UI provides a comprehensive web-based admin interface for managing your Llama Stack server, with an integrated chat playground for interactive testing. This admin interface is the primary way to monitor, manage, and debug your Llama Stack applications.

-:::note[Experimental Feature]
-The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
-:::
+## Quick Start

-The Llama Stack Playground is a simple interface that aims to:
- **Showcase capabilities and concepts** of Llama Stack in an interactive environment
- **Demo end-to-end application code** to help users get started building their own applications
- **Provide a UI** to help users inspect and understand Llama Stack API providers and resources
-
-## Key Features
-
-### Interactive Playground Pages
-
-The playground provides interactive pages for users to explore Llama Stack API capabilities:
-
-#### Chatbot Interface
-
-<video
-  controls
-  autoPlay
-  playsInline
-  muted
-  loop
-  style={{width: '100%'}}
->
-  <source src="https://github.com/user-attachments/assets/8d2ef802-5812-4a28-96e1-316038c84cbf" type="video/mp4" />
-  Your browser does not support the video tag.
-</video>
-
-<Tabs>
-<TabItem value="chat" label="Chat">
-
-**Simple Chat Interface**
- Chat directly with Llama models through an intuitive interface
- Uses the `/chat/completions` streaming API under the hood
- Real-time message streaming for responsive interactions
- Perfect for testing model capabilities and prompt engineering
-
-</TabItem>
-<TabItem value="rag" label="RAG Chat">
-
-**Document-Aware Conversations**
- Upload documents to create memory banks
- Chat with a RAG-enabled agent that can query your documents
- Uses Llama Stack's `/agents` API to create and manage RAG sessions
- Ideal for exploring knowledge-enhanced AI applications
-
-</TabItem>
-</Tabs>
-
-#### Evaluation Interface
-
-<video
-  controls
-  autoPlay
-  playsInline
-  muted
-  loop
-  style={{width: '100%'}}
->
-  <source src="https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5" type="video/mp4" />
-  Your browser does not support the video tag.
-</video>
-
-<Tabs>
-<TabItem value="scoring" label="Scoring Evaluations">
-
-**Custom Dataset Evaluation**
- Upload your own evaluation datasets
- Run evaluations using available scoring functions
- Uses Llama Stack's `/scoring` API for flexible evaluation workflows
- Great for testing application performance on custom metrics
-
-</TabItem>
-<TabItem value="benchmarks" label="Benchmark Evaluations">
-
-<video
-  controls
-  autoPlay
-  playsInline
-  muted
-  loop
-  style={{width: '100%', marginBottom: '1rem'}}
->
-  <source src="https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf" type="video/mp4" />
-  Your browser does not support the video tag.
-</video>
-
-**Pre-registered Evaluation Tasks**
- Evaluate models or agents on pre-defined tasks
- Uses Llama Stack's `/eval` API for comprehensive evaluation
- Combines datasets and scoring functions for standardized testing
-
-**Setup Requirements:**
-Register evaluation datasets and benchmarks first:
+Launch the admin UI with:

 ```bash
-# Register evaluation dataset
-llama-stack-client datasets register \
-  --dataset-id "mmlu" \
-  --provider-id "huggingface" \
-  --url "https://huggingface.co/datasets/llamastack/evals" \
-  --metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
-  --schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string"}, "chat_completion_input": {"type": "string"}}'
-
-# Register benchmark task
-llama-stack-client benchmarks register \
-  --eval-task-id meta-reference-mmlu \
-  --provider-id meta-reference \
-  --dataset-id mmlu \
-  --scoring-functions basic::regex_parser_multiple_choice_answer
+npx llama-stack-ui
 ```

-</TabItem>
-</Tabs>
+Then visit `http://localhost:8322` to access the interface.

-#### Inspection Interface
+## Admin Interface Features

-<video
-  controls
-  autoPlay
-  playsInline
-  muted
-  loop
-  style={{width: '100%'}}
->
-  <source src="https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99" type="video/mp4" />
-  Your browser does not support the video tag.
-</video>
+The Llama Stack UI is organized into three main sections:

-<Tabs>
-<TabItem value="providers" label="API Providers">
+### 🎯 Create
+**Chat Playground** - Interactive testing environment
+- Real-time chat interface for testing agents and models
+- Multi-turn conversations with tool calling support
+- Agent SDK integration (will be migrated to Responses API)
+- Custom system prompts and model parameter adjustment

-**Provider Management**
- Inspect available Llama Stack API providers
- View provider configurations and capabilities
- Uses the `/providers` API for real-time provider information
- Essential for understanding your deployment's capabilities
+### 📊 Manage
+**Logs & Resource Management** - Monitor and manage your stack
+- **Responses Logs**: View and analyze agent responses and interactions
+- **Chat Completions Logs**: Monitor chat completion requests and responses
+- **Vector Stores**: Create, manage, and monitor vector databases for RAG workflows
+- **Prompts**: Full CRUD operations for prompt templates and management
+- **Files**: Forthcoming file management capabilities

-</TabItem>
-<TabItem value="resources" label="API Resources">
+## Key Capabilities for Application Development

-**Resource Exploration**
- Inspect Llama Stack API resources including:
-  - **Models**: Available language models
-  - **Datasets**: Registered evaluation datasets
-  - **Memory Banks**: Vector databases and knowledge stores
-  - **Benchmarks**: Evaluation tasks and scoring functions
-  - **Shields**: Safety and content moderation tools
- Uses `/<resources>/list` APIs for comprehensive resource visibility
- For detailed information about resources, see [Core Concepts](/docs/concepts)
+### Real-time Monitoring
+- **Response Tracking**: Monitor all agent responses and tool calls
+- **Completion Analysis**: View chat completion performance and patterns
+- **Vector Store Activity**: Track RAG operations and document processing
+- **Prompt Usage**: Analyze prompt template performance

-</TabItem>
-</Tabs>
+### Resource Management
+- **Vector Store CRUD**: Create, update, and delete vector databases
+- **Prompt Library**: Organize and version control your prompts
+- **File Operations**: Manage documents and assets (forthcoming)
+
+### Interactive Testing
+- **Chat Playground**: Test conversational flows before production deployment
+- **Agent Prototyping**: Validate agent behaviors and tool integrations
+
+## Development Workflow Integration
+
+The admin UI supports your development lifecycle:
+
+1. **Development**: Use chat playground to prototype and test features
+2. **Monitoring**: Track system performance through logs and metrics
+3. **Management**: Organize prompts, vector stores, and other resources
+4. **Debugging**: Analyze logs to identify and resolve issues
+
+## Architecture Notes
+
+- **Current**: Chat playground uses Agents SDK
+- **Future**: Migration to Responses API for improved performance and consistency
+- **Admin Focus**: Primary emphasis on monitoring, logging, and resource management

 ## Getting Started

-### Quick Start Guide
+1. **Launch the UI**: Run `npx llama-stack-ui`
+2. **Explore Logs**: Start with Responses and Chat Completions logs to understand your system activity
+3. **Test in Playground**: Use the chat interface to validate your agent configurations
+4. **Manage Resources**: Create vector stores and organize prompts through the UI

-<Tabs>
-<TabItem value="setup" label="Setup">
+For detailed setup and configuration, see the [Llama Stack UI documentation](/docs/distributions/llama_stack_ui).

-**1. Start the Llama Stack API Server**
+## Next Steps

-```bash
-llama stack list-deps together | xargs -L1 uv pip install
-llama stack run together
-```
-
-**2. Start the Streamlit UI**
-
-```bash
-# Launch the playground interface
-uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
-```
-
-</TabItem>
-<TabItem value="usage" label="Usage Tips">
-
-**Making the Most of the Playground:**
-
- **Start with Chat**: Test basic model interactions and prompt engineering
- **Explore RAG**: Upload sample documents to see knowledge-enhanced responses
- **Try Evaluations**: Use the scoring interface to understand evaluation metrics
- **Inspect Resources**: Check what providers and resources are available
- **Experiment with Settings**: Adjust parameters to see how they affect results
-
-</TabItem>
-</Tabs>
-
-### Available Distributions
-
-The playground works with any Llama Stack distribution. Popular options include:
-
-<Tabs>
-<TabItem value="together" label="Together AI">
-
-```bash
-llama stack list-deps together | xargs -L1 uv pip install
-llama stack run together
-```
-
-**Features:**
- Cloud-hosted models
- Fast inference
- Multiple model options
-
-</TabItem>
-<TabItem value="ollama" label="Ollama (Local)">
-
-```bash
-llama stack list-deps ollama | xargs -L1 uv pip install
-llama stack run ollama
-```
-
-**Features:**
- Local model execution
- Privacy-focused
- No internet required
-
-</TabItem>
-<TabItem value="meta-reference" label="Meta Reference">
-
-```bash
-llama stack list-deps meta-reference | xargs -L1 uv pip install
-llama stack run meta-reference
-```
-
-**Features:**
- Reference implementation
- All API features available
- Best for development
-
-</TabItem>
-</Tabs>
-
-## Use Cases & Examples
-
-### Educational Use Cases
- **Learning Llama Stack**: Hands-on exploration of API capabilities
- **Prompt Engineering**: Interactive testing of different prompting strategies
- **RAG Experimentation**: Understanding how document retrieval affects responses
- **Evaluation Understanding**: See how different metrics evaluate model performance
-
-### Development Use Cases
- **Prototype Testing**: Quick validation of application concepts
- **API Exploration**: Understanding available endpoints and parameters
- **Integration Planning**: Seeing how different components work together
- **Demo Creation**: Showcasing Llama Stack capabilities to stakeholders
-
-### Research Use Cases
- **Model Comparison**: Side-by-side testing of different models
- **Evaluation Design**: Understanding how scoring functions work
- **Safety Testing**: Exploring shield effectiveness with different inputs
- **Performance Analysis**: Measuring model behavior across different scenarios
-
-## Best Practices
-
-### 🚀 **Getting Started**
- Begin with simple chat interactions to understand basic functionality
- Gradually explore more advanced features like RAG and evaluations
- Use the inspection tools to understand your deployment's capabilities
-
-### 🔧 **Development Workflow**
- Use the playground to prototype before writing application code
- Test different parameter settings interactively
- Validate evaluation approaches before implementing them programmatically
-
-### 📊 **Evaluation & Testing**
- Start with simple scoring functions before trying complex evaluations
- Use the playground to understand evaluation results before automation
- Test safety features with various input types
-
-### 🎯 **Production Preparation**
- Use playground insights to inform your production API usage
- Test edge cases and error conditions interactively
- Validate resource configurations before deployment
-
-## Related Resources
-
- **[Getting Started Guide](../getting_started/quickstart)** - Complete setup and introduction
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack fundamentals
- **[Agents](./agent)** - Building intelligent agents
- **[RAG (Retrieval Augmented Generation)](./rag)** - Knowledge-enhanced applications
- **[Evaluations](./evals)** - Comprehensive evaluation framework
- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
+- Set up your [first agent](/docs/building_applications/agent)
+- Implement [RAG functionality](/docs/building_applications/rag)
+- Add [evaluation metrics](/docs/building_applications/evals)
+- Configure [safety measures](/docs/building_applications/safety)
--- a/docs/docs/building_applications/safety.mdx
+++ b/docs/docs/building_applications/safety.mdx
@ -391,5 +391,4 @@ client.shields.register(
 - **[Agents](./agent)** - Integrating safety shields with intelligent agents
 - **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
 - **[Evaluations](./evals)** - Evaluating safety shield effectiveness
- **[Telemetry](./telemetry)** - Monitoring safety violations and metrics
 - **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details
--- a/docs/docs/building_applications/telemetry.mdx
+++ b/docs/docs/building_applications/telemetry.mdx
@ -10,203 +10,34 @@ import TabItem from '@theme/TabItem';

 # Telemetry

-The Llama Stack uses OpenTelemetry to provide comprehensive tracing, metrics, and logging capabilities.
+The preferred way to instrument Llama Stack is with OpenTelemetry. Llama Stack enriches the data
+collected by OpenTelemetry to capture helpful information about the performance and behavior of your
+application. Here is an example of how to forward your telemetry to an OTLP collector from Llama Stack:

+```sh
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
+export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
+export OTEL_SERVICE_NAME="llama-stack-server"

-## Automatic Metrics Generation
+uv pip install opentelemetry-distro opentelemetry-exporter-otlp
+uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -

-Llama Stack automatically generates metrics during inference operations. These metrics are aggregated at the **inference request level** and provide insights into token usage and model performance.
-
-### Available Metrics
-
-The following metrics are automatically generated for each inference request:
-
-| Metric Name | Type | Unit | Description | Labels |
-|-------------|------|------|-------------|--------|
-| `llama_stack_prompt_tokens_total` | Counter | `tokens` | Number of tokens in the input prompt | `model_id`, `provider_id` |
-| `llama_stack_completion_tokens_total` | Counter | `tokens` | Number of tokens in the generated response | `model_id`, `provider_id` |
-| `llama_stack_tokens_total` | Counter | `tokens` | Total tokens used (prompt + completion) | `model_id`, `provider_id` |
-
-### Metric Generation Flow
-
-1. **Token Counting**: During inference operations (chat completion, completion, etc.), the system counts tokens in both input prompts and generated responses
-2. **Metric Construction**: For each request, `MetricEvent` objects are created with the token counts
-3. **Telemetry Logging**: Metrics are sent to the configured telemetry sinks
-4. **OpenTelemetry Export**: When OpenTelemetry is enabled, metrics are exposed as standard OpenTelemetry counters
-
-### Metric Aggregation Level
-
-All metrics are generated and aggregated at the **inference request level**. This means:
-
- Each individual inference request generates its own set of metrics
- Metrics are not pre-aggregated across multiple requests
- Aggregation (sums, averages, etc.) can be performed by your observability tools (Prometheus, Grafana, etc.)
- Each metric includes labels for `model_id` and `provider_id` to enable filtering and grouping
-
-### Example Metric Event
-
-```python
-MetricEvent(
-    trace_id="1234567890abcdef",
-    span_id="abcdef1234567890",
-    metric="total_tokens",
-    value=150,
-    timestamp=1703123456.789,
-    unit="tokens",
-    attributes={
-        "model_id": "meta-llama/Llama-3.2-3B-Instruct",
-        "provider_id": "tgi"
-    },
-)
+uv run opentelemetry-instrument llama stack run run.yaml
 ```

-## Telemetry Sinks

-Choose from multiple sink types based on your observability needs:
+### Known issues

-<Tabs>
-<TabItem value="opentelemetry" label="OpenTelemetry">
+Some database instrumentation libraries have a known bug where spans get wrapped twice, or do not get connected to a trace.
+To prevent this, you can disable database specific tracing, and rely just on the SQLAlchemy tracing. If you are using
+`sqlite3` as your database, for example, you can disable the additional tracing like this:

-Send events to an OpenTelemetry Collector for integration with observability platforms:
-
-**Use Cases:**
- Visualizing traces in tools like Jaeger
- Collecting metrics for Prometheus
- Integration with enterprise observability stacks
-
-**Features:**
- Standard OpenTelemetry format
- Compatible with all OpenTelemetry collectors
- Supports both traces and metrics
-
-</TabItem>
-<TabItem value="console" label="Console">
-
-Print events to the console for immediate debugging:
-
-**Use Cases:**
- Development and testing
- Quick debugging sessions
- Simple logging without external tools
-
-**Features:**
- Immediate output visibility
- No setup required
- Human-readable format
-
-</TabItem>
-</Tabs>
-
-## Configuration
-
-### Meta-Reference Provider
-
-Currently, only the meta-reference provider is implemented. It can be configured to send events to multiple sink types:
-
-```yaml
-telemetry:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      service_name: "llama-stack-service"
-      sinks: ['console', 'otel_trace', 'otel_metric']
-      otel_exporter_otlp_endpoint: "http://localhost:4318"
+```sh
+export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3"
 ```

-### Environment Variables
-
-Configure telemetry behavior using environment variables:
-
- **`OTEL_EXPORTER_OTLP_ENDPOINT`**: OpenTelemetry Collector endpoint (default: `http://localhost:4318`)
- **`OTEL_SERVICE_NAME`**: Service name for telemetry (default: empty string)
- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `[]`)
-
-### Quick Setup: Complete Telemetry Stack
-
-Use the automated setup script to launch the complete telemetry stack (Jaeger, OpenTelemetry Collector, Prometheus, and Grafana):
-
-```bash
-./scripts/telemetry/setup_telemetry.sh
-```
-
-This sets up:
- **Jaeger UI**: http://localhost:16686 (traces visualization)
- **Prometheus**: http://localhost:9090 (metrics)
- **Grafana**: http://localhost:3000 (dashboards with auto-configured data sources)
- **OTEL Collector**: http://localhost:4318 (OTLP endpoint)
-
-Once running, you can visualize traces by navigating to [Grafana](http://localhost:3000/) and login with login `admin` and password `admin`.
-
-## Querying Metrics
-
-When using the OpenTelemetry sink, metrics are exposed in standard format and can be queried through various tools:
-
-<Tabs>
-<TabItem value="prometheus" label="Prometheus Queries">
-
-Example Prometheus queries for analyzing token usage:
-
-```promql
-# Total tokens used across all models
-sum(llama_stack_tokens_total)
-
-# Tokens per model
-sum by (model_id) (llama_stack_tokens_total)
-
-# Average tokens per request over 5 minutes
-rate(llama_stack_tokens_total[5m])
-
-# Token usage by provider
-sum by (provider_id) (llama_stack_tokens_total)
-```
-
-</TabItem>
-<TabItem value="grafana" label="Grafana Dashboards">
-
-Create dashboards using Prometheus as a data source:
-
- **Token Usage Over Time**: Line charts showing token consumption trends
- **Model Performance**: Comparison of different models by token efficiency
- **Provider Analysis**: Breakdown of usage across different providers
- **Request Patterns**: Understanding peak usage times and patterns
-
-</TabItem>
-<TabItem value="otlp" label="OpenTelemetry Collector">
-
-Forward metrics to other observability systems:
-
- Export to multiple backends simultaneously
- Apply transformations and filtering
- Integrate with existing monitoring infrastructure
-
-</TabItem>
-</Tabs>
-
-## Best Practices
-
-### 🔍 **Monitoring Strategy**
- Use OpenTelemetry for production environments
- Set up alerts on key metrics like token usage and error rates
-
-### 📊 **Metrics Analysis**
- Track token usage trends to optimize costs
- Monitor response times across different models
- Analyze usage patterns to improve resource allocation
-
-### 🚨 **Alerting & Debugging**
- Set up alerts for unusual token consumption spikes
- Use trace data to debug performance issues
- Monitor error rates and failure patterns
-
-### 🔧 **Configuration Management**
- Use environment variables for flexible deployment
- Ensure proper network access to OpenTelemetry collectors
-

 ## Related Resources

- **[Agents](./agent)** - Monitoring agent execution with telemetry
- **[Evaluations](./evals)** - Using telemetry data for performance evaluation
- **[Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Telemetry examples and queries
 - **[OpenTelemetry Documentation](https://opentelemetry.io/)** - Comprehensive observability framework
 - **[Jaeger Documentation](https://www.jaegertracing.io/)** - Distributed tracing visualization
--- a/docs/docs/building_applications/tools.mdx
+++ b/docs/docs/building_applications/tools.mdx
@ -104,23 +104,19 @@ client.toolgroups.register(
 )
 ```

-Note that most of the more useful MCP servers need you to authenticate with them. Many of them use OAuth2.0 for authentication. You can provide authorization headers to send to the MCP server using the "Provider Data" abstraction provided by Llama Stack. When making an agent call,
+Note that most of the more useful MCP servers need you to authenticate with them. Many of them use OAuth2.0 for authentication. You can provide the authorization token when creating the Agent:

 ```python
 agent = Agent(
    ...,
-    tools=["mcp::deepwiki"],
-    extra_headers={
-        "X-LlamaStack-Provider-Data": json.dumps(
-            {
-                "mcp_headers": {
-                    "http://mcp.deepwiki.com/sse": {
-                        "Authorization": "Bearer <your_access_token>",
-                    },
-                },
-            }
-        ),
-    },
+    tools=[
+        {
+            "type": "mcp",
+            "server_url": "https://mcp.deepwiki.com/sse",
+            "server_label": "mcp::deepwiki",
+            "authorization": "<your_access_token>",  # OAuth token (without "Bearer " prefix)
+        }
+    ],
 )
 agent.create_turn(...)
 ```
--- a/docs/docs/concepts/apis/external.mdx
+++ b/docs/docs/concepts/apis/external.mdx
@ -58,7 +58,7 @@ External APIs must expose a `available_providers()` function in their module tha

 ```python
 # llama_stack_api_weather/api.py
-from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
+from llama_stack_api import Api, InlineProviderSpec, ProviderSpec


 def available_providers() -> list[ProviderSpec]:
@ -79,7 +79,7 @@ A Protocol class like so:
 # llama_stack_api_weather/api.py
 from typing import Protocol

-from llama_stack.schema_utils import webmethod
+from llama_stack_api import webmethod


 class WeatherAPI(Protocol):
@ -151,13 +151,12 @@ __all__ = ["WeatherAPI", "available_providers"]
 # llama-stack-api-weather/src/llama_stack_api_weather/weather.py
 from typing import Protocol

-from llama_stack.providers.datatypes import (
+from llama_stack_api import (
    Api,
    ProviderSpec,
    RemoteProviderSpec,
+    webmethod,
 )
-from llama_stack.schema_utils import webmethod
-

 def available_providers() -> list[ProviderSpec]:
    return [
--- a/docs/docs/concepts/apis/index.mdx
+++ b/docs/docs/concepts/apis/index.mdx
@ -7,7 +7,7 @@ sidebar_position: 1

 # APIs

-A Llama Stack API is described as a collection of REST endpoints. We currently support the following APIs:
+A Llama Stack API is described as a collection of REST endpoints following OpenAI API standards. We currently support the following APIs:

 - **Inference**: run inference with a LLM
 - **Safety**: apply safety policies to the output at a Systems (not only model) level
@ -16,13 +16,25 @@ A Llama Stack API is described as a collection of REST endpoints. We currently s
 - **Scoring**: evaluate outputs of the system
 - **Eval**: generate outputs (via Inference or Agents) and perform scoring
 - **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
- **Telemetry**: collect telemetry data from the system
+- **Files**: manage file uploads, storage, and retrieval
 - **Post Training**: fine-tune a model
 - **Tool Runtime**: interact with various tools and protocols
- **Responses**: generate responses from an LLM using this OpenAI compatible API.
+- **Responses**: generate responses from an LLM

 We are working on adding a few more APIs to complete the application lifecycle. These will include:
 - **Batch Inference**: run inference on a dataset of inputs
 - **Batch Agents**: run agents on a dataset of inputs
- **Synthetic Data Generation**: generate synthetic data for model development
 - **Batches**: OpenAI-compatible batch management for inference
+
+
+## OpenAI API Compatibility
+We are working on adding OpenAI API compatibility to Llama Stack. This will allow you to use Llama Stack with OpenAI API clients and tools.
+
+### File Operations and Vector Store Integration
+
+The Files API and Vector Store APIs work together through file operations, enabling automatic document processing and search. This integration implements the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files) and allows you to:
+- Upload documents through the Files API
+- Automatically process and chunk documents into searchable vectors
+- Store processed content in vector databases based on the availability of [our providers](../../providers/index.mdx)
+- Search through documents using natural language queries
+For detailed information about this integration, see [File Operations and Vector Store Integration](../file_operations_vector_stores.md).
--- a/docs/docs/concepts/file_operations_vector_stores.mdx
+++ b/docs/docs/concepts/file_operations_vector_stores.mdx
@ -0,0 +1,420 @@
+# File Operations and Vector Store Integration
+
+## Overview
+
+Llama Stack provides seamless integration between the Files API and Vector Store APIs, enabling you to upload documents and automatically process them into searchable vector embeddings. This integration implements file operations following the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files).
+
+## Enhanced Capabilities Beyond OpenAI
+
+While Llama Stack maintains full compatibility with OpenAI's Vector Store API, it provides several additional capabilities that enhance functionality and flexibility:
+
+### **Embedding Model Specification**
+Unlike OpenAI's vector stores which use a fixed embedding model, Llama Stack allows you to specify which embedding model to use when creating a vector store:
+
+```python
+# Create vector store with specific embedding model
+vector_store = client.vector_stores.create(
+    name="my_documents",
+    embedding_model="all-MiniLM-L6-v2",  # Specify your preferred model
+    embedding_dimension=384,
+)
+```
+
+### **Advanced Search Modes**
+Llama Stack supports multiple search modes beyond basic vector similarity:
+
+- **Vector Search**: Pure semantic similarity search using embeddings
+- **Keyword Search**: Traditional keyword-based search for exact matches
+- **Hybrid Search**: Combines both vector and keyword search for optimal results
+
+```python
+# Different search modes
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id,
+    query="machine learning algorithms",
+    search_mode="hybrid",  # or "vector", "keyword"
+    max_num_results=5,
+)
+```
+
+### **Flexible Ranking Options**
+For hybrid search, Llama Stack offers configurable ranking strategies:
+
+- **RRF (Reciprocal Rank Fusion)**: Combines rankings with configurable impact factor
+- **Weighted Ranker**: Linear combination of vector and keyword scores with adjustable weights
+
+```python
+# Custom ranking configuration
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id,
+    query="neural networks",
+    search_mode="hybrid",
+    ranking_options={
+        "ranker": {"type": "weighted", "alpha": 0.7}  # 70% vector, 30% keyword
+    },
+)
+```
+
+### **Provider Selection**
+Choose from multiple vector store providers based on your specific needs:
+
+- **Inline Providers**: FAISS (fast in-memory), SQLite-vec (disk-based), Milvus (high-performance)
+- **Remote Providers**: ChromaDB, Qdrant, Weaviate, Postgres (PGVector), Milvus
+
+```python
+# Specify provider when creating vector store
+vector_store = client.vector_stores.create(
+    name="my_documents", provider_id="sqlite-vec"  # Choose your preferred provider
+)
+```
+
+## How It Works
+
+The file operations work through several key components:
+
+1. **File Upload**: Documents are uploaded through the Files API
+2. **Automatic Processing**: Files are automatically chunked and converted to embeddings
+3. **Vector Storage**: Chunks are stored in vector databases with metadata
+4. **Search & Retrieval**: Users can search through processed documents using natural language
+
+## Supported Vector Store Providers
+
+The following vector store providers support file operations:
+
+### Inline Providers (Single Node)
+
+- **FAISS**: Fast in-memory vector similarity search
+- **SQLite-vec**: Disk-based storage with hybrid search capabilities
+
+### Remote Providers (Hosted)
+
+- **ChromaDB**: Vector database with metadata filtering
+- **Weaviate**: Vector database with GraphQL interface
+- **Postgres (PGVector)**: Vector extensions for PostgreSQL
+
+### Both Inline & Remote Providers
+- **Milvus**: High-performance vector database with advanced indexing
+- **Qdrant**: Vector similarity search with payload filtering
+
+## File Processing Pipeline
+
+### 1. File Upload
+
+```python
+from llama_stack import LlamaStackClient
+
+client = LlamaStackClient("http://localhost:8000")
+
+# Upload a document
+with open("document.pdf", "rb") as f:
+    file_info = await client.files.upload(file=f, purpose="assistants")
+```
+
+### 2. Attach to Vector Store
+
+```python
+# Create a vector store
+vector_store = client.vector_stores.create(name="my_documents")
+
+# Attach the file to the vector store
+file_attach_response = await client.vector_stores.files.create(
+    vector_store_id=vector_store.id, file_id=file_info.id
+)
+```
+
+### 3. Automatic Processing
+
+The system automatically:
+- Detects the file type and extracts text content
+- Splits content into chunks (default: 800 tokens with 400 token overlap)
+- Generates embeddings for each chunk
+- Stores chunks with metadata in the vector store
+- Updates file status to "completed"
+
+### 4. Search and Retrieval
+
+```python
+# Search through processed documents
+search_results = await client.vector_stores.search(
+    vector_store_id=vector_store.id,
+    query="What is the main topic discussed?",
+    max_num_results=5,
+)
+
+# Process results
+for result in search_results.data:
+    print(f"Score: {result.score}")
+    for content in result.content:
+        print(f"Content: {content.text}")
+```
+
+## Supported File Types
+
+The FileResponse system supports various document formats:
+
+- **Text Files**: `.txt`, `.md`, `.rst`
+- **Documents**: `.pdf`, `.docx`, `.doc`
+- **Code**: `.py`, `.js`, `.java`, `.cpp`, etc.
+- **Data**: `.json`, `.csv`, `.xml`
+- **Web Content**: HTML files
+
+## Chunking Strategies
+
+### Default Strategy
+
+The default chunking strategy uses:
+- **Max Chunk Size**: 800 tokens
+- **Overlap**: 400 tokens
+- **Method**: Semantic boundary detection
+
+### Custom Chunking
+
+You can customize chunking when attaching files:
+
+```python
+from llama_stack.apis.vector_io import VectorStoreChunkingStrategy
+
+# Attach file with custom chunking
+file_attach_response = await client.vector_stores.files.create(
+    vector_store_id=vector_store.id,
+    file_id=file_info.id,
+    chunking_strategy=chunking_strategy,
+)
+```
+
+**Note**: While Llama Stack is OpenAI-compatible, it also supports additional options beyond the standard OpenAI API. When creating vector stores, you can specify custom embedding models and embedding dimensions that will be used when processing chunks from attached files.
+
+
+## File Management
+
+### List Files in Vector Store
+
+```python
+# List all files in a vector store
+files = await client.vector_stores.files.list(vector_store_id=vector_store.id)
+
+for file in files:
+    print(f"File: {file.filename}, Status: {file.status}")
+```
+
+### File Status Tracking
+
+Files go through several statuses:
+- **in_progress**: File is being processed
+- **completed**: File successfully processed and searchable
+- **failed**: Processing failed (check `last_error` for details)
+- **cancelled**: Processing was cancelled
+
+### Retrieve File Content
+
+```python
+# Get chunked content from vector store
+content_response = await client.vector_stores.files.retrieve_content(
+    vector_store_id=vector_store.id, file_id=file_info.id
+)
+
+for chunk in content_response.content:
+    print(f"Chunk {chunk.metadata.get('chunk_index', 0)}: {chunk.text}")
+```
+
+## Vector Store Management
+
+### List Vector Stores
+
+Retrieve a paginated list of all vector stores:
+
+```python
+# List all vector stores with default pagination
+vector_stores = await client.vector_stores.list()
+
+# Custom pagination and ordering
+vector_stores = await client.vector_stores.list(
+    limit=10,
+    order="asc",  # or "desc"
+    after="vs_12345678",  # cursor-based pagination
+)
+
+for store in vector_stores.data:
+    print(f"Store: {store.name}, Files: {store.file_counts.total}")
+    print(f"Created: {store.created_at}, Status: {store.status}")
+```
+
+### Retrieve Vector Store Details
+
+Get detailed information about a specific vector store:
+
+```python
+# Get vector store details
+store_details = await client.vector_stores.retrieve(vector_store_id="vs_12345678")
+
+print(f"Name: {store_details.name}")
+print(f"Status: {store_details.status}")
+print(f"File Counts: {store_details.file_counts}")
+print(f"Usage: {store_details.usage_bytes} bytes")
+print(f"Created: {store_details.created_at}")
+print(f"Metadata: {store_details.metadata}")
+```
+
+### Update Vector Store
+
+Modify vector store properties such as name, metadata, or expiration settings:
+
+```python
+# Update vector store name and metadata
+updated_store = await client.vector_stores.update(
+    vector_store_id="vs_12345678",
+    name="Updated Document Collection",
+    metadata={
+        "description": "Updated collection for research",
+        "category": "research",
+        "version": "2.0",
+    },
+)
+
+# Set expiration policy
+expired_store = await client.vector_stores.update(
+    vector_store_id="vs_12345678",
+    expires_after={"anchor": "last_active_at", "days": 30},
+)
+
+print(f"Updated store: {updated_store.name}")
+print(f"Last active: {updated_store.last_active_at}")
+```
+
+### Delete Vector Store
+
+Remove a vector store and all its associated data:
+
+```python
+# Delete a vector store
+delete_response = await client.vector_stores.delete(vector_store_id="vs_12345678")
+
+if delete_response.deleted:
+    print(f"Vector store {delete_response.id} successfully deleted")
+else:
+    print("Failed to delete vector store")
+```
+
+**Important Notes:**
+- Deleting a vector store removes all files, chunks, and embeddings
+- This operation cannot be undone
+- The underlying vector database is also cleaned up
+- Consider backing up important data before deletion
+
+## Search Capabilities
+
+### Vector Search
+
+Pure similarity search using embeddings:
+
+```python
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id,
+    query="machine learning algorithms",
+    max_num_results=10,
+)
+```
+
+### Filtered Search
+
+Combine vector search with metadata filtering:
+
+```python
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id,
+    query="machine learning algorithms",
+    filters={"file_type": "pdf", "upload_date": "2024-01-01"},
+    max_num_results=10,
+)
+```
+
+### Hybrid Search
+
+[SQLite-vec](../providers/vector_io/inline_sqlite-vec.mdx), [pgvector](../providers/vector_io/remote_pgvector.mdx), and [Milvus](../providers/vector_io/inline_milvus.mdx) support combining vector and keyword search.
+
+## Performance Considerations
+
+> **Note**: For detailed performance optimization strategies, see [Performance Considerations](../providers/files/openai_file_operations_support.md#performance-considerations) in the provider documentation.
+
+**Key Points:**
+- **Chunk Size**: 400-600 tokens for precision, 800-1200 for context
+- **Storage**: Choose provider based on your performance needs
+- **Search**: Optimize for your specific use case
+
+## Error Handling
+
+> **Note**: For comprehensive troubleshooting and error handling, see [Troubleshooting](../providers/files/openai_file_operations_support.md#troubleshooting) in the provider documentation.
+
+**Common Issues:**
+- File processing failures (format, size limits)
+- Search performance optimization
+- Storage and memory issues
+
+## Best Practices
+
+> **Note**: For detailed best practices and recommendations, see [Best Practices](../providers/files/openai_file_operations_support.md#best-practices) in the provider documentation.
+
+**Key Recommendations:**
+- File organization and naming conventions
+- Chunking strategy optimization
+- Metadata and monitoring practices
+- Regular cleanup and maintenance
+
+## Integration Examples
+
+### RAG Application
+
+```python
+# Build a RAG system with file uploads
+async def build_rag_system():
+    # Create vector store
+    vector_store = client.vector_stores.create(name="knowledge_base")
+
+    # Upload and process documents
+    documents = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
+    for doc in documents:
+        with open(doc, "rb") as f:
+            file_info = await client.files.create(file=f, purpose="assistants")
+            await client.vector_stores.files.create(
+                vector_store_id=vector_store.id, file_id=file_info.id
+            )
+
+    return vector_store
+
+
+# Query the RAG system
+async def query_rag(vector_store_id, question):
+    results = await client.vector_stores.search(
+        vector_store_id=vector_store_id, query=question, max_num_results=5
+    )
+    return results
+```
+
+### Document Analysis
+
+```python
+# Analyze document content through vector search
+async def analyze_document(vector_store_id, file_id):
+    # Get document content
+    content = await client.vector_stores.files.retrieve_content(
+        vector_store_id=vector_store_id, file_id=file_id
+    )
+
+    # Search for specific topics
+    topics = ["introduction", "methodology", "conclusion"]
+    analysis = {}
+
+    for topic in topics:
+        results = await client.vector_stores.search(
+            vector_store_id=vector_store_id, query=topic, max_num_results=3
+        )
+        analysis[topic] = results.data
+
+    return analysis
+```
+
+## Next Steps
+
+- Explore the [Files API documentation](../../providers/files/files.mdx) for detailed API reference
+- Check [Vector Store Providers](../providers/vector_io/index.mdx) for specific implementation details
+- Review [Getting Started](../getting_started/quickstart.mdx) for quick setup instructions
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@ -1,232 +1,13 @@
-# Contributing to Llama Stack
-We want to make contributing to this project as easy and transparent as
-possible.
+---
+title: Contributing
+description: Contributing to Llama Stack
+sidebar_label: Contributing to Llama Stack
+sidebar_position: 3
+hide_title: true
+---

-## Set up your development environment
+import Contributing from '!!raw-loader!../../../CONTRIBUTING.md';
+import ReactMarkdown from 'react-markdown';

-We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.
-You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).

-You can install the dependencies by running:
-
-```bash
-cd llama-stack
-uv sync --group dev
-uv pip install -e .
-source .venv/bin/activate
-```
-
-```{note}
-You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`).
-Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
-For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
-```
-
-Note that you can create a dotenv file `.env` that includes necessary environment variables:
-```
-LLAMA_STACK_BASE_URL=http://localhost:8321
-LLAMA_STACK_CLIENT_LOG=debug
-LLAMA_STACK_PORT=8321
-LLAMA_STACK_CONFIG=<provider-name>
-TAVILY_SEARCH_API_KEY=
-BRAVE_SEARCH_API_KEY=
-```
-
-And then use this dotenv file when running client SDK tests via the following:
-```bash
-uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
-```
-
-### Pre-commit Hooks
-
-We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
-
-```bash
-uv run pre-commit install
-```
-
-After that, pre-commit hooks will run automatically before each commit.
-
-Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
-
-```bash
-uv run pre-commit run --all-files
-```
-
-```{caution}
-Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
-```
-
-## Discussions -> Issues -> Pull Requests
-
-We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
-
-If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
-
-### Issues
-We use GitHub issues to track public bugs. Please ensure your description is
-clear and has sufficient instructions to be able to reproduce the issue.
-
-Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
-disclosure of security bugs. In those cases, please go through the process
-outlined on that page and do not file a public issue.
-
-### Contributor License Agreement ("CLA")
-In order to accept your pull request, we need you to submit a CLA. You only need
-to do this once to work on any of Meta's open source projects.
-
-Complete your CLA here: [https://code.facebook.com/cla](https://code.facebook.com/cla)
-
-**I'd like to contribute!**
-
-If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested
-leave a comment on the issue and a triager will assign it to you.
-
-Please avoid picking up too many issues at once. This helps you stay focused and ensures that others in the community also have opportunities to contribute.
- Try to work on only 1–2 issues at a time, especially if you’re still getting familiar with the codebase.
- Before taking an issue, check if it’s already assigned or being actively discussed.
- If you’re blocked or can’t continue with an issue, feel free to unassign yourself or leave a comment so others can step in.
-
-**I have a bug!**
-
-1. Search the issue tracker and discussions for similar issues.
-2. If you don't have steps to reproduce, open a discussion.
-3. If you have steps to reproduce, open an issue.
-
-**I have an idea for a feature!**
-
-1. Open a discussion.
-
-**I've implemented a feature!**
-
-1. If there is an issue for the feature, open a pull request.
-2. If there is no issue, open a discussion and link to your branch.
-
-**I have a question!**
-
-1. Open a discussion or use [Discord](https://discord.gg/llama-stack).
-
-
-**Opening a Pull Request**
-
-1. Fork the repo and create your branch from `main`.
-2. If you've changed APIs, update the documentation.
-3. Ensure the test suite passes.
-4. Make sure your code lints using `pre-commit`.
-5. If you haven't already, complete the Contributor License Agreement ("CLA").
-6. Ensure your pull request follows the [conventional commits format](https://www.conventionalcommits.org/en/v1.0.0/).
-7. Ensure your pull request follows the [coding style](#coding-style).
-
-
-Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing.
-
-```{tip}
-As a general guideline:
- Experienced contributors should try to keep no more than 5 open PRs at a time.
- New contributors are encouraged to have only one open PR at a time until they’re familiar with the codebase and process.
-```
-
-## Repository guidelines
-
-### Coding Style
-
-* Comments should provide meaningful insights into the code. Avoid filler comments that simply
-  describe the next step, as they create unnecessary clutter, same goes for docstrings.
-* Prefer comments to clarify surprising behavior and/or relationships between parts of the code
-  rather than explain what the next line of code does.
-* Catching exceptions, prefer using a specific exception type rather than a broad catch-all like
-  `Exception`.
-* Error messages should be prefixed with "Failed to ..."
-* 4 spaces for indentation rather than tab
-* When using `# noqa` to suppress a style or linter warning, include a comment explaining the
-  justification for bypassing the check.
-* When using `# type: ignore` to suppress a mypy warning, include a comment explaining the
-  justification for bypassing the check.
-* Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or
-  readability reasons.
-* Providers configuration class should be Pydantic Field class. It should have a `description` field
-  that describes the configuration. These descriptions will be used to generate the provider
-  documentation.
-* When possible, use keyword arguments only when calling functions.
-* Llama Stack utilizes custom Exception classes for certain Resources that should be used where applicable.
-
-### License
-By contributing to Llama, you agree that your contributions will be licensed
-under the LICENSE file in the root directory of this source tree.
-
-## Common Tasks
-
-Some tips about common tasks you work on while contributing to Llama Stack:
-
-### Setup for development
-
-```bash
-git clone https://github.com/meta-llama/llama-stack.git
-cd llama-stack
-uv run llama stack list-deps <distro-name> | xargs -L1 uv pip install
-
-# (Optional) If you are developing the llama-stack-client-python package, you can add it as an editable package.
-git clone https://github.com/meta-llama/llama-stack-client-python.git
-uv add --editable ../llama-stack-client-python
-```
-
-### Updating distribution configurations
-
-If you have made changes to a provider's configuration in any form (introducing a new config key, or
-changing models, etc.), you should run `./scripts/distro_codegen.py` to re-generate various YAML
-files as well as the documentation. You should not change `docs/source/.../distributions/` files
-manually as they are auto-generated.
-
-### Updating the provider documentation
-
-If you have made changes to a provider's configuration, you should run `./scripts/provider_codegen.py`
-to re-generate the documentation. You should not change `docs/source/.../providers/` files manually
-as they are auto-generated.
-Note that the provider "description" field will be used to generate the provider documentation.
-
-### Building the Documentation
-
-If you are making changes to the documentation at [https://llamastack.github.io/](https://llamastack.github.io/), you can use the following command to build the documentation and preview your changes.
-
-```bash
-# This rebuilds the documentation pages and the OpenAPI spec.
-npm install
-npm run gen-api-docs all
-npm run build
-
-# This will start a local server (usually at http://127.0.0.1:3000).
-npm run serve
-```
-
-### Update API Documentation
-
-If you modify or add new API endpoints, update the API documentation accordingly. You can do this by running the following command:
-
-```bash
-uv run ./docs/openapi_generator/run_openapi_generator.sh
-```
-
-The generated API schema will be available in `docs/static/`. Make sure to review the changes before committing.
-
-## Adding a New Provider
-
-See:
- [Adding a New API Provider Page](./new_api_provider.mdx) which describes how to add new API providers to the Stack.
- [Vector Database Page](./new_vector_database.mdx) which describes how to add a new vector databases with Llama Stack.
- [External Provider Page](/docs/providers/external/) which describes how to add external providers to the Stack.
-
-
-## Testing
-
-
-See the [Testing README](https://github.com/meta-llama/llama-stack/blob/main/tests/README.md) for detailed testing information.
-
-## Advanced Topics
-
-For developers who need deeper understanding of the testing system internals:
-
- [Record-Replay Testing](./testing/record-replay.mdx)
-
-### Benchmarking
-
-See the [Benchmarking README](https://github.com/meta-llama/llama-stack/blob/main/benchmarking/k8s-benchmark/README.md) for benchmarking information.
+<ReactMarkdown>{Contributing}</ReactMarkdown>
--- a/docs/docs/deploying/kubernetes_deployment.mdx
+++ b/docs/docs/deploying/kubernetes_deployment.mdx
@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';

 # Kubernetes Deployment Guide

-Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS.
+Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually.

 ## Prerequisites

@ -110,115 +110,176 @@ spec:
 EOF
 ```

-### Step 3: Configure Llama Stack
+### Step 3: Install Kubernetes Operator

-Update your run configuration:
-
-```yaml
-providers:
-  inference:
-  - provider_id: vllm
-    provider_type: remote::vllm
-    config:
-      url: http://vllm-server.default.svc.cluster.local:8000/v1
-      max_tokens: 4096
-      api_token: fake
-```
-
-Build container image:
+Install the Llama Stack Kubernetes operator to manage Llama Stack deployments:

 ```bash
-tmp_dir=$(mktemp -d) && cat >$tmp_dir/Containerfile.llama-stack-run-k8s <<EOF
-FROM distribution-myenv:dev
-RUN apt-get update && apt-get install -y git
-RUN git clone https://github.com/meta-llama/llama-stack.git /app/llama-stack-source
-ADD ./vllm-llama-stack-run-k8s.yaml /app/config.yaml
-EOF
-podman build -f $tmp_dir/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s $tmp_dir
+# Install from the latest main branch
+kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml
+
+# Or install a specific version (e.g., v0.4.0)
+# kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/v0.4.0/release/operator.yaml
 ```

-### Step 4: Deploy Llama Stack Server
+Verify the operator is running:
+
+```bash
+kubectl get pods -n llama-stack-operator-system
+```
+
+For more information about the operator, see the [llama-stack-k8s-operator repository](https://github.com/llamastack/llama-stack-k8s-operator).
+
+### Step 4: Deploy Llama Stack Server using Operator
+
+Create a `LlamaStackDistribution` custom resource to deploy the Llama Stack server. The operator will automatically create the necessary Deployment, Service, and other resources.
+You can optionally override the default `run.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).

 ```yaml
 cat <<EOF | kubectl apply -f -
-apiVersion: v1
-kind: PersistentVolumeClaim
+apiVersion: llamastack.io/v1alpha1
+kind: LlamaStackDistribution
 metadata:
-  name: llama-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: llama-stack-server
+  name: llamastack-vllm
 spec:
  replicas: 1
-  selector:
-    matchLabels:
-      app.kubernetes.io/name: llama-stack
-  template:
-    metadata:
-      labels:
-        app.kubernetes.io/name: llama-stack
-    spec:
-      containers:
-      - name: llama-stack
-        image: localhost/llama-stack-run-k8s:latest
-        imagePullPolicy: IfNotPresent
-        command: ["llama", "stack", "run", "/app/config.yaml"]
-        ports:
-          - containerPort: 5000
-        volumeMounts:
-          - name: llama-storage
-            mountPath: /root/.llama
-      volumes:
-      - name: llama-storage
-        persistentVolumeClaim:
-          claimName: llama-pvc
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: llama-stack-service
-spec:
-  selector:
-    app.kubernetes.io/name: llama-stack
-  ports:
-  - protocol: TCP
-    port: 5000
-    targetPort: 5000
-  type: ClusterIP
+  server:
+    distribution:
+      name: starter
+    containerSpec:
+      port: 8321
+      env:
+      - name: VLLM_URL
+        value: "http://vllm-server.default.svc.cluster.local:8000/v1"
+      - name: VLLM_MAX_TOKENS
+        value: "4096"
+      - name: VLLM_API_TOKEN
+        value: "fake"
+    # Optional: override run.yaml from a ConfigMap using userConfig
+    userConfig:
+      configMap:
+        name: llama-stack-config
+    storage:
+      size: "20Gi"
+      mountPath: "/home/lls/.lls"
 EOF
 ```

+**Configuration Options:**
+
+- `replicas`: Number of Llama Stack server instances to run
+- `server.distribution.name`: The distribution to use (e.g., `starter` for the starter distribution). See the [list of supported distributions](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository.
+- `server.distribution.image`: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence over `name`.
+- `server.containerSpec.port`: Port on which the Llama Stack server listens (default: 8321)
+- `server.containerSpec.env`: Environment variables to configure providers:
+- `server.userConfig`: (Optional) Override the default `run.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
+- `server.storage.size`: Size of the persistent volume for model and data storage
+- `server.storage.mountPath`: Where to mount the storage in the container
+
+**Note:** For a complete list of supported distributions, see [distributions.json](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository. To use a custom or non-supported distribution, set the `server.distribution.image` field with your container image instead of  `server.distribution.name`.
+
+The operator automatically creates:
+- A Deployment for the Llama Stack server
+- A Service to access the server
+- A PersistentVolumeClaim for storage
+- All necessary RBAC resources
+
+
+Check the status of your deployment:
+
+```bash
+kubectl get llamastackdistribution
+kubectl describe llamastackdistribution llamastack-vllm
+```
+
 ### Step 5: Test Deployment

+Wait for the Llama Stack server pod to be ready:
+
 ```bash
-# Port forward and test
-kubectl port-forward service/llama-stack-service 5000:5000
-llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
+# Check the status of the LlamaStackDistribution
+kubectl get llamastackdistribution llamastack-vllm
+
+# Check the pods created by the operator
+kubectl get pods -l app.kubernetes.io/name=llama-stack
+
+# Wait for the pod to be ready
+kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=llama-stack --timeout=300s
+```
+
+Get the service name created by the operator (it typically follows the pattern `<llamastackdistribution-name>-service`):
+
+```bash
+# List services to find the service name
+kubectl get services | grep llamastack
+
+# Port forward and test (replace SERVICE_NAME with the actual service name)
+kubectl port-forward service/llamastack-vllm-service 8321:8321
+```
+
+In another terminal, test the deployment:
+
+```bash
+llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?"
 ```

 ## Troubleshooting

-**Check pod status:**
+### vLLM Server Issues
+
+**Check vLLM pod status:**
 ```bash
 kubectl get pods -l app.kubernetes.io/name=vllm
 kubectl logs -l app.kubernetes.io/name=vllm
 ```

-**Test service connectivity:**
+**Test vLLM service connectivity:**
 ```bash
 kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models
 ```

+### Llama Stack Server Issues
+
+**Check LlamaStackDistribution status:**
+```bash
+# Get detailed status
+kubectl describe llamastackdistribution llamastack-vllm
+
+# Check for events
+kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm
+```
+
+**Check operator-managed pods:**
+```bash
+# List all pods managed by the operator
+kubectl get pods -l app.kubernetes.io/name=llama-stack
+
+# Check pod logs (replace POD_NAME with actual pod name)
+kubectl logs -l app.kubernetes.io/name=llama-stack
+```
+
+**Check operator status:**
+```bash
+# Verify the operator is running
+kubectl get pods -n llama-stack-operator-system
+
+# Check operator logs if issues persist
+kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager
+```
+
+**Verify service connectivity:**
+```bash
+# Get the service endpoint
+kubectl get svc llamastack-vllm-service
+
+# Test connectivity from within the cluster
+kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health
+```
+
 ## Related Resources

 - **[Deployment Overview](/docs/deploying/)** - Overview of deployment options
 - **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions
 - **[Configuration](/docs/distributions/configuration)** - Detailed configuration options
+- **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator
+- **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource.
--- a/docs/docs/distributions/building_distro.mdx
+++ b/docs/docs/distributions/building_distro.mdx
@ -65,7 +65,7 @@ external_providers_dir: /workspace/providers.d
 Inside `providers.d/custom_ollama/provider.py`, define `get_provider_spec()` so the CLI can discover dependencies:

 ```python
-from llama_stack.providers.datatypes import ProviderSpec
+from llama_stack_api.providers.datatypes import ProviderSpec


 def get_provider_spec() -> ProviderSpec:
--- a/docs/docs/distributions/configuration.mdx
+++ b/docs/docs/distributions/configuration.mdx
@ -21,7 +21,6 @@ apis:
 - inference
 - vector_io
 - safety
- telemetry
 providers:
  inference:
  - provider_id: ollama
@ -51,10 +50,6 @@ providers:
        responses:
          backend: sql_default
          table_name: responses
-  telemetry:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config: {}
 storage:
  backends:
    kv_default:
@ -63,13 +58,21 @@ storage:
    sql_default:
      type: sql_sqlite
      db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/sqlstore.db
-  references:
+  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
+      max_write_queue_size: 10000
+      num_writers: 4
+    conversations:
+      backend: sql_default
+      table_name: openai_conversations
+    prompts:
+      backend: kv_default
+      namespace: prompts
 models:
 - metadata: {}
  model_id: ${env.INFERENCE_MODEL}
@ -92,7 +95,6 @@ apis:
 - inference
 - vector_io
 - safety
- telemetry
 ```

 ## Providers
@ -219,7 +221,15 @@ models:
 ```
 A Model is an instance of a "Resource" (see [Concepts](../concepts/)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.

-What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.
+What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. The `model_id` field is provided for configuration purposes but is not used as part of the model identifier.
+
+**Important:** Models are identified as `provider_id/provider_model_id` in the system and when making API calls. When `provider_model_id` is omitted, the server will set it to be the same as `model_id`.
+
+Examples:
+- Config: `model_id: llama3.2`, `provider_id: ollama`, `provider_model_id: null`
+  → Access as: `ollama/llama3.2`
+- Config: `model_id: my-llama`, `provider_id: vllm-inference`, `provider_model_id: llama-3-2-3b`
+  → Access as: `vllm-inference/llama-3-2-3b` (the `model_id` is not used in the identifier)

 If you need to conditionally register a model in the configuration, such as only when specific environment variable(s) are set, this can be accomplished by utilizing a special `__disabled__` string as the default value of an environment variable substitution, as shown below:

@ -589,24 +599,13 @@ created by users sharing a team with them:

 In addition to resource-based access control, Llama Stack supports endpoint-level authorization using OAuth 2.0 style scopes. When authentication is enabled, specific API endpoints require users to have particular scopes in their authentication token.

-**Scope-Gated APIs:**
-The following APIs are currently gated by scopes:
-
- **Telemetry API** (scope: `telemetry.read`):
-  - `POST /telemetry/traces` - Query traces
-  - `GET /telemetry/traces/{trace_id}` - Get trace by ID
-  - `GET /telemetry/traces/{trace_id}/spans/{span_id}` - Get span by ID
-  - `POST /telemetry/spans/{span_id}/tree` - Get span tree
-  - `POST /telemetry/spans` - Query spans
-  - `POST /telemetry/metrics/{metric_name}` - Query metrics
-
 **Authentication Configuration:**

 For **JWT/OAuth2 providers**, scopes should be included in the JWT's claims:
 ```json
 {
  "sub": "user123",
-  "scope": "telemetry.read",
+  "scope": "<scope>",
  "aud": "llama-stack"
 }
 ```
@ -616,7 +615,7 @@ For **custom authentication providers**, the endpoint must return user attribute
 {
  "principal": "user123",
  "attributes": {
-    "scopes": ["telemetry.read"]
+    "scopes": ["<scope>"]
  }
 }
 ```
--- a/docs/docs/distributions/importing_as_library.mdx
+++ b/docs/docs/distributions/importing_as_library.mdx
@ -11,7 +11,7 @@ If you are planning to use an external service for Inference (even Ollama or TGI
 This avoids the overhead of setting up a server.
 ```bash
 # setup
-uv pip install llama-stack
+uv pip install llama-stack llama-stack-client
 llama stack list-deps starter | xargs -L1 uv pip install
 ```

--- a/docs/docs/distributions/index.mdx
+++ b/docs/docs/distributions/index.mdx
@ -19,3 +19,4 @@ This section provides an overview of the distributions available in Llama Stack.
 - **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
 - **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
 - **[Configuration Reference](./configuration.mdx)** - Configuration file format details
+- **[Llama Stack UI](./llama_stack_ui.mdx)** - Web-based user interface for interacting with Llama Stack servers
--- a/docs/docs/distributions/k8s/stack-configmap.yaml
+++ b/docs/docs/distributions/k8s/stack-configmap.yaml
@ -8,7 +8,6 @@ data:
    - inference
    - files
    - safety
-    - telemetry
    - tool_runtime
    - vector_io
    providers:
@ -73,12 +72,6 @@ data:
            db: ${env.POSTGRES_DB:=llamastack}
            user: ${env.POSTGRES_USER:=llamastack}
            password: ${env.POSTGRES_PASSWORD:=llamastack}
-      telemetry:
-      - provider_id: meta-reference
-        provider_type: inline::meta-reference
-        config:
-          service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
-          sinks: ${env.TELEMETRY_SINKS:=console}
      tool_runtime:
      - provider_id: brave-search
        provider_type: remote::brave-search
@ -113,13 +106,21 @@ data:
          db: ${env.POSTGRES_DB:=llamastack}
          user: ${env.POSTGRES_USER:=llamastack}
          password: ${env.POSTGRES_PASSWORD:=llamastack}
-      references:
+      stores:
        metadata:
          backend: kv_default
          namespace: registry
        inference:
          backend: sql_default
          table_name: inference_store
+          max_write_queue_size: 10000
+          num_writers: 4
+        conversations:
+          backend: sql_default
+          table_name: openai_conversations
+        prompts:
+          backend: kv_default
+          namespace: prompts
    models:
    - metadata:
        embedding_dimension: 768
--- a/docs/docs/distributions/k8s/stack_run_config.yaml
+++ b/docs/docs/distributions/k8s/stack_run_config.yaml
@ -5,7 +5,6 @@ apis:
 - inference
 - files
 - safety
- telemetry
 - tool_runtime
 - vector_io
 providers:
@ -32,21 +31,17 @@ providers:
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL:=}
-      kvstore:
-        type: postgres
-        host: ${env.POSTGRES_HOST:=localhost}
-        port: ${env.POSTGRES_PORT:=5432}
-        db: ${env.POSTGRES_DB:=llamastack}
-        user: ${env.POSTGRES_USER:=llamastack}
-        password: ${env.POSTGRES_PASSWORD:=llamastack}
+      persistence:
+        namespace: vector_io::chroma_remote
+        backend: kv_default
  files:
  - provider_id: meta-reference-files
    provider_type: inline::localfs
    config:
      storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/distributions/starter/files}
      metadata_store:
-        type: sqlite
-        db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/files_metadata.db
+        table_name: files_metadata
+        backend: sql_default
  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
@ -56,26 +51,15 @@ providers:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
-      persistence_store:
-        type: postgres
-        host: ${env.POSTGRES_HOST:=localhost}
-        port: ${env.POSTGRES_PORT:=5432}
-        db: ${env.POSTGRES_DB:=llamastack}
-        user: ${env.POSTGRES_USER:=llamastack}
-        password: ${env.POSTGRES_PASSWORD:=llamastack}
-      responses_store:
-        type: postgres
-        host: ${env.POSTGRES_HOST:=localhost}
-        port: ${env.POSTGRES_PORT:=5432}
-        db: ${env.POSTGRES_DB:=llamastack}
-        user: ${env.POSTGRES_USER:=llamastack}
-        password: ${env.POSTGRES_PASSWORD:=llamastack}
-  telemetry:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
-      sinks: ${env.TELEMETRY_SINKS:=console}
+      persistence:
+        agent_state:
+          namespace: agents
+          backend: kv_default
+        responses:
+          table_name: responses
+          backend: sql_default
+          max_write_queue_size: 10000
+          num_writers: 4
  tool_runtime:
  - provider_id: brave-search
    provider_type: remote::brave-search
@ -110,40 +94,54 @@ storage:
      db: ${env.POSTGRES_DB:=llamastack}
      user: ${env.POSTGRES_USER:=llamastack}
      password: ${env.POSTGRES_PASSWORD:=llamastack}
-  references:
+  stores:
    metadata:
-      backend: kv_default
      namespace: registry
+      backend: kv_default
    inference:
-      backend: sql_default
      table_name: inference_store
-models:
- metadata:
-    embedding_dimension: 768
-  model_id: nomic-embed-text-v1.5
-  provider_id: sentence-transformers
-  model_type: embedding
- metadata: {}
-  model_id: ${env.INFERENCE_MODEL}
-  provider_id: vllm-inference
-  model_type: llm
- metadata: {}
-  model_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
-  provider_id: vllm-safety
-  model_type: llm
-shields:
- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
-vector_dbs: []
-datasets: []
-scoring_fns: []
-benchmarks: []
-tool_groups:
- toolgroup_id: builtin::websearch
-  provider_id: tavily-search
- toolgroup_id: builtin::rag
-  provider_id: rag-runtime
+      backend: sql_default
+      max_write_queue_size: 10000
+      num_writers: 4
+    conversations:
+      table_name: openai_conversations
+      backend: sql_default
+    prompts:
+      namespace: prompts
+      backend: kv_default
+registered_resources:
+  models:
+  - metadata:
+      embedding_dimension: 768
+    model_id: nomic-embed-text-v1.5
+    provider_id: sentence-transformers
+    model_type: embedding
+  - metadata: {}
+    model_id: ${env.INFERENCE_MODEL}
+    provider_id: vllm-inference
+    model_type: llm
+  - metadata: {}
+    model_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
+    provider_id: vllm-safety
+    model_type: llm
+  shields:
+  - shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
+  vector_dbs: []
+  datasets: []
+  scoring_fns: []
+  benchmarks: []
+  tool_groups:
+  - toolgroup_id: builtin::websearch
+    provider_id: tavily-search
+  - toolgroup_id: builtin::rag
+    provider_id: rag-runtime
 server:
  port: 8321
  auth:
    provider_config:
      type: github_token
+vector_stores:
+  default_provider_id: chromadb
+  default_embedding_model:
+    provider_id: sentence-transformers
+    model_id: nomic-ai/nomic-embed-text-v1.5
--- a/docs/docs/distributions/k8s/ui-k8s.yaml.template
+++ b/docs/docs/distributions/k8s/ui-k8s.yaml.template
@ -44,7 +44,7 @@ spec:

            # Navigate to the UI directory
            echo "Navigating to UI directory..."
-            cd /app/llama_stack/ui
+            cd /app/llama_stack_ui

            # Check if package.json exists
            if [ ! -f "package.json" ]; then
--- a/docs/docs/distributions/list_of_distributions.mdx
+++ b/docs/docs/distributions/list_of_distributions.mdx
@ -28,7 +28,7 @@ Llama Stack provides several pre-configured distributions to help you get starte
 - Run locally with Ollama for development

 ```bash
-docker pull llama-stack/distribution-starter
+docker pull llamastack/distribution-starter
 ```

 **Guides:** [Starter Distribution Guide](self_hosted_distro/starter)
@ -41,7 +41,7 @@ docker pull llama-stack/distribution-starter
 - Need to run inference locally

 ```bash
-docker pull llama-stack/distribution-meta-reference-gpu
+docker pull llamastack/distribution-meta-reference-gpu
 ```

 **Guides:** [Meta Reference GPU Guide](self_hosted_distro/meta-reference-gpu)
--- a/docs/docs/distributions/llama_stack_ui.mdx
+++ b/docs/docs/distributions/llama_stack_ui.mdx
@ -0,0 +1,109 @@
+---
+title: Llama Stack UI
+description: Web-based user interface for interacting with Llama Stack servers
+sidebar_label: Llama Stack UI
+sidebar_position: 8
+---
+
+# Llama Stack UI
+
+The Llama Stack UI is a web-based interface for interacting with Llama Stack servers. Built with Next.js and React, it provides a visual way to work with agents, manage resources, and view logs.
+
+## Features
+
+- **Logs & Monitoring**: View chat completions, agent responses, and vector store activity
+- **Vector Stores**: Create and manage vector databases for RAG (Retrieval-Augmented Generation) workflows
+- **Prompt Management**: Create and manage reusable prompts
+
+## Prerequisites
+
+You need a running Llama Stack server. The UI is a client that connects to the Llama Stack backend.
+
+If you don't have a Llama Stack server running yet, see the [Starting Llama Stack Server](../getting_started/starting_llama_stack_server.mdx) guide.
+
+## Running the UI
+
+### Option 1: Using npx (Recommended for Quick Start)
+
+The fastest way to get started is using `npx`:
+
+```bash
+npx llama-stack-ui
+```
+
+This will start the UI server on `http://localhost:8322` (default port).
+
+### Option 2: Using Docker
+
+Run the UI in a container:
+
+```bash
+docker run -p 8322:8322 llamastack/ui
+```
+
+Access the UI at `http://localhost:8322`.
+
+## Environment Variables
+
+The UI can be configured using the following environment variables:
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `LLAMA_STACK_BACKEND_URL` | URL of your Llama Stack server | `http://localhost:8321` |
+| `LLAMA_STACK_UI_PORT` | Port for the UI server | `8322` |
+
+If the Llama Stack server is running with authentication enabled, you can configure the UI to use it by setting the following environment variables:
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `NEXTAUTH_URL` | NextAuth URL for authentication | `http://localhost:8322` |
+| `GITHUB_CLIENT_ID` | GitHub OAuth client ID (optional, for authentication) | - |
+| `GITHUB_CLIENT_SECRET` | GitHub OAuth client secret (optional, for authentication) | - |
+
+### Setting Environment Variables
+
+#### For npx:
+
+```bash
+LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
+LLAMA_STACK_UI_PORT=8080 \
+npx llama-stack-ui
+```
+
+#### For Docker:
+
+```bash
+docker run -p 8080:8080 \
+  -e LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
+  -e LLAMA_STACK_UI_PORT=8080 \
+  llamastack/ui
+```
+
+## Using the UI
+
+### Managing Resources
+
+- **Vector Stores**: Create vector databases for RAG workflows, view stored documents and embeddings
+- **Prompts**: Create and manage reusable prompt templates
+- **Chat Completions**: View history of chat interactions
+- **Responses**: Browse detailed agent responses and tool calls
+
+## Development
+
+If you want to run the UI from source for development:
+
+```bash
+# From the project root
+cd src/llama_stack_ui
+
+# Install dependencies
+npm install
+
+# Set environment variables
+export LLAMA_STACK_BACKEND_URL=http://localhost:8321
+
+# Start the development server
+npm run dev
+```
+
+The development server will start on `http://localhost:8322` with hot reloading enabled.
--- a/docs/docs/distributions/remote_hosted_distro/index.mdx
+++ b/docs/docs/distributions/remote_hosted_distro/index.mdx
@ -2,10 +2,10 @@

 Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.

-| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
+| Distribution | Endpoint | Inference | Agents | Memory | Safety |
 |-------------|----------|-----------|---------|---------|---------|------------|
-| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference |
-| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference |
+| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference |
+| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference |

 ## Connecting to Remote-Hosted Distributions

--- a/docs/docs/distributions/remote_hosted_distro/oci.md
+++ b/docs/docs/distributions/remote_hosted_distro/oci.md
@ -0,0 +1,143 @@
+---
+orphan: true
+---
+<!-- This file was auto-generated by distro_codegen.py, please edit source -->
+# OCI Distribution
+
+The `llamastack/distribution-oci` distribution consists of the following provider configurations.
+
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| datasetio | `remote::huggingface`, `inline::localfs` |
+| eval | `inline::meta-reference` |
+| files | `inline::localfs` |
+| inference | `remote::oci` |
+| safety | `inline::llama-guard` |
+| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
+| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
+| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
+
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
+- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
+- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
+- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
+- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
+
+
+## Prerequisites
+### Oracle Cloud Infrastructure Setup
+
+Before using the OCI Generative AI distribution, ensure you have:
+
+1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
+2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
+3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
+4. **Authentication**: Configure authentication using either:
+   - **Instance Principal** (recommended for cloud-hosted deployments)
+   - **API Key** (for on-premises or development environments)
+
+### Authentication Methods
+
+#### Instance Principal Authentication (Recommended)
+Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
+
+Requirements:
+- Instance must be running in an Oracle Cloud Infrastructure compartment
+- Instance must have appropriate IAM policies to access Generative AI services
+
+#### API Key Authentication
+For development or on-premises deployments, follow [this doc](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm) to learn how to create your API signing key for your config file.
+
+### Required IAM Policies
+
+Ensure your OCI user or instance has the following policy statements:
+
+```
+Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
+Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
+```
+
+## Supported Services
+
+### Inference: OCI Generative AI
+Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
+
+- **Chat Completions**: Conversational AI with context awareness
+- **Text Generation**: Complete prompts and generate text content
+
+#### Available Models
+Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models.
+
+### Safety: Llama Guard
+For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
+- Content filtering and moderation
+- Policy compliance checking
+- Harmful content detection
+
+### Vector Storage: Multiple Options
+The distribution supports several vector storage providers:
+- **FAISS**: Local in-memory vector search
+- **ChromaDB**: Distributed vector database
+- **PGVector**: PostgreSQL with vector extensions
+
+### Additional Services
+- **Dataset I/O**: Local filesystem and Hugging Face integration
+- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
+- **Evaluation**: Meta reference evaluation framework
+
+## Running Llama Stack with OCI
+
+You can run the OCI distribution via Docker or local virtual environment.
+
+### Via venv
+
+If you've set up your local development environment, you can also build the image using your local virtual environment.
+
+```bash
+OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci
+```
+
+### Configuration Examples
+
+#### Using Instance Principal (Recommended for Production)
+```bash
+export OCI_AUTH_TYPE=instance_principal
+export OCI_REGION=us-chicago-1
+export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
+```
+
+#### Using API Key Authentication (Development)
+```bash
+export OCI_AUTH_TYPE=config_file
+export OCI_CONFIG_FILE_PATH=~/.oci/config
+export OCI_CLI_PROFILE=DEFAULT
+export OCI_REGION=us-chicago-1
+export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
+```
+
+## Regional Endpoints
+
+OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
+
+https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Authentication Errors**: Verify your OCI credentials and IAM policies
+2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
+3. **Permission Denied**: Check compartment permissions and Generative AI service access
+4. **Region Unavailable**: Verify the specified region supports Generative AI services
+
+### Getting Help
+
+For additional support:
+- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
+- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)
--- a/docs/docs/distributions/remote_hosted_distro/watsonx.md
+++ b/docs/docs/distributions/remote_hosted_distro/watsonx.md
@ -21,7 +21,6 @@ The `llamastack/distribution-watsonx` distribution consists of the following pro
 | inference | `remote::watsonx`, `inline::sentence-transformers` |
 | safety | `inline::llama-guard` |
 | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
-| telemetry | `inline::meta-reference` |
 | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
 | vector_io | `inline::faiss` |

--- a/docs/docs/distributions/self_hosted_distro/dell-tgi.md
+++ b/docs/docs/distributions/self_hosted_distro/dell-tgi.md
@ -13,9 +13,9 @@ self
 The `llamastack/distribution-tgi` distribution consists of the following provider configurations.


-| **API**         	| **Inference** 	| **Agents**     	| **Memory**                                       	| **Safety**     	| **Telemetry**  	|
-|-----------------	|---------------	|----------------	|--------------------------------------------------	|----------------	|----------------	|
-| **Provider(s)** 	| remote::tgi   	| meta-reference 	| meta-reference, remote::pgvector, remote::chroma 	| meta-reference 	| meta-reference 	|
+| **API**         	| **Inference** 	| **Agents**     	| **Memory**                                       	| **Safety**     	|
+|-----------------	|---------------	|----------------	|--------------------------------------------------	|----------------	|
+| **Provider(s)** 	| remote::tgi   	| meta-reference 	| meta-reference, remote::pgvector, remote::chroma 	| meta-reference 	|


 The only difference vs. the `tgi` distribution is that it runs the Dell-TGI server for inference.
--- a/docs/docs/distributions/self_hosted_distro/dell.md
+++ b/docs/docs/distributions/self_hosted_distro/dell.md
@ -22,7 +22,6 @@ The `llamastack/distribution-dell` distribution consists of the following provid
 | inference | `remote::tgi`, `inline::sentence-transformers` |
 | safety | `inline::llama-guard` |
 | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
-| telemetry | `inline::meta-reference` |
 | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime` |
 | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |

--- a/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
@ -79,6 +79,33 @@ docker run \
  --port $LLAMA_STACK_PORT
 ```

+### Via Docker with Custom Run Configuration
+
+You can also run the Docker container with a custom run configuration file by mounting it into the container:
+
+```bash
+# Set the path to your custom run.yaml file
+CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
+LLAMA_STACK_PORT=8321
+
+docker run \
+  -it \
+  --pull always \
+  --gpu all \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
+  -e RUN_CONFIG_PATH=/app/custom-run.yaml \
+  llamastack/distribution-meta-reference-gpu \
+  --port $LLAMA_STACK_PORT
+```
+
+**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
+
+Available run configurations for this distribution:
+- `run.yaml`
+- `run-with-safety.yaml`
+
 ### Via venv

 Make sure you have the Llama Stack CLI available.
--- a/docs/docs/distributions/self_hosted_distro/nvidia.md
+++ b/docs/docs/distributions/self_hosted_distro/nvidia.md
@ -127,13 +127,39 @@ docker run \
  -it \
  --pull always \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
+  -v ~/.llama:/root/.llama \
  -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
  llamastack/distribution-nvidia \
-  --config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT
 ```

+### Via Docker with Custom Run Configuration
+
+You can also run the Docker container with a custom run configuration file by mounting it into the container:
+
+```bash
+# Set the path to your custom run.yaml file
+CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
+LLAMA_STACK_PORT=8321
+
+docker run \
+  -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
+  -e RUN_CONFIG_PATH=/app/custom-run.yaml \
+  -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
+  llamastack/distribution-nvidia \
+  --port $LLAMA_STACK_PORT
+```
+
+**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
+
+Available run configurations for this distribution:
+- `run.yaml`
+- `run-with-safety.yaml`
+
 ### Via venv

 If you've set up your local development environment, you can also install the distribution dependencies using your local virtual environment.
--- a/docs/docs/distributions/self_hosted_distro/passthrough.md
+++ b/docs/docs/distributions/self_hosted_distro/passthrough.md
@ -21,7 +21,6 @@ The `llamastack/distribution-passthrough` distribution consists of the following
 | inference | `remote::passthrough`, `inline::sentence-transformers` |
 | safety | `inline::llama-guard` |
 | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
-| telemetry | `inline::meta-reference` |
 | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `remote::wolfram-alpha`, `inline::rag-runtime`, `remote::model-context-protocol` |
 | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |

--- a/docs/docs/distributions/self_hosted_distro/starter.md
+++ b/docs/docs/distributions/self_hosted_distro/starter.md
@ -26,7 +26,6 @@ The starter distribution consists of the following provider configurations:
 | inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers` |
 | safety | `inline::llama-guard`                                                                                                                                                                                                                                                                                                                          |
 | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust`                                                                                                                                                                                                                                                                                  |
-| telemetry | `inline::meta-reference`                                                                                                                                                                                                                                                                                                                       |
 | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`                                                                                                                                                                                                                                       |
 | vector_io | `inline::faiss`, `inline::sqlite-vec`, `inline::milvus`, `remote::chromadb`, `remote::pgvector`                                                                                                                                                                                                                                                 |

@ -117,10 +116,6 @@ The following environment variables can be configured:
 - `BRAVE_SEARCH_API_KEY`: Brave Search API key
 - `TAVILY_SEARCH_API_KEY`: Tavily Search API key

-### Telemetry Configuration
- `OTEL_SERVICE_NAME`: OpenTelemetry service name
- `TELEMETRY_SINKS`: Telemetry sinks (default: `[]`)
-
 ## Enabling Providers

 You can enable specific providers by setting appropriate environment variables. For example,
@ -164,7 +159,41 @@ docker run \
  --port $LLAMA_STACK_PORT
 ```

-### Via venv
+The container will run the distribution with a SQLite store by default. This store is used for the following components:
+
+- Metadata store: store metadata about the models, providers, etc.
+- Inference store: collect of responses from the inference provider
+- Agents store: store agent configurations (sessions, turns, etc.)
+- Agents Responses store: store responses from the agents
+
+However, you can use PostgreSQL instead by running the `starter::run-with-postgres-store.yaml` configuration:
+
+```bash
+docker run \
+  -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -e OPENAI_API_KEY=your_openai_key \
+  -e FIREWORKS_API_KEY=your_fireworks_key \
+  -e TOGETHER_API_KEY=your_together_key \
+  -e POSTGRES_HOST=your_postgres_host \
+  -e POSTGRES_PORT=your_postgres_port \
+  -e POSTGRES_DB=your_postgres_db \
+  -e POSTGRES_USER=your_postgres_user \
+  -e POSTGRES_PASSWORD=your_postgres_password \
+  llamastack/distribution-starter \
+  starter::run-with-postgres-store.yaml
+```
+
+Postgres environment variables:
+
+- `POSTGRES_HOST`: Postgres host (default: `localhost`)
+- `POSTGRES_PORT`: Postgres port (default: `5432`)
+- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
+- `POSTGRES_USER`: Postgres username (default: `llamastack`)
+- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)
+
+### Via Conda or venv

 Ensure you have configured the starter distribution using the environment variables explained above.

@ -172,8 +201,11 @@ Ensure you have configured the starter distribution using the environment variab
 # Install dependencies for the starter distribution
 uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install

-# Run the server
+# Run the server (with SQLite - default)
 uv run --with llama-stack llama stack run starter
+
+# Or run with PostgreSQL
+uv run --with llama-stack llama stack run starter::run-with-postgres-store.yaml
 ```

 ## Example Usage
@ -229,7 +261,7 @@ The starter distribution uses SQLite for local storage of various components:
 2. **Flexible Configuration**: Easy to enable/disable providers based on your needs
 3. **No Local GPU Required**: Most providers are cloud-based, making it accessible to developers without high-end hardware
 4. **Easy Migration**: Start with hosted providers and gradually move to local ones as needed
-5. **Production Ready**: Includes safety, evaluation, and telemetry components
+5. **Production Ready**: Includes safety and evaluation
 6. **Tool Integration**: Comes with web search, RAG, and model context protocol tools

 The starter distribution is ideal for developers who want to experiment with different AI providers, build prototypes quickly, or create applications that can work with multiple AI backends.
--- a/docs/docs/distributions/starting_llama_stack_server.mdx
+++ b/docs/docs/distributions/starting_llama_stack_server.mdx
@ -27,7 +27,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste

 Control log output via environment variables before starting the server.

- `LLAMA_STACK_LOGGING` sets per-component levels, e.g. `LLAMA_STACK_LOGGING=server=debug;core=info`.
+- `LLAMA_STACK_LOGGING` sets per-component levels, e.g. `LLAMA_STACK_LOGGING=server=debug,core=info`.
 - Supported categories: `all`, `core`, `server`, `router`, `inference`, `agents`, `safety`, `eval`, `tools`, `client`.
 - Levels: `debug`, `info`, `warning`, `error`, `critical` (default is `info`). Use `all=<level>` to apply globally.
 - `LLAMA_STACK_LOG_FILE=/path/to/log` mirrors logs to a file while still printing to stdout.
--- a/docs/docs/getting_started/detailed_tutorial.mdx
+++ b/docs/docs/getting_started/detailed_tutorial.mdx
@ -144,7 +144,7 @@ source .venv/bin/activate
 ```bash
 uv venv client --python 3.12
 source client/bin/activate
-pip install llama-stack-client
+uv pip install llama-stack-client
 ```
 </TabItem>
 </Tabs>
@ -239,8 +239,13 @@ client = LlamaStackClient(base_url="http://localhost:8321")
 models = client.models.list()

 # Select the first LLM
-llm = next(m for m in models if m.model_type == "llm" and m.provider_id == "ollama")
-model_id = llm.identifier
+llm = next(
+    m for m in models
+    if m.custom_metadata
+    and m.custom_metadata.get("model_type") == "llm"
+    and m.custom_metadata.get("provider_id") == "ollama"
+)
+model_id = llm.id

 print("Model:", model_id)

@ -279,8 +284,13 @@ import uuid
 client = LlamaStackClient(base_url=f"http://localhost:8321")

 models = client.models.list()
-llm = next(m for m in models if m.model_type == "llm" and m.provider_id == "ollama")
-model_id = llm.identifier
+llm = next(
+    m for m in models
+    if m.custom_metadata
+    and m.custom_metadata.get("model_type") == "llm"
+    and m.custom_metadata.get("provider_id") == "ollama"
+)
+model_id = llm.id

 agent = Agent(client, model=model_id, instructions="You are a helpful assistant.")

@ -450,8 +460,11 @@ import uuid
 client = LlamaStackClient(base_url="http://localhost:8321")

 # Create a vector database instance
-embed_lm = next(m for m in client.models.list() if m.model_type == "embedding")
-embedding_model = embed_lm.identifier
+embed_lm = next(
+    m for m in client.models.list()
+    if m.custom_metadata and m.custom_metadata.get("model_type") == "embedding"
+)
+embedding_model = embed_lm.id
 vector_db_id = f"v{uuid.uuid4().hex}"
 # The VectorDB API is deprecated; the server now returns its own authoritative ID.
 # We capture the correct ID from the response's .identifier attribute.
@ -489,9 +502,11 @@ client.tool_runtime.rag_tool.insert(
 llm = next(
    m
    for m in client.models.list()
-    if m.model_type == "llm" and m.provider_id == "ollama"
+    if m.custom_metadata
+    and m.custom_metadata.get("model_type") == "llm"
+    and m.custom_metadata.get("provider_id") == "ollama"
 )
-model = llm.identifier
+model = llm.id

 # Create the RAG agent
 rag_agent = Agent(
--- a/docs/docs/getting_started/quickstart.mdx
+++ b/docs/docs/getting_started/quickstart.mdx
@ -24,6 +24,9 @@ ollama run llama3.2:3b --keepalive 60m

 #### Step 2: Run the Llama Stack server

+```python file=./demo_script.py title="demo_script.py"
+```
+
 We will use `uv` to install dependencies and run the Llama Stack server.
 ```bash
 # Install dependencies for the starter distribution
@ -35,27 +38,6 @@ OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run star
 #### Step 3: Run the demo
 Now open up a new terminal and copy the following script into a file named `demo_script.py`.

-```python
-import io, requests
-from openai import OpenAI
-
-url="https://www.paulgraham.com/greatwork.html"
-client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
-
-vs = client.vector_stores.create()
-response = requests.get(url)
-pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
-uploaded_file = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants")
-client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
-
-resp = client.responses.create(
-    model="openai/gpt-4o",
-    input="How do you do great work? Use the existing knowledge_search tool.",
-    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
-    include=["file_search_call.results"],
-)
-
-
 We will use `uv` to run the script
 ```
 uv run --with llama-stack-client,fire,requests demo_script.py
--- a/docs/docs/index.mdx
+++ b/docs/docs/index.mdx
@ -29,7 +29,7 @@ Llama Stack is now available! See the [release notes](https://github.com/llamast

 Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides:

- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
+- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals.
 - **Plugin architecture** to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile.
 - **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment
 - **Multiple developer interfaces** like CLI and SDKs for Python, Node, iOS, and Android
--- a/docs/docs/providers/agents/index.mdx
+++ b/docs/docs/providers/agents/index.mdx
@ -1,7 +1,8 @@
 ---
-description: "Agents
+description: |
+  Agents

-    APIs for creating and interacting with agentic systems."
+  APIs for creating and interacting with agentic systems.
 sidebar_label: Agents
 title: Agents
 ---
@ -12,6 +13,6 @@ title: Agents

 Agents

-    APIs for creating and interacting with agentic systems.
+APIs for creating and interacting with agentic systems.

 This section contains documentation for all available providers for the **agents** API.
--- a/docs/docs/providers/agents/inline_meta-reference.mdx
+++ b/docs/docs/providers/agents/inline_meta-reference.mdx
@ -14,7 +14,7 @@ Meta's reference implementation of an agent system that can use tools, access ve

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `persistence` | `<class 'inline.agents.meta_reference.config.AgentPersistenceConfig'>` | No |  |  |
+| `persistence` | `AgentPersistenceConfig` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/batches/index.mdx
+++ b/docs/docs/providers/batches/index.mdx
@ -1,14 +1,15 @@
 ---
-description: "The Batches API enables efficient processing of multiple requests in a single operation,
-    particularly useful for processing large datasets, batch evaluation workflows, and
-    cost-effective inference at scale.
+description: |
+  The Batches API enables efficient processing of multiple requests in a single operation,
+  particularly useful for processing large datasets, batch evaluation workflows, and
+  cost-effective inference at scale.

-    The API is designed to allow use of openai client libraries for seamless integration.
+  The API is designed to allow use of openai client libraries for seamless integration.

-    This API provides the following extensions:
-     - idempotent batch creation
+  This API provides the following extensions:
+   - idempotent batch creation

-    Note: This API is currently under active development and may undergo changes."
+  Note: This API is currently under active development and may undergo changes.
 sidebar_label: Batches
 title: Batches
 ---
@ -18,14 +19,14 @@ title: Batches
 ## Overview

 The Batches API enables efficient processing of multiple requests in a single operation,
-    particularly useful for processing large datasets, batch evaluation workflows, and
-    cost-effective inference at scale.
+particularly useful for processing large datasets, batch evaluation workflows, and
+cost-effective inference at scale.

-    The API is designed to allow use of openai client libraries for seamless integration.
+The API is designed to allow use of openai client libraries for seamless integration.

-    This API provides the following extensions:
-     - idempotent batch creation
+This API provides the following extensions:
+ - idempotent batch creation

-    Note: This API is currently under active development and may undergo changes.
+Note: This API is currently under active development and may undergo changes.

 This section contains documentation for all available providers for the **batches** API.
--- a/docs/docs/providers/batches/inline_reference.mdx
+++ b/docs/docs/providers/batches/inline_reference.mdx
@ -14,9 +14,9 @@ Reference implementation of batches API with KVStore persistence.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  | Configuration for the key-value store backend. |
-| `max_concurrent_batches` | `<class 'int'>` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
-| `max_concurrent_requests_per_batch` | `<class 'int'>` | No | 10 | Maximum number of concurrent requests to process per batch. |
+| `kvstore` | `KVStoreReference` | No |  | Configuration for the key-value store backend. |
+| `max_concurrent_batches` | `int` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
+| `max_concurrent_requests_per_batch` | `int` | No | 10 | Maximum number of concurrent requests to process per batch. |

 ## Sample Configuration

--- a/docs/docs/providers/datasetio/inline_localfs.mdx
+++ b/docs/docs/providers/datasetio/inline_localfs.mdx
@ -14,7 +14,7 @@ Local filesystem-based dataset I/O provider for reading and writing datasets to

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  |  |
+| `kvstore` | `KVStoreReference` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/datasetio/remote_huggingface.mdx
+++ b/docs/docs/providers/datasetio/remote_huggingface.mdx
@ -14,7 +14,7 @@ HuggingFace datasets provider for accessing and managing datasets from the Huggi

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  |  |
+| `kvstore` | `KVStoreReference` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/datasetio/remote_nvidia.mdx
+++ b/docs/docs/providers/datasetio/remote_nvidia.mdx
@ -17,7 +17,7 @@ NVIDIA's dataset I/O provider for accessing datasets from NVIDIA's data platform
 | `api_key` | `str \| None` | No |  | The NVIDIA API key. |
 | `dataset_namespace` | `str \| None` | No | default | The NVIDIA dataset namespace. |
 | `project_id` | `str \| None` | No | test-project | The NVIDIA project ID. |
-| `datasets_url` | `<class 'str'>` | No | http://nemo.test | Base URL for the NeMo Dataset API |
+| `datasets_url` | `str` | No | http://nemo.test | Base URL for the NeMo Dataset API |

 ## Sample Configuration

--- a/docs/docs/providers/eval/index.mdx
+++ b/docs/docs/providers/eval/index.mdx
@ -1,7 +1,8 @@
 ---
-description: "Evaluations
+description: |
+  Evaluations

-    Llama Stack Evaluation API for running evaluations on model and agent candidates."
+  Llama Stack Evaluation API for running evaluations on model and agent candidates.
 sidebar_label: Eval
 title: Eval
 ---
@ -12,6 +13,6 @@ title: Eval

 Evaluations

-    Llama Stack Evaluation API for running evaluations on model and agent candidates.
+Llama Stack Evaluation API for running evaluations on model and agent candidates.

 This section contains documentation for all available providers for the **eval** API.
--- a/docs/docs/providers/eval/inline_meta-reference.mdx
+++ b/docs/docs/providers/eval/inline_meta-reference.mdx
@ -14,7 +14,7 @@ Meta's reference implementation of evaluation tasks with support for multiple la

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  |  |
+| `kvstore` | `KVStoreReference` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/eval/remote_nvidia.mdx
+++ b/docs/docs/providers/eval/remote_nvidia.mdx
@ -14,7 +14,7 @@ NVIDIA's evaluation provider for running evaluation tasks on NVIDIA's platform.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `evaluator_url` | `<class 'str'>` | No | http://0.0.0.0:7331 | The url for accessing the evaluator service |
+| `evaluator_url` | `str` | No | http://0.0.0.0:7331 | The url for accessing the evaluator service |

 ## Sample Configuration

--- a/docs/docs/providers/external/external-providers-guide.mdx
+++ b/docs/docs/providers/external/external-providers-guide.mdx
@ -80,7 +80,7 @@ container_image: custom-vector-store:latest  # optional
 All providers must contain a `get_provider_spec` function in their `provider` module. This is a standardized structure that Llama Stack expects and is necessary for getting things such as the config class. The `get_provider_spec` method returns a structure identical to the `adapter`. An example function may look like:

 ```python
-from llama_stack.providers.datatypes import (
+from llama_stack_api.providers.datatypes import (
    ProviderSpec,
    Api,
    RemoteProviderSpec,
--- a/docs/docs/providers/files/files.mdx
+++ b/docs/docs/providers/files/files.mdx
@ -0,0 +1,290 @@
+---
+sidebar_label: Files
+title: Files
+---
+
+## Overview
+
+The Files API provides file management capabilities for Llama Stack. It allows you to upload, store, retrieve, and manage files that can be used across various endpoints in your application.
+
+## Features
+
+- **File Upload**: Upload files with metadata and purpose classification
+- **File Management**: List, retrieve, and delete files
+- **Content Retrieval**: Access raw file content for processing
+- **API Compatibility**: Full compatibility with OpenAI Files API endpoints
+- **Flexible Storage**: Support for local filesystem and cloud storage backends
+
+## API Endpoints
+
+### Upload File
+
+**POST** `/v1/openai/v1/files`
+
+Upload a file that can be used across various endpoints.
+
+**Request Body:**
+- `file`: The file object to be uploaded (multipart form data)
+- `purpose`: The intended purpose of the uploaded file
+
+**Supported Purposes:**
+- `batch`: Files for batch operations
+
+**Response:**
+```json
+{
+  "id": "file-abc123",
+  "object": "file",
+  "bytes": 140,
+  "created_at": 1613779121,
+  "filename": "mydata.jsonl",
+  "purpose": "batch"
+}
+```
+
+**Example:**
+```python
+import requests
+
+with open("data.jsonl", "rb") as f:
+    files = {"file": f}
+    data = {"purpose": "batch"}
+    response = requests.post(
+        "http://localhost:8000/v1/openai/v1/files", files=files, data=data
+      )
+    file_info = response.json()
+```
+
+### List Files
+
+**GET** `/v1/openai/v1/files`
+
+Returns a list of files that belong to the user's organization.
+
+**Query Parameters:**
+- `after` (optional): A cursor for pagination
+- `limit` (optional): Limit on number of objects (1-10,000, default: 10,000)
+- `order` (optional): Sort order by created_at timestamp (`asc` or `desc`, default: `desc`)
+- `purpose` (optional): Filter files by purpose
+
+**Response:**
+```json
+{
+  "object": "list",
+  "data": [
+    {
+      "id": "file-abc123",
+      "object": "file",
+      "bytes": 140,
+      "created_at": 1613779121,
+      "filename": "mydata.jsonl",
+      "purpose": "fine-tune"
+    }
+  ],
+  "has_more": false
+}
+```
+
+**Example:**
+```python
+import requests
+
+# List all files
+response = requests.get("http://localhost:8000/v1/openai/v1/files")
+files = response.json()
+
+# List files with pagination
+response = requests.get(
+    "http://localhost:8000/v1/openAi/v1/files",
+    params={"limit": 10, "after": "file-abc123"},
+)
+files = response.json()
+
+# Filter by purpose
+response = requests.get(
+    "http://localhost:8000/v1/openAi/v1/files", params={"purpose": "fine-tune"}
+)
+files = response.json()
+```
+
+### Retrieve File
+
+**GET** `/v1/openAi/v1/files/{file_id}`
+
+Returns information about a specific file.
+
+**Path Parameters:**
+- `file_id`: The ID of the file to retrieve
+
+**Response:**
+```json
+{
+  "id": "file-abc123",
+  "object": "file",
+  "bytes": 140,
+  "created_at": 1613779121,
+  "filename": "mydata.jsonl",
+  "purpose": "fine-tune"
+}
+```
+
+**Example:**
+```python
+import requests
+
+file_id = "file-abc123"
+response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}")
+file_info = response.json()
+```
+
+### Delete File
+
+**DELETE** `/v1/openAi/v1/files/{file_id}`
+
+Delete a file.
+
+**Path Parameters:**
+- `file_id`: The ID of the file to delete
+
+**Response:**
+```json
+{
+  "id": "file-abc123",
+  "object": "file",
+  "deleted": true
+}
+```
+
+**Example:**
+```python
+import requests
+
+file_id = "file-abc123"
+response = requests.delete(f"http://localhost:8000/v1/openAi/v1/files/{file_id}")
+result = response.json()
+```
+
+### Retrieve File Content
+
+**GET** `/v1/openAi/v1/files/{file_id}/content`
+
+Returns the raw file content as a binary response.
+
+**Path Parameters:**
+- `file_id`: The ID of the file to retrieve content from
+
+**Response:**
+Binary file content with appropriate headers:
+- `Content-Type`: `application/octet-stream`
+- `Content-Disposition`: `attachment; filename="filename"`
+
+**Example:**
+```python
+import requests
+
+file_id = "file-abc123"
+response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}/content")
+
+# Save content to file
+with open("downloaded_file.jsonl", "wb") as f:
+    f.write(response.content)
+
+# Or process content directly
+content = response.content
+```
+
+## Vector Store Integration
+
+The Files API integrates with Vector Stores to enable document processing and search. For detailed information about this integration, see [File Operations and Vector Store Integration](../concepts/file_operations_vector_stores.md).
+
+### Vector Store File Operations
+
+**List Vector Store Files:**
+- **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files`
+
+**Retrieve Vector Store File Content:**
+- **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files/{file_id}/content`
+
+**Attach File to Vector Store:**
+- **POST** `/v1/openAi/v1/vector_stores/{vector_store_id}/files`
+
+## Error Handling
+
+The Files API returns standard HTTP status codes and error responses:
+
+- `400 Bad Request`: Invalid request parameters
+- `404 Not Found`: File not found
+- `429 Too Many Requests`: Rate limit exceeded
+- `500 Internal Server Error`: Server error
+
+**Error Response Format:**
+```json
+{
+  "error": {
+    "message": "Error description",
+    "type": "invalid_request_error",
+    "code": "file_not_found"
+  }
+}
+```
+
+## Rate Limits
+
+The Files API implements rate limiting to ensure fair usage:
+- File uploads: 100 files per minute
+- File retrievals: 1000 requests per minute
+- File deletions: 100 requests per minute
+
+## Best Practices
+
+1. **File Organization**: Use descriptive filenames and appropriate purpose classifications
+2. **Batch Operations**: For multiple files, consider using batch endpoints when available
+3. **Error Handling**: Always check response status codes and handle errors gracefully
+4. **Content Types**: Ensure files are uploaded with appropriate content types
+5. **Cleanup**: Regularly delete unused files to manage storage costs
+
+## Integration Examples
+
+### With Python Client
+
+```python
+from llama_stack import LlamaStackClient
+
+client = LlamaStackClient("http://localhost:8000")
+
+# Upload a file
+with open("data.jsonl", "rb") as f:
+    file_info = await client.files.upload(file=f, purpose="fine-tune")
+
+# List files
+files = await client.files.list(purpose="fine-tune")
+
+# Retrieve file content
+content = await client.files.retrieve_content(file_info.id)
+```
+
+### With cURL
+
+```bash
+# Upload file
+curl -X POST http://localhost:8000/v1/openAi/v1/files \
+  -F "file=@data.jsonl" \
+  -F "purpose=fine-tune"
+
+# List files
+curl http://localhost:8000/v1/openAi/v1/files
+
+# Download file content
+curl http://localhost:8000/v1/openAi/v1/files/file-abc123/content \
+  -o downloaded_file.jsonl
+```
+
+## Provider Support
+
+The Files API supports multiple storage backends:
+
+- **Local Filesystem**: Store files on local disk (inline provider)
+- **S3**: Store files in AWS S3 or S3-compatible services (remote provider)
+- **Custom Backends**: Extensible architecture for custom storage providers
+
+See the [Files Providers](index.md) documentation for detailed configuration options.
--- a/docs/docs/providers/files/index.mdx
+++ b/docs/docs/providers/files/index.mdx
@ -1,7 +1,8 @@
 ---
-description: "Files
+description: |
+  Files

-    This API is used to upload documents that can be used with other Llama Stack APIs."
+  This API is used to upload documents that can be used with other Llama Stack APIs.
 sidebar_label: Files
 title: Files
 ---
@ -12,6 +13,6 @@ title: Files

 Files

-    This API is used to upload documents that can be used with other Llama Stack APIs.
+This API is used to upload documents that can be used with other Llama Stack APIs.

 This section contains documentation for all available providers for the **files** API.
--- a/docs/docs/providers/files/inline_localfs.mdx
+++ b/docs/docs/providers/files/inline_localfs.mdx
@ -14,9 +14,9 @@ Local filesystem-based file storage provider for managing files and documents lo

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `storage_dir` | `<class 'str'>` | No |  | Directory to store uploaded files |
-| `metadata_store` | `<class 'llama_stack.core.storage.datatypes.SqlStoreReference'>` | No |  | SQL store configuration for file metadata |
-| `ttl_secs` | `<class 'int'>` | No | 31536000 |  |
+| `storage_dir` | `str` | No |  | Directory to store uploaded files |
+| `metadata_store` | `SqlStoreReference` | No |  | SQL store configuration for file metadata |
+| `ttl_secs` | `int` | No | 31536000 |  |

 ## Sample Configuration

--- a/docs/docs/providers/files/openai_file_operations_quick_reference.md
+++ b/docs/docs/providers/files/openai_file_operations_quick_reference.md
@ -0,0 +1,80 @@
+# File Operations Quick Reference
+
+## Overview
+
+As of release 0.2.14, Llama Stack provides comprehensive file operations and Vector Store API integration, following the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files).
+
+> **Note**: For detailed overview and implementation details, see [Overview](../openai_file_operations_support.md#overview) in the full documentation.
+
+## Supported Providers
+
+> **Note**: For complete provider details and features, see [Supported Providers](../openai_file_operations_support.md#supported-providers) in the full documentation.
+
+**Inline Providers**: FAISS, SQLite-vec, Milvus
+**Remote Providers**: ChromaDB, Qdrant, Weaviate, PGVector
+
+## Quick Start
+
+### 1. Upload File
+```python
+file_info = await client.files.upload(
+    file=open("document.pdf", "rb"), purpose="assistants"
+)
+```
+
+### 2. Create Vector Store
+```python
+vector_store = client.vector_stores.create(name="my_docs")
+```
+
+### 3. Attach File
+```python
+await client.vector_stores.files.create(
+    vector_store_id=vector_store.id, file_id=file_info.id
+)
+```
+
+### 4. Search
+```python
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id, query="What is the main topic?", max_num_results=5
+)
+```
+
+## File Processing & Search
+
+**Processing**: 800 tokens default chunk size, 400 token overlap
+**Formats**: PDF, DOCX, TXT, Code files, etc.
+**Search**: Vector similarity, Hybrid (SQLite-vec), Filtered with metadata
+
+## Configuration
+
+> **Note**: For detailed configuration examples and options, see [Configuration Examples](../openai_file_operations_support.md#configuration-examples) in the full documentation.
+
+**Basic Setup**: Configure vector_io and files providers in your run.yaml
+
+## Common Use Cases
+
+- **RAG Systems**: Document Q&A with file uploads
+- **Knowledge Bases**: Searchable document collections
+- **Content Analysis**: Document similarity and clustering
+- **Research Tools**: Literature review and analysis
+
+## Performance Tips
+
+> **Note**: For detailed performance optimization strategies, see [Performance Considerations](../openai_file_operations_support.md#performance-considerations) in the full documentation.
+
+**Quick Tips**: Choose provider based on your needs (speed vs. storage vs. scalability)
+
+## Troubleshooting
+
+> **Note**: For comprehensive troubleshooting, see [Troubleshooting](../openai_file_operations_support.md#troubleshooting) in the full documentation.
+
+**Quick Fixes**: Check file format compatibility, optimize chunk sizes, monitor storage
+
+## Resources
+
+- [Full Documentation](openai_file_operations_support.md)
+- [Integration Guide](../concepts/file_operations_vector_stores.md)
+- [Files API](files_api.md)
+- [Provider Details](../vector_io/index.md)
--- a/docs/docs/providers/files/openai_file_operations_support.md
+++ b/docs/docs/providers/files/openai_file_operations_support.md
@ -0,0 +1,291 @@
+# File Operations Support in Vector Store Providers
+
+## Overview
+
+This document provides a comprehensive overview of file operations and Vector Store API support across all available vector store providers in Llama Stack. As of release 0.2.24, the following providers support full file operations integration.
+
+## Supported Providers
+
+### ✅ Full File Operations Support
+
+The following providers support complete file operations integration, including file upload, automatic processing, and search:
+
+#### Inline Providers (Single Node)
+
+| Provider | File Operations | Key Features |
+|----------|----------------|--------------|
+| **FAISS** | ✅ Full Support | Fast in-memory search, GPU acceleration |
+| **SQLite-vec** | ✅ Full Support | Hybrid search, disk-based storage |
+| **Milvus** | ✅ Full Support | High-performance, scalable indexing |
+
+#### Remote Providers (Hosted)
+
+| Provider | File Operations | Key Features |
+|----------|----------------|--------------|
+| **ChromaDB** | ✅ Full Support | Metadata filtering, persistent storage |
+| **Qdrant** | ✅ Full Support | Payload filtering, advanced search |
+| **Weaviate** | ✅ Full Support | GraphQL interface, schema management |
+| **Postgres (PGVector)** | ✅ Full Support | SQL integration, ACID compliance |
+
+### 🔄 Partial Support
+
+Some providers may support basic vector operations but lack full file operations integration:
+
+| Provider | Status | Notes |
+|----------|--------|-------|
+| **Meta Reference** | 🔄 Basic | Core vector operations only |
+
+## File Operations Features
+
+All supported providers offer the following file operations capabilities:
+
+### Core Functionality
+
+- **File Upload & Processing**: Automatic document ingestion and chunking
+- **Vector Storage**: Embedding generation and storage
+- **Search & Retrieval**: Semantic search with metadata filtering
+- **File Management**: List, retrieve, and manage files in vector stores
+
+### Advanced Features
+
+- **Automatic Chunking**: Configurable chunk sizes and overlap
+- **Metadata Preservation**: File attributes and chunk metadata
+- **Status Tracking**: Monitor file processing progress
+- **Error Handling**: Comprehensive error reporting and recovery
+
+## Implementation Details
+
+### File Processing Pipeline
+
+1. **Upload**: File uploaded via Files API
+2. **Extraction**: Text content extracted from various formats
+3. **Chunking**: Content split into optimal chunks (default: 800 tokens)
+4. **Embedding**: Chunks converted to vector embeddings
+5. **Storage**: Vectors stored with metadata in vector database
+6. **Indexing**: Search index updated for fast retrieval
+
+### Supported File Formats
+
+- **Documents**: PDF, DOCX, DOC
+- **Text**: TXT, MD, RST
+- **Code**: Python, JavaScript, Java, C++, etc.
+- **Data**: JSON, CSV, XML
+- **Web**: HTML files
+
+### Chunking Strategies
+
+- **Default**: 800 tokens with 400 token overlap
+- **Custom**: Configurable chunk sizes and overlap
+- **Static**: Fixed-size chunks with overlap
+
+## Provider-Specific Features
+
+### FAISS
+
+- **Storage**: In-memory with optional persistence
+- **Performance**: Optimized for speed and GPU acceleration
+- **Use Case**: High-performance, memory-constrained environments
+
+### SQLite-vec
+
+- **Storage**: Disk-based with SQLite backend
+- **Search**: Hybrid vector + keyword search
+- **Use Case**: Large document collections, frequent updates
+
+### Milvus
+
+- **Storage**: Scalable distributed storage
+- **Indexing**: Multiple index types (IVF, HNSW)
+- **Use Case**: Production deployments, large-scale applications
+
+### ChromaDB
+
+- **Storage**: Persistent storage with metadata
+- **Filtering**: Advanced metadata filtering
+- **Use Case**: Applications requiring rich metadata
+
+### Qdrant
+
+- **Storage**: High-performance vector database
+- **Filtering**: Payload-based filtering
+- **Use Case**: Real-time applications, complex queries
+
+### Weaviate
+
+- **Storage**: GraphQL-native vector database
+- **Schema**: Flexible schema management
+- **Use Case**: Applications requiring complex data relationships
+
+### Postgres (PGVector)
+
+- **Storage**: SQL database with vector extensions
+- **Integration**: ACID compliance, existing SQL workflows
+- **Use Case**: Applications requiring transactional guarantees
+
+## Configuration Examples
+
+### Basic Configuration
+
+```yaml
+vector_io:
+  - provider_id: faiss
+    provider_type: inline::faiss
+    config:
+      kvstore:
+        type: sqlite
+        db_path: ~/.llama/faiss_store.db
+```
+
+### With FileResponse Support
+
+```yaml
+vector_io:
+  - provider_id: faiss
+    provider_type: inline::faiss
+    config:
+      kvstore:
+        type: sqlite
+        db_path: ~/.llama/faiss_store.db
+
+files:
+  - provider_id: local-files
+    provider_type: inline::localfs
+    config:
+      storage_dir: ~/.llama/files
+      metadata_store:
+        type: sqlite
+        db_path: ~/.llama/files_metadata.db
+```
+
+## Usage Examples
+
+### Python Client
+
+```python
+from llama_stack import LlamaStackClient
+
+client = LlamaStackClient("http://localhost:8000")
+
+# Create vector store
+vector_store = client.vector_stores.create(name="documents")
+
+# Upload and process file
+with open("document.pdf", "rb") as f:
+    file_info = await client.files.upload(file=f, purpose="assistants")
+
+# Attach to vector store
+await client.vector_stores.files.create(
+    vector_store_id=vector_store.id, file_id=file_info.id
+)
+
+# Search
+results = await client.vector_stores.search(
+    vector_store_id=vector_store.id, query="What is the main topic?", max_num_results=5
+)
+```
+
+### cURL Commands
+
+```bash
+# Upload file
+curl -X POST http://localhost:8000/v1/openai/v1/files \
+  -F "file=@document.pdf" \
+  -F "purpose=assistants"
+
+# Create vector store
+curl -X POST http://localhost:8000/v1/openai/v1/vector_stores \
+  -H "Content-Type: application/json" \
+  -d '{"name": "documents"}'
+
+# Attach file to vector store
+curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/files \
+  -H "Content-Type: application/json" \
+  -d '{"file_id": "file-abc123"}'
+
+# Search vector store
+curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/search \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is the main topic?", "max_num_results": 5}'
+```
+
+## Performance Considerations
+
+### Chunk Size Optimization
+
+- **Small chunks (400-600 tokens)**: Better precision, more results
+- **Large chunks (800-1200 tokens)**: Better context, fewer results
+- **Overlap (50%)**: Maintains context between chunks
+
+### Storage Efficiency
+
+- **FAISS**: Fastest, but memory-limited
+- **SQLite-vec**: Good balance of performance and storage
+- **Milvus**: Scalable, production-ready
+- **Remote providers**: Managed, but network-dependent
+
+### Search Performance
+
+- **Vector search**: Fastest for semantic queries
+- **Hybrid search**: Best accuracy (SQLite-vec only)
+- **Filtered search**: Fast with metadata constraints
+
+## Troubleshooting
+
+### Common Issues
+
+1. **File Processing Failures**
+   - Check file format compatibility
+   - Verify file size limits
+   - Review error messages in file status
+
+2. **Search Performance**
+   - Optimize chunk sizes for your use case
+   - Use filters to narrow search scope
+   - Monitor vector store metrics
+
+3. **Storage Issues**
+   - Check available disk space
+   - Verify database permissions
+   - Monitor memory usage (for in-memory providers)
+
+### Monitoring
+
+```python
+# Check file processing status
+file_status = await client.vector_stores.files.retrieve(
+    vector_store_id=vector_store.id, file_id=file_info.id
+)
+
+if file_status.status == "failed":
+    print(f"Error: {file_status.last_error.message}")
+
+# Monitor vector store health
+health = await client.vector_stores.health(vector_store_id=vector_store.id)
+print(f"Status: {health.status}")
+```
+
+## Best Practices
+
+1. **File Organization**: Use descriptive names and organize by purpose
+2. **Chunking Strategy**: Test different sizes for your specific use case
+3. **Metadata**: Add relevant attributes for better filtering
+4. **Monitoring**: Track processing status and search performance
+5. **Cleanup**: Regularly remove unused files to manage storage
+
+## Future Enhancements
+
+Planned improvements for file operations support:
+
+- **Batch Processing**: Process multiple files simultaneously
+- **Advanced Chunking**: More sophisticated chunking algorithms
+- **Custom Embeddings**: Support for custom embedding models
+- **Real-time Updates**: Live file processing and indexing
+- **Multi-format Support**: Enhanced file format support
+
+## Support and Resources
+
+- **Documentation**: [File Operations and Vector Store Integration](../../concepts/file_operations_vector_stores.mdx)
+- **API Reference**: [Files API](files_api.md)
+- **Provider Docs**: [Vector Store Providers](../vector_io/index.md)
+- **Examples**: [Getting Started](../getting_started/index.md)
+- **Community**: [GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)
--- a/docs/docs/providers/files/remote_openai.mdx
+++ b/docs/docs/providers/files/remote_openai.mdx
@ -0,0 +1,27 @@
+---
+description: "OpenAI Files API provider for managing files through OpenAI's native file storage service."
+sidebar_label: Remote - Openai
+title: remote::openai
+---
+
+# remote::openai
+
+## Description
+
+OpenAI Files API provider for managing files through OpenAI's native file storage service.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str` | No |  | OpenAI API key for authentication |
+| `metadata_store` | `SqlStoreReference` | No |  | SQL store configuration for file metadata |
+
+## Sample Configuration
+
+```yaml
+api_key: ${env.OPENAI_API_KEY}
+metadata_store:
+  table_name: openai_files_metadata
+  backend: sql_default
+```
--- a/docs/docs/providers/files/remote_s3.mdx
+++ b/docs/docs/providers/files/remote_s3.mdx
@ -14,13 +14,13 @@ AWS S3-based file storage provider for scalable cloud file management with metad

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `bucket_name` | `<class 'str'>` | No |  | S3 bucket name to store files |
-| `region` | `<class 'str'>` | No | us-east-1 | AWS region where the bucket is located |
+| `bucket_name` | `str` | No |  | S3 bucket name to store files |
+| `region` | `str` | No | us-east-1 | AWS region where the bucket is located |
 | `aws_access_key_id` | `str \| None` | No |  | AWS access key ID (optional if using IAM roles) |
 | `aws_secret_access_key` | `str \| None` | No |  | AWS secret access key (optional if using IAM roles) |
 | `endpoint_url` | `str \| None` | No |  | Custom S3 endpoint URL (for MinIO, LocalStack, etc.) |
-| `auto_create_bucket` | `<class 'bool'>` | No | False | Automatically create the S3 bucket if it doesn't exist |
-| `metadata_store` | `<class 'llama_stack.core.storage.datatypes.SqlStoreReference'>` | No |  | SQL store configuration for file metadata |
+| `auto_create_bucket` | `bool` | No | False | Automatically create the S3 bucket if it doesn't exist |
+| `metadata_store` | `SqlStoreReference` | No |  | SQL store configuration for file metadata |

 ## Sample Configuration

--- a/docs/docs/providers/index.mdx
+++ b/docs/docs/providers/index.mdx
@ -22,15 +22,25 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
 ## Provider Categories

 - **[External Providers](external/index.mdx)** - Guide for building and using external providers
+- **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility layer
 - **[Inference](inference/index.mdx)** - LLM and embedding model providers
 - **[Agents](agents/index.mdx)** - Agentic system providers
 - **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers
 - **[Safety](safety/index.mdx)** - Content moderation and safety providers
- **[Telemetry](telemetry/index.mdx)** - Monitoring and observability providers
 - **[Vector IO](vector_io/index.mdx)** - Vector database providers
 - **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers
 - **[Files](files/index.mdx)** - File system and storage providers

-## Other information about Providers
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
+## API Documentation
+
+For comprehensive API documentation and reference:
+
+- **[API Reference](../api/index.mdx)** - Complete API documentation
+- **[Experimental APIs](../api-experimental/index.mdx)** - APIs in development
+- **[Deprecated APIs](../api-deprecated/index.mdx)** - Legacy APIs being phased out
+- **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility guide
+
+## Additional Provider Information
+
+- **[OpenAI Implementation Guide](./openai.mdx)** - Code examples and implementation details for OpenAI APIs
 - **[OpenAI-Compatible Responses Limitations](./openai_responses_limitations.mdx)** - Known limitations of the Responses API in Llama Stack
--- a/docs/docs/providers/inference/index.mdx
+++ b/docs/docs/providers/inference/index.mdx
@ -1,11 +1,13 @@
 ---
-description: "Inference
+description: |
+  Inference

-    Llama Stack Inference API for generating completions, chat completions, and embeddings.
+  Llama Stack Inference API for generating completions, chat completions, and embeddings.

-    This API provides the raw interface to the underlying models. Two kinds of models are supported:
-    - LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
-    - Embedding models: these models generate embeddings to be used for semantic search."
+  This API provides the raw interface to the underlying models. Three kinds of models are supported:
+  - LLM models: these models generate "raw" and "chat" (conversational) completions.
+  - Embedding models: these models generate embeddings to be used for semantic search.
+  - Rerank models: these models reorder the documents based on their relevance to a query.
 sidebar_label: Inference
 title: Inference
 ---
@ -16,10 +18,11 @@ title: Inference

 Inference

-    Llama Stack Inference API for generating completions, chat completions, and embeddings.
+Llama Stack Inference API for generating completions, chat completions, and embeddings.

-    This API provides the raw interface to the underlying models. Two kinds of models are supported:
-    - LLM models: these models generate "raw" and "chat" (conversational) completions.
-    - Embedding models: these models generate embeddings to be used for semantic search.
+This API provides the raw interface to the underlying models. Three kinds of models are supported:
+- LLM models: these models generate "raw" and "chat" (conversational) completions.
+- Embedding models: these models generate embeddings to be used for semantic search.
+- Rerank models: these models reorder the documents based on their relevance to a query.

 This section contains documentation for all available providers for the **inference** API.
--- a/docs/docs/providers/inference/inline_meta-reference.mdx
+++ b/docs/docs/providers/inference/inline_meta-reference.mdx
@ -16,12 +16,12 @@ Meta's reference implementation of inference with support for various model form
 |-------|------|----------|---------|-------------|
 | `model` | `str \| None` | No |  |  |
 | `torch_seed` | `int \| None` | No |  |  |
-| `max_seq_len` | `<class 'int'>` | No | 4096 |  |
-| `max_batch_size` | `<class 'int'>` | No | 1 |  |
+| `max_seq_len` | `int` | No | 4096 |  |
+| `max_batch_size` | `int` | No | 1 |  |
 | `model_parallel_size` | `int \| None` | No |  |  |
-| `create_distributed_process_group` | `<class 'bool'>` | No | True |  |
+| `create_distributed_process_group` | `bool` | No | True |  |
 | `checkpoint_dir` | `str \| None` | No |  |  |
-| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No |  |  |
+| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig \| None` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_anthropic.mdx
+++ b/docs/docs/providers/inference/remote_anthropic.mdx
@ -14,9 +14,9 @@ Anthropic inference provider for accessing Claude models and Anthropic's AI serv

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_azure.mdx
+++ b/docs/docs/providers/inference/remote_azure.mdx
@ -21,10 +21,10 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `api_base` | `<class 'pydantic.networks.HttpUrl'>` | No |  | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No |  | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com/openai/v1) |
 | `api_version` | `str \| None` | No |  | Azure API version for Azure (e.g., 2024-12-01-preview) |
 | `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |

@ -32,7 +32,7 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview

 ```yaml
 api_key: ${env.AZURE_API_KEY:=}
-api_base: ${env.AZURE_API_BASE:=}
+base_url: ${env.AZURE_API_BASE:=}
 api_version: ${env.AZURE_API_VERSION:=}
 api_type: ${env.AZURE_API_TYPE:=}
 ```
--- a/docs/docs/providers/inference/remote_bedrock.mdx
+++ b/docs/docs/providers/inference/remote_bedrock.mdx
@ -1,5 +1,5 @@
 ---
-description: "AWS Bedrock inference provider for accessing various AI models through AWS's managed service."
+description: "AWS Bedrock inference provider using OpenAI compatible endpoint."
 sidebar_label: Remote - Bedrock
 title: remote::bedrock
 ---
@ -8,27 +8,20 @@ title: remote::bedrock

 ## Description

-AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
+AWS Bedrock inference provider using OpenAI compatible endpoint.

 ## Configuration

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `aws_access_key_id` | `str \| None` | No |  | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
-| `aws_secret_access_key` | `str \| None` | No |  | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
-| `aws_session_token` | `str \| None` | No |  | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
-| `region_name` | `str \| None` | No |  | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
-| `profile_name` | `str \| None` | No |  | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
-| `total_max_attempts` | `int \| None` | No |  | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
-| `retry_mode` | `str \| None` | No |  | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
-| `connect_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
-| `read_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
-| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `region_name` | `str` | No | us-east-2 | AWS Region for the Bedrock Runtime endpoint |

 ## Sample Configuration

 ```yaml
-{}
+api_key: ${env.AWS_BEARER_TOKEN_BEDROCK:=}
+region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
 ```
--- a/docs/docs/providers/inference/remote_cerebras.mdx
+++ b/docs/docs/providers/inference/remote_cerebras.mdx
@ -14,14 +14,14 @@ Cerebras inference provider for running models on Cerebras Cloud platform.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.cerebras.ai/v1 | Base URL for the Cerebras API |

 ## Sample Configuration

 ```yaml
-base_url: https://api.cerebras.ai
+base_url: https://api.cerebras.ai/v1
 api_key: ${env.CEREBRAS_API_KEY:=}
 ```
--- a/docs/docs/providers/inference/remote_databricks.mdx
+++ b/docs/docs/providers/inference/remote_databricks.mdx
@ -14,14 +14,14 @@ Databricks inference provider for running models on Databricks' unified analytic

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_token` | `pydantic.types.SecretStr \| None` | No |  | The Databricks API token |
-| `url` | `str \| None` | No |  | The URL for the Databricks model serving endpoint |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_token` | `SecretStr \| None` | No |  | The Databricks API token |
+| `base_url` | `HttpUrl \| None` | No |  | The URL for the Databricks model serving endpoint (should include /serving-endpoints path) |

 ## Sample Configuration

 ```yaml
-url: ${env.DATABRICKS_HOST:=}
+base_url: ${env.DATABRICKS_HOST:=}
 api_token: ${env.DATABRICKS_TOKEN:=}
 ```
--- a/docs/docs/providers/inference/remote_fireworks.mdx
+++ b/docs/docs/providers/inference/remote_fireworks.mdx
@ -14,14 +14,14 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |

 ## Sample Configuration

 ```yaml
-url: https://api.fireworks.ai/inference/v1
+base_url: https://api.fireworks.ai/inference/v1
 api_key: ${env.FIREWORKS_API_KEY:=}
 ```
--- a/docs/docs/providers/inference/remote_gemini.mdx
+++ b/docs/docs/providers/inference/remote_gemini.mdx
@ -14,9 +14,9 @@ Google Gemini inference provider for accessing Gemini models and Google's AI ser

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_groq.mdx
+++ b/docs/docs/providers/inference/remote_groq.mdx
@ -14,14 +14,14 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.groq.com/openai/v1 | The URL for the Groq AI server |

 ## Sample Configuration

 ```yaml
-url: https://api.groq.com
+base_url: https://api.groq.com/openai/v1
 api_key: ${env.GROQ_API_KEY:=}
 ```
--- a/docs/docs/providers/inference/remote_hf_endpoint.mdx
+++ b/docs/docs/providers/inference/remote_hf_endpoint.mdx
@ -14,8 +14,8 @@ HuggingFace Inference Endpoints provider for dedicated model serving.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `endpoint_name` | `<class 'str'>` | No |  | The name of the Hugging Face Inference Endpoint in the format of '&#123;namespace&#125;/&#123;endpoint_name&#125;' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
-| `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
+| `endpoint_name` | `str` | No |  | The name of the Hugging Face Inference Endpoint in the format of '&#123;namespace&#125;/&#123;endpoint_name&#125;' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
+| `api_token` | `SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_hf_serverless.mdx
+++ b/docs/docs/providers/inference/remote_hf_serverless.mdx
@ -14,8 +14,8 @@ HuggingFace Inference API serverless provider for on-demand model inference.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `huggingface_repo` | `<class 'str'>` | No |  | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
-| `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
+| `huggingface_repo` | `str` | No |  | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
+| `api_token` | `SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_llama-openai-compat.mdx
+++ b/docs/docs/providers/inference/remote_llama-openai-compat.mdx
@ -14,14 +14,14 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |

 ## Sample Configuration

 ```yaml
-openai_compat_api_base: https://api.llama.com/compat/v1/
+base_url: https://api.llama.com/compat/v1/
 api_key: ${env.LLAMA_API_KEY}
 ```
--- a/docs/docs/providers/inference/remote_nvidia.mdx
+++ b/docs/docs/providers/inference/remote_nvidia.mdx
@ -14,17 +14,16 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
-| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
-| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://integrate.api.nvidia.com/v1 | A base url for accessing the NVIDIA NIM |
+| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |
+| `rerank_model_to_url` | `dict[str, str]` | No | `{'nv-rerank-qa-mistral-4b:1': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking', 'nvidia/nv-rerankqa-mistral-4b-v3': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking', 'nvidia/llama-3.2-nv-rerankqa-1b-v2': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking'}` | Mapping of rerank model identifiers to their API endpoints.  |

 ## Sample Configuration

 ```yaml
-url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
+base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
 api_key: ${env.NVIDIA_API_KEY:=}
-append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
 ```
--- a/docs/docs/providers/inference/remote_oci.mdx
+++ b/docs/docs/providers/inference/remote_oci.mdx
@ -0,0 +1,41 @@
+---
+description: |
+  Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
+  Provider documentation
+  https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
+sidebar_label: Remote - Oci
+title: remote::oci
+---
+
+# remote::oci
+
+## Description
+
+
+Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
+Provider documentation
+https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
+
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `oci_auth_type` | `str` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
+| `oci_region` | `str` | No | us-ashburn-1 | OCI region (e.g., us-ashburn-1) |
+| `oci_compartment_id` | `str` | No |  | OCI compartment ID for the Generative AI service |
+| `oci_config_file_path` | `str` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
+| `oci_config_profile` | `str` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
+
+## Sample Configuration
+
+```yaml
+oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
+oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
+oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
+oci_region: ${env.OCI_REGION:=us-ashburn-1}
+oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
+```
--- a/docs/docs/providers/inference/remote_ollama.mdx
+++ b/docs/docs/providers/inference/remote_ollama.mdx
@ -14,12 +14,12 @@ Ollama inference provider for running local models through the Ollama runtime.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `url` | `<class 'str'>` | No | http://localhost:11434 |  |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `base_url` | `HttpUrl \| None` | No | http://localhost:11434/v1 |  |

 ## Sample Configuration

 ```yaml
-url: ${env.OLLAMA_URL:=http://localhost:11434}
+base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
 ```
--- a/docs/docs/providers/inference/remote_openai.mdx
+++ b/docs/docs/providers/inference/remote_openai.mdx
@ -14,10 +14,10 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `base_url` | `<class 'str'>` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.openai.com/v1 | Base URL for OpenAI API |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_passthrough.mdx
+++ b/docs/docs/providers/inference/remote_passthrough.mdx
@ -14,14 +14,14 @@ Passthrough inference provider for connecting to any external inference service

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | API Key for the passthrouth endpoint |
-| `url` | `<class 'str'>` | No |  | The URL for the passthrough endpoint |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No |  | The URL for the passthrough endpoint |

 ## Sample Configuration

 ```yaml
-url: ${env.PASSTHROUGH_URL}
+base_url: ${env.PASSTHROUGH_URL}
 api_key: ${env.PASSTHROUGH_API_KEY}
 ```
--- a/docs/docs/providers/inference/remote_runpod.mdx
+++ b/docs/docs/providers/inference/remote_runpod.mdx
@ -14,14 +14,14 @@ RunPod inference provider for running models on RunPod's cloud GPU platform.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_token` | `pydantic.types.SecretStr \| None` | No |  | The API token |
-| `url` | `str \| None` | No |  | The URL for the Runpod model serving endpoint |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_token` | `SecretStr \| None` | No |  | The API token |
+| `base_url` | `HttpUrl \| None` | No |  | The URL for the Runpod model serving endpoint |

 ## Sample Configuration

 ```yaml
-url: ${env.RUNPOD_URL:=}
+base_url: ${env.RUNPOD_URL:=}
 api_token: ${env.RUNPOD_API_TOKEN}
 ```
--- a/docs/docs/providers/inference/remote_sambanova.mdx
+++ b/docs/docs/providers/inference/remote_sambanova.mdx
@ -14,14 +14,14 @@ SambaNova inference provider for running models on SambaNova's dataflow architec

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |

 ## Sample Configuration

 ```yaml
-url: https://api.sambanova.ai/v1
+base_url: https://api.sambanova.ai/v1
 api_key: ${env.SAMBANOVA_API_KEY:=}
 ```
--- a/docs/docs/providers/inference/remote_tgi.mdx
+++ b/docs/docs/providers/inference/remote_tgi.mdx
@ -14,12 +14,12 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `url` | `<class 'str'>` | No |  | The URL for the TGI serving endpoint |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `base_url` | `HttpUrl \| None` | No |  | The URL for the TGI serving endpoint (should include /v1 path) |

 ## Sample Configuration

 ```yaml
-url: ${env.TGI_URL:=}
+base_url: ${env.TGI_URL:=}
 ```
--- a/docs/docs/providers/inference/remote_together.mdx
+++ b/docs/docs/providers/inference/remote_together.mdx
@ -14,14 +14,14 @@ Together AI inference provider for open-source models and collaborative AI devel

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://api.together.xyz/v1 | The URL for the Together AI server |

 ## Sample Configuration

 ```yaml
-url: https://api.together.xyz/v1
+base_url: https://api.together.xyz/v1
 api_key: ${env.TOGETHER_API_KEY:=}
 ```
--- a/docs/docs/providers/inference/remote_vertexai.mdx
+++ b/docs/docs/providers/inference/remote_vertexai.mdx
@ -53,10 +53,10 @@ Available Models:

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `project` | `<class 'str'>` | No |  | Google Cloud project ID for Vertex AI |
-| `location` | `<class 'str'>` | No | us-central1 | Google Cloud location for Vertex AI |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `project` | `str` | No |  | Google Cloud project ID for Vertex AI |
+| `location` | `str` | No | us-central1 | Google Cloud location for Vertex AI |

 ## Sample Configuration

--- a/docs/docs/providers/inference/remote_vllm.mdx
+++ b/docs/docs/providers/inference/remote_vllm.mdx
@ -14,17 +14,17 @@ Remote vLLM inference provider for connecting to vLLM servers.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_token` | `pydantic.types.SecretStr \| None` | No |  | The API token |
-| `url` | `str \| None` | No |  | The URL for the vLLM model serving endpoint |
-| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_token` | `SecretStr \| None` | No |  | The API token |
+| `base_url` | `HttpUrl \| None` | No |  | The URL for the vLLM model serving endpoint |
+| `max_tokens` | `int` | No | 4096 | Maximum number of tokens to generate. |
 | `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |

 ## Sample Configuration

 ```yaml
-url: ${env.VLLM_URL:=}
+base_url: ${env.VLLM_URL:=}
 max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
 api_token: ${env.VLLM_API_TOKEN:=fake}
 tls_verify: ${env.VLLM_TLS_VERIFY:=true}
--- a/docs/docs/providers/inference/remote_watsonx.mdx
+++ b/docs/docs/providers/inference/remote_watsonx.mdx
@ -14,17 +14,17 @@ IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Authentication credential for the provider |
-| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
+| `api_key` | `SecretStr \| None` | No |  | Authentication credential for the provider |
+| `base_url` | `HttpUrl \| None` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
 | `project_id` | `str \| None` | No |  | The watsonx.ai project ID |
-| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
+| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |

 ## Sample Configuration

 ```yaml
-url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
+base_url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
 api_key: ${env.WATSONX_API_KEY:=}
 project_id: ${env.WATSONX_PROJECT_ID:=}
 ```
--- a/docs/docs/providers/openai.mdx
+++ b/docs/docs/providers/openai.mdx
@ -1,9 +1,14 @@
 ---
-title: OpenAI Compatibility
-description: OpenAI API Compatibility
-sidebar_label: OpenAI Compatibility
-sidebar_position: 1
+title: OpenAI Implementation Guide
+description: Code examples and implementation details for OpenAI API compatibility
+sidebar_label: OpenAI Implementation
+sidebar_position: 2
 ---
+
+# OpenAI Implementation Guide
+
+This guide provides detailed code examples and implementation details for using OpenAI-compatible APIs with Llama Stack. For a comprehensive overview of OpenAI compatibility features, see our [OpenAI API Compatibility Guide](../api-openai/index.mdx).
+
 ## OpenAI API Compatibility

 ### Server path
@ -195,3 +200,9 @@ Lines of code unfurl
 Logic whispers in the dark
 Art in hidden form
 ```
+
+## Additional Resources
+
+- **[OpenAI API Compatibility Guide](../api-openai/index.mdx)** - Comprehensive overview of OpenAI compatibility features
+- **[OpenAI Responses API Limitations](./openai_responses_limitations.mdx)** - Detailed limitations and known issues
+- **[Provider Documentation](../index.mdx)** - Complete provider ecosystem overview
--- a/docs/docs/providers/openai_responses_limitations.mdx
+++ b/docs/docs/providers/openai_responses_limitations.mdx
@ -48,11 +48,9 @@ Both OpenAI and Llama Stack support a web-search built-in tool.  The [OpenAI doc

 > The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.

-In contrast, the [Llama Stack documentation](https://llamastack.github.io/docs/api/create-a-new-open-ai-response) says that the allowed values for `type` for web search are `MOD1`, `MOD2` and `MOD3`.
-Is that correct?  If so, what are the meanings of each of them?  It might make sense for the allowed values for OpenAI map to some values for Llama Stack so that code written to the OpenAI specification
-also work with Llama Stack.
+Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.

-The OpenAI web search tool also has fields for `filters` and `user_location` which are not documented as options for Llama Stack.  If feasible, it would be good to support these too.
+The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack.  If feasible, it would be good to support these too.

 ---

--- a/docs/docs/providers/post_training/inline_huggingface-gpu.mdx
+++ b/docs/docs/providers/post_training/inline_huggingface-gpu.mdx
@ -14,23 +14,23 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `device` | `<class 'str'>` | No | cuda |  |
-| `distributed_backend` | `Literal['fsdp', 'deepspeed'` | No |  |  |
-| `checkpoint_format` | `Literal['full_state', 'huggingface'` | No | huggingface |  |
-| `chat_template` | `<class 'str'>` | No | `&lt;|user|&gt;`<br/>`{input}`<br/>`&lt;|assistant|&gt;`<br/>`{output}` |  |
-| `model_specific_config` | `<class 'dict'>` | No | `{'trust_remote_code': True, 'attn_implementation': 'sdpa'}` |  |
-| `max_seq_length` | `<class 'int'>` | No | 2048 |  |
-| `gradient_checkpointing` | `<class 'bool'>` | No | False |  |
-| `save_total_limit` | `<class 'int'>` | No | 3 |  |
-| `logging_steps` | `<class 'int'>` | No | 10 |  |
-| `warmup_ratio` | `<class 'float'>` | No | 0.1 |  |
-| `weight_decay` | `<class 'float'>` | No | 0.01 |  |
-| `dataloader_num_workers` | `<class 'int'>` | No | 4 |  |
-| `dataloader_pin_memory` | `<class 'bool'>` | No | True |  |
-| `dpo_beta` | `<class 'float'>` | No | 0.1 |  |
-| `use_reference_model` | `<class 'bool'>` | No | True |  |
-| `dpo_loss_type` | `Literal['sigmoid', 'hinge', 'ipo', 'kto_pair'` | No | sigmoid |  |
-| `dpo_output_dir` | `<class 'str'>` | No |  |  |
+| `device` | `str` | No | cuda |  |
+| `distributed_backend` | `Literal[fsdp, deepspeed] \| None` | No |  |  |
+| `checkpoint_format` | `Literal[full_state, huggingface] \| None` | No | huggingface |  |
+| `chat_template` | `str` | No | `&lt;|user|&gt;`<br/>`{input}`<br/>`&lt;|assistant|&gt;`<br/>`{output}` |  |
+| `model_specific_config` | `dict` | No | `{'trust_remote_code': True, 'attn_implementation': 'sdpa'}` |  |
+| `max_seq_length` | `int` | No | 2048 |  |
+| `gradient_checkpointing` | `bool` | No | False |  |
+| `save_total_limit` | `int` | No | 3 |  |
+| `logging_steps` | `int` | No | 10 |  |
+| `warmup_ratio` | `float` | No | 0.1 |  |
+| `weight_decay` | `float` | No | 0.01 |  |
+| `dataloader_num_workers` | `int` | No | 4 |  |
+| `dataloader_pin_memory` | `bool` | No | True |  |
+| `dpo_beta` | `float` | No | 0.1 |  |
+| `use_reference_model` | `bool` | No | True |  |
+| `dpo_loss_type` | `Literal[sigmoid, hinge, ipo, kto_pair]` | No | sigmoid |  |
+| `dpo_output_dir` | `str` | No |  |  |

 ## Sample Configuration

--- a/docs/docs/providers/post_training/inline_torchtune-cpu.mdx
+++ b/docs/docs/providers/post_training/inline_torchtune-cpu.mdx
@ -15,7 +15,7 @@ TorchTune-based post-training provider for fine-tuning and optimizing models usi
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `torch_seed` | `int \| None` | No |  |  |
-| `checkpoint_format` | `Literal['meta', 'huggingface'` | No | meta |  |
+| `checkpoint_format` | `Literal[meta, huggingface] \| None` | No | meta |  |

 ## Sample Configuration

--- a/docs/docs/providers/post_training/inline_torchtune-gpu.mdx
+++ b/docs/docs/providers/post_training/inline_torchtune-gpu.mdx
@ -15,7 +15,7 @@ TorchTune-based post-training provider for fine-tuning and optimizing models usi
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `torch_seed` | `int \| None` | No |  |  |
-| `checkpoint_format` | `Literal['meta', 'huggingface'` | No | meta |  |
+| `checkpoint_format` | `Literal[meta, huggingface] \| None` | No | meta |  |

 ## Sample Configuration

--- a/docs/docs/providers/post_training/remote_nvidia.mdx
+++ b/docs/docs/providers/post_training/remote_nvidia.mdx
@ -18,9 +18,9 @@ NVIDIA's post-training provider for fine-tuning models on NVIDIA's platform.
 | `dataset_namespace` | `str \| None` | No | default | The NVIDIA dataset namespace. |
 | `project_id` | `str \| None` | No | test-example-model@v1 | The NVIDIA project ID. |
 | `customizer_url` | `str \| None` | No |  | Base URL for the NeMo Customizer API |
-| `timeout` | `<class 'int'>` | No | 300 | Timeout for the NVIDIA Post Training API |
-| `max_retries` | `<class 'int'>` | No | 3 | Maximum number of retries for the NVIDIA Post Training API |
-| `output_model_dir` | `<class 'str'>` | No | test-example-model@v1 | Directory to save the output model |
+| `timeout` | `int` | No | 300 | Timeout for the NVIDIA Post Training API |
+| `max_retries` | `int` | No | 3 | Maximum number of retries for the NVIDIA Post Training API |
+| `output_model_dir` | `str` | No | test-example-model@v1 | Directory to save the output model |

 ## Sample Configuration

--- a/docs/docs/providers/safety/index.mdx
+++ b/docs/docs/providers/safety/index.mdx
@ -1,7 +1,8 @@
 ---
-description: "Safety
+description: |
+  Safety

-    OpenAI-compatible Moderations API."
+  OpenAI-compatible Moderations API.
 sidebar_label: Safety
 title: Safety
 ---
@ -12,6 +13,6 @@ title: Safety

 Safety

-    OpenAI-compatible Moderations API.
+OpenAI-compatible Moderations API.

 This section contains documentation for all available providers for the **safety** API.
--- a/docs/docs/providers/safety/inline_llama-guard.mdx
+++ b/docs/docs/providers/safety/inline_llama-guard.mdx
@ -14,7 +14,7 @@ Llama Guard safety provider for content moderation and safety filtering using Me

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `excluded_categories` | `list[str` | No | [] |  |
+| `excluded_categories` | `list[str]` | No | [] |  |

 ## Sample Configuration

--- a/docs/docs/providers/safety/inline_prompt-guard.mdx
+++ b/docs/docs/providers/safety/inline_prompt-guard.mdx
@ -14,7 +14,7 @@ Prompt Guard safety provider for detecting and filtering unsafe prompts and cont

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `guard_type` | `<class 'str'>` | No | injection |  |
+| `guard_type` | `str` | No | injection |  |

 ## Sample Configuration

--- a/docs/docs/providers/safety/remote_bedrock.mdx
+++ b/docs/docs/providers/safety/remote_bedrock.mdx
@ -14,8 +14,8 @@ AWS Bedrock safety provider for content moderation using AWS's safety services.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `allowed_models` | `list[str \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
-| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
+| `allowed_models` | `list[str] \| None` | No |  | List of models that should be registered with the model registry. If None, all models are allowed. |
+| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
 | `aws_access_key_id` | `str \| None` | No |  | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
 | `aws_secret_access_key` | `str \| None` | No |  | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
 | `aws_session_token` | `str \| None` | No |  | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
--- a/docs/docs/providers/safety/remote_nvidia.mdx
+++ b/docs/docs/providers/safety/remote_nvidia.mdx
@ -14,7 +14,7 @@ NVIDIA's safety provider for content moderation and safety filtering.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `guardrails_service_url` | `<class 'str'>` | No | http://0.0.0.0:7331 | The url for accessing the Guardrails service |
+| `guardrails_service_url` | `str` | No | http://0.0.0.0:7331 | The url for accessing the Guardrails service |
 | `config_id` | `str \| None` | No | self-check | Guardrails configuration ID to use from the Guardrails configuration store |

 ## Sample Configuration
--- a/docs/docs/providers/safety/remote_sambanova.mdx
+++ b/docs/docs/providers/safety/remote_sambanova.mdx
@ -14,8 +14,8 @@ SambaNova's safety provider for content moderation and safety filtering.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
-| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The SambaNova cloud API Key |
+| `url` | `str` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
+| `api_key` | `SecretStr \| None` | No |  | The SambaNova cloud API Key |

 ## Sample Configuration

--- a/docs/docs/providers/telemetry/index.mdx
+++ b/docs/docs/providers/telemetry/index.mdx
@ -1,10 +0,0 @@
---
-sidebar_label: Telemetry
-title: Telemetry
---
-
-# Telemetry
-
-## Overview
-
-This section contains documentation for all available providers for the **telemetry** API.
--- a/docs/docs/providers/telemetry/inline_meta-reference.mdx
+++ b/docs/docs/providers/telemetry/inline_meta-reference.mdx
@ -1,27 +0,0 @@
---
-description: "Meta's reference implementation of telemetry and observability using OpenTelemetry."
-sidebar_label: Meta-Reference
-title: inline::meta-reference
---
-
-# inline::meta-reference
-
-## Description
-
-Meta's reference implementation of telemetry and observability using OpenTelemetry.
-
-## Configuration
-
-| Field | Type | Required | Default | Description |
-|-------|------|----------|---------|-------------|
-| `otel_exporter_otlp_endpoint` | `str \| None` | No |  | The OpenTelemetry collector endpoint URL (base URL for traces, metrics, and logs). If not set, the SDK will use OTEL_EXPORTER_OTLP_ENDPOINT environment variable. |
-| `service_name` | `<class 'str'>` | No |  | The service name to use for telemetry |
-| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, console) |
-
-## Sample Configuration
-
-```yaml
-service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
-sinks: ${env.TELEMETRY_SINKS:=}
-otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
-```
--- a/docs/docs/providers/tool_runtime/remote_bing-search.mdx
+++ b/docs/docs/providers/tool_runtime/remote_bing-search.mdx
@ -15,7 +15,7 @@ Bing Search tool for web search capabilities using Microsoft's search engine.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `api_key` | `str \| None` | No |  |  |
-| `top_k` | `<class 'int'>` | No | 3 |  |
+| `top_k` | `int` | No | 3 |  |

 ## Sample Configuration

--- a/docs/docs/providers/tool_runtime/remote_brave-search.mdx
+++ b/docs/docs/providers/tool_runtime/remote_brave-search.mdx
@ -15,7 +15,7 @@ Brave Search tool for web search capabilities with privacy-focused results.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `api_key` | `str \| None` | No |  | The Brave Search API Key |
-| `max_results` | `<class 'int'>` | No | 3 | The maximum number of results to return |
+| `max_results` | `int` | No | 3 | The maximum number of results to return |

 ## Sample Configuration

--- a/docs/docs/providers/tool_runtime/remote_tavily-search.mdx
+++ b/docs/docs/providers/tool_runtime/remote_tavily-search.mdx
@ -15,7 +15,7 @@ Tavily Search tool for AI-optimized web search with structured results.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `api_key` | `str \| None` | No |  | The Tavily Search API Key |
-| `max_results` | `<class 'int'>` | No | 3 | The maximum number of results to return |
+| `max_results` | `int` | No | 3 | The maximum number of results to return |

 ## Sample Configuration

--- a/docs/docs/providers/vector_io/inline_chromadb.mdx
+++ b/docs/docs/providers/vector_io/inline_chromadb.mdx
@ -78,8 +78,8 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No |  |  |
-| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  | Config for KV store backend |
+| `db_path` | `str` | No |  |  |
+| `persistence` | `KVStoreReference` | No |  | Config for KV store backend |

 ## Sample Configuration

--- a/docs/docs/providers/vector_io/inline_faiss.mdx
+++ b/docs/docs/providers/vector_io/inline_faiss.mdx
@ -95,7 +95,7 @@ more details about Faiss in general.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No |  |  |
+| `persistence` | `KVStoreReference` | No |  |  |

 ## Sample Configuration

--- a/Show more
+++ b/Show more