Merge 8c06256da2 into d266c59c2a

2025-10-03 19:57:35 +00:00 · 2025-10-03 05:03:25 -07:00 · 2025-10-03 05:03:25 -07:00 · 3dbc715b0d
commit 3dbc715b0d
parent d266c59c2a 8c06256da2
10 changed files with 1723 additions and 3 deletions
--- a/docs/docs/api-deprecated/index.mdx
+++ b/docs/docs/api-deprecated/index.mdx
@ -0,0 +1,62 @@
 ---
 title: Deprecated APIs
 description: Legacy APIs that are being phased out
 sidebar_label: Deprecated
 sidebar_position: 1
 ---
 # Deprecated APIs
 This section contains APIs that are being phased out in favor of newer, more standardized implementations. These APIs are maintained for backward compatibility but are not recommended for new projects.
 :::warning Deprecation Notice
 These APIs are deprecated and will be removed in future versions. Please migrate to the recommended alternatives listed below.
 :::
 ## Migration Guide
 When using deprecated APIs, please refer to the migration guides provided for each API to understand how to transition to the supported alternatives.
 ## Deprecated API List
 ### Legacy Inference APIs
 Some older inference endpoints that have been superseded by the standardized Inference API.
 **Migration Path:** Use the [Inference API](../api/) instead.
 ### Legacy Vector Operations
 Older vector database operations that have been replaced by the Vector IO API.
 **Migration Path:** Use the [Vector IO API](../api/) instead.
 ### Legacy File Operations
 Older file management endpoints that have been replaced by the Files API.
 **Migration Path:** Use the [Files API](../api/) instead.
 ## Support Timeline
 Deprecated APIs will be supported according to the following timeline:
 - **Current Version**: Full support with deprecation warnings
 - **Next Major Version**: Limited support with migration notices
 - **Following Major Version**: Removal of deprecated APIs
 ## Getting Help
 If you need assistance migrating from deprecated APIs:
 1. Check the specific migration guides for each API
 2. Review the [API Reference](../api/) for current alternatives
 3. Consult the [Community Forums](https://github.com/llamastack/llama-stack/discussions) for migration support
 4. Open an issue on GitHub for specific migration questions
 ## Contributing
 If you find issues with deprecated APIs or have suggestions for improving the migration process, please contribute by:
 1. Opening an issue describing the problem
 2. Submitting a pull request with improvements
 3. Updating migration documentation
 For more information on contributing, see our [Contributing Guide](../contributing/).
--- a/docs/docs/api-experimental/index.mdx
+++ b/docs/docs/api-experimental/index.mdx
@ -0,0 +1,128 @@
 ---
 title: Experimental APIs
 description: APIs in development with limited support
 sidebar_label: Experimental
 sidebar_position: 1
 ---
 # Experimental APIs
 This section contains APIs that are currently in development and may have limited support or stability. These APIs are available for testing and feedback but should not be used in production environments.
 :::warning Experimental Notice
 These APIs are experimental and may change without notice. Use with caution and provide feedback to help improve them.
 :::
 ## Current Experimental APIs
 ### Batch Inference API
 Run inference on a dataset of inputs in batch mode for improved efficiency.
 **Status:** In Development
 **Provider Support:** Limited
 **Use Case:** Large-scale inference operations
 **Features:**
 - Batch processing of multiple inputs
 - Optimized resource utilization
 - Progress tracking and monitoring
 ### Batch Agents API
 Run agentic workflows on a dataset of inputs in batch mode.
 **Status:** In Development
 **Provider Support:** Limited
 **Use Case:** Large-scale agent operations
 **Features:**
 - Batch agent execution
 - Parallel processing capabilities
 - Result aggregation and analysis
 ### Synthetic Data Generation API
 Generate synthetic data for model development and testing.
 **Status:** Early Development
 **Provider Support:** Very Limited
 **Use Case:** Training data augmentation
 **Features:**
 - Automated data generation
 - Quality control mechanisms
 - Customizable generation parameters
 ### Batches API (OpenAI-compatible)
 OpenAI-compatible batch management for inference operations.
 **Status:** In Development
 **Provider Support:** Limited
 **Use Case:** OpenAI batch processing compatibility
 **Features:**
 - OpenAI batch API compatibility
 - Job scheduling and management
 - Status tracking and monitoring
 ## Getting Started with Experimental APIs
 ### Prerequisites
 - Llama Stack server running with experimental features enabled
 - Appropriate provider configurations
 - Understanding of API limitations
 ### Configuration
 Experimental APIs may require special configuration flags or provider settings. Check the specific API documentation for setup requirements.
 ### Usage Guidelines
 1. **Testing Only**: Use experimental APIs for testing and development only
 2. **Monitor Changes**: Watch for updates and breaking changes
 3. **Provide Feedback**: Report issues and suggest improvements
 4. **Backup Data**: Always backup important data when using experimental features
 ## Feedback and Contribution
 We encourage feedback on experimental APIs to help improve them:
 ### Reporting Issues
 - Use GitHub issues with the "experimental" label
 - Include detailed error messages and reproduction steps
 - Specify the API version and provider being used
 ### Feature Requests
 - Submit feature requests through GitHub discussions
 - Provide use cases and expected behavior
 - Consider contributing implementations
 ### Testing
 - Test experimental APIs in your environment
 - Report performance issues and optimization opportunities
 - Share success stories and use cases
 ## Migration to Stable APIs
 As experimental APIs mature, they will be moved to the stable API section. When this happens:
 1. **Announcement**: We'll announce the promotion in release notes
 2. **Migration Guide**: Detailed migration instructions will be provided
 3. **Deprecation Timeline**: Experimental versions will be deprecated with notice
 4. **Support**: Full support will be available for stable versions
 ## Provider Support
 Experimental APIs may have limited provider support. Check the specific API documentation for:
 - Supported providers
 - Configuration requirements
 - Known limitations
 - Performance characteristics
 ## Roadmap
 Experimental APIs are part of our ongoing development roadmap:
 - **Q1 2024**: Batch Inference API stabilization
 - **Q2 2024**: Batch Agents API improvements
 - **Q3 2024**: Synthetic Data Generation API expansion
 - **Q4 2024**: Batches API full OpenAI compatibility
 For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).
--- a/docs/docs/api-openai/index.mdx
+++ b/docs/docs/api-openai/index.mdx
@ -0,0 +1,278 @@
 ---
 title: OpenAI API Compatibility
 description: OpenAI-compatible APIs and features in Llama Stack
 sidebar_label: OpenAI Compatibility
 sidebar_position: 1
 ---
 # OpenAI API Compatibility
 Llama Stack provides comprehensive OpenAI API compatibility, allowing you to use existing OpenAI API clients and tools with Llama Stack providers. This compatibility layer ensures seamless migration and interoperability.
 ## Overview
 OpenAI API compatibility in Llama Stack includes:
 - **OpenAI-compatible endpoints** for all major APIs
 - **Request/response format compatibility** with OpenAI standards
 - **Authentication and authorization** using OpenAI-style API keys
 - **Error handling** with OpenAI-compatible error codes and messages
 - **Rate limiting** and usage tracking compatible with OpenAI patterns
 ## Supported OpenAI APIs
 ### Chat Completions API
 OpenAI-compatible chat completions for conversational AI applications.
 **Endpoint:** `/v1/chat/completions`
 **Compatibility:** Full OpenAI API compatibility
 **Providers:** All inference providers
 **Features:**
 - Message-based conversations
 - System prompts and user messages
 - Function calling support
 - Streaming responses
 - Temperature and other parameter controls
 ### Completions API
 OpenAI-compatible text completions for general text generation.
 **Endpoint:** `/v1/completions`
 **Compatibility:** Full OpenAI API compatibility
 **Providers:** All inference providers
 **Features:**
 - Text completion generation
 - Prompt engineering support
 - Customizable parameters
 - Batch processing capabilities
 ### Embeddings API
 OpenAI-compatible embeddings for vector operations.
 **Endpoint:** `/v1/embeddings`
 **Compatibility:** Full OpenAI API compatibility
 **Providers:** All embedding providers
 **Features:**
 - Text embedding generation
 - Multiple embedding models
 - Batch embedding processing
 - Vector similarity operations
 ### Files API
 OpenAI-compatible file management for document processing.
 **Endpoint:** `/v1/files`
 **Compatibility:** Full OpenAI API compatibility
 **Providers:** Local Filesystem, S3
 **Features:**
 - File upload and management
 - Document processing
 - File metadata tracking
 - Secure file access
 ### Vector Store Files API
 OpenAI-compatible vector store file operations for RAG applications.
 **Endpoint:** `/v1/vector_stores/{vector_store_id}/files`
 **Compatibility:** Full OpenAI API compatibility
 **Providers:** FAISS, SQLite-vec, Milvus, ChromaDB, Qdrant, Weaviate, Postgres (PGVector)
 **Features:**
 - Automatic document processing
 - Vector store integration
 - File chunking and indexing
 - Search and retrieval operations
 ### Batches API
 OpenAI-compatible batch processing for large-scale operations.
 **Endpoint:** `/v1/batches`
 **Compatibility:** OpenAI API compatibility (experimental)
 **Providers:** Limited support
 **Features:**
 - Batch job creation and management
 - Progress tracking
 - Result retrieval
 - Error handling
 ## Migration from OpenAI
 ### Step 1: Update API Endpoint
 Change your API endpoint from OpenAI to your Llama Stack server:
 ```python
 # Before (OpenAI)
 import openai
 client = openai.OpenAI(api_key="your-openai-key")
 # After (Llama Stack)
 import openai
 client = openai.OpenAI(
    api_key="your-llama-stack-key",
    base_url="http://localhost:8000/v1"  # Your Llama Stack server
 )
 ```
 ### Step 2: Configure Providers
 Set up your preferred providers in the Llama Stack configuration:
 ```yaml
 # stack-config.yaml
 inference:
  providers:
    - name: "meta-reference"
      type: "inline"
      model: "llama-3.1-8b"
 ```
 ### Step 3: Test Compatibility
 Verify that your existing code works with Llama Stack:
 ```python
 # Test chat completions
 response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
 )
 print(response.choices[0].message.content)
 ```
 ## Provider-Specific Features
 ### Meta Reference Provider
 - Full OpenAI API compatibility
 - Local model execution
 - Custom model support
 ### Remote Providers
 - OpenAI API compatibility
 - Cloud-based execution
 - Scalable infrastructure
 ### Vector Store Providers
 - OpenAI vector store API compatibility
 - Automatic document processing
 - Advanced search capabilities
 ## Authentication
 Llama Stack supports OpenAI-style authentication:
 ### API Key Authentication
 ```python
 client = openai.OpenAI(
    api_key="your-api-key",
    base_url="http://localhost:8000/v1"
 )
 ```
 ### Environment Variables
 ```bash
 export OPENAI_API_KEY="your-api-key"
 export OPENAI_BASE_URL="http://localhost:8000/v1"
 ```
 ## Error Handling
 Llama Stack provides OpenAI-compatible error responses:
 ```python
 try:
    response = client.chat.completions.create(...)
 except openai.APIError as e:
    print(f"API Error: {e}")
 except openai.RateLimitError as e:
    print(f"Rate Limit Error: {e}")
 except openai.APIConnectionError as e:
    print(f"Connection Error: {e}")
 ```
 ## Rate Limiting
 OpenAI-compatible rate limiting is supported:
 - **Requests per minute** limits
 - **Tokens per minute** limits
 - **Concurrent request** limits
 - **Usage tracking** and monitoring
 ## Monitoring and Observability
 Track your API usage with OpenAI-compatible monitoring:
 - **Request/response logging**
 - **Usage metrics** and analytics
 - **Performance monitoring**
 - **Error tracking** and alerting
 ## Best Practices
 ### 1. Provider Selection
 Choose providers based on your requirements:
 - **Local development**: Meta Reference, Ollama
 - **Production**: Cloud providers (Fireworks, Together, NVIDIA)
 - **Specialized use cases**: Custom providers
 ### 2. Model Configuration
 Configure models for optimal performance:
 - **Model selection** based on task requirements
 - **Parameter tuning** for specific use cases
 - **Resource allocation** for performance
 ### 3. Error Handling
 Implement robust error handling:
 - **Retry logic** for transient failures
 - **Fallback providers** for high availability
 - **Monitoring** and alerting for issues
 ### 4. Security
 Follow security best practices:
 - **API key management** and rotation
 - **Access control** and authorization
 - **Data privacy** and compliance
 ## Troubleshooting
 ### Common Issues
 **Connection Errors**
 - Verify server is running
 - Check network connectivity
 - Validate API endpoint URL
 **Authentication Errors**
 - Verify API key is correct
 - Check key permissions
 - Ensure proper authentication headers
 **Model Errors**
 - Verify model is available
 - Check provider configuration
 - Validate model parameters
 ### Getting Help
 For OpenAI compatibility issues:
 1. **Check Documentation**: Review provider-specific documentation
 2. **Community Support**: Ask questions in GitHub discussions
 3. **Issue Reporting**: Open GitHub issues for bugs
 4. **Professional Support**: Contact support for enterprise issues
 ## Roadmap
 Upcoming OpenAI compatibility features:
 - **Enhanced batch processing** support
 - **Advanced function calling** capabilities
 - **Improved error handling** and diagnostics
 - **Performance optimizations** for large-scale deployments
 For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).
--- a/docs/docs/api/index.mdx
+++ b/docs/docs/api/index.mdx
@ -0,0 +1,144 @@
 ---
 title: API Reference
 description: Complete reference for Llama Stack APIs
 sidebar_label: Overview
 sidebar_position: 1
 ---
 # API Reference
 Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers.
 ## Core APIs
 ### Inference API
 Run inference with Large Language Models (LLMs) and embedding models.
 **Supported Providers:**
 - Meta Reference (Single Node)
 - Ollama (Single Node)
 - Fireworks (Hosted)
 - Together (Hosted)
 - NVIDIA NIM (Hosted and Single Node)
 - vLLM (Hosted and Single Node)
 - TGI (Hosted and Single Node)
 - AWS Bedrock (Hosted)
 - Cerebras (Hosted)
 - Groq (Hosted)
 - SambaNova (Hosted)
 - PyTorch ExecuTorch (On-device iOS, Android)
 - OpenAI (Hosted)
 - Anthropic (Hosted)
 - Gemini (Hosted)
 - WatsonX (Hosted)
 ### Agents API
 Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning.
 **Supported Providers:**
 - Meta Reference (Single Node)
 - Fireworks (Hosted)
 - Together (Hosted)
 - PyTorch ExecuTorch (On-device iOS)
 ### Vector IO API
 Perform operations on vector stores, including adding documents, searching, and deleting documents.
 **Supported Providers:**
 - FAISS (Single Node)
 - SQLite-Vec (Single Node)
 - Chroma (Hosted and Single Node)
 - Milvus (Hosted and Single Node)
 - Postgres (PGVector) (Hosted and Single Node)
 - Weaviate (Hosted)
 - Qdrant (Hosted and Single Node)
 ### Files API (OpenAI-compatible)
 Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints.
 **Supported Providers:**
 - Local Filesystem (Single Node)
 - S3 (Hosted)
 ### Vector Store Files API (OpenAI-compatible)
 Integrate file operations with vector stores for automatic document processing and search.
 **Supported Providers:**
 - FAISS (Single Node)
 - SQLite-vec (Single Node)
 - Milvus (Single Node)
 - ChromaDB (Hosted and Single Node)
 - Qdrant (Hosted and Single Node)
 - Weaviate (Hosted)
 - Postgres (PGVector) (Hosted and Single Node)
 ### Safety API
 Apply safety policies to outputs at a systems level, not just model level.
 **Supported Providers:**
 - Llama Guard (Depends on Inference Provider)
 - Prompt Guard (Single Node)
 - Code Scanner (Single Node)
 - AWS Bedrock (Hosted)
 ### Post Training API
 Fine-tune models for specific use cases and domains.
 **Supported Providers:**
 - Meta Reference (Single Node)
 - HuggingFace (Single Node)
 - TorchTune (Single Node)
 - NVIDIA NEMO (Hosted)
 ### Eval API
 Generate outputs and perform scoring to evaluate system performance.
 **Supported Providers:**
 - Meta Reference (Single Node)
 - NVIDIA NEMO (Hosted)
 ### Telemetry API
 Collect telemetry data from the system for monitoring and observability.
 **Supported Providers:**
 - Meta Reference (Single Node)
 ### Tool Runtime API
 Interact with various tools and protocols to extend LLM capabilities.
 **Supported Providers:**
 - Brave Search (Hosted)
 - RAG Runtime (Single Node)
 ## API Compatibility
 All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to:
 - Use existing OpenAI API clients and tools
 - Migrate from OpenAI to other providers seamlessly
 - Maintain consistent API contracts across different environments
 ## Getting Started
 To get started with Llama Stack APIs:
 1. **Choose a Distribution**: Select a pre-configured distribution that matches your environment
 2. **Configure Providers**: Set up the providers you want to use for each API
 3. **Start the Server**: Launch the Llama Stack server with your configuration
 4. **Use the APIs**: Make requests to the API endpoints using your preferred client
 For detailed setup instructions, see our [Getting Started Guide](../getting_started/quickstart).
 ## Provider Details
 For complete provider compatibility and setup instructions, see our [Providers Documentation](../providers/).
 ## API Stability
 Llama Stack APIs are organized by stability level:
 - **[Stable APIs](./index.mdx)** - Production-ready APIs with full support
 - **[Experimental APIs](../api-experimental/)** - APIs in development with limited support
 - **[Deprecated APIs](../api-deprecated/)** - Legacy APIs being phased out
 ## OpenAI Integration
 For specific OpenAI API compatibility features, see our [OpenAI Compatibility Guide](../api-openai/).
--- a/docs/docs/concepts/apis/index.mdx
+++ b/docs/docs/concepts/apis/index.mdx
@ -7,7 +7,7 @@ sidebar_position: 1
 # APIs
-A Llama Stack API is described as a collection of REST endpoints. We currently support the following APIs:
+A Llama Stack API is described as a collection of REST endpoints following OpenAI API standards. We currently support the following APIs:
 - **Inference**: run inference with a LLM
 - **Safety**: apply safety policies to the output at a Systems (not only model) level
@ -16,13 +16,27 @@ A Llama Stack API is described as a collection of REST endpoints. We currently s
 - **Scoring**: evaluate outputs of the system
 - **Eval**: generate outputs (via Inference or Agents) and perform scoring
 - **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
 - **Files**: manage file uploads, storage, and retrieval
 - **Telemetry**: collect telemetry data from the system
 - **Post Training**: fine-tune a model
 - **Tool Runtime**: interact with various tools and protocols
- **Responses**: generate responses from an LLM using this OpenAI compatible API.
+- **Responses**: generate responses from an LLM
 We are working on adding a few more APIs to complete the application lifecycle. These will include:
 - **Batch Inference**: run inference on a dataset of inputs
 - **Batch Agents**: run agents on a dataset of inputs
 - **Synthetic Data Generation**: generate synthetic data for model development
 - **Batches**: OpenAI-compatible batch management for inference
 ## OpenAI API Compatibility
 We are working on adding OpenAI API compatibility to Llama Stack. This will allow you to use Llama Stack with OpenAI API clients and tools.
 ### File Operations and Vector Store Integration
 The Files API and Vector Store APIs work together through file operations, enabling automatic document processing and search. This integration implements the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files) and allows you to:
 - Upload documents through the Files API
 - Automatically process and chunk documents into searchable vectors
 - Store processed content in vector databases based on the availability of [our providers](../../providers/index.mdx)
 - Search through documents using natural language queries
 For detailed information about this integration, see [File Operations and Vector Store Integration](../file_operations_vector_stores.md).
--- a/docs/docs/concepts/file_operations_vector_stores.md
+++ b/docs/docs/concepts/file_operations_vector_stores.md
@ -0,0 +1,423 @@
 # File Operations and Vector Store Integration
 ## Overview
 Llama Stack provides seamless integration between the Files API and Vector Store APIs, enabling you to upload documents and automatically process them into searchable vector embeddings. This integration implements file operations following the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files).
 ## Enhanced Capabilities Beyond OpenAI
 While Llama Stack maintains full compatibility with OpenAI's Vector Store API, it provides several additional capabilities that enhance functionality and flexibility:
 ### **Embedding Model Specification**
 Unlike OpenAI's vector stores which use a fixed embedding model, Llama Stack allows you to specify which embedding model to use when creating a vector store:
 ```python
 # Create vector store with specific embedding model
 vector_store = client.vector_stores.create(
    name="my_documents",
    embedding_model="all-MiniLM-L6-v2",  # Specify your preferred model
    embedding_dimension=384,
 )
 ```
 ### **Advanced Search Modes**
 Llama Stack supports multiple search modes beyond basic vector similarity:
 - **Vector Search**: Pure semantic similarity search using embeddings
 - **Keyword Search**: Traditional keyword-based search for exact matches
 - **Hybrid Search**: Combines both vector and keyword search for optimal results
 ```python
 # Different search modes
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    search_mode="hybrid",  # or "vector", "keyword"
    max_num_results=5,
 )
 ```
 ### **Flexible Ranking Options**
 For hybrid search, Llama Stack offers configurable ranking strategies:
 - **RRF (Reciprocal Rank Fusion)**: Combines rankings with configurable impact factor
 - **Weighted Ranker**: Linear combination of vector and keyword scores with adjustable weights
 ```python
 # Custom ranking configuration
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="neural networks",
    search_mode="hybrid",
    ranking_options={
        "ranker": {"type": "weighted", "alpha": 0.7}  # 70% vector, 30% keyword
    },
 )
 ```
 ### **Provider Selection**
 Choose from multiple vector store providers based on your specific needs:
 - **Inline Providers**: FAISS (fast in-memory), SQLite-vec (disk-based), Milvus (high-performance)
 - **Remote Providers**: ChromaDB, Qdrant, Weaviate, Postgres (PGVector), Milvus
 ```python
 # Specify provider when creating vector store
 vector_store = client.vector_stores.create(
    name="my_documents", provider_id="sqlite-vec"  # Choose your preferred provider
 )
 ```
 ## How It Works
 The file operations work through several key components:
 1. **File Upload**: Documents are uploaded through the Files API
 2. **Automatic Processing**: Files are automatically chunked and converted to embeddings
 3. **Vector Storage**: Chunks are stored in vector databases with metadata
 4. **Search & Retrieval**: Users can search through processed documents using natural language
 ## Supported Vector Store Providers
 The following vector store providers support file operations:
 ### Inline Providers (Single Node)
 - **FAISS**: Fast in-memory vector similarity search
 - **SQLite-vec**: Disk-based storage with hybrid search capabilities
 - **Milvus**: High-performance vector database with advanced indexing
 ### Remote Providers (Hosted)
 - **ChromaDB**: Vector database with metadata filtering
 - **Qdrant**: Vector similarity search with payload filtering
 - **Weaviate**: Vector database with GraphQL interface
 - **Postgres (PGVector)**: Vector extensions for PostgreSQL
 ## File Processing Pipeline
 ### 1. File Upload
 ```python
 from llama_stack import LlamaStackClient
 client = LlamaStackClient("http://localhost:8000")
 # Upload a document
 with open("document.pdf", "rb") as f:
    file_info = await client.files.upload(file=f, purpose="assistants")
 ```
 ### 2. Attach to Vector Store
 ```python
 # Create a vector store
 vector_store = client.vector_stores.create(name="my_documents")
 # Attach the file to the vector store
 file_attach_response = await client.vector_stores.files.create(
    vector_store_id=vector_store.id, file_id=file_info.id
 )
 ```
 ### 3. Automatic Processing
 The system automatically:
 - Detects the file type and extracts text content
 - Splits content into chunks (default: 800 tokens with 400 token overlap)
 - Generates embeddings for each chunk
 - Stores chunks with metadata in the vector store
 - Updates file status to "completed"
 ### 4. Search and Retrieval
 ```python
 # Search through processed documents
 search_results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="What is the main topic discussed?",
    max_num_results=5,
 )
 # Process results
 for result in search_results.data:
    print(f"Score: {result.score}")
    for content in result.content:
        print(f"Content: {content.text}")
 ```
 ## Supported File Types
 The FileResponse system supports various document formats:
 - **Text Files**: `.txt`, `.md`, `.rst`
 - **Documents**: `.pdf`, `.docx`, `.doc`
 - **Code**: `.py`, `.js`, `.java`, `.cpp`, etc.
 - **Data**: `.json`, `.csv`, `.xml`
 - **Web Content**: HTML files
 ## Chunking Strategies
 ### Default Strategy
 The default chunking strategy uses:
 - **Max Chunk Size**: 800 tokens
 - **Overlap**: 400 tokens
 - **Method**: Semantic boundary detection
 ### Custom Chunking
 You can customize chunking when attaching files:
 ```python
 from llama_stack.apis.vector_io import VectorStoreChunkingStrategy
 # Custom chunking strategy
 chunking_strategy = VectorStoreChunkingStrategy(
    type="custom", max_chunk_size_tokens=1000, chunk_overlap_tokens=200
 )
 # Attach file with custom chunking
 file_attach_response = await client.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file_info.id,
    chunking_strategy=chunking_strategy,
 )
 ```
 **Note**: While Llama Stack is OpenAI-compatible, it also supports additional options beyond the standard OpenAI API. When creating vector stores, you can specify custom embedding models and embedding dimensions that will be used when processing chunks from attached files.
 ## File Management
 ### List Files in Vector Store
 ```python
 # List all files in a vector store
 files = await client.vector_stores.files.list(vector_store_id=vector_store.id)
 for file in files:
    print(f"File: {file.filename}, Status: {file.status}")
 ```
 ### File Status Tracking
 Files go through several statuses:
 - **in_progress**: File is being processed
 - **completed**: File successfully processed and searchable
 - **failed**: Processing failed (check `last_error` for details)
 - **cancelled**: Processing was cancelled
 ### Retrieve File Content
 ```python
 # Get chunked content from vector store
 content_response = await client.vector_stores.files.retrieve_content(
    vector_store_id=vector_store.id, file_id=file_info.id
 )
 for chunk in content_response.content:
    print(f"Chunk {chunk.metadata.get('chunk_index', 0)}: {chunk.text}")
 ```
 ## Vector Store Management
 ### List Vector Stores
 Retrieve a paginated list of all vector stores:
 ```python
 # List all vector stores with default pagination
 vector_stores = await client.vector_stores.list()
 # Custom pagination and ordering
 vector_stores = await client.vector_stores.list(
    limit=10,
    order="asc",  # or "desc"
    after="vs_12345678",  # cursor-based pagination
 )
 for store in vector_stores.data:
    print(f"Store: {store.name}, Files: {store.file_counts.total}")
    print(f"Created: {store.created_at}, Status: {store.status}")
 ```
 ### Retrieve Vector Store Details
 Get detailed information about a specific vector store:
 ```python
 # Get vector store details
 store_details = await client.vector_stores.retrieve(vector_store_id="vs_12345678")
 print(f"Name: {store_details.name}")
 print(f"Status: {store_details.status}")
 print(f"File Counts: {store_details.file_counts}")
 print(f"Usage: {store_details.usage_bytes} bytes")
 print(f"Created: {store_details.created_at}")
 print(f"Metadata: {store_details.metadata}")
 ```
 ### Update Vector Store
 Modify vector store properties such as name, metadata, or expiration settings:
 ```python
 # Update vector store name and metadata
 updated_store = await client.vector_stores.update(
    vector_store_id="vs_12345678",
    name="Updated Document Collection",
    metadata={
        "description": "Updated collection for research",
        "category": "research",
        "version": "2.0",
    },
 )
 # Set expiration policy
 expired_store = await client.vector_stores.update(
    vector_store_id="vs_12345678",
    expires_after={"anchor": "last_active_at", "days": 30},
 )
 print(f"Updated store: {updated_store.name}")
 print(f"Last active: {updated_store.last_active_at}")
 ```
 ### Delete Vector Store
 Remove a vector store and all its associated data:
 ```python
 # Delete a vector store
 delete_response = await client.vector_stores.delete(vector_store_id="vs_12345678")
 if delete_response.deleted:
    print(f"Vector store {delete_response.id} successfully deleted")
 else:
    print("Failed to delete vector store")
 ```
 **Important Notes:**
 - Deleting a vector store removes all files, chunks, and embeddings
 - This operation cannot be undone
 - The underlying vector database is also cleaned up
 - Consider backing up important data before deletion
 ## Search Capabilities
 ### Vector Search
 Pure similarity search using embeddings:
 ```python
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    max_num_results=10,
 )
 ```
 ### Filtered Search
 Combine vector search with metadata filtering:
 ```python
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    filters={"file_type": "pdf", "upload_date": "2024-01-01"},
    max_num_results=10,
 )
 ```
 ### Hybrid Search
 [SQLite-vec](../providers/vector_io/inline_sqlite-vec.md), [pgvector](../providers/vector_io/remote_pgvector.md), and [Milvus](../providers/vector_io/inline_milvus.md) support combining vector and keyword search.
 ## Performance Considerations
 > **Note**: For detailed performance optimization strategies, see [Performance Considerations](../providers/files/openai_file_operations_support.md#performance-considerations) in the provider documentation.
 **Key Points:**
 - **Chunk Size**: 400-600 tokens for precision, 800-1200 for context
 - **Storage**: Choose provider based on your performance needs
 - **Search**: Optimize for your specific use case
 ## Error Handling
 > **Note**: For comprehensive troubleshooting and error handling, see [Troubleshooting](../providers/files/openai_file_operations_support.md#troubleshooting) in the provider documentation.
 **Common Issues:**
 - File processing failures (format, size limits)
 - Search performance optimization
 - Storage and memory issues
 ## Best Practices
 > **Note**: For detailed best practices and recommendations, see [Best Practices](../providers/files/openai_file_operations_support.md#best-practices) in the provider documentation.
 **Key Recommendations:**
 - File organization and naming conventions
 - Chunking strategy optimization
 - Metadata and monitoring practices
 - Regular cleanup and maintenance
 ## Integration Examples
 ### RAG Application
 ```python
 # Build a RAG system with file uploads
 async def build_rag_system():
    # Create vector store
    vector_store = client.vector_stores.create(name="knowledge_base")
    # Upload and process documents
    documents = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
    for doc in documents:
        with open(doc, "rb") as f:
            file_info = await client.files.create(file=f, purpose="assistants")
            await client.vector_stores.files.create(
                vector_store_id=vector_store.id, file_id=file_info.id
            )
    return vector_store
 # Query the RAG system
 async def query_rag(vector_store_id, question):
    results = await client.vector_stores.search(
        vector_store_id=vector_store_id, query=question, max_num_results=5
    )
    return results
 ```
 ### Document Analysis
 ```python
 # Analyze document content through vector search
 async def analyze_document(vector_store_id, file_id):
    # Get document content
    content = await client.vector_stores.files.retrieve_content(
        vector_store_id=vector_store_id, file_id=file_id
    )
    # Search for specific topics
    topics = ["introduction", "methodology", "conclusion"]
    analysis = {}
    for topic in topics:
        results = await client.vector_stores.search(
            vector_store_id=vector_store_id, query=topic, max_num_results=3
        )
        analysis[topic] = results.data
    return analysis
 ```
 ## Next Steps
 - Explore the [Files API documentation](../apis/files.md) for detailed API reference
 - Check [Vector Store Providers](../providers/vector_io/index.md) for specific implementation details
 - Review [Getting Started](../getting_started/index.md) for quick setup instructions
--- a/docs/docs/providers/files/files.mdx
+++ b/docs/docs/providers/files/files.mdx
@ -0,0 +1,290 @@
 ---
 sidebar_label: Files
 title: Files
 ---
 ## Overview
 The Files API provides file management capabilities for Llama Stack. It allows you to upload, store, retrieve, and manage files that can be used across various endpoints in your application.
 ## Features
 - **File Upload**: Upload files with metadata and purpose classification
 - **File Management**: List, retrieve, and delete files
 - **Content Retrieval**: Access raw file content for processing
 - **API Compatibility**: Full compatibility with OpenAI Files API endpoints
 - **Flexible Storage**: Support for local filesystem and cloud storage backends
 ## API Endpoints
 ### Upload File
 **POST** `/v1/openai/v1/files`
 Upload a file that can be used across various endpoints.
 **Request Body:**
 - `file`: The file object to be uploaded (multipart form data)
 - `purpose`: The intended purpose of the uploaded file
 **Supported Purposes:**
 - `batch`: Files for batch operations
 **Response:**
 ```json
 {
  "id": "file-abc123",
  "object": "file",
  "bytes": 140,
  "created_at": 1613779121,
  "filename": "mydata.jsonl",
  "purpose": "batch"
 }
 ```
 **Example:**
 ```python
 import requests
 with open("data.jsonl", "rb") as f:
    files = {"file": f}
    data = {"purpose": "batch"}
    response = requests.post(
        "http://localhost:8000/v1/openai/v1/files", files=files, data=data
      )
    file_info = response.json()
 ```
 ### List Files
 **GET** `/v1/openai/v1/files`
 Returns a list of files that belong to the user's organization.
 **Query Parameters:**
 - `after` (optional): A cursor for pagination
 - `limit` (optional): Limit on number of objects (1-10,000, default: 10,000)
 - `order` (optional): Sort order by created_at timestamp (`asc` or `desc`, default: `desc`)
 - `purpose` (optional): Filter files by purpose
 **Response:**
 ```json
 {
  "object": "list",
  "data": [
    {
      "id": "file-abc123",
      "object": "file",
      "bytes": 140,
      "created_at": 1613779121,
      "filename": "mydata.jsonl",
      "purpose": "fine-tune"
    }
  ],
  "has_more": false
 }
 ```
 **Example:**
 ```python
 import requests
 # List all files
 response = requests.get("http://localhost:8000/v1/openai/v1/files")
 files = response.json()
 # List files with pagination
 response = requests.get(
    "http://localhost:8000/v1/openAi/v1/files",
    params={"limit": 10, "after": "file-abc123"},
 )
 files = response.json()
 # Filter by purpose
 response = requests.get(
    "http://localhost:8000/v1/openAi/v1/files", params={"purpose": "fine-tune"}
 )
 files = response.json()
 ```
 ### Retrieve File
 **GET** `/v1/openAi/v1/files/{file_id}`
 Returns information about a specific file.
 **Path Parameters:**
 - `file_id`: The ID of the file to retrieve
 **Response:**
 ```json
 {
  "id": "file-abc123",
  "object": "file",
  "bytes": 140,
  "created_at": 1613779121,
  "filename": "mydata.jsonl",
  "purpose": "fine-tune"
 }
 ```
 **Example:**
 ```python
 import requests
 file_id = "file-abc123"
 response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}")
 file_info = response.json()
 ```
 ### Delete File
 **DELETE** `/v1/openAi/v1/files/{file_id}`
 Delete a file.
 **Path Parameters:**
 - `file_id`: The ID of the file to delete
 **Response:**
 ```json
 {
  "id": "file-abc123",
  "object": "file",
  "deleted": true
 }
 ```
 **Example:**
 ```python
 import requests
 file_id = "file-abc123"
 response = requests.delete(f"http://localhost:8000/v1/openAi/v1/files/{file_id}")
 result = response.json()
 ```
 ### Retrieve File Content
 **GET** `/v1/openAi/v1/files/{file_id}/content`
 Returns the raw file content as a binary response.
 **Path Parameters:**
 - `file_id`: The ID of the file to retrieve content from
 **Response:**
 Binary file content with appropriate headers:
 - `Content-Type`: `application/octet-stream`
 - `Content-Disposition`: `attachment; filename="filename"`
 **Example:**
 ```python
 import requests
 file_id = "file-abc123"
 response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}/content")
 # Save content to file
 with open("downloaded_file.jsonl", "wb") as f:
    f.write(response.content)
 # Or process content directly
 content = response.content
 ```
 ## Vector Store Integration
 The Files API integrates with Vector Stores to enable document processing and search. For detailed information about this integration, see [File Operations and Vector Store Integration](../concepts/file_operations_vector_stores.md).
 ### Vector Store File Operations
 **List Vector Store Files:**
 - **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files`
 **Retrieve Vector Store File Content:**
 - **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files/{file_id}/content`
 **Attach File to Vector Store:**
 - **POST** `/v1/openAi/v1/vector_stores/{vector_store_id}/files`
 ## Error Handling
 The Files API returns standard HTTP status codes and error responses:
 - `400 Bad Request`: Invalid request parameters
 - `404 Not Found`: File not found
 - `429 Too Many Requests`: Rate limit exceeded
 - `500 Internal Server Error`: Server error
 **Error Response Format:**
 ```json
 {
  "error": {
    "message": "Error description",
    "type": "invalid_request_error",
    "code": "file_not_found"
  }
 }
 ```
 ## Rate Limits
 The Files API implements rate limiting to ensure fair usage:
 - File uploads: 100 files per minute
 - File retrievals: 1000 requests per minute
 - File deletions: 100 requests per minute
 ## Best Practices
 1. **File Organization**: Use descriptive filenames and appropriate purpose classifications
 2. **Batch Operations**: For multiple files, consider using batch endpoints when available
 3. **Error Handling**: Always check response status codes and handle errors gracefully
 4. **Content Types**: Ensure files are uploaded with appropriate content types
 5. **Cleanup**: Regularly delete unused files to manage storage costs
 ## Integration Examples
 ### With Python Client
 ```python
 from llama_stack import LlamaStackClient
 client = LlamaStackClient("http://localhost:8000")
 # Upload a file
 with open("data.jsonl", "rb") as f:
    file_info = await client.files.upload(file=f, purpose="fine-tune")
 # List files
 files = await client.files.list(purpose="fine-tune")
 # Retrieve file content
 content = await client.files.retrieve_content(file_info.id)
 ```
 ### With cURL
 ```bash
 # Upload file
 curl -X POST http://localhost:8000/v1/openAi/v1/files \
  -F "file=@data.jsonl" \
  -F "purpose=fine-tune"
 # List files
 curl http://localhost:8000/v1/openAi/v1/files
 # Download file content
 curl http://localhost:8000/v1/openAi/v1/files/file-abc123/content \
  -o downloaded_file.jsonl
 ```
 ## Provider Support
 The Files API supports multiple storage backends:
 - **Local Filesystem**: Store files on local disk (inline provider)
 - **S3**: Store files in AWS S3 or S3-compatible services (remote provider)
 - **Custom Backends**: Extensible architecture for custom storage providers
 See the [Files Providers](index.md) documentation for detailed configuration options.
--- a/docs/docs/providers/files/openai_file_operations_quick_reference.md
+++ b/docs/docs/providers/files/openai_file_operations_quick_reference.md
@ -0,0 +1,80 @@
 # File Operations Quick Reference
 ## Overview
 As of release 0.2.14, Llama Stack provides comprehensive file operations and Vector Store API integration, following the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files).
 > **Note**: For detailed overview and implementation details, see [Overview](../openai_file_operations_support.md#overview) in the full documentation.
 ## Supported Providers
 > **Note**: For complete provider details and features, see [Supported Providers](../openai_file_operations_support.md#supported-providers) in the full documentation.
 **Inline Providers**: FAISS, SQLite-vec, Milvus
 **Remote Providers**: ChromaDB, Qdrant, Weaviate, PGVector
 ## Quick Start
 ### 1. Upload File
 ```python
 file_info = await client.files.upload(
    file=open("document.pdf", "rb"), purpose="assistants"
 )
 ```
 ### 2. Create Vector Store
 ```python
 vector_store = client.vector_stores.create(name="my_docs")
 ```
 ### 3. Attach File
 ```python
 await client.vector_stores.files.create(
    vector_store_id=vector_store.id, file_id=file_info.id
 )
 ```
 ### 4. Search
 ```python
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id, query="What is the main topic?", max_num_results=5
 )
 ```
 ## File Processing & Search
 **Processing**: 800 tokens default chunk size, 400 token overlap
 **Formats**: PDF, DOCX, TXT, Code files, etc.
 **Search**: Vector similarity, Hybrid (SQLite-vec), Filtered with metadata
 ## Configuration
 > **Note**: For detailed configuration examples and options, see [Configuration Examples](../openai_file_operations_support.md#configuration-examples) in the full documentation.
 **Basic Setup**: Configure vector_io and files providers in your run.yaml
 ## Common Use Cases
 - **RAG Systems**: Document Q&A with file uploads
 - **Knowledge Bases**: Searchable document collections
 - **Content Analysis**: Document similarity and clustering
 - **Research Tools**: Literature review and analysis
 ## Performance Tips
 > **Note**: For detailed performance optimization strategies, see [Performance Considerations](../openai_file_operations_support.md#performance-considerations) in the full documentation.
 **Quick Tips**: Choose provider based on your needs (speed vs. storage vs. scalability)
 ## Troubleshooting
 > **Note**: For comprehensive troubleshooting, see [Troubleshooting](../openai_file_operations_support.md#troubleshooting) in the full documentation.
 **Quick Fixes**: Check file format compatibility, optimize chunk sizes, monitor storage
 ## Resources
 - [Full Documentation](openai_file_operations_support.md)
 - [Integration Guide](../concepts/file_operations_vector_stores.md)
 - [Files API](files_api.md)
 - [Provider Details](../vector_io/index.md)
--- a/docs/docs/providers/files/openai_file_operations_support.md
+++ b/docs/docs/providers/files/openai_file_operations_support.md
@ -0,0 +1,292 @@
 # File Operations Support in Vector Store Providers
 ## Overview
 This document provides a comprehensive overview of file operations and Vector Store API support across all available vector store providers in Llama Stack. As of release 0.2.14, the following providers support full file operations integration.
 ## Supported Providers
 ### ✅ Full File Operations Support
 The following providers support complete file operations integration, including file upload, automatic processing, and search:
 #### Inline Providers (Single Node)
 | Provider | File Operations | Key Features |
 |----------|----------------|--------------|
 | **FAISS** | ✅ Full Support | Fast in-memory search, GPU acceleration |
 | **SQLite-vec** | ✅ Full Support | Hybrid search, disk-based storage |
 | **Milvus** | ✅ Full Support | High-performance, scalable indexing |
 #### Remote Providers (Hosted)
 | Provider | File Operations | Key Features |
 |----------|----------------|--------------|
 | **ChromaDB** | ✅ Full Support | Metadata filtering, persistent storage |
 | **Qdrant** | ✅ Full Support | Payload filtering, advanced search |
 | **Weaviate** | ✅ Full Support | GraphQL interface, schema management |
 | **Postgres (PGVector)** | ✅ Full Support | SQL integration, ACID compliance |
 ### 🔄 Partial Support
 Some providers may support basic vector operations but lack full file operations integration:
 | Provider | Status | Notes |
 |----------|--------|-------|
 | **Meta Reference** | 🔄 Basic | Core vector operations only |
 ## File Operations Features
 All supported providers offer the following file operations capabilities:
 ### Core Functionality
 - **File Upload & Processing**: Automatic document ingestion and chunking
 - **Vector Storage**: Embedding generation and storage
 - **Search & Retrieval**: Semantic search with metadata filtering
 - **File Management**: List, retrieve, and manage files in vector stores
 ### Advanced Features
 - **Automatic Chunking**: Configurable chunk sizes and overlap
 - **Metadata Preservation**: File attributes and chunk metadata
 - **Status Tracking**: Monitor file processing progress
 - **Error Handling**: Comprehensive error reporting and recovery
 ## Implementation Details
 ### File Processing Pipeline
 1. **Upload**: File uploaded via Files API
 2. **Extraction**: Text content extracted from various formats
 3. **Chunking**: Content split into optimal chunks (default: 800 tokens)
 4. **Embedding**: Chunks converted to vector embeddings
 5. **Storage**: Vectors stored with metadata in vector database
 6. **Indexing**: Search index updated for fast retrieval
 ### Supported File Formats
 - **Documents**: PDF, DOCX, DOC
 - **Text**: TXT, MD, RST
 - **Code**: Python, JavaScript, Java, C++, etc.
 - **Data**: JSON, CSV, XML
 - **Web**: HTML files
 ### Chunking Strategies
 - **Default**: 800 tokens with 400 token overlap
 - **Custom**: Configurable chunk sizes and overlap
 - **Semantic**: Intelligent boundary detection
 - **Static**: Fixed-size chunks with overlap
 ## Provider-Specific Features
 ### FAISS
 - **Storage**: In-memory with optional persistence
 - **Performance**: Optimized for speed and GPU acceleration
 - **Use Case**: High-performance, memory-constrained environments
 ### SQLite-vec
 - **Storage**: Disk-based with SQLite backend
 - **Search**: Hybrid vector + keyword search
 - **Use Case**: Large document collections, frequent updates
 ### Milvus
 - **Storage**: Scalable distributed storage
 - **Indexing**: Multiple index types (IVF, HNSW)
 - **Use Case**: Production deployments, large-scale applications
 ### ChromaDB
 - **Storage**: Persistent storage with metadata
 - **Filtering**: Advanced metadata filtering
 - **Use Case**: Applications requiring rich metadata
 ### Qdrant
 - **Storage**: High-performance vector database
 - **Filtering**: Payload-based filtering
 - **Use Case**: Real-time applications, complex queries
 ### Weaviate
 - **Storage**: GraphQL-native vector database
 - **Schema**: Flexible schema management
 - **Use Case**: Applications requiring complex data relationships
 ### Postgres (PGVector)
 - **Storage**: SQL database with vector extensions
 - **Integration**: ACID compliance, existing SQL workflows
 - **Use Case**: Applications requiring transactional guarantees
 ## Configuration Examples
 ### Basic Configuration
 ```yaml
 vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      kvstore:
        type: sqlite
        db_path: ~/.llama/faiss_store.db
 ```
 ### With FileResponse Support
 ```yaml
 vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      kvstore:
        type: sqlite
        db_path: ~/.llama/faiss_store.db
 files:
  - provider_id: local-files
    provider_type: inline::localfs
    config:
      storage_dir: ~/.llama/files
      metadata_store:
        type: sqlite
        db_path: ~/.llama/files_metadata.db
 ```
 ## Usage Examples
 ### Python Client
 ```python
 from llama_stack import LlamaStackClient
 client = LlamaStackClient("http://localhost:8000")
 # Create vector store
 vector_store = client.vector_stores.create(name="documents")
 # Upload and process file
 with open("document.pdf", "rb") as f:
    file_info = await client.files.upload(file=f, purpose="assistants")
 # Attach to vector store
 await client.vector_stores.files.create(
    vector_store_id=vector_store.id, file_id=file_info.id
 )
 # Search
 results = await client.vector_stores.search(
    vector_store_id=vector_store.id, query="What is the main topic?", max_num_results=5
 )
 ```
 ### cURL Commands
 ```bash
 # Upload file
 curl -X POST http://localhost:8000/v1/openai/v1/files \
  -F "file=@document.pdf" \
  -F "purpose=assistants"
 # Create vector store
 curl -X POST http://localhost:8000/v1/openai/v1/vector_stores \
  -H "Content-Type: application/json" \
  -d '{"name": "documents"}'
 # Attach file to vector store
 curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/files \
  -H "Content-Type: application/json" \
  -d '{"file_id": "file-abc123"}'
 # Search vector store
 curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/search \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the main topic?", "max_num_results": 5}'
 ```
 ## Performance Considerations
 ### Chunk Size Optimization
 - **Small chunks (400-600 tokens)**: Better precision, more results
 - **Large chunks (800-1200 tokens)**: Better context, fewer results
 - **Overlap (50%)**: Maintains context between chunks
 ### Storage Efficiency
 - **FAISS**: Fastest, but memory-limited
 - **SQLite-vec**: Good balance of performance and storage
 - **Milvus**: Scalable, production-ready
 - **Remote providers**: Managed, but network-dependent
 ### Search Performance
 - **Vector search**: Fastest for semantic queries
 - **Hybrid search**: Best accuracy (SQLite-vec only)
 - **Filtered search**: Fast with metadata constraints
 ## Troubleshooting
 ### Common Issues
 1. **File Processing Failures**
   - Check file format compatibility
   - Verify file size limits
   - Review error messages in file status
 2. **Search Performance**
   - Optimize chunk sizes for your use case
   - Use filters to narrow search scope
   - Monitor vector store metrics
 3. **Storage Issues**
   - Check available disk space
   - Verify database permissions
   - Monitor memory usage (for in-memory providers)
 ### Monitoring
 ```python
 # Check file processing status
 file_status = await client.vector_stores.files.retrieve(
    vector_store_id=vector_store.id, file_id=file_info.id
 )
 if file_status.status == "failed":
    print(f"Error: {file_status.last_error.message}")
 # Monitor vector store health
 health = await client.vector_stores.health(vector_store_id=vector_store.id)
 print(f"Status: {health.status}")
 ```
 ## Best Practices
 1. **File Organization**: Use descriptive names and organize by purpose
 2. **Chunking Strategy**: Test different sizes for your specific use case
 3. **Metadata**: Add relevant attributes for better filtering
 4. **Monitoring**: Track processing status and search performance
 5. **Cleanup**: Regularly remove unused files to manage storage
 ## Future Enhancements
 Planned improvements for file operations support:
 - **Batch Processing**: Process multiple files simultaneously
 - **Advanced Chunking**: More sophisticated chunking algorithms
 - **Custom Embeddings**: Support for custom embedding models
 - **Real-time Updates**: Live file processing and indexing
 - **Multi-format Support**: Enhanced file format support
 ## Support and Resources
 - **Documentation**: [File Operations and Vector Store Integration](../concepts/file_operations_vector_stores.md)
 - **API Reference**: [Files API](files_api.md)
 - **Provider Docs**: [Vector Store Providers](../vector_io/index.md)
 - **Examples**: [Getting Started](../getting_started/index.md)
 - **Community**: [GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)
--- a/docs/docs/providers/index.mdx
+++ b/docs/docs/providers/index.mdx
@ -22,7 +22,7 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
 ## Provider Categories
 - **[External Providers](external/index.mdx)** - Guide for building and using external providers
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
+- **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility layer
 - **[Inference](inference/index.mdx)** - LLM and embedding model providers
 - **[Agents](agents/index.mdx)** - Agentic system providers
 - **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers
@ -31,3 +31,12 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
 - **[Vector IO](vector_io/index.mdx)** - Vector database providers
 - **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers
 - **[Files](files/index.mdx)** - File system and storage providers
 ## API Documentation
 For comprehensive API documentation and reference:
 - **[API Reference](../api/index.mdx)** - Complete API documentation
 - **[Experimental APIs](../api-experimental/index.mdx)** - APIs in development
 - **[Deprecated APIs](../api-deprecated/index.mdx)** - Legacy APIs being phased out
 - **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility guide