diff --git a/docs/docs/api-deprecated/index.mdx b/docs/docs/api-deprecated/index.mdx new file mode 100644 index 000000000..0da357e30 --- /dev/null +++ b/docs/docs/api-deprecated/index.mdx @@ -0,0 +1,62 @@ +--- +title: Deprecated APIs +description: Legacy APIs that are being phased out +sidebar_label: Deprecated +sidebar_position: 1 +--- + +# Deprecated APIs + +This section contains APIs that are being phased out in favor of newer, more standardized implementations. These APIs are maintained for backward compatibility but are not recommended for new projects. + +:::warning Deprecation Notice +These APIs are deprecated and will be removed in future versions. Please migrate to the recommended alternatives listed below. +::: + +## Migration Guide + +When using deprecated APIs, please refer to the migration guides provided for each API to understand how to transition to the supported alternatives. + +## Deprecated API List + +### Legacy Inference APIs +Some older inference endpoints that have been superseded by the standardized Inference API. + +**Migration Path:** Use the [Inference API](../api/) instead. + +### Legacy Vector Operations +Older vector database operations that have been replaced by the Vector IO API. + +**Migration Path:** Use the [Vector IO API](../api/) instead. + +### Legacy File Operations +Older file management endpoints that have been replaced by the Files API. + +**Migration Path:** Use the [Files API](../api/) instead. + +## Support Timeline + +Deprecated APIs will be supported according to the following timeline: + +- **Current Version**: Full support with deprecation warnings +- **Next Major Version**: Limited support with migration notices +- **Following Major Version**: Removal of deprecated APIs + +## Getting Help + +If you need assistance migrating from deprecated APIs: + +1. Check the specific migration guides for each API +2. Review the [API Reference](../api/) for current alternatives +3. Consult the [Community Forums](https://github.com/llamastack/llama-stack/discussions) for migration support +4. Open an issue on GitHub for specific migration questions + +## Contributing + +If you find issues with deprecated APIs or have suggestions for improving the migration process, please contribute by: + +1. Opening an issue describing the problem +2. Submitting a pull request with improvements +3. Updating migration documentation + +For more information on contributing, see our [Contributing Guide](../contributing/). diff --git a/docs/docs/api-experimental/index.mdx b/docs/docs/api-experimental/index.mdx new file mode 100644 index 000000000..adbd64582 --- /dev/null +++ b/docs/docs/api-experimental/index.mdx @@ -0,0 +1,128 @@ +--- +title: Experimental APIs +description: APIs in development with limited support +sidebar_label: Experimental +sidebar_position: 1 +--- + +# Experimental APIs + +This section contains APIs that are currently in development and may have limited support or stability. These APIs are available for testing and feedback but should not be used in production environments. + +:::warning Experimental Notice +These APIs are experimental and may change without notice. Use with caution and provide feedback to help improve them. +::: + +## Current Experimental APIs + +### Batch Inference API +Run inference on a dataset of inputs in batch mode for improved efficiency. + +**Status:** In Development +**Provider Support:** Limited +**Use Case:** Large-scale inference operations + +**Features:** +- Batch processing of multiple inputs +- Optimized resource utilization +- Progress tracking and monitoring + +### Batch Agents API +Run agentic workflows on a dataset of inputs in batch mode. + +**Status:** In Development +**Provider Support:** Limited +**Use Case:** Large-scale agent operations + +**Features:** +- Batch agent execution +- Parallel processing capabilities +- Result aggregation and analysis + +### Synthetic Data Generation API +Generate synthetic data for model development and testing. + +**Status:** Early Development +**Provider Support:** Very Limited +**Use Case:** Training data augmentation + +**Features:** +- Automated data generation +- Quality control mechanisms +- Customizable generation parameters + +### Batches API (OpenAI-compatible) +OpenAI-compatible batch management for inference operations. + +**Status:** In Development +**Provider Support:** Limited +**Use Case:** OpenAI batch processing compatibility + +**Features:** +- OpenAI batch API compatibility +- Job scheduling and management +- Status tracking and monitoring + +## Getting Started with Experimental APIs + +### Prerequisites +- Llama Stack server running with experimental features enabled +- Appropriate provider configurations +- Understanding of API limitations + +### Configuration +Experimental APIs may require special configuration flags or provider settings. Check the specific API documentation for setup requirements. + +### Usage Guidelines +1. **Testing Only**: Use experimental APIs for testing and development only +2. **Monitor Changes**: Watch for updates and breaking changes +3. **Provide Feedback**: Report issues and suggest improvements +4. **Backup Data**: Always backup important data when using experimental features + +## Feedback and Contribution + +We encourage feedback on experimental APIs to help improve them: + +### Reporting Issues +- Use GitHub issues with the "experimental" label +- Include detailed error messages and reproduction steps +- Specify the API version and provider being used + +### Feature Requests +- Submit feature requests through GitHub discussions +- Provide use cases and expected behavior +- Consider contributing implementations + +### Testing +- Test experimental APIs in your environment +- Report performance issues and optimization opportunities +- Share success stories and use cases + +## Migration to Stable APIs + +As experimental APIs mature, they will be moved to the stable API section. When this happens: + +1. **Announcement**: We'll announce the promotion in release notes +2. **Migration Guide**: Detailed migration instructions will be provided +3. **Deprecation Timeline**: Experimental versions will be deprecated with notice +4. **Support**: Full support will be available for stable versions + +## Provider Support + +Experimental APIs may have limited provider support. Check the specific API documentation for: + +- Supported providers +- Configuration requirements +- Known limitations +- Performance characteristics + +## Roadmap + +Experimental APIs are part of our ongoing development roadmap: + +- **Q1 2024**: Batch Inference API stabilization +- **Q2 2024**: Batch Agents API improvements +- **Q3 2024**: Synthetic Data Generation API expansion +- **Q4 2024**: Batches API full OpenAI compatibility + +For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions). diff --git a/docs/docs/api-openai/index.mdx b/docs/docs/api-openai/index.mdx new file mode 100644 index 000000000..21a377fca --- /dev/null +++ b/docs/docs/api-openai/index.mdx @@ -0,0 +1,278 @@ +--- +title: OpenAI API Compatibility +description: OpenAI-compatible APIs and features in Llama Stack +sidebar_label: OpenAI Compatibility +sidebar_position: 1 +--- + +# OpenAI API Compatibility + +Llama Stack provides comprehensive OpenAI API compatibility, allowing you to use existing OpenAI API clients and tools with Llama Stack providers. This compatibility layer ensures seamless migration and interoperability. + +## Overview + +OpenAI API compatibility in Llama Stack includes: + +- **OpenAI-compatible endpoints** for all major APIs +- **Request/response format compatibility** with OpenAI standards +- **Authentication and authorization** using OpenAI-style API keys +- **Error handling** with OpenAI-compatible error codes and messages +- **Rate limiting** and usage tracking compatible with OpenAI patterns + +## Supported OpenAI APIs + +### Chat Completions API +OpenAI-compatible chat completions for conversational AI applications. + +**Endpoint:** `/v1/chat/completions` +**Compatibility:** Full OpenAI API compatibility +**Providers:** All inference providers + +**Features:** +- Message-based conversations +- System prompts and user messages +- Function calling support +- Streaming responses +- Temperature and other parameter controls + +### Completions API +OpenAI-compatible text completions for general text generation. + +**Endpoint:** `/v1/completions` +**Compatibility:** Full OpenAI API compatibility +**Providers:** All inference providers + +**Features:** +- Text completion generation +- Prompt engineering support +- Customizable parameters +- Batch processing capabilities + +### Embeddings API +OpenAI-compatible embeddings for vector operations. + +**Endpoint:** `/v1/embeddings` +**Compatibility:** Full OpenAI API compatibility +**Providers:** All embedding providers + +**Features:** +- Text embedding generation +- Multiple embedding models +- Batch embedding processing +- Vector similarity operations + +### Files API +OpenAI-compatible file management for document processing. + +**Endpoint:** `/v1/files` +**Compatibility:** Full OpenAI API compatibility +**Providers:** Local Filesystem, S3 + +**Features:** +- File upload and management +- Document processing +- File metadata tracking +- Secure file access + +### Vector Store Files API +OpenAI-compatible vector store file operations for RAG applications. + +**Endpoint:** `/v1/vector_stores/{vector_store_id}/files` +**Compatibility:** Full OpenAI API compatibility +**Providers:** FAISS, SQLite-vec, Milvus, ChromaDB, Qdrant, Weaviate, Postgres (PGVector) + +**Features:** +- Automatic document processing +- Vector store integration +- File chunking and indexing +- Search and retrieval operations + +### Batches API +OpenAI-compatible batch processing for large-scale operations. + +**Endpoint:** `/v1/batches` +**Compatibility:** OpenAI API compatibility (experimental) +**Providers:** Limited support + +**Features:** +- Batch job creation and management +- Progress tracking +- Result retrieval +- Error handling + +## Migration from OpenAI + +### Step 1: Update API Endpoint +Change your API endpoint from OpenAI to your Llama Stack server: + +```python +# Before (OpenAI) +import openai +client = openai.OpenAI(api_key="your-openai-key") + +# After (Llama Stack) +import openai +client = openai.OpenAI( + api_key="your-llama-stack-key", + base_url="http://localhost:8000/v1" # Your Llama Stack server +) +``` + +### Step 2: Configure Providers +Set up your preferred providers in the Llama Stack configuration: + +```yaml +# stack-config.yaml +inference: + providers: + - name: "meta-reference" + type: "inline" + model: "llama-3.1-8b" +``` + +### Step 3: Test Compatibility +Verify that your existing code works with Llama Stack: + +```python +# Test chat completions +response = client.chat.completions.create( + model="llama-3.1-8b", + messages=[ + {"role": "user", "content": "Hello, world!"} + ] +) +print(response.choices[0].message.content) +``` + +## Provider-Specific Features + +### Meta Reference Provider +- Full OpenAI API compatibility +- Local model execution +- Custom model support + +### Remote Providers +- OpenAI API compatibility +- Cloud-based execution +- Scalable infrastructure + +### Vector Store Providers +- OpenAI vector store API compatibility +- Automatic document processing +- Advanced search capabilities + +## Authentication + +Llama Stack supports OpenAI-style authentication: + +### API Key Authentication +```python +client = openai.OpenAI( + api_key="your-api-key", + base_url="http://localhost:8000/v1" +) +``` + +### Environment Variables +```bash +export OPENAI_API_KEY="your-api-key" +export OPENAI_BASE_URL="http://localhost:8000/v1" +``` + +## Error Handling + +Llama Stack provides OpenAI-compatible error responses: + +```python +try: + response = client.chat.completions.create(...) +except openai.APIError as e: + print(f"API Error: {e}") +except openai.RateLimitError as e: + print(f"Rate Limit Error: {e}") +except openai.APIConnectionError as e: + print(f"Connection Error: {e}") +``` + +## Rate Limiting + +OpenAI-compatible rate limiting is supported: + +- **Requests per minute** limits +- **Tokens per minute** limits +- **Concurrent request** limits +- **Usage tracking** and monitoring + +## Monitoring and Observability + +Track your API usage with OpenAI-compatible monitoring: + +- **Request/response logging** +- **Usage metrics** and analytics +- **Performance monitoring** +- **Error tracking** and alerting + +## Best Practices + +### 1. Provider Selection +Choose providers based on your requirements: +- **Local development**: Meta Reference, Ollama +- **Production**: Cloud providers (Fireworks, Together, NVIDIA) +- **Specialized use cases**: Custom providers + +### 2. Model Configuration +Configure models for optimal performance: +- **Model selection** based on task requirements +- **Parameter tuning** for specific use cases +- **Resource allocation** for performance + +### 3. Error Handling +Implement robust error handling: +- **Retry logic** for transient failures +- **Fallback providers** for high availability +- **Monitoring** and alerting for issues + +### 4. Security +Follow security best practices: +- **API key management** and rotation +- **Access control** and authorization +- **Data privacy** and compliance + +## Troubleshooting + +### Common Issues + +**Connection Errors** +- Verify server is running +- Check network connectivity +- Validate API endpoint URL + +**Authentication Errors** +- Verify API key is correct +- Check key permissions +- Ensure proper authentication headers + +**Model Errors** +- Verify model is available +- Check provider configuration +- Validate model parameters + +### Getting Help + +For OpenAI compatibility issues: + +1. **Check Documentation**: Review provider-specific documentation +2. **Community Support**: Ask questions in GitHub discussions +3. **Issue Reporting**: Open GitHub issues for bugs +4. **Professional Support**: Contact support for enterprise issues + +## Roadmap + +Upcoming OpenAI compatibility features: + +- **Enhanced batch processing** support +- **Advanced function calling** capabilities +- **Improved error handling** and diagnostics +- **Performance optimizations** for large-scale deployments + +For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions). diff --git a/docs/docs/api/index.mdx b/docs/docs/api/index.mdx new file mode 100644 index 000000000..7088c6c2b --- /dev/null +++ b/docs/docs/api/index.mdx @@ -0,0 +1,144 @@ +--- +title: API Reference +description: Complete reference for Llama Stack APIs +sidebar_label: Overview +sidebar_position: 1 +--- + +# API Reference + +Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers. + +## Core APIs + +### Inference API +Run inference with Large Language Models (LLMs) and embedding models. + +**Supported Providers:** +- Meta Reference (Single Node) +- Ollama (Single Node) +- Fireworks (Hosted) +- Together (Hosted) +- NVIDIA NIM (Hosted and Single Node) +- vLLM (Hosted and Single Node) +- TGI (Hosted and Single Node) +- AWS Bedrock (Hosted) +- Cerebras (Hosted) +- Groq (Hosted) +- SambaNova (Hosted) +- PyTorch ExecuTorch (On-device iOS, Android) +- OpenAI (Hosted) +- Anthropic (Hosted) +- Gemini (Hosted) +- WatsonX (Hosted) + +### Agents API +Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning. + +**Supported Providers:** +- Meta Reference (Single Node) +- Fireworks (Hosted) +- Together (Hosted) +- PyTorch ExecuTorch (On-device iOS) + +### Vector IO API +Perform operations on vector stores, including adding documents, searching, and deleting documents. + +**Supported Providers:** +- FAISS (Single Node) +- SQLite-Vec (Single Node) +- Chroma (Hosted and Single Node) +- Milvus (Hosted and Single Node) +- Postgres (PGVector) (Hosted and Single Node) +- Weaviate (Hosted) +- Qdrant (Hosted and Single Node) + +### Files API (OpenAI-compatible) +Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints. + +**Supported Providers:** +- Local Filesystem (Single Node) +- S3 (Hosted) + +### Vector Store Files API (OpenAI-compatible) +Integrate file operations with vector stores for automatic document processing and search. + +**Supported Providers:** +- FAISS (Single Node) +- SQLite-vec (Single Node) +- Milvus (Single Node) +- ChromaDB (Hosted and Single Node) +- Qdrant (Hosted and Single Node) +- Weaviate (Hosted) +- Postgres (PGVector) (Hosted and Single Node) + +### Safety API +Apply safety policies to outputs at a systems level, not just model level. + +**Supported Providers:** +- Llama Guard (Depends on Inference Provider) +- Prompt Guard (Single Node) +- Code Scanner (Single Node) +- AWS Bedrock (Hosted) + +### Post Training API +Fine-tune models for specific use cases and domains. + +**Supported Providers:** +- Meta Reference (Single Node) +- HuggingFace (Single Node) +- TorchTune (Single Node) +- NVIDIA NEMO (Hosted) + +### Eval API +Generate outputs and perform scoring to evaluate system performance. + +**Supported Providers:** +- Meta Reference (Single Node) +- NVIDIA NEMO (Hosted) + +### Telemetry API +Collect telemetry data from the system for monitoring and observability. + +**Supported Providers:** +- Meta Reference (Single Node) + +### Tool Runtime API +Interact with various tools and protocols to extend LLM capabilities. + +**Supported Providers:** +- Brave Search (Hosted) +- RAG Runtime (Single Node) + +## API Compatibility + +All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to: +- Use existing OpenAI API clients and tools +- Migrate from OpenAI to other providers seamlessly +- Maintain consistent API contracts across different environments + +## Getting Started + +To get started with Llama Stack APIs: + +1. **Choose a Distribution**: Select a pre-configured distribution that matches your environment +2. **Configure Providers**: Set up the providers you want to use for each API +3. **Start the Server**: Launch the Llama Stack server with your configuration +4. **Use the APIs**: Make requests to the API endpoints using your preferred client + +For detailed setup instructions, see our [Getting Started Guide](../getting_started/quickstart). + +## Provider Details + +For complete provider compatibility and setup instructions, see our [Providers Documentation](../providers/). + +## API Stability + +Llama Stack APIs are organized by stability level: +- **[Stable APIs](./index.mdx)** - Production-ready APIs with full support +- **[Experimental APIs](../api-experimental/)** - APIs in development with limited support +- **[Deprecated APIs](../api-deprecated/)** - Legacy APIs being phased out + +## OpenAI Integration + +For specific OpenAI API compatibility features, see our [OpenAI Compatibility Guide](../api-openai/). diff --git a/docs/docs/providers/files/files_api.md b/docs/docs/providers/files/files.mdx similarity index 87% rename from docs/docs/providers/files/files_api.md rename to docs/docs/providers/files/files.mdx index b2d515c66..095642be3 100644 --- a/docs/docs/providers/files/files_api.md +++ b/docs/docs/providers/files/files.mdx @@ -1,4 +1,7 @@ -# Files API +--- +sidebar_label: Files +title: Files +--- ## Overview @@ -48,7 +51,7 @@ with open("data.jsonl", "rb") as f: data = {"purpose": "batch"} response = requests.post( "http://localhost:8000/v1/openai/v1/files", files=files, data=data - ) + ) file_info = response.json() ``` @@ -92,21 +95,21 @@ files = response.json() # List files with pagination response = requests.get( - "http://localhost:8000/v1/openai/v1/files", + "http://localhost:8000/v1/openAi/v1/files", params={"limit": 10, "after": "file-abc123"}, ) files = response.json() # Filter by purpose response = requests.get( - "http://localhost:8000/v1/openai/v1/files", params={"purpose": "fine-tune"} + "http://localhost:8000/v1/openAi/v1/files", params={"purpose": "fine-tune"} ) files = response.json() ``` ### Retrieve File -**GET** `/v1/openai/v1/files/{file_id}` +**GET** `/v1/openAi/v1/files/{file_id}` Returns information about a specific file. @@ -130,13 +133,13 @@ Returns information about a specific file. import requests file_id = "file-abc123" -response = requests.get(f"http://localhost:8000/v1/openai/v1/files/{file_id}") +response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}") file_info = response.json() ``` ### Delete File -**DELETE** `/v1/openai/v1/files/{file_id}` +**DELETE** `/v1/openAi/v1/files/{file_id}` Delete a file. @@ -157,13 +160,13 @@ Delete a file. import requests file_id = "file-abc123" -response = requests.delete(f"http://localhost:8000/v1/openai/v1/files/{file_id}") +response = requests.delete(f"http://localhost:8000/v1/openAi/v1/files/{file_id}") result = response.json() ``` ### Retrieve File Content -**GET** `/v1/openai/v1/files/{file_id}/content` +**GET** `/v1/openAi/v1/files/{file_id}/content` Returns the raw file content as a binary response. @@ -180,7 +183,7 @@ Binary file content with appropriate headers: import requests file_id = "file-abc123" -response = requests.get(f"http://localhost:8000/v1/openai/v1/files/{file_id}/content") +response = requests.get(f"http://localhost:8000/v1/openAi/v1/files/{file_id}/content") # Save content to file with open("downloaded_file.jsonl", "wb") as f: @@ -197,13 +200,13 @@ The Files API integrates with Vector Stores to enable document processing and se ### Vector Store File Operations **List Vector Store Files:** -- **GET** `/v1/openai/v1/vector_stores/{vector_store_id}/files` +- **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files` **Retrieve Vector Store File Content:** -- **GET** `/v1/openai/v1/vector_stores/{vector_store_id}/files/{file_id}/content` +- **GET** `/v1/openAi/v1/vector_stores/{vector_store_id}/files/{file_id}/content` **Attach File to Vector Store:** -- **POST** `/v1/openai/v1/vector_stores/{vector_store_id}/files` +- **POST** `/v1/openAi/v1/vector_stores/{vector_store_id}/files` ## Error Handling @@ -264,15 +267,15 @@ content = await client.files.retrieve_content(file_info.id) ```bash # Upload file -curl -X POST http://localhost:8000/v1/openai/v1/files \ +curl -X POST http://localhost:8000/v1/openAi/v1/files \ -F "file=@data.jsonl" \ -F "purpose=fine-tune" # List files -curl http://localhost:8000/v1/openai/v1/files +curl http://localhost:8000/v1/openAi/v1/files # Download file content -curl http://localhost:8000/v1/openai/v1/files/file-abc123/content \ +curl http://localhost:8000/v1/openAi/v1/files/file-abc123/content \ -o downloaded_file.jsonl ``` diff --git a/docs/docs/providers/index.mdx b/docs/docs/providers/index.mdx index 9c560fe32..0893f5658 100644 --- a/docs/docs/providers/index.mdx +++ b/docs/docs/providers/index.mdx @@ -22,7 +22,7 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro ## Provider Categories - **[External Providers](external/index.mdx)** - Guide for building and using external providers -- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer +- **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility layer - **[Inference](inference/index.mdx)** - LLM and embedding model providers - **[Agents](agents/index.mdx)** - Agentic system providers - **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers @@ -31,3 +31,12 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro - **[Vector IO](vector_io/index.mdx)** - Vector database providers - **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers - **[Files](files/index.mdx)** - File system and storage providers + +## API Documentation + +For comprehensive API documentation and reference: + +- **[API Reference](../api/index.mdx)** - Complete API documentation +- **[Experimental APIs](../api-experimental/index.mdx)** - APIs in development +- **[Deprecated APIs](../api-deprecated/index.mdx)** - Legacy APIs being phased out +- **[OpenAI Compatibility](../api-openai/index.mdx)** - OpenAI API compatibility guide diff --git a/docs/source/index.md b/docs/source/index.md deleted file mode 100644 index 5446ec2d3..000000000 --- a/docs/source/index.md +++ /dev/null @@ -1,150 +0,0 @@ -# Llama Stack -Welcome to Llama Stack, the open-source framework for building generative AI applications. -```{admonition} Llama 4 is here! -:class: tip - -Check out [Getting Started with Llama 4](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started_llama4.ipynb) -``` -```{admonition} News -:class: tip - -Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details. -``` - - -## What is Llama Stack? - -Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of OpenAI-compatible APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides - -- **OpenAI-compatible API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry -- **Plugin architecture** to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile -- **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment -- **Multiple developer interfaces** like CLI and SDKs for Python, Node, iOS, and Android -- **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack - -```{image} ../_static/llama-stack.png -:alt: Llama Stack -:width: 400px -``` - -Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available. - -## How does Llama Stack work? -Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and Client SDKs (see below) meant to -be used in your applications. The server can be run in a variety of environments, including local (inline) -development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and -Kotlin. - -## Quick Links - -- Ready to build? Check out the [Quick Start](getting_started/index) to get started. -- Want to contribute? See the [Contributing](contributing/index) guide. - -## Supported Llama Stack Implementations - -A number of "adapters" are available for some popular Inference and Vector Store providers. For other APIs (particularly Safety and Agents), we provide *reference implementations* you can use to get started. We expect this list to grow over time. We are slowly onboarding more providers to the ecosystem as we get more confidence in the APIs. - -**Inference API** -| **Provider** | **Environments** | -| :----: | :----: | -| Meta Reference | Single Node | -| Ollama | Single Node | -| Fireworks | Hosted | -| Together | Hosted | -| NVIDIA NIM | Hosted and Single Node | -| vLLM | Hosted and Single Node | -| TGI | Hosted and Single Node | -| AWS Bedrock | Hosted | -| Cerebras | Hosted | -| Groq | Hosted | -| SambaNova | Hosted | -| PyTorch ExecuTorch | On-device iOS, Android | -| OpenAI | Hosted | -| Anthropic | Hosted | -| Gemini | Hosted | -| WatsonX | Hosted | - -**Agents API** -| **Provider** | **Environments** | -| :----: | :----: | -| Meta Reference | Single Node | -| Fireworks | Hosted | -| Together | Hosted | -| PyTorch ExecuTorch | On-device iOS | - -**Vector IO API** -| **Provider** | **Environments** | -| :----: | :----: | -| FAISS | Single Node | -| SQLite-Vec | Single Node | -| Chroma | Hosted and Single Node | -| Milvus | Hosted and Single Node | -| Postgres (PGVector) | Hosted and Single Node | -| Weaviate | Hosted | -| Qdrant | Hosted and Single Node | - -**Files API (OpenAI-compatible)** -| **Provider** | **Environments** | -| :----: | :----: | -| Local Filesystem | Single Node | -| S3 | Hosted | - -**Vector Store Files API (OpenAI-compatible)** -| **Provider** | **Environments** | -| :----: | :----: | -| FAISS | Single Node | -| SQLite-vec | Single Node | -| Milvus | Single Node | -| ChromaDB | Hosted and Single Node | -| Qdrant | Hosted and Single Node | -| Weaviate | Hosted | -| Postgres (PGVector) | Hosted and Single Node | - -**Safety API** -| **Provider** | **Environments** | -| :----: | :----: | -| Llama Guard | Depends on Inference Provider | -| Prompt Guard | Single Node | -| Code Scanner | Single Node | -| AWS Bedrock | Hosted | - -**Post Training API** -| **Provider** | **Environments** | -| :----: | :----: | -| Meta Reference | Single Node | -| HuggingFace | Single Node | -| TorchTune | Single Node | -| NVIDIA NEMO | Hosted | - -**Eval API** -| **Provider** | **Environments** | -| :----: | :----: | -| Meta Reference | Single Node | -| NVIDIA NEMO | Hosted | - -**Telemetry API** -| **Provider** | **Environments** | -| :----: | :----: | -| Meta Reference | Single Node | - -**Tool Runtime API** -| **Provider** | **Environments** | -| :----: | :----: | -| Brave Search | Hosted | -| RAG Runtime | Single Node | - -```{toctree} -:hidden: -:maxdepth: 3 - -self -getting_started/index -concepts/index -providers/index -distributions/index -advanced_apis/index -building_applications/index -deploying/index -contributing/index -references/index -```