mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
docs: Add comprehensive Files API and Vector Store integration documentation - Add Files API documentation with OpenAI-compatible endpoints - Create comprehensive guide for OpenAI-compatible file operations - Reorganize documentation structure: move file operations to files/ directory - Add vector store provider documentation for Milvus, SQLite-vec, FAISS - Clean up redundant files and improve navigation - Update cross-references and eliminate documentation duplication - Support for release 0.2.14 FileResponse and Vector Store API features # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
144 lines
4.2 KiB
Text
144 lines
4.2 KiB
Text
---
|
|
title: API Reference
|
|
description: Complete reference for Llama Stack APIs
|
|
sidebar_label: Overview
|
|
sidebar_position: 1
|
|
---
|
|
|
|
# API Reference
|
|
|
|
Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers.
|
|
|
|
## Core APIs
|
|
|
|
### Inference API
|
|
Run inference with Large Language Models (LLMs) and embedding models.
|
|
|
|
**Supported Providers:**
|
|
- Meta Reference (Single Node)
|
|
- Ollama (Single Node)
|
|
- Fireworks (Hosted)
|
|
- Together (Hosted)
|
|
- NVIDIA NIM (Hosted and Single Node)
|
|
- vLLM (Hosted and Single Node)
|
|
- TGI (Hosted and Single Node)
|
|
- AWS Bedrock (Hosted)
|
|
- Cerebras (Hosted)
|
|
- Groq (Hosted)
|
|
- SambaNova (Hosted)
|
|
- PyTorch ExecuTorch (On-device iOS, Android)
|
|
- OpenAI (Hosted)
|
|
- Anthropic (Hosted)
|
|
- Gemini (Hosted)
|
|
- WatsonX (Hosted)
|
|
|
|
### Agents API
|
|
Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning.
|
|
|
|
**Supported Providers:**
|
|
- Meta Reference (Single Node)
|
|
- Fireworks (Hosted)
|
|
- Together (Hosted)
|
|
- PyTorch ExecuTorch (On-device iOS)
|
|
|
|
### Vector IO API
|
|
Perform operations on vector stores, including adding documents, searching, and deleting documents.
|
|
|
|
**Supported Providers:**
|
|
- FAISS (Single Node)
|
|
- SQLite-Vec (Single Node)
|
|
- Chroma (Hosted and Single Node)
|
|
- Milvus (Hosted and Single Node)
|
|
- Postgres (PGVector) (Hosted and Single Node)
|
|
- Weaviate (Hosted)
|
|
- Qdrant (Hosted and Single Node)
|
|
|
|
### Files API (OpenAI-compatible)
|
|
Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints.
|
|
|
|
**Supported Providers:**
|
|
- Local Filesystem (Single Node)
|
|
- S3 (Hosted)
|
|
|
|
### Vector Store Files API (OpenAI-compatible)
|
|
Integrate file operations with vector stores for automatic document processing and search.
|
|
|
|
**Supported Providers:**
|
|
- FAISS (Single Node)
|
|
- SQLite-vec (Single Node)
|
|
- Milvus (Single Node)
|
|
- ChromaDB (Hosted and Single Node)
|
|
- Qdrant (Hosted and Single Node)
|
|
- Weaviate (Hosted)
|
|
- Postgres (PGVector) (Hosted and Single Node)
|
|
|
|
### Safety API
|
|
Apply safety policies to outputs at a systems level, not just model level.
|
|
|
|
**Supported Providers:**
|
|
- Llama Guard (Depends on Inference Provider)
|
|
- Prompt Guard (Single Node)
|
|
- Code Scanner (Single Node)
|
|
- AWS Bedrock (Hosted)
|
|
|
|
### Post Training API
|
|
Fine-tune models for specific use cases and domains.
|
|
|
|
**Supported Providers:**
|
|
- Meta Reference (Single Node)
|
|
- HuggingFace (Single Node)
|
|
- TorchTune (Single Node)
|
|
- NVIDIA NEMO (Hosted)
|
|
|
|
### Eval API
|
|
Generate outputs and perform scoring to evaluate system performance.
|
|
|
|
**Supported Providers:**
|
|
- Meta Reference (Single Node)
|
|
- NVIDIA NEMO (Hosted)
|
|
|
|
### Telemetry API
|
|
Collect telemetry data from the system for monitoring and observability.
|
|
|
|
**Supported Providers:**
|
|
- Meta Reference (Single Node)
|
|
|
|
### Tool Runtime API
|
|
Interact with various tools and protocols to extend LLM capabilities.
|
|
|
|
**Supported Providers:**
|
|
- Brave Search (Hosted)
|
|
- RAG Runtime (Single Node)
|
|
|
|
## API Compatibility
|
|
|
|
All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to:
|
|
- Use existing OpenAI API clients and tools
|
|
- Migrate from OpenAI to other providers seamlessly
|
|
- Maintain consistent API contracts across different environments
|
|
|
|
## Getting Started
|
|
|
|
To get started with Llama Stack APIs:
|
|
|
|
1. **Choose a Distribution**: Select a pre-configured distribution that matches your environment
|
|
2. **Configure Providers**: Set up the providers you want to use for each API
|
|
3. **Start the Server**: Launch the Llama Stack server with your configuration
|
|
4. **Use the APIs**: Make requests to the API endpoints using your preferred client
|
|
|
|
For detailed setup instructions, see our [Getting Started Guide](../getting_started/quickstart).
|
|
|
|
## Provider Details
|
|
|
|
For complete provider compatibility and setup instructions, see our [Providers Documentation](../providers/).
|
|
|
|
## API Stability
|
|
|
|
Llama Stack APIs are organized by stability level:
|
|
- **[Stable APIs](./index.mdx)** - Production-ready APIs with full support
|
|
- **[Experimental APIs](../api-experimental/)** - APIs in development with limited support
|
|
- **[Deprecated APIs](../api-deprecated/)** - Legacy APIs being phased out
|
|
|
|
## OpenAI Integration
|
|
|
|
For specific OpenAI API compatibility features, see our [OpenAI Compatibility Guide](../api-openai/).
|