llama-stack/docs/source/index.md
raghotham ed58a94b30
docs: fixes to quick start (#1943)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-04-11 13:41:23 -07:00

5.3 KiB

Llama Stack

Welcome to Llama Stack, the open-source framework for building generative AI applications.

:class: tip

Check out [Getting Started with Llama 4](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started_llama4.ipynb)
:class: tip

Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details.

What is Llama Stack?

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides

  • Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Plugin architecture to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile.
  • Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment
  • Multiple developer interfaces like CLI and SDKs for Python, Node, iOS, and Android
  • Standalone applications as examples for how to build production-grade AI applications with Llama Stack
:alt: Llama Stack
:width: 400px

Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.

How does Llama Stack work?

Llama Stack consists of a server (with multiple pluggable API providers) and Client SDKs (see below) meant to be used in your applications. The server can be run in a variety of environments, including local (inline) development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and Kotlin.

Client SDKs

We have a number of client-side SDKs available for different languages.

Language Client SDK Package
Python llama-stack-client-python PyPI version
Swift llama-stack-client-swift Swift Package Index
Node llama-stack-client-node NPM version
Kotlin llama-stack-client-kotlin Maven version

Supported Llama Stack Implementations

A number of "adapters" are available for some popular Inference and Vector Store providers. For other APIs (particularly Safety and Agents), we provide reference implementations you can use to get started. We expect this list to grow over time. We are slowly onboarding more providers to the ecosystem as we get more confidence in the APIs.

Inference API

Provider Environments
Meta Reference Single Node
Ollama Single Node
Fireworks Hosted
Together Hosted
NVIDIA NIM Hosted and Single Node
vLLM Hosted and Single Node
TGI Hosted and Single Node
AWS Bedrock Hosted
Cerebras Hosted
Groq Hosted
SambaNova Hosted
PyTorch ExecuTorch On-device iOS, Android
OpenAI Hosted
Anthropic Hosted
Gemini Hosted

Vector IO API

Provider Environments
FAISS Single Node
SQLite-Vec Single Node
Chroma Hosted and Single Node
Milvus Hosted and Single Node
Postgres (PGVector) Hosted and Single Node
Weaviate Hosted

Safety API

Provider Environments
Llama Guard Depends on Inference Provider
Prompt Guard Single Node
Code Scanner Single Node
AWS Bedrock Hosted
:hidden:
:maxdepth: 3

self
getting_started/index
getting_started/detailed_tutorial
introduction/index
concepts/index
providers/index
distributions/index
building_applications/index
playground/index
contributing/index
references/index