forked from phoenix-oss/llama-stack-mirror

Francisco Arceo d495922949

docs: Updated documentation and Sphinx configuration (#1845 )

# What does this PR do?

The goal of this PR is to make the pages easier to navigate by surfacing
the child pages on the navbar, updating some of the copy, moving some of
the files around.

Some changes:
1. Clarifying Titles
2. Restructuring "Distributions" more formally in its own page to be
consistent with Providers and adding some clarity to the child pages to
surface them and make them easier to navigate
3. Updated sphinx config to not collapse navigation by default
4. Updated copyright year to be calculated dynamically 
5. Moved `docs/source/distributions/index.md` ->
`docs/source/distributions/starting_llama_stack_server.md`

Another for https://github.com/meta-llama/llama-stack/issues/1815

## Test Plan
Tested locally and pages build (screen shots for example).

## Documentation
###  Before:
![Screenshot 2025-03-31 at 1 09
21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a)

### After:
![Screenshot 2025-03-31 at 1 08
52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f)

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

2025-03-31 13:08:05 -07:00

5.2 KiB

Raw Blame History

:class: tip

Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details.

Llama Stack

What is Llama Stack?

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides

Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
Plugin architecture to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile.
Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment
Multiple developer interfaces like CLI and SDKs for Python, Node, iOS, and Android
Standalone applications as examples for how to build production-grade AI applications with Llama Stack

:alt: Llama Stack
:width: 400px

Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.

How does Llama Stack work?

Llama Stack consists of a server (with multiple pluggable API providers) and client SDKs meant to be used in your applications. The server can be run in a variety of environments, including local (inline) development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and Kotlin.

Quick Links

New to Llama Stack? Start with the Introduction to understand our motivation and vision.
Ready to build? Check out the Quick Start to get started.
Need specific providers? Browse Distributions to see all the options available.
Want to contribute? See the Contributing guide.

Available SDKs

We have a number of client-side SDKs available for different languages.

Language	Client SDK	Package
Python	llama-stack-client-python
Swift	llama-stack-client-swift
Node	llama-stack-client-node
Kotlin	llama-stack-client-kotlin

Supported Llama Stack Implementations

A number of "adapters" are available for some popular Inference and Vector Store providers. For other APIs (particularly Safety and Agents), we provide reference implementations you can use to get started. We expect this list to grow over time. We are slowly onboarding more providers to the ecosystem as we get more confidence in the APIs.

Inference API

Provider	Environments
Meta Reference	Single Node
Ollama	Single Node
Fireworks	Hosted
Together	Hosted
NVIDIA NIM	Hosted and Single Node
vLLM	Hosted and Single Node
TGI	Hosted and Single Node
AWS Bedrock	Hosted
Cerebras	Hosted
Groq	Hosted
SambaNova	Hosted
PyTorch ExecuTorch	On-device iOS, Android
OpenAI	Hosted
Anthropic	Hosted
Gemini	Hosted

Vector IO API

Provider	Environments
FAISS	Single Node
SQLite-Vec	Single Node
Chroma	Hosted and Single Node
Milvus	Hosted and Single Node
Postgres (PGVector)	Hosted and Single Node
Weaviate	Hosted

Safety API

Provider	Environments
Llama Guard	Depends on Inference Provider
Prompt Guard	Single Node
Code Scanner	Single Node
AWS Bedrock	Hosted

:hidden:
:maxdepth: 3

self
introduction/index
getting_started/index
concepts/index
providers/index
distributions/index
building_applications/index
playground/index
contributing/index
references/index

5.2 KiB Raw Blame History