llama-stack-mirror/docs/source/concepts/architecture.md
Kelly Brown b096794959
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
docs: Reorganize documentation on the webpage (#2651)
# What does this PR do?
Reorganizes the Llama stack webpage into more concise index pages,
introduce more of a workflow, and reduce repetition of content.

New nav structure so far based on #2637 

Further discussions in
https://github.com/meta-llama/llama-stack/discussions/2585

**Preview:**
![Screenshot 2025-07-09 at 2 31
53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af)

You can also build a full local preview locally 

 **Feedback**
Looking for feedback on page titles and general feedback on the new
structure

**Follow up documentation**
I plan on reducing some sections and standardizing some terminology in a
follow up PR.
More discussions on that in
https://github.com/meta-llama/llama-stack/discussions/2585
2025-07-15 14:19:35 -07:00

70 lines
No EOL
2.7 KiB
Markdown

## Llama Stack architecture
Llama Stack allows you to build different layers of distributions for your AI workloads using various SDKs and API providers.
```{image} ../../_static/llama-stack.png
:alt: Llama Stack
:width: 400px
```
### Benefits of Llama stack
#### Current challenges in custom AI applications
Building production AI applications today requires solving multiple challenges:
Infrastructure Complexity
- Running large language models efficiently requires specialized infrastructure.
- Different deployment scenarios (local development, cloud, edge) need different solutions.
- Moving from development to production often requires significant rework.
**Essential Capabilities**
- Safety guardrails and content filtering are necessary in an enterprise setting.
- Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
- Nearly any application needs composable multi-step workflows.
- Without monitoring, observability and evaluation, you end up operating in the dark.
**Lack of Flexibility and Choice**
- Directly integrating with multiple providers creates tight coupling.
- Different providers have different APIs and abstractions.
- Changing providers requires significant code changes.
#### Our Solution: A Universal Stack
Llama Stack addresses these challenges through a service-oriented, API-first approach:
**Develop Anywhere, Deploy Everywhere**
- Start locally with CPU-only setups
- Move to GPU acceleration when needed
- Deploy to cloud or edge without code changes
- Same APIs and developer experience everywhere
**Production-Ready Building Blocks**
- Pre-built safety guardrails and content filtering
- Built-in RAG and agent capabilities
- Comprehensive evaluation toolkit
- Full observability and monitoring
**True Provider Independence**
- Swap providers without application changes
- Mix and match best-in-class implementations
- Federation and fallback support
- No vendor lock-in
**Robust Ecosystem**
- Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies).
- Ecosystem offers tailored infrastructure, software, and services for deploying a variety of models.
### Our Philosophy
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
- **Composability**: Every component is independent but works together seamlessly
- **Production Ready**: Built for real-world applications, not just demos
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.