mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-18 19:02:30 +00:00
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do? Reorganizes the Llama stack webpage into more concise index pages, introduce more of a workflow, and reduce repetition of content. New nav structure so far based on #2637 Further discussions in https://github.com/meta-llama/llama-stack/discussions/2585 **Preview:**  You can also build a full local preview locally **Feedback** Looking for feedback on page titles and general feedback on the new structure **Follow up documentation** I plan on reducing some sections and standardizing some terminology in a follow up PR. More discussions on that in https://github.com/meta-llama/llama-stack/discussions/2585
70 lines
No EOL
2.7 KiB
Markdown
70 lines
No EOL
2.7 KiB
Markdown
## Llama Stack architecture
|
|
|
|
Llama Stack allows you to build different layers of distributions for your AI workloads using various SDKs and API providers.
|
|
|
|
```{image} ../../_static/llama-stack.png
|
|
:alt: Llama Stack
|
|
:width: 400px
|
|
```
|
|
|
|
### Benefits of Llama stack
|
|
|
|
#### Current challenges in custom AI applications
|
|
|
|
Building production AI applications today requires solving multiple challenges:
|
|
|
|
Infrastructure Complexity
|
|
|
|
- Running large language models efficiently requires specialized infrastructure.
|
|
- Different deployment scenarios (local development, cloud, edge) need different solutions.
|
|
- Moving from development to production often requires significant rework.
|
|
|
|
**Essential Capabilities**
|
|
|
|
- Safety guardrails and content filtering are necessary in an enterprise setting.
|
|
- Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
|
|
- Nearly any application needs composable multi-step workflows.
|
|
- Without monitoring, observability and evaluation, you end up operating in the dark.
|
|
|
|
**Lack of Flexibility and Choice**
|
|
|
|
- Directly integrating with multiple providers creates tight coupling.
|
|
- Different providers have different APIs and abstractions.
|
|
- Changing providers requires significant code changes.
|
|
|
|
#### Our Solution: A Universal Stack
|
|
|
|
Llama Stack addresses these challenges through a service-oriented, API-first approach:
|
|
|
|
**Develop Anywhere, Deploy Everywhere**
|
|
- Start locally with CPU-only setups
|
|
- Move to GPU acceleration when needed
|
|
- Deploy to cloud or edge without code changes
|
|
- Same APIs and developer experience everywhere
|
|
|
|
**Production-Ready Building Blocks**
|
|
- Pre-built safety guardrails and content filtering
|
|
- Built-in RAG and agent capabilities
|
|
- Comprehensive evaluation toolkit
|
|
- Full observability and monitoring
|
|
|
|
**True Provider Independence**
|
|
- Swap providers without application changes
|
|
- Mix and match best-in-class implementations
|
|
- Federation and fallback support
|
|
- No vendor lock-in
|
|
|
|
**Robust Ecosystem**
|
|
- Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies).
|
|
- Ecosystem offers tailored infrastructure, software, and services for deploying a variety of models.
|
|
|
|
|
|
### Our Philosophy
|
|
|
|
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
|
|
- **Composability**: Every component is independent but works together seamlessly
|
|
- **Production Ready**: Built for real-world applications, not just demos
|
|
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
|
|
|
|
|
|
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations. |