mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
Update Documentation (#838)
# What does this PR do? Update README and other documentation ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.
This commit is contained in:
parent
6c205e1d5a
commit
65f07c3d63
5 changed files with 146 additions and 183 deletions
|
@ -19,77 +19,41 @@ Building production AI applications today requires solving multiple challenges:
|
|||
- Changing providers requires significant code changes.
|
||||
|
||||
|
||||
### The Vision: A Universal Stack
|
||||
|
||||
### Our Solution: A Universal Stack
|
||||
|
||||
```{image} ../../_static/llama-stack.png
|
||||
:alt: Llama Stack
|
||||
:width: 400px
|
||||
```
|
||||
|
||||
Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. These building blocks are presented as interoperable APIs with a broad set of Service Providers providing their implementations.
|
||||
Llama Stack addresses these challenges through a service-oriented, API-first approach:
|
||||
|
||||
#### Service-oriented Design
|
||||
Unlike other frameworks, Llama Stack is built with a service-oriented, REST API-first approach. Such a design not only allows for seamless transitions from local to remote deployments but also forces the design to be more declarative. This restriction can result in a much simpler, robust developer experience. The same code works across different environments:
|
||||
**Develop Anywhere, Deploy Everywhere**
|
||||
- Start locally with CPU-only setups
|
||||
- Move to GPU acceleration when needed
|
||||
- Deploy to cloud or edge without code changes
|
||||
- Same APIs and developer experience everywhere
|
||||
|
||||
- Local development with CPU-only setups
|
||||
- Self-hosted with GPU acceleration
|
||||
- Cloud-hosted on providers like AWS, Fireworks, Together
|
||||
- On-device for iOS and Android
|
||||
|
||||
|
||||
#### Composability
|
||||
The APIs we design are composable. An Agent abstractly depends on { Inference, Memory, Safety } APIs but does not care about the actual implementation details. Safety itself may require model inference and hence can depend on the Inference API.
|
||||
|
||||
#### Turnkey Solutions
|
||||
|
||||
We provide turnkey solutions for popular deployment scenarios. It should be easy to deploy a Llama Stack server on AWS or in a private data center. Either of these should allow a developer to get started with powerful agentic apps, model evaluations, or fine-tuning services in minutes.
|
||||
|
||||
We have built-in support for critical needs:
|
||||
|
||||
- Safety guardrails and content filtering
|
||||
- Comprehensive evaluation capabilities
|
||||
**Production-Ready Building Blocks**
|
||||
- Pre-built safety guardrails and content filtering
|
||||
- Built-in RAG and agent capabilities
|
||||
- Comprehensive evaluation toolkit
|
||||
- Full observability and monitoring
|
||||
- Provider federation and fallback
|
||||
|
||||
#### Focus on Llama Models
|
||||
As a Meta-initiated project, we explicitly focus on Meta's Llama series of models. Supporting the broad set of open models is no easy task and we want to start with models we understand best.
|
||||
|
||||
#### Supporting the Ecosystem
|
||||
There is a vibrant ecosystem of Providers which provide efficient inference or scalable vector stores or powerful observability solutions. We want to make sure it is easy for developers to pick and choose the best implementations for their use cases. We also want to make sure it is easy for new Providers to onboard and participate in the ecosystem.
|
||||
|
||||
Additionally, we have designed every element of the Stack such that APIs as well as Resources (like Models) can be federated.
|
||||
|
||||
#### Rich Provider Ecosystem
|
||||
|
||||
```{list-table}
|
||||
:header-rows: 1
|
||||
|
||||
* - Provider
|
||||
- Local
|
||||
- Self-hosted
|
||||
- Cloud
|
||||
* - Inference
|
||||
- Ollama
|
||||
- vLLM, TGI
|
||||
- Fireworks, Together, AWS
|
||||
* - Memory
|
||||
- FAISS
|
||||
- Chroma, pgvector
|
||||
- Weaviate
|
||||
* - Safety
|
||||
- Llama Guard
|
||||
- -
|
||||
- AWS Bedrock
|
||||
```
|
||||
**True Provider Independence**
|
||||
- Swap providers without application changes
|
||||
- Mix and match best-in-class implementations
|
||||
- Federation and fallback support
|
||||
- No vendor lock-in
|
||||
|
||||
|
||||
### Unified API Layer
|
||||
### Our Philosophy
|
||||
|
||||
Llama Stack provides a consistent interface for:
|
||||
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
|
||||
- **Composability**: Every component is independent but works together seamlessly
|
||||
- **Production Ready**: Built for real-world applications, not just demos
|
||||
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
|
||||
- **Llama First**: Explicit focus on Meta's Llama models and partnering ecosystem
|
||||
|
||||
- **Inference**: Run LLM models efficiently
|
||||
- **Safety**: Apply content filtering and safety policies
|
||||
- **Memory**: Store and retrieve knowledge for RAG
|
||||
- **Agents**: Build multi-step workflows
|
||||
- **Evaluation**: Test and improve application quality
|
||||
|
||||
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue