mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-27 18:50:41 +00:00

# What does this PR do?

Update README and other documentation


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

2025-01-22 20:38:52 -08:00

2.3 KiB

Raw Blame History

Why Llama Stack?

Building production AI applications today requires solving multiple challenges:

Infrastructure Complexity

Running large language models efficiently requires specialized infrastructure.
Different deployment scenarios (local development, cloud, edge) need different solutions.
Moving from development to production often requires significant rework.

Essential Capabilities

Safety guardrails and content filtering are necessary in an enterprise setting.
Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
Nearly any application needs composable multi-step workflows.
Finally, without monitoring, observability and evaluation, you end up operating in the dark.

Lack of Flexibility and Choice

Directly integrating with multiple providers creates tight coupling.
Different providers have different APIs and abstractions.
Changing providers requires significant code changes.

Our Solution: A Universal Stack

:alt: Llama Stack
:width: 400px

Llama Stack addresses these challenges through a service-oriented, API-first approach:

Develop Anywhere, Deploy Everywhere

Start locally with CPU-only setups
Move to GPU acceleration when needed
Deploy to cloud or edge without code changes
Same APIs and developer experience everywhere

Production-Ready Building Blocks

Pre-built safety guardrails and content filtering
Built-in RAG and agent capabilities
Comprehensive evaluation toolkit
Full observability and monitoring

True Provider Independence

Swap providers without application changes
Mix and match best-in-class implementations
Federation and fallback support
No vendor lock-in

Our Philosophy

Service-Oriented: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
Composability: Every component is independent but works together seamlessly
Production Ready: Built for real-world applications, not just demos
Turnkey Solutions: Easy to deploy built in solutions for popular deployment scenarios
Llama First: Explicit focus on Meta's Llama models and partnering ecosystem

With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.

2.3 KiB Raw Blame History

Why Llama Stack?

Our Solution: A Universal Stack

Our Philosophy

2.3 KiB

Raw Blame History