forked from phoenix-oss/llama-stack-mirror

Dinesh Yeduguru 7df40da5fa

# What does this PR do?

README has some new content that is being synced to index.md

2025-01-23 12:43:09 -08:00

2.6 KiB

Raw Blame History

Why Llama Stack?

Building production AI applications today requires solving multiple challenges:

Infrastructure Complexity

Running large language models efficiently requires specialized infrastructure.
Different deployment scenarios (local development, cloud, edge) need different solutions.
Moving from development to production often requires significant rework.

Essential Capabilities

Safety guardrails and content filtering are necessary in an enterprise setting.
Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
Nearly any application needs composable multi-step workflows.
Finally, without monitoring, observability and evaluation, you end up operating in the dark.

Lack of Flexibility and Choice

Directly integrating with multiple providers creates tight coupling.
Different providers have different APIs and abstractions.
Changing providers requires significant code changes.

Our Solution: A Universal Stack

:alt: Llama Stack
:width: 400px

Llama Stack addresses these challenges through a service-oriented, API-first approach:

Develop Anywhere, Deploy Everywhere

Start locally with CPU-only setups
Move to GPU acceleration when needed
Deploy to cloud or edge without code changes
Same APIs and developer experience everywhere

Production-Ready Building Blocks

Pre-built safety guardrails and content filtering
Built-in RAG and agent capabilities
Comprehensive evaluation toolkit
Full observability and monitoring

True Provider Independence

Swap providers without application changes
Mix and match best-in-class implementations
Federation and fallback support
No vendor lock-in

Robust Ecosystem -Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies). -Ecosystem offers tailored infrastructure, software, and services for deploying Llama models.

Our Philosophy

Service-Oriented: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
Composability: Every component is independent but works together seamlessly
Production Ready: Built for real-world applications, not just demos
Turnkey Solutions: Easy to deploy built in solutions for popular deployment scenarios
Llama First: Explicit focus on Meta's Llama models and partnering ecosystem

With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.

2.6 KiB Raw Blame History

Why Llama Stack?

Our Solution: A Universal Stack

Our Philosophy

2.6 KiB

Raw Blame History