mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-11 19:56:03 +00:00
- Fixed Agent Instructions overflow by adding vertical scrolling - Fixed duplicate chat content by skipping turn_complete events - Added comprehensive architecture documentation (4 files) - Added UI bug fixes documentation - Added Notion API upgrade analysis - Created documentation registry for Notion pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
7.2 KiB
7.2 KiB
Llama Stack - Quick Reference Guide
Key Concepts at a Glance
The Three Pillars
- APIs (
llama_stack/apis/) - Abstract interfaces (27 total) - Providers (
llama_stack/providers/) - Implementations (50+ total) - Distributions (
llama_stack/distributions/) - Pre-configured bundles
Directory Map for Quick Navigation
| Component | Location | Purpose |
|---|---|---|
| Inference API | apis/inference/inference.py |
LLM chat, completion, embeddings |
| Agents API | apis/agents/agents.py |
Multi-turn agent orchestration |
| Safety API | apis/safety/safety.py |
Content filtering |
| Vector IO API | apis/vector_io/vector_io.py |
Vector database operations |
| Core Stack | core/stack.py |
Main orchestrator (implements all APIs) |
| Provider Resolver | core/resolver.py |
Dependency injection & instantiation |
| Inline Inference | providers/inline/inference/ |
Local model execution |
| Remote Inference | providers/remote/inference/ |
API providers (OpenAI, Ollama, etc.) |
| CLI Entry Point | cli/llama.py |
Command-line interface |
| Starter Distribution | distributions/starter/ |
Basic multi-provider setup |
Common Tasks
Understanding an API
- Read the API definition:
llama_stack/apis/[api_name]/[api_name].py - Check common types:
llama_stack/apis/common/ - Look at providers:
llama_stack/providers/registry/[api_name].py - Examine an implementation:
llama_stack/providers/inline/[api_name]/[provider]/
Adding a Provider
- Create module:
llama_stack/providers/remote/[api]/[provider_name]/ - Implement class extending the API protocol
- Register in:
llama_stack/providers/registry/[api].py - Add to distribution:
llama_stack/distributions/[distro]/[distro].py
Debugging a Request
- Check routing:
llama_stack/core/routers/orrouting_tables/ - Find provider:
llama_stack/providers/registry/[api].py - Read implementation:
llama_stack/providers/[inline|remote]/[api]/[provider]/ - Check config: Look for
Configclass in provider module
Running Tests
# Unit tests (fast)
uv run --group unit pytest tests/unit/
# Integration tests (with replay)
uv run --group test pytest tests/integration/ --stack-config=starter
# Re-record tests
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/
Core Classes to Know
ProviderSpec Hierarchy
ProviderSpec (base)
├── InlineProviderSpec (in-process)
└── RemoteProviderSpec (external services)
Key Runtime Classes
- LlamaStack (
core/stack.py) - Main class implementing all APIs - StackRunConfig (
core/datatypes.py) - Configuration for a stack - ProviderRegistry (
core/resolver.py) - Maps APIs to providers
Key Data Classes
- Provider - Concrete provider instance with config
- Model - Registered model (from a provider)
- OpenAIChatCompletion - Response format (from Inference API)
Configuration Files
run.yaml Structure
version: 2
providers:
[api_name]:
- provider_id: unique_name
provider_type: inline::name or remote::name
config: {} # Provider-specific config
default_models:
- identifier: model_id
provider_id: inference_provider_id
vector_stores_config:
default_provider_id: faiss_or_other
Environment Variables
Override any config value:
INFERENCE_MODEL=llama-2-7b llama stack run starter
Common File Patterns
Inline Provider Structure
llama_stack/providers/inline/[api]/[provider]/
├── __init__.py # Exports adapter class
├── config.py # ConfigClass
├── [provider].py # AdapterImpl(ProtocolClass)
└── [utils].py # Helper modules
Remote Provider Structure
llama_stack/providers/remote/[api]/[provider]/
├── __init__.py # Exports adapter class
├── config.py # ConfigClass
└── [provider].py # AdapterImpl with HTTP calls
API Structure
llama_stack/apis/[api]/
├── __init__.py # Exports main protocol
├── [api].py # Main protocol definition
└── [supporting].py # Types and supporting classes
Key Design Patterns
Pattern 1: Auto-Routed APIs
Provider selected automatically based on resource ID
# Router finds which provider has this model
await inference.post_chat_completion(model="llama-2-7b")
Pattern 2: Routing Tables
Registry APIs that list/register resources
# Returns merged list from all providers
await models.list_models()
# Router selects provider internally
await models.register_model(model)
Pattern 3: Dependency Injection
Providers depend on other APIs
class AgentProvider:
def __init__(self, inference: InferenceProvider, ...):
self.inference = inference
Important Numbers
- 27 APIs total in Llama Stack
- 30+ Inference Providers (OpenAI, Anthropic, Groq, local, etc.)
- 10+ Vector IO Providers (FAISS, Qdrant, ChromaDB, etc.)
- 5+ Safety Providers (Llama Guard, Bedrock, etc.)
- 7 Built-in Distributions (starter, starter-gpu, meta-reference-gpu, etc.)
Quick Commands
# List all APIs
llama stack list-apis
# List all providers
llama stack list-providers [api_name]
# List distributions
llama stack list
# Show dependencies for a distribution
llama stack list-deps starter
# Start a distribution on custom port
llama stack run starter --port 8322
# Interact with running server
curl http://localhost:8321/health
File Size Reference (to judge complexity)
| File | Size | Complexity |
|---|---|---|
| inference.py (API) | 46KB | High (30+ parameters) |
| stack.py (core) | 21KB | High (orchestration) |
| resolver.py (core) | 19KB | High (dependency resolution) |
| library_client.py (core) | 20KB | Medium (client implementation) |
| template.py (distributions) | 18KB | Medium (config generation) |
Testing Quick Reference
Record-Replay Testing
- Record:
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ... - Replay:
pytest ...(default, no network calls) - Location:
tests/integration/[api]/cassettes/ - Format: YAML files with request/response pairs
Test Structure
- Unit tests: No external dependencies
- Integration tests: Use actual providers (record-replay)
- Common fixtures:
tests/unit/conftest.py,tests/integration/conftest.py
Common Debugging Tips
- Provider not loading? → Check
llama_stack/providers/registry/[api].py - Config validation error? → Check provider's
Configclass - Import error? → Verify
pip_packagesin ProviderSpec - Routing not working? → Check
llama_stack/core/routers/orrouting_tables/ - Test failing? → Check cassettes in
tests/integration/[api]/cassettes/
Most Important Files for Beginners
pyproject.toml- Project metadata & entry pointsllama_stack/core/stack.py- Understand the main classllama_stack/core/resolver.py- Understand how providers are loadedllama_stack/apis/inference/inference.py- Understand an APIllama_stack/providers/registry/inference.py- See all inference providersllama_stack/distributions/starter/starter.py- See how distributions work