- Fixed Agent Instructions overflow by adding vertical scrolling - Fixed duplicate chat content by skipping turn_complete events - Added comprehensive architecture documentation (4 files) - Added UI bug fixes documentation - Added Notion API upgrade analysis - Created documentation registry for Notion pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
8.1 KiB
Llama Stack Architecture - Documentation Index
This directory contains comprehensive architecture documentation for the Llama Stack codebase. These documents were created through thorough exploration of the entire codebase and are designed to help developers understand the "big picture" without needing to read code snippets from dozens of files.
Documentation Files
1. ARCHITECTURE_SUMMARY.md (30 KB)
Comprehensive technical reference covering all major components
Start here for a complete overview. Covers:
- Core architecture philosophy and design patterns
- Complete directory structure with descriptions
- All 27 APIs with their purposes and providers
- Provider system (inline vs remote)
- Core runtime and resolution process
- Request routing mechanisms
- Distribution system
- CLI architecture
- Testing architecture with record-replay
- Storage and telemetry systems
- Key files to understand
Best for: Getting the complete picture, reference material
2. ARCHITECTURE_INSIGHTS.md (16 KB)
Strategic insights into why design decisions were made
Explains the reasoning and elegance of the architecture. Covers:
- Why this architecture works (problems it solves)
- The genius of the plugin system
- Request routing intelligence
- Configuration as a weapon
- Distributions strategy
- Testing genius (record-replay)
- Core runtime elegance
- Dependency injection
- Client duality (library vs server)
- Extension points
- Performance implications
- Security considerations
- Maturity indicators
- Key architectural decisions
- Learning path for contributors
Best for: Understanding design philosophy, decision-making context
3. QUICK_REFERENCE.md (7.2 KB)
Cheat sheet and quick lookup guide
Fast reference for developers working on the codebase. Covers:
- Key concepts at a glance
- Directory map for navigation
- Common task procedures
- Core classes to know
- Configuration file structures
- Common file patterns
- Key design patterns
- Important numbers
- Quick commands
- File size reference
- Testing quick reference
- Common debugging tips
- Most important files for beginners
Best for: Quick lookup, developers working on code
How to Use These Documents
For New Team Members
- Start with ARCHITECTURE_SUMMARY.md (20 min read)
- Read ARCHITECTURE_INSIGHTS.md (15 min read)
- Bookmark QUICK_REFERENCE.md for later
- Start exploring code using provided file paths
For Understanding a Specific Component
- Search QUICK_REFERENCE.md for the component name
- Get the file path from ARCHITECTURE_SUMMARY.md
- Understand the context from ARCHITECTURE_INSIGHTS.md
- Read the source code
For Adding a New Feature
- Identify which layer(s) you're modifying (API, Provider, Distribution)
- Check ARCHITECTURE_SUMMARY.md for similar components
- Look at existing examples in the codebase
- Use QUICK_REFERENCE.md for implementation patterns
- Follow the extension points in ARCHITECTURE_INSIGHTS.md
For Debugging Issues
- Use QUICK_REFERENCE.md's debugging tips section
- Find the routing mechanism in ARCHITECTURE_SUMMARY.md
- Trace through provider registration in ARCHITECTURE_SUMMARY.md
- Check the request flow diagram
Key Takeaways
The Three Pillars
- APIs (
llama_stack/apis/) - Abstract interfaces (27 total) - Providers (
llama_stack/providers/) - Implementations (50+ total) - Distributions (
llama_stack/distributions/) - Pre-configured bundles
The Architecture Philosophy
- Separation of Concerns - Clear boundaries between APIs, Providers, and Distributions
- Plugin System - Dynamically load providers based on configuration
- Configuration-Driven - YAML-based configuration enables flexibility
- Smart Routing - Automatic request routing to appropriate providers
- Two Client Modes - Library (in-process) or Server (HTTP)
The Testing Revolution
- Record-Replay Pattern - Record real API calls once, replay thousands of times
- Cost Effective - Save money on API calls during development
- Fast - Cached responses = instant test execution
- Provider Agnostic - Same test works with multiple providers
The Extension Strategy
Add custom providers by:
- Creating a module in
providers/[inline|remote]/[api]/[provider]/ - Registering in
providers/registry/[api].py - Using in distribution YAML
No framework customization needed!
Important Statistics
- 27 APIs covering all major AI operations
- 50+ Providers across inline and remote implementations
- 7 Built-in Distributions for different scenarios
- Python 3.12+ required
- 100% Async - Built on asyncio throughout
- Pydantic - For type validation and configuration
- FastAPI - For HTTP server implementation
- OpenTelemetry - For observability
Most Important Files
These files are the foundation - understanding them gives 80% of the architecture knowledge:
/home/asallas/workarea/projects/personal/llama-stack/llama_stack/core/stack.py- Main orchestrator/home/asallas/workarea/projects/personal/llama-stack/llama_stack/core/resolver.py- Dependency resolution/home/asallas/workarea/projects/personal/llama-stack/llama_stack/apis/inference/inference.py- Example API/home/asallas/workarea/projects/personal/llama-stack/llama_stack/providers/datatypes.py- Provider specs/home/asallas/workarea/projects/personal/llama-stack/llama_stack/distributions/template.py- Distribution base
Quick Navigation by Use Case
I want to understand how requests are routed
- ARCHITECTURE_SUMMARY.md → Section 6 "Request Routing"
- ARCHITECTURE_INSIGHTS.md → Section "The Request Routing Intelligence"
- Check:
llama_stack/core/routers/andllama_stack/core/routing_tables/
I want to add a new provider
- QUICK_REFERENCE.md → "Adding a Provider"
- ARCHITECTURE_SUMMARY.md → Section 4 "Provider System"
- Look at existing providers in
llama_stack/providers/[inline|remote]/
I want to understand the testing strategy
- ARCHITECTURE_SUMMARY.md → Section 9 "Testing Architecture"
- ARCHITECTURE_INSIGHTS.md → Section "The Testing Genius"
- Check:
tests/README.mdfor detailed testing guide
I want to understand distributions
- ARCHITECTURE_SUMMARY.md → Section 7 "Distributions"
- ARCHITECTURE_INSIGHTS.md → Section "The Distributions Strategy"
- Look at:
llama_stack/distributions/starter/starter.py
I want to understand the CLI
- ARCHITECTURE_SUMMARY.md → Section 8 "CLI Architecture"
- QUICK_REFERENCE.md → "Quick Commands"
- Look at:
llama_stack/cli/stack/run.py
I want to understand configuration
- ARCHITECTURE_SUMMARY.md → Section 11 "Configuration Management"
- QUICK_REFERENCE.md → "Configuration Files"
- Look at:
llama_stack/core/utils/config_resolution.py
Documentation Creation
These documents were created through:
- Directory exploration - Understanding the codebase structure
- File analysis - Reading key files across all components
- Pattern identification - Recognizing common architectural patterns
- Relationship mapping - Understanding how components interact
- Testing analysis - Understanding test architecture and patterns
All information comes directly from the codebase, with specific file paths provided for verification and deeper exploration.
Staying Current
These documents reflect the codebase as of October 27, 2025. When the codebase changes:
- Check if changes are in the identified key files
- If in existing components, documents are still largely accurate
- If entirely new components, documents should be updated
- The architecture philosophy should remain constant
Questions?
When exploring the codebase with these documents:
- Start with the QUICK_REFERENCE.md for fast lookup
- Use ARCHITECTURE_SUMMARY.md for detailed information
- Consult ARCHITECTURE_INSIGHTS.md for design rationale
- Always verify with actual source code files
The documentation is comprehensive but code is the source of truth.
Created: October 27, 2025
Codebase Analyzed: /home/asallas/workarea/projects/personal/llama-stack/
Focus: Comprehensive architecture overview for developer understanding