mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-11 19:56:03 +00:00

Antony Sallas 5ef6ccf90e fix: UI bug fixes and comprehensive architecture documentation

- Fixed Agent Instructions overflow by adding vertical scrolling
- Fixed duplicate chat content by skipping turn_complete events
- Added comprehensive architecture documentation (4 files)
- Added UI bug fixes documentation
- Added Notion API upgrade analysis
- Created documentation registry for Notion pages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

2025-10-27 13:34:02 +08:00

8.1 KiB

Raw Blame History

Llama Stack Architecture - Documentation Index

This directory contains comprehensive architecture documentation for the Llama Stack codebase. These documents were created through thorough exploration of the entire codebase and are designed to help developers understand the "big picture" without needing to read code snippets from dozens of files.

Documentation Files

1. ARCHITECTURE_SUMMARY.md (30 KB)

Comprehensive technical reference covering all major components

Start here for a complete overview. Covers:

Core architecture philosophy and design patterns
Complete directory structure with descriptions
All 27 APIs with their purposes and providers
Provider system (inline vs remote)
Core runtime and resolution process
Request routing mechanisms
Distribution system
CLI architecture
Testing architecture with record-replay
Storage and telemetry systems
Key files to understand

Best for: Getting the complete picture, reference material

2. ARCHITECTURE_INSIGHTS.md (16 KB)

Strategic insights into why design decisions were made

Explains the reasoning and elegance of the architecture. Covers:

Why this architecture works (problems it solves)
The genius of the plugin system
Request routing intelligence
Configuration as a weapon
Distributions strategy
Testing genius (record-replay)
Core runtime elegance
Dependency injection
Client duality (library vs server)
Extension points
Performance implications
Security considerations
Maturity indicators
Key architectural decisions
Learning path for contributors

Best for: Understanding design philosophy, decision-making context

3. QUICK_REFERENCE.md (7.2 KB)

Cheat sheet and quick lookup guide

Fast reference for developers working on the codebase. Covers:

Key concepts at a glance
Directory map for navigation
Common task procedures
Core classes to know
Configuration file structures
Common file patterns
Key design patterns
Important numbers
Quick commands
File size reference
Testing quick reference
Common debugging tips
Most important files for beginners

Best for: Quick lookup, developers working on code

How to Use These Documents

For New Team Members

Start with ARCHITECTURE_SUMMARY.md (20 min read)
Read ARCHITECTURE_INSIGHTS.md (15 min read)
Bookmark QUICK_REFERENCE.md for later
Start exploring code using provided file paths

For Understanding a Specific Component

Search QUICK_REFERENCE.md for the component name
Get the file path from ARCHITECTURE_SUMMARY.md
Understand the context from ARCHITECTURE_INSIGHTS.md
Read the source code

For Adding a New Feature

Identify which layer(s) you're modifying (API, Provider, Distribution)
Check ARCHITECTURE_SUMMARY.md for similar components
Look at existing examples in the codebase
Use QUICK_REFERENCE.md for implementation patterns
Follow the extension points in ARCHITECTURE_INSIGHTS.md

For Debugging Issues

Use QUICK_REFERENCE.md's debugging tips section
Find the routing mechanism in ARCHITECTURE_SUMMARY.md
Trace through provider registration in ARCHITECTURE_SUMMARY.md
Check the request flow diagram

Key Takeaways

The Three Pillars

APIs (llama_stack/apis/) - Abstract interfaces (27 total)
Providers (llama_stack/providers/) - Implementations (50+ total)
Distributions (llama_stack/distributions/) - Pre-configured bundles

The Architecture Philosophy

Separation of Concerns - Clear boundaries between APIs, Providers, and Distributions
Plugin System - Dynamically load providers based on configuration
Configuration-Driven - YAML-based configuration enables flexibility
Smart Routing - Automatic request routing to appropriate providers
Two Client Modes - Library (in-process) or Server (HTTP)

The Testing Revolution

Record-Replay Pattern - Record real API calls once, replay thousands of times
Cost Effective - Save money on API calls during development
Fast - Cached responses = instant test execution
Provider Agnostic - Same test works with multiple providers

The Extension Strategy

Add custom providers by:

Creating a module in providers/[inline|remote]/[api]/[provider]/
Registering in providers/registry/[api].py
Using in distribution YAML

No framework customization needed!

Important Statistics

27 APIs covering all major AI operations
50+ Providers across inline and remote implementations
7 Built-in Distributions for different scenarios
Python 3.12+ required
100% Async - Built on asyncio throughout
Pydantic - For type validation and configuration
FastAPI - For HTTP server implementation
OpenTelemetry - For observability

Most Important Files

These files are the foundation - understanding them gives 80% of the architecture knowledge:

/home/asallas/workarea/projects/personal/llama-stack/llama_stack/core/stack.py - Main orchestrator
/home/asallas/workarea/projects/personal/llama-stack/llama_stack/core/resolver.py - Dependency resolution
/home/asallas/workarea/projects/personal/llama-stack/llama_stack/apis/inference/inference.py - Example API
/home/asallas/workarea/projects/personal/llama-stack/llama_stack/providers/datatypes.py - Provider specs
/home/asallas/workarea/projects/personal/llama-stack/llama_stack/distributions/template.py - Distribution base

I want to understand how requests are routed

ARCHITECTURE_SUMMARY.md → Section 6 "Request Routing"
ARCHITECTURE_INSIGHTS.md → Section "The Request Routing Intelligence"
Check: llama_stack/core/routers/ and llama_stack/core/routing_tables/

I want to add a new provider

QUICK_REFERENCE.md → "Adding a Provider"
ARCHITECTURE_SUMMARY.md → Section 4 "Provider System"
Look at existing providers in llama_stack/providers/[inline|remote]/

I want to understand the testing strategy

ARCHITECTURE_SUMMARY.md → Section 9 "Testing Architecture"
ARCHITECTURE_INSIGHTS.md → Section "The Testing Genius"
Check: tests/README.md for detailed testing guide

I want to understand distributions

ARCHITECTURE_SUMMARY.md → Section 7 "Distributions"
ARCHITECTURE_INSIGHTS.md → Section "The Distributions Strategy"
Look at: llama_stack/distributions/starter/starter.py

I want to understand the CLI

ARCHITECTURE_SUMMARY.md → Section 8 "CLI Architecture"
QUICK_REFERENCE.md → "Quick Commands"
Look at: llama_stack/cli/stack/run.py

I want to understand configuration

ARCHITECTURE_SUMMARY.md → Section 11 "Configuration Management"
QUICK_REFERENCE.md → "Configuration Files"
Look at: llama_stack/core/utils/config_resolution.py

Documentation Creation

These documents were created through:

Directory exploration - Understanding the codebase structure
File analysis - Reading key files across all components
Pattern identification - Recognizing common architectural patterns
Relationship mapping - Understanding how components interact
Testing analysis - Understanding test architecture and patterns

All information comes directly from the codebase, with specific file paths provided for verification and deeper exploration.

Staying Current

These documents reflect the codebase as of October 27, 2025. When the codebase changes:

Check if changes are in the identified key files
If in existing components, documents are still largely accurate
If entirely new components, documents should be updated
The architecture philosophy should remain constant

Questions?

When exploring the codebase with these documents:

Start with the QUICK_REFERENCE.md for fast lookup
Use ARCHITECTURE_SUMMARY.md for detailed information
Consult ARCHITECTURE_INSIGHTS.md for design rationale
Always verify with actual source code files

The documentation is comprehensive but code is the source of truth.

Created: October 27, 2025
Codebase Analyzed: /home/asallas/workarea/projects/personal/llama-stack/
Focus: Comprehensive architecture overview for developer understanding

8.1 KiB Raw Blame History