mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-11 19:56:03 +00:00

Antony Sallas 5ef6ccf90e fix: UI bug fixes and comprehensive architecture documentation

- Fixed Agent Instructions overflow by adding vertical scrolling
- Fixed duplicate chat content by skipping turn_complete events
- Added comprehensive architecture documentation (4 files)
- Added UI bug fixes documentation
- Added Notion API upgrade analysis
- Created documentation registry for Notion pages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

2025-10-27 13:34:02 +08:00

7.2 KiB

Raw Blame History

Llama Stack - Quick Reference Guide

Key Concepts at a Glance

The Three Pillars

APIs (llama_stack/apis/) - Abstract interfaces (27 total)
Providers (llama_stack/providers/) - Implementations (50+ total)
Distributions (llama_stack/distributions/) - Pre-configured bundles

Component	Location	Purpose
Inference API	`apis/inference/inference.py`	LLM chat, completion, embeddings
Agents API	`apis/agents/agents.py`	Multi-turn agent orchestration
Safety API	`apis/safety/safety.py`	Content filtering
Vector IO API	`apis/vector_io/vector_io.py`	Vector database operations
Core Stack	`core/stack.py`	Main orchestrator (implements all APIs)
Provider Resolver	`core/resolver.py`	Dependency injection & instantiation
Inline Inference	`providers/inline/inference/`	Local model execution
Remote Inference	`providers/remote/inference/`	API providers (OpenAI, Ollama, etc.)
CLI Entry Point	`cli/llama.py`	Command-line interface
Starter Distribution	`distributions/starter/`	Basic multi-provider setup

Common Tasks

Understanding an API

Read the API definition: llama_stack/apis/[api_name]/[api_name].py
Check common types: llama_stack/apis/common/
Look at providers: llama_stack/providers/registry/[api_name].py
Examine an implementation: llama_stack/providers/inline/[api_name]/[provider]/

Adding a Provider

Create module: llama_stack/providers/remote/[api]/[provider_name]/
Implement class extending the API protocol
Register in: llama_stack/providers/registry/[api].py
Add to distribution: llama_stack/distributions/[distro]/[distro].py

Debugging a Request

Check routing: llama_stack/core/routers/ or routing_tables/
Find provider: llama_stack/providers/registry/[api].py
Read implementation: llama_stack/providers/[inline|remote]/[api]/[provider]/
Check config: Look for Config class in provider module

Running Tests

# Unit tests (fast)
uv run --group unit pytest tests/unit/

# Integration tests (with replay)
uv run --group test pytest tests/integration/ --stack-config=starter

# Re-record tests
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/

Core Classes to Know

ProviderSpec Hierarchy

ProviderSpec (base)
├── InlineProviderSpec (in-process)
└── RemoteProviderSpec (external services)

Key Runtime Classes

LlamaStack (core/stack.py) - Main class implementing all APIs
StackRunConfig (core/datatypes.py) - Configuration for a stack
ProviderRegistry (core/resolver.py) - Maps APIs to providers

Key Data Classes

Provider - Concrete provider instance with config
Model - Registered model (from a provider)
OpenAIChatCompletion - Response format (from Inference API)

Configuration Files

run.yaml Structure

version: 2
providers:
  [api_name]:
    - provider_id: unique_name
      provider_type: inline::name or remote::name
      config: {}  # Provider-specific config
default_models:
  - identifier: model_id
    provider_id: inference_provider_id
vector_stores_config:
  default_provider_id: faiss_or_other

Environment Variables

Override any config value:

INFERENCE_MODEL=llama-2-7b llama stack run starter

Common File Patterns

Inline Provider Structure

llama_stack/providers/inline/[api]/[provider]/
├── __init__.py          # Exports adapter class
├── config.py            # ConfigClass
├── [provider].py        # AdapterImpl(ProtocolClass)
└── [utils].py           # Helper modules

Remote Provider Structure

llama_stack/providers/remote/[api]/[provider]/
├── __init__.py          # Exports adapter class
├── config.py            # ConfigClass
└── [provider].py        # AdapterImpl with HTTP calls

API Structure

llama_stack/apis/[api]/
├── __init__.py          # Exports main protocol
├── [api].py             # Main protocol definition
└── [supporting].py      # Types and supporting classes

Key Design Patterns

Pattern 1: Auto-Routed APIs

Provider selected automatically based on resource ID

# Router finds which provider has this model
await inference.post_chat_completion(model="llama-2-7b")

Pattern 2: Routing Tables

Registry APIs that list/register resources

# Returns merged list from all providers
await models.list_models()

# Router selects provider internally
await models.register_model(model)

Pattern 3: Dependency Injection

Providers depend on other APIs

class AgentProvider:
    def __init__(self, inference: InferenceProvider, ...):
        self.inference = inference

Important Numbers

27 APIs total in Llama Stack
30+ Inference Providers (OpenAI, Anthropic, Groq, local, etc.)
10+ Vector IO Providers (FAISS, Qdrant, ChromaDB, etc.)
5+ Safety Providers (Llama Guard, Bedrock, etc.)
7 Built-in Distributions (starter, starter-gpu, meta-reference-gpu, etc.)

Quick Commands

# List all APIs
llama stack list-apis

# List all providers
llama stack list-providers [api_name]

# List distributions
llama stack list

# Show dependencies for a distribution
llama stack list-deps starter

# Start a distribution on custom port
llama stack run starter --port 8322

# Interact with running server
curl http://localhost:8321/health

File Size Reference (to judge complexity)

File	Size	Complexity
inference.py (API)	46KB	High (30+ parameters)
stack.py (core)	21KB	High (orchestration)
resolver.py (core)	19KB	High (dependency resolution)
library_client.py (core)	20KB	Medium (client implementation)
template.py (distributions)	18KB	Medium (config generation)

Testing Quick Reference

Record-Replay Testing

Record: LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...
Replay: pytest ... (default, no network calls)
Location: tests/integration/[api]/cassettes/
Format: YAML files with request/response pairs

Test Structure

Unit tests: No external dependencies
Integration tests: Use actual providers (record-replay)
Common fixtures: tests/unit/conftest.py, tests/integration/conftest.py

Common Debugging Tips

Provider not loading? → Check llama_stack/providers/registry/[api].py
Config validation error? → Check provider's Config class
Import error? → Verify pip_packages in ProviderSpec
Routing not working? → Check llama_stack/core/routers/ or routing_tables/
Test failing? → Check cassettes in tests/integration/[api]/cassettes/

Most Important Files for Beginners

pyproject.toml - Project metadata & entry points
llama_stack/core/stack.py - Understand the main class
llama_stack/core/resolver.py - Understand how providers are loaded
llama_stack/apis/inference/inference.py - Understand an API
llama_stack/providers/registry/inference.py - See all inference providers
llama_stack/distributions/starter/starter.py - See how distributions work

7.2 KiB Raw Blame History