llama-stack-mirror/QUICK_REFERENCE.md

# Llama Stack - Quick Reference Guide

## Key Concepts at a Glance

### The Three Pillars
1. **APIs** (`llama_stack/apis/`) - Abstract interfaces (27 total)
2. **Providers** (`llama_stack/providers/`) - Implementations (50+ total)
3. **Distributions** (`llama_stack/distributions/`) - Pre-configured bundles

### Directory Map for Quick Navigation

| Component | Location | Purpose |
|-----------|----------|---------|
| Inference API | `apis/inference/inference.py` | LLM chat, completion, embeddings |
| Agents API | `apis/agents/agents.py` | Multi-turn agent orchestration |
| Safety API | `apis/safety/safety.py` | Content filtering |
| Vector IO API | `apis/vector_io/vector_io.py` | Vector database operations |
| Core Stack | `core/stack.py` | Main orchestrator (implements all APIs) |
| Provider Resolver | `core/resolver.py` | Dependency injection & instantiation |
| Inline Inference | `providers/inline/inference/` | Local model execution |
| Remote Inference | `providers/remote/inference/` | API providers (OpenAI, Ollama, etc.) |
| CLI Entry Point | `cli/llama.py` | Command-line interface |
| Starter Distribution | `distributions/starter/` | Basic multi-provider setup |

## Common Tasks

### Understanding an API
1. Read the API definition: `llama_stack/apis/[api_name]/[api_name].py`
2. Check common types: `llama_stack/apis/common/`
3. Look at providers: `llama_stack/providers/registry/[api_name].py`
4. Examine an implementation: `llama_stack/providers/inline/[api_name]/[provider]/`

### Adding a Provider
1. Create module: `llama_stack/providers/remote/[api]/[provider_name]/`
2. Implement class extending the API protocol
3. Register in: `llama_stack/providers/registry/[api].py`
4. Add to distribution: `llama_stack/distributions/[distro]/[distro].py`

### Debugging a Request
1. Check routing: `llama_stack/core/routers/` or `routing_tables/`
2. Find provider: `llama_stack/providers/registry/[api].py`
3. Read implementation: `llama_stack/providers/[inline|remote]/[api]/[provider]/`
4. Check config: Look for `Config` class in provider module

### Running Tests
```bash
# Unit tests (fast)
uv run --group unit pytest tests/unit/

# Integration tests (with replay)
uv run --group test pytest tests/integration/ --stack-config=starter

# Re-record tests
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/
```

## Core Classes to Know

### ProviderSpec Hierarchy
```
ProviderSpec (base)
├── InlineProviderSpec (in-process)
└── RemoteProviderSpec (external services)
```

### Key Runtime Classes
- **LlamaStack** (`core/stack.py`) - Main class implementing all APIs
- **StackRunConfig** (`core/datatypes.py`) - Configuration for a stack
- **ProviderRegistry** (`core/resolver.py`) - Maps APIs to providers

### Key Data Classes
- **Provider** - Concrete provider instance with config
- **Model** - Registered model (from a provider)
- **OpenAIChatCompletion** - Response format (from Inference API)

## Configuration Files

### run.yaml Structure
```yaml
version: 2
providers:
  [api_name]:
    - provider_id: unique_name
      provider_type: inline::name or remote::name
      config: {}  # Provider-specific config
default_models:
  - identifier: model_id
    provider_id: inference_provider_id
vector_stores_config:
  default_provider_id: faiss_or_other
```

### Environment Variables
Override any config value:
```bash
INFERENCE_MODEL=llama-2-7b llama stack run starter
```

## Common File Patterns

### Inline Provider Structure
```
llama_stack/providers/inline/[api]/[provider]/
├── __init__.py          # Exports adapter class
├── config.py            # ConfigClass
├── [provider].py        # AdapterImpl(ProtocolClass)
└── [utils].py           # Helper modules
```

### Remote Provider Structure
```
llama_stack/providers/remote/[api]/[provider]/
├── __init__.py          # Exports adapter class
├── config.py            # ConfigClass
└── [provider].py        # AdapterImpl with HTTP calls
```

### API Structure
```
llama_stack/apis/[api]/
├── __init__.py          # Exports main protocol
├── [api].py             # Main protocol definition
└── [supporting].py      # Types and supporting classes
```

## Key Design Patterns

### Pattern 1: Auto-Routed APIs
Provider selected automatically based on resource ID
```python
# Router finds which provider has this model
await inference.post_chat_completion(model="llama-2-7b")
```

### Pattern 2: Routing Tables
Registry APIs that list/register resources
```python
# Returns merged list from all providers
await models.list_models()

# Router selects provider internally
await models.register_model(model)
```

### Pattern 3: Dependency Injection
Providers depend on other APIs
```python
class AgentProvider:
    def __init__(self, inference: InferenceProvider, ...):
        self.inference = inference
```

## Important Numbers

- **27 APIs** total in Llama Stack
- **30+ Inference Providers** (OpenAI, Anthropic, Groq, local, etc.)
- **10+ Vector IO Providers** (FAISS, Qdrant, ChromaDB, etc.)
- **5+ Safety Providers** (Llama Guard, Bedrock, etc.)
- **7 Built-in Distributions** (starter, starter-gpu, meta-reference-gpu, etc.)

## Quick Commands

```bash
# List all APIs
llama stack list-apis

# List all providers
llama stack list-providers [api_name]

# List distributions
llama stack list

# Show dependencies for a distribution
llama stack list-deps starter

# Start a distribution on custom port
llama stack run starter --port 8322

# Interact with running server
curl http://localhost:8321/health
```

## File Size Reference (to judge complexity)

| File | Size | Complexity |
|------|------|-----------|
| inference.py (API) | 46KB | High (30+ parameters) |
| stack.py (core) | 21KB | High (orchestration) |
| resolver.py (core) | 19KB | High (dependency resolution) |
| library_client.py (core) | 20KB | Medium (client implementation) |
| template.py (distributions) | 18KB | Medium (config generation) |

## Testing Quick Reference

### Record-Replay Testing
1. **Record**: `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...`
2. **Replay**: `pytest ...` (default, no network calls)
3. **Location**: `tests/integration/[api]/cassettes/`
4. **Format**: YAML files with request/response pairs

### Test Structure
- Unit tests: No external dependencies
- Integration tests: Use actual providers (record-replay)
- Common fixtures: `tests/unit/conftest.py`, `tests/integration/conftest.py`

## Common Debugging Tips

1. **Provider not loading?** → Check `llama_stack/providers/registry/[api].py`
2. **Config validation error?** → Check provider's `Config` class
3. **Import error?** → Verify `pip_packages` in ProviderSpec
4. **Routing not working?** → Check `llama_stack/core/routers/` or `routing_tables/`
5. **Test failing?** → Check cassettes in `tests/integration/[api]/cassettes/`

## Most Important Files for Beginners

1. `pyproject.toml` - Project metadata & entry points
2. `llama_stack/core/stack.py` - Understand the main class
3. `llama_stack/core/resolver.py` - Understand how providers are loaded
4. `llama_stack/apis/inference/inference.py` - Understand an API
5. `llama_stack/providers/registry/inference.py` - See all inference providers
6. `llama_stack/distributions/starter/starter.py` - See how distributions work