llama-stack-mirror/QUICK_REFERENCE.md
Antony Sallas 5ef6ccf90e fix: UI bug fixes and comprehensive architecture documentation
- Fixed Agent Instructions overflow by adding vertical scrolling
- Fixed duplicate chat content by skipping turn_complete events
- Added comprehensive architecture documentation (4 files)
- Added UI bug fixes documentation
- Added Notion API upgrade analysis
- Created documentation registry for Notion pages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
2025-10-27 13:34:02 +08:00

222 lines
7.2 KiB
Markdown

# Llama Stack - Quick Reference Guide
## Key Concepts at a Glance
### The Three Pillars
1. **APIs** (`llama_stack/apis/`) - Abstract interfaces (27 total)
2. **Providers** (`llama_stack/providers/`) - Implementations (50+ total)
3. **Distributions** (`llama_stack/distributions/`) - Pre-configured bundles
### Directory Map for Quick Navigation
| Component | Location | Purpose |
|-----------|----------|---------|
| Inference API | `apis/inference/inference.py` | LLM chat, completion, embeddings |
| Agents API | `apis/agents/agents.py` | Multi-turn agent orchestration |
| Safety API | `apis/safety/safety.py` | Content filtering |
| Vector IO API | `apis/vector_io/vector_io.py` | Vector database operations |
| Core Stack | `core/stack.py` | Main orchestrator (implements all APIs) |
| Provider Resolver | `core/resolver.py` | Dependency injection & instantiation |
| Inline Inference | `providers/inline/inference/` | Local model execution |
| Remote Inference | `providers/remote/inference/` | API providers (OpenAI, Ollama, etc.) |
| CLI Entry Point | `cli/llama.py` | Command-line interface |
| Starter Distribution | `distributions/starter/` | Basic multi-provider setup |
## Common Tasks
### Understanding an API
1. Read the API definition: `llama_stack/apis/[api_name]/[api_name].py`
2. Check common types: `llama_stack/apis/common/`
3. Look at providers: `llama_stack/providers/registry/[api_name].py`
4. Examine an implementation: `llama_stack/providers/inline/[api_name]/[provider]/`
### Adding a Provider
1. Create module: `llama_stack/providers/remote/[api]/[provider_name]/`
2. Implement class extending the API protocol
3. Register in: `llama_stack/providers/registry/[api].py`
4. Add to distribution: `llama_stack/distributions/[distro]/[distro].py`
### Debugging a Request
1. Check routing: `llama_stack/core/routers/` or `routing_tables/`
2. Find provider: `llama_stack/providers/registry/[api].py`
3. Read implementation: `llama_stack/providers/[inline|remote]/[api]/[provider]/`
4. Check config: Look for `Config` class in provider module
### Running Tests
```bash
# Unit tests (fast)
uv run --group unit pytest tests/unit/
# Integration tests (with replay)
uv run --group test pytest tests/integration/ --stack-config=starter
# Re-record tests
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/
```
## Core Classes to Know
### ProviderSpec Hierarchy
```
ProviderSpec (base)
├── InlineProviderSpec (in-process)
└── RemoteProviderSpec (external services)
```
### Key Runtime Classes
- **LlamaStack** (`core/stack.py`) - Main class implementing all APIs
- **StackRunConfig** (`core/datatypes.py`) - Configuration for a stack
- **ProviderRegistry** (`core/resolver.py`) - Maps APIs to providers
### Key Data Classes
- **Provider** - Concrete provider instance with config
- **Model** - Registered model (from a provider)
- **OpenAIChatCompletion** - Response format (from Inference API)
## Configuration Files
### run.yaml Structure
```yaml
version: 2
providers:
[api_name]:
- provider_id: unique_name
provider_type: inline::name or remote::name
config: {} # Provider-specific config
default_models:
- identifier: model_id
provider_id: inference_provider_id
vector_stores_config:
default_provider_id: faiss_or_other
```
### Environment Variables
Override any config value:
```bash
INFERENCE_MODEL=llama-2-7b llama stack run starter
```
## Common File Patterns
### Inline Provider Structure
```
llama_stack/providers/inline/[api]/[provider]/
├── __init__.py # Exports adapter class
├── config.py # ConfigClass
├── [provider].py # AdapterImpl(ProtocolClass)
└── [utils].py # Helper modules
```
### Remote Provider Structure
```
llama_stack/providers/remote/[api]/[provider]/
├── __init__.py # Exports adapter class
├── config.py # ConfigClass
└── [provider].py # AdapterImpl with HTTP calls
```
### API Structure
```
llama_stack/apis/[api]/
├── __init__.py # Exports main protocol
├── [api].py # Main protocol definition
└── [supporting].py # Types and supporting classes
```
## Key Design Patterns
### Pattern 1: Auto-Routed APIs
Provider selected automatically based on resource ID
```python
# Router finds which provider has this model
await inference.post_chat_completion(model="llama-2-7b")
```
### Pattern 2: Routing Tables
Registry APIs that list/register resources
```python
# Returns merged list from all providers
await models.list_models()
# Router selects provider internally
await models.register_model(model)
```
### Pattern 3: Dependency Injection
Providers depend on other APIs
```python
class AgentProvider:
def __init__(self, inference: InferenceProvider, ...):
self.inference = inference
```
## Important Numbers
- **27 APIs** total in Llama Stack
- **30+ Inference Providers** (OpenAI, Anthropic, Groq, local, etc.)
- **10+ Vector IO Providers** (FAISS, Qdrant, ChromaDB, etc.)
- **5+ Safety Providers** (Llama Guard, Bedrock, etc.)
- **7 Built-in Distributions** (starter, starter-gpu, meta-reference-gpu, etc.)
## Quick Commands
```bash
# List all APIs
llama stack list-apis
# List all providers
llama stack list-providers [api_name]
# List distributions
llama stack list
# Show dependencies for a distribution
llama stack list-deps starter
# Start a distribution on custom port
llama stack run starter --port 8322
# Interact with running server
curl http://localhost:8321/health
```
## File Size Reference (to judge complexity)
| File | Size | Complexity |
|------|------|-----------|
| inference.py (API) | 46KB | High (30+ parameters) |
| stack.py (core) | 21KB | High (orchestration) |
| resolver.py (core) | 19KB | High (dependency resolution) |
| library_client.py (core) | 20KB | Medium (client implementation) |
| template.py (distributions) | 18KB | Medium (config generation) |
## Testing Quick Reference
### Record-Replay Testing
1. **Record**: `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...`
2. **Replay**: `pytest ...` (default, no network calls)
3. **Location**: `tests/integration/[api]/cassettes/`
4. **Format**: YAML files with request/response pairs
### Test Structure
- Unit tests: No external dependencies
- Integration tests: Use actual providers (record-replay)
- Common fixtures: `tests/unit/conftest.py`, `tests/integration/conftest.py`
## Common Debugging Tips
1. **Provider not loading?** → Check `llama_stack/providers/registry/[api].py`
2. **Config validation error?** → Check provider's `Config` class
3. **Import error?** → Verify `pip_packages` in ProviderSpec
4. **Routing not working?** → Check `llama_stack/core/routers/` or `routing_tables/`
5. **Test failing?** → Check cassettes in `tests/integration/[api]/cassettes/`
## Most Important Files for Beginners
1. `pyproject.toml` - Project metadata & entry points
2. `llama_stack/core/stack.py` - Understand the main class
3. `llama_stack/core/resolver.py` - Understand how providers are loaded
4. `llama_stack/apis/inference/inference.py` - Understand an API
5. `llama_stack/providers/registry/inference.py` - See all inference providers
6. `llama_stack/distributions/starter/starter.py` - See how distributions work