mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-11 19:56:03 +00:00
- Fixed Agent Instructions overflow by adding vertical scrolling - Fixed duplicate chat content by skipping turn_complete events - Added comprehensive architecture documentation (4 files) - Added UI bug fixes documentation - Added Notion API upgrade analysis - Created documentation registry for Notion pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
222 lines
7.2 KiB
Markdown
222 lines
7.2 KiB
Markdown
# Llama Stack - Quick Reference Guide
|
|
|
|
## Key Concepts at a Glance
|
|
|
|
### The Three Pillars
|
|
1. **APIs** (`llama_stack/apis/`) - Abstract interfaces (27 total)
|
|
2. **Providers** (`llama_stack/providers/`) - Implementations (50+ total)
|
|
3. **Distributions** (`llama_stack/distributions/`) - Pre-configured bundles
|
|
|
|
### Directory Map for Quick Navigation
|
|
|
|
| Component | Location | Purpose |
|
|
|-----------|----------|---------|
|
|
| Inference API | `apis/inference/inference.py` | LLM chat, completion, embeddings |
|
|
| Agents API | `apis/agents/agents.py` | Multi-turn agent orchestration |
|
|
| Safety API | `apis/safety/safety.py` | Content filtering |
|
|
| Vector IO API | `apis/vector_io/vector_io.py` | Vector database operations |
|
|
| Core Stack | `core/stack.py` | Main orchestrator (implements all APIs) |
|
|
| Provider Resolver | `core/resolver.py` | Dependency injection & instantiation |
|
|
| Inline Inference | `providers/inline/inference/` | Local model execution |
|
|
| Remote Inference | `providers/remote/inference/` | API providers (OpenAI, Ollama, etc.) |
|
|
| CLI Entry Point | `cli/llama.py` | Command-line interface |
|
|
| Starter Distribution | `distributions/starter/` | Basic multi-provider setup |
|
|
|
|
## Common Tasks
|
|
|
|
### Understanding an API
|
|
1. Read the API definition: `llama_stack/apis/[api_name]/[api_name].py`
|
|
2. Check common types: `llama_stack/apis/common/`
|
|
3. Look at providers: `llama_stack/providers/registry/[api_name].py`
|
|
4. Examine an implementation: `llama_stack/providers/inline/[api_name]/[provider]/`
|
|
|
|
### Adding a Provider
|
|
1. Create module: `llama_stack/providers/remote/[api]/[provider_name]/`
|
|
2. Implement class extending the API protocol
|
|
3. Register in: `llama_stack/providers/registry/[api].py`
|
|
4. Add to distribution: `llama_stack/distributions/[distro]/[distro].py`
|
|
|
|
### Debugging a Request
|
|
1. Check routing: `llama_stack/core/routers/` or `routing_tables/`
|
|
2. Find provider: `llama_stack/providers/registry/[api].py`
|
|
3. Read implementation: `llama_stack/providers/[inline|remote]/[api]/[provider]/`
|
|
4. Check config: Look for `Config` class in provider module
|
|
|
|
### Running Tests
|
|
```bash
|
|
# Unit tests (fast)
|
|
uv run --group unit pytest tests/unit/
|
|
|
|
# Integration tests (with replay)
|
|
uv run --group test pytest tests/integration/ --stack-config=starter
|
|
|
|
# Re-record tests
|
|
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/
|
|
```
|
|
|
|
## Core Classes to Know
|
|
|
|
### ProviderSpec Hierarchy
|
|
```
|
|
ProviderSpec (base)
|
|
├── InlineProviderSpec (in-process)
|
|
└── RemoteProviderSpec (external services)
|
|
```
|
|
|
|
### Key Runtime Classes
|
|
- **LlamaStack** (`core/stack.py`) - Main class implementing all APIs
|
|
- **StackRunConfig** (`core/datatypes.py`) - Configuration for a stack
|
|
- **ProviderRegistry** (`core/resolver.py`) - Maps APIs to providers
|
|
|
|
### Key Data Classes
|
|
- **Provider** - Concrete provider instance with config
|
|
- **Model** - Registered model (from a provider)
|
|
- **OpenAIChatCompletion** - Response format (from Inference API)
|
|
|
|
## Configuration Files
|
|
|
|
### run.yaml Structure
|
|
```yaml
|
|
version: 2
|
|
providers:
|
|
[api_name]:
|
|
- provider_id: unique_name
|
|
provider_type: inline::name or remote::name
|
|
config: {} # Provider-specific config
|
|
default_models:
|
|
- identifier: model_id
|
|
provider_id: inference_provider_id
|
|
vector_stores_config:
|
|
default_provider_id: faiss_or_other
|
|
```
|
|
|
|
### Environment Variables
|
|
Override any config value:
|
|
```bash
|
|
INFERENCE_MODEL=llama-2-7b llama stack run starter
|
|
```
|
|
|
|
## Common File Patterns
|
|
|
|
### Inline Provider Structure
|
|
```
|
|
llama_stack/providers/inline/[api]/[provider]/
|
|
├── __init__.py # Exports adapter class
|
|
├── config.py # ConfigClass
|
|
├── [provider].py # AdapterImpl(ProtocolClass)
|
|
└── [utils].py # Helper modules
|
|
```
|
|
|
|
### Remote Provider Structure
|
|
```
|
|
llama_stack/providers/remote/[api]/[provider]/
|
|
├── __init__.py # Exports adapter class
|
|
├── config.py # ConfigClass
|
|
└── [provider].py # AdapterImpl with HTTP calls
|
|
```
|
|
|
|
### API Structure
|
|
```
|
|
llama_stack/apis/[api]/
|
|
├── __init__.py # Exports main protocol
|
|
├── [api].py # Main protocol definition
|
|
└── [supporting].py # Types and supporting classes
|
|
```
|
|
|
|
## Key Design Patterns
|
|
|
|
### Pattern 1: Auto-Routed APIs
|
|
Provider selected automatically based on resource ID
|
|
```python
|
|
# Router finds which provider has this model
|
|
await inference.post_chat_completion(model="llama-2-7b")
|
|
```
|
|
|
|
### Pattern 2: Routing Tables
|
|
Registry APIs that list/register resources
|
|
```python
|
|
# Returns merged list from all providers
|
|
await models.list_models()
|
|
|
|
# Router selects provider internally
|
|
await models.register_model(model)
|
|
```
|
|
|
|
### Pattern 3: Dependency Injection
|
|
Providers depend on other APIs
|
|
```python
|
|
class AgentProvider:
|
|
def __init__(self, inference: InferenceProvider, ...):
|
|
self.inference = inference
|
|
```
|
|
|
|
## Important Numbers
|
|
|
|
- **27 APIs** total in Llama Stack
|
|
- **30+ Inference Providers** (OpenAI, Anthropic, Groq, local, etc.)
|
|
- **10+ Vector IO Providers** (FAISS, Qdrant, ChromaDB, etc.)
|
|
- **5+ Safety Providers** (Llama Guard, Bedrock, etc.)
|
|
- **7 Built-in Distributions** (starter, starter-gpu, meta-reference-gpu, etc.)
|
|
|
|
## Quick Commands
|
|
|
|
```bash
|
|
# List all APIs
|
|
llama stack list-apis
|
|
|
|
# List all providers
|
|
llama stack list-providers [api_name]
|
|
|
|
# List distributions
|
|
llama stack list
|
|
|
|
# Show dependencies for a distribution
|
|
llama stack list-deps starter
|
|
|
|
# Start a distribution on custom port
|
|
llama stack run starter --port 8322
|
|
|
|
# Interact with running server
|
|
curl http://localhost:8321/health
|
|
```
|
|
|
|
## File Size Reference (to judge complexity)
|
|
|
|
| File | Size | Complexity |
|
|
|------|------|-----------|
|
|
| inference.py (API) | 46KB | High (30+ parameters) |
|
|
| stack.py (core) | 21KB | High (orchestration) |
|
|
| resolver.py (core) | 19KB | High (dependency resolution) |
|
|
| library_client.py (core) | 20KB | Medium (client implementation) |
|
|
| template.py (distributions) | 18KB | Medium (config generation) |
|
|
|
|
## Testing Quick Reference
|
|
|
|
### Record-Replay Testing
|
|
1. **Record**: `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...`
|
|
2. **Replay**: `pytest ...` (default, no network calls)
|
|
3. **Location**: `tests/integration/[api]/cassettes/`
|
|
4. **Format**: YAML files with request/response pairs
|
|
|
|
### Test Structure
|
|
- Unit tests: No external dependencies
|
|
- Integration tests: Use actual providers (record-replay)
|
|
- Common fixtures: `tests/unit/conftest.py`, `tests/integration/conftest.py`
|
|
|
|
## Common Debugging Tips
|
|
|
|
1. **Provider not loading?** → Check `llama_stack/providers/registry/[api].py`
|
|
2. **Config validation error?** → Check provider's `Config` class
|
|
3. **Import error?** → Verify `pip_packages` in ProviderSpec
|
|
4. **Routing not working?** → Check `llama_stack/core/routers/` or `routing_tables/`
|
|
5. **Test failing?** → Check cassettes in `tests/integration/[api]/cassettes/`
|
|
|
|
## Most Important Files for Beginners
|
|
|
|
1. `pyproject.toml` - Project metadata & entry points
|
|
2. `llama_stack/core/stack.py` - Understand the main class
|
|
3. `llama_stack/core/resolver.py` - Understand how providers are loaded
|
|
4. `llama_stack/apis/inference/inference.py` - Understand an API
|
|
5. `llama_stack/providers/registry/inference.py` - See all inference providers
|
|
6. `llama_stack/distributions/starter/starter.py` - See how distributions work
|
|
|