mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-11 19:56:03 +00:00
fix: UI bug fixes and comprehensive architecture documentation
- Fixed Agent Instructions overflow by adding vertical scrolling - Fixed duplicate chat content by skipping turn_complete events - Added comprehensive architecture documentation (4 files) - Added UI bug fixes documentation - Added Notion API upgrade analysis - Created documentation registry for Notion pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
This commit is contained in:
parent
3059423cd7
commit
5ef6ccf90e
8 changed files with 2603 additions and 1 deletions
222
QUICK_REFERENCE.md
Normal file
222
QUICK_REFERENCE.md
Normal file
|
|
@ -0,0 +1,222 @@
|
|||
# Llama Stack - Quick Reference Guide
|
||||
|
||||
## Key Concepts at a Glance
|
||||
|
||||
### The Three Pillars
|
||||
1. **APIs** (`llama_stack/apis/`) - Abstract interfaces (27 total)
|
||||
2. **Providers** (`llama_stack/providers/`) - Implementations (50+ total)
|
||||
3. **Distributions** (`llama_stack/distributions/`) - Pre-configured bundles
|
||||
|
||||
### Directory Map for Quick Navigation
|
||||
|
||||
| Component | Location | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| Inference API | `apis/inference/inference.py` | LLM chat, completion, embeddings |
|
||||
| Agents API | `apis/agents/agents.py` | Multi-turn agent orchestration |
|
||||
| Safety API | `apis/safety/safety.py` | Content filtering |
|
||||
| Vector IO API | `apis/vector_io/vector_io.py` | Vector database operations |
|
||||
| Core Stack | `core/stack.py` | Main orchestrator (implements all APIs) |
|
||||
| Provider Resolver | `core/resolver.py` | Dependency injection & instantiation |
|
||||
| Inline Inference | `providers/inline/inference/` | Local model execution |
|
||||
| Remote Inference | `providers/remote/inference/` | API providers (OpenAI, Ollama, etc.) |
|
||||
| CLI Entry Point | `cli/llama.py` | Command-line interface |
|
||||
| Starter Distribution | `distributions/starter/` | Basic multi-provider setup |
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Understanding an API
|
||||
1. Read the API definition: `llama_stack/apis/[api_name]/[api_name].py`
|
||||
2. Check common types: `llama_stack/apis/common/`
|
||||
3. Look at providers: `llama_stack/providers/registry/[api_name].py`
|
||||
4. Examine an implementation: `llama_stack/providers/inline/[api_name]/[provider]/`
|
||||
|
||||
### Adding a Provider
|
||||
1. Create module: `llama_stack/providers/remote/[api]/[provider_name]/`
|
||||
2. Implement class extending the API protocol
|
||||
3. Register in: `llama_stack/providers/registry/[api].py`
|
||||
4. Add to distribution: `llama_stack/distributions/[distro]/[distro].py`
|
||||
|
||||
### Debugging a Request
|
||||
1. Check routing: `llama_stack/core/routers/` or `routing_tables/`
|
||||
2. Find provider: `llama_stack/providers/registry/[api].py`
|
||||
3. Read implementation: `llama_stack/providers/[inline|remote]/[api]/[provider]/`
|
||||
4. Check config: Look for `Config` class in provider module
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
# Unit tests (fast)
|
||||
uv run --group unit pytest tests/unit/
|
||||
|
||||
# Integration tests (with replay)
|
||||
uv run --group test pytest tests/integration/ --stack-config=starter
|
||||
|
||||
# Re-record tests
|
||||
LLAMA_STACK_TEST_INFERENCE_MODE=record uv run --group test pytest tests/integration/
|
||||
```
|
||||
|
||||
## Core Classes to Know
|
||||
|
||||
### ProviderSpec Hierarchy
|
||||
```
|
||||
ProviderSpec (base)
|
||||
├── InlineProviderSpec (in-process)
|
||||
└── RemoteProviderSpec (external services)
|
||||
```
|
||||
|
||||
### Key Runtime Classes
|
||||
- **LlamaStack** (`core/stack.py`) - Main class implementing all APIs
|
||||
- **StackRunConfig** (`core/datatypes.py`) - Configuration for a stack
|
||||
- **ProviderRegistry** (`core/resolver.py`) - Maps APIs to providers
|
||||
|
||||
### Key Data Classes
|
||||
- **Provider** - Concrete provider instance with config
|
||||
- **Model** - Registered model (from a provider)
|
||||
- **OpenAIChatCompletion** - Response format (from Inference API)
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### run.yaml Structure
|
||||
```yaml
|
||||
version: 2
|
||||
providers:
|
||||
[api_name]:
|
||||
- provider_id: unique_name
|
||||
provider_type: inline::name or remote::name
|
||||
config: {} # Provider-specific config
|
||||
default_models:
|
||||
- identifier: model_id
|
||||
provider_id: inference_provider_id
|
||||
vector_stores_config:
|
||||
default_provider_id: faiss_or_other
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
Override any config value:
|
||||
```bash
|
||||
INFERENCE_MODEL=llama-2-7b llama stack run starter
|
||||
```
|
||||
|
||||
## Common File Patterns
|
||||
|
||||
### Inline Provider Structure
|
||||
```
|
||||
llama_stack/providers/inline/[api]/[provider]/
|
||||
├── __init__.py # Exports adapter class
|
||||
├── config.py # ConfigClass
|
||||
├── [provider].py # AdapterImpl(ProtocolClass)
|
||||
└── [utils].py # Helper modules
|
||||
```
|
||||
|
||||
### Remote Provider Structure
|
||||
```
|
||||
llama_stack/providers/remote/[api]/[provider]/
|
||||
├── __init__.py # Exports adapter class
|
||||
├── config.py # ConfigClass
|
||||
└── [provider].py # AdapterImpl with HTTP calls
|
||||
```
|
||||
|
||||
### API Structure
|
||||
```
|
||||
llama_stack/apis/[api]/
|
||||
├── __init__.py # Exports main protocol
|
||||
├── [api].py # Main protocol definition
|
||||
└── [supporting].py # Types and supporting classes
|
||||
```
|
||||
|
||||
## Key Design Patterns
|
||||
|
||||
### Pattern 1: Auto-Routed APIs
|
||||
Provider selected automatically based on resource ID
|
||||
```python
|
||||
# Router finds which provider has this model
|
||||
await inference.post_chat_completion(model="llama-2-7b")
|
||||
```
|
||||
|
||||
### Pattern 2: Routing Tables
|
||||
Registry APIs that list/register resources
|
||||
```python
|
||||
# Returns merged list from all providers
|
||||
await models.list_models()
|
||||
|
||||
# Router selects provider internally
|
||||
await models.register_model(model)
|
||||
```
|
||||
|
||||
### Pattern 3: Dependency Injection
|
||||
Providers depend on other APIs
|
||||
```python
|
||||
class AgentProvider:
|
||||
def __init__(self, inference: InferenceProvider, ...):
|
||||
self.inference = inference
|
||||
```
|
||||
|
||||
## Important Numbers
|
||||
|
||||
- **27 APIs** total in Llama Stack
|
||||
- **30+ Inference Providers** (OpenAI, Anthropic, Groq, local, etc.)
|
||||
- **10+ Vector IO Providers** (FAISS, Qdrant, ChromaDB, etc.)
|
||||
- **5+ Safety Providers** (Llama Guard, Bedrock, etc.)
|
||||
- **7 Built-in Distributions** (starter, starter-gpu, meta-reference-gpu, etc.)
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# List all APIs
|
||||
llama stack list-apis
|
||||
|
||||
# List all providers
|
||||
llama stack list-providers [api_name]
|
||||
|
||||
# List distributions
|
||||
llama stack list
|
||||
|
||||
# Show dependencies for a distribution
|
||||
llama stack list-deps starter
|
||||
|
||||
# Start a distribution on custom port
|
||||
llama stack run starter --port 8322
|
||||
|
||||
# Interact with running server
|
||||
curl http://localhost:8321/health
|
||||
```
|
||||
|
||||
## File Size Reference (to judge complexity)
|
||||
|
||||
| File | Size | Complexity |
|
||||
|------|------|-----------|
|
||||
| inference.py (API) | 46KB | High (30+ parameters) |
|
||||
| stack.py (core) | 21KB | High (orchestration) |
|
||||
| resolver.py (core) | 19KB | High (dependency resolution) |
|
||||
| library_client.py (core) | 20KB | Medium (client implementation) |
|
||||
| template.py (distributions) | 18KB | Medium (config generation) |
|
||||
|
||||
## Testing Quick Reference
|
||||
|
||||
### Record-Replay Testing
|
||||
1. **Record**: `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...`
|
||||
2. **Replay**: `pytest ...` (default, no network calls)
|
||||
3. **Location**: `tests/integration/[api]/cassettes/`
|
||||
4. **Format**: YAML files with request/response pairs
|
||||
|
||||
### Test Structure
|
||||
- Unit tests: No external dependencies
|
||||
- Integration tests: Use actual providers (record-replay)
|
||||
- Common fixtures: `tests/unit/conftest.py`, `tests/integration/conftest.py`
|
||||
|
||||
## Common Debugging Tips
|
||||
|
||||
1. **Provider not loading?** → Check `llama_stack/providers/registry/[api].py`
|
||||
2. **Config validation error?** → Check provider's `Config` class
|
||||
3. **Import error?** → Verify `pip_packages` in ProviderSpec
|
||||
4. **Routing not working?** → Check `llama_stack/core/routers/` or `routing_tables/`
|
||||
5. **Test failing?** → Check cassettes in `tests/integration/[api]/cassettes/`
|
||||
|
||||
## Most Important Files for Beginners
|
||||
|
||||
1. `pyproject.toml` - Project metadata & entry points
|
||||
2. `llama_stack/core/stack.py` - Understand the main class
|
||||
3. `llama_stack/core/resolver.py` - Understand how providers are loaded
|
||||
4. `llama_stack/apis/inference/inference.py` - Understand an API
|
||||
5. `llama_stack/providers/registry/inference.py` - See all inference providers
|
||||
6. `llama_stack/distributions/starter/starter.py` - See how distributions work
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue