docs(tests): Add a bunch of documentation for our testing systems

2025-08-15 14:08:00 +00:00 · 2025-08-13 17:37:02 -07:00 · 2025-08-13 17:37:02 -07:00 · f4281ce66a
commit f4281ce66a
parent e1e161553c
7 changed files with 1006 additions and 1 deletions
--- a/docs/source/contributing/index.md
+++ b/docs/source/contributing/index.md
@ -19,7 +19,8 @@ new_vector_database

 ## Testing

-See the [Test Page](testing.md) which describes how to test your changes.
+Llama Stack uses a record-replay testing system for reliable, cost-effective testing. See the [Testing Documentation](testing.md) for comprehensive guides on writing and running tests.
+
 ```{toctree}
 :maxdepth: 1
 :hidden:
--- a/docs/source/contributing/testing.md
+++ b/docs/source/contributing/testing.md
@ -1,3 +1,35 @@
+# Testing
+
+Llama Stack uses a record-replay system for reliable, fast, and cost-effective testing of AI applications.
+
+## Testing Documentation
+
+```{toctree}
+:maxdepth: 1
+
+testing/index
+testing/integration-testing
+testing/record-replay
+testing/writing-tests
+testing/troubleshooting
+```
+
+## Quick Start
+
+```bash
+# Run tests with existing recordings
+uv run pytest tests/integration/
+
+# Test against live APIs
+FIREWORKS_API_KEY=... pytest tests/integration/ --stack-config=server:fireworks
+```
+
+For detailed information, see the [Testing Overview](testing/index.md).
+
+---
+
+## Original Documentation
+
 ```{include} ../../../tests/README.md
 ```

--- a/docs/source/contributing/testing/index.md
+++ b/docs/source/contributing/testing/index.md
@ -0,0 +1,103 @@
+# Testing in Llama Stack
+
+Llama Stack uses a record-replay testing system to handle AI API costs, non-deterministic responses, and multiple provider integrations.
+
+## Core Problems
+
+Testing AI applications creates three challenges:
+
+- **API costs** accumulate quickly during development and CI
+- **Non-deterministic responses** make tests unreliable
+- **Multiple providers** require testing the same logic across different APIs
+
+## Solution
+
+Record real API responses once, replay them for fast, deterministic tests.
+
+## Architecture Overview
+
+### Test Types
+
+- **Unit tests** (`tests/unit/`) - Test components in isolation with mocks
+- **Integration tests** (`tests/integration/`) - Test complete workflows with record-replay
+
+### Core Components
+
+#### Record-Replay System
+
+Captures API calls and replays them deterministically:
+
+```python
+# Record real API responses
+with inference_recording(mode=InferenceMode.RECORD, storage_dir="recordings"):
+    response = await client.chat.completions.create(...)
+
+# Replay cached responses
+with inference_recording(mode=InferenceMode.REPLAY, storage_dir="recordings"):
+    response = await client.chat.completions.create(...)  # No API call made
+```
+
+#### Provider Testing
+
+Write tests once, run against any provider:
+
+```bash
+# Same test, different providers
+pytest tests/integration/inference/ --stack-config=openai --text-model=gpt-4
+pytest tests/integration/inference/ --stack-config=starter --text-model=llama3.2:3b
+```
+
+#### Test Parametrization
+
+Generate test combinations from CLI arguments:
+
+```bash
+# Creates test for each model/provider combination
+pytest tests/integration/ \
+    --stack-config=inference=fireworks \
+    --text-model=llama-3.1-8b,llama-3.1-70b
+```
+
+## How It Works
+
+### Recording Storage
+
+Recordings use SQLite for lookup and JSON for storage:
+
+```
+recordings/
+├── index.sqlite          # Fast lookup by request hash
+└── responses/
+    ├── abc123def456.json  # Individual response files
+    └── def789ghi012.json
+```
+
+### Why Record-Replay?
+
+Mocking AI APIs is brittle. Real API responses:
+- Include edge cases and realistic data structures
+- Preserve streaming behavior
+- Can be inspected and debugged
+
+### Why Test All Providers?
+
+One test verifies behavior across all providers, catching integration bugs early.
+
+## Workflow
+
+1. **Develop tests** in `LIVE` mode against real APIs
+2. **Record responses** with `RECORD` mode
+3. **Commit recordings** for deterministic CI
+4. **Tests replay** cached responses in CI
+
+## Quick Start
+
+```bash
+# Run tests with existing recordings
+uv run pytest tests/integration/
+
+# Test against live APIs
+FIREWORKS_API_KEY=... pytest tests/integration/ --stack-config=server:fireworks
+```
+
+See [Integration Testing](integration-testing.md) for usage details and [Record-Replay](record-replay.md) for system internals.
--- a/docs/source/contributing/testing/integration-testing.md
+++ b/docs/source/contributing/testing/integration-testing.md
@ -0,0 +1,136 @@
+# Integration Testing Guide
+
+Practical usage of Llama Stack's integration testing system.
+
+## Basic Usage
+
+```bash
+# Run all integration tests
+uv run pytest tests/integration/
+
+# Run specific test suites
+uv run pytest tests/integration/inference/
+uv run pytest tests/integration/agents/
+```
+
+## Live API Testing
+
+```bash
+# Auto-start server
+export FIREWORKS_API_KEY=your_key
+pytest tests/integration/inference/ \
+    --stack-config=server:fireworks \
+    --text-model=meta-llama/Llama-3.1-8B-Instruct
+
+# Library client
+export TOGETHER_API_KEY=your_key
+pytest tests/integration/inference/ \
+    --stack-config=starter \
+    --text-model=meta-llama/Llama-3.1-8B-Instruct
+```
+
+## Configuration
+
+### Stack Config
+
+```bash
+--stack-config=server:fireworks          # Auto-start server
+--stack-config=server:together:8322      # Custom port
+--stack-config=starter                   # Template
+--stack-config=/path/to/run.yaml         # Config file
+--stack-config=inference=fireworks       # Adhoc providers
+--stack-config=http://localhost:5001     # Existing server
+```
+
+### Models
+
+```bash
+--text-model=meta-llama/Llama-3.1-8B-Instruct
+--vision-model=meta-llama/Llama-3.2-11B-Vision-Instruct
+--embedding-model=sentence-transformers/all-MiniLM-L6-v2
+```
+
+### Environment
+
+```bash
+--env FIREWORKS_API_KEY=your_key
+--env OPENAI_BASE_URL=http://localhost:11434/v1
+```
+
+## Test Scenarios
+
+### New Provider Testing
+
+```bash
+# Test new provider
+pytest tests/integration/inference/ \
+    --stack-config=inference=your-new-provider \
+    --text-model=your-model-id
+```
+
+### Multiple Models
+
+```bash
+# Test multiple models
+pytest tests/integration/inference/ \
+    --text-model=llama-3.1-8b,llama-3.1-70b
+```
+
+### Local Development
+
+```bash
+# Test with local Ollama
+pytest tests/integration/inference/ \
+    --stack-config=starter \
+    --text-model=llama3.2:3b
+```
+
+## Recording Modes
+
+```bash
+# Live API calls (default)
+LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/
+
+# Record new responses
+LLAMA_STACK_TEST_INFERENCE_MODE=record \
+LLAMA_STACK_TEST_RECORDING_DIR=./recordings \
+pytest tests/integration/inference/test_new.py
+
+# Replay cached responses
+LLAMA_STACK_TEST_INFERENCE_MODE=replay \
+LLAMA_STACK_TEST_RECORDING_DIR=./recordings \
+pytest tests/integration/
+```
+
+## Recording Management
+
+```bash
+# View recordings
+sqlite3 recordings/index.sqlite "SELECT * FROM recordings;"
+cat recordings/responses/abc123.json
+
+# Re-record tests
+rm -rf recordings/
+LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/test_specific.py
+```
+
+## Debugging
+
+```bash
+# Verbose output
+pytest -vvs tests/integration/inference/
+
+# Debug logging
+LLAMA_STACK_LOG_LEVEL=DEBUG pytest tests/integration/test_failing.py
+
+# Custom port for conflicts
+pytest tests/integration/ --stack-config=server:fireworks:8322
+```
+
+## Best Practices
+
+- Use existing recordings for development
+- Record new interactions only when needed
+- Test across multiple providers
+- Use descriptive test names
+- Commit recordings to version control
--- a/docs/source/contributing/testing/record-replay.md
+++ b/docs/source/contributing/testing/record-replay.md
@ -0,0 +1,80 @@
+# Record-Replay System
+
+The record-replay system captures real API interactions and replays them deterministically for fast, reliable testing.
+
+## How It Works
+
+### Request Hashing
+
+API requests are hashed to enable consistent lookup:
+
+```python
+def normalize_request(method: str, url: str, headers: dict, body: dict) -> str:
+    normalized = {
+        "method": method.upper(),
+        "endpoint": urlparse(url).path,
+        "body": body
+    }
+    return hashlib.sha256(json.dumps(normalized, sort_keys=True).encode()).hexdigest()
+```
+
+Hashing is precise - different whitespace or float precision produces different hashes.
+
+### Client Interception
+
+The system patches OpenAI and Ollama client methods to intercept API calls before they leave the client.
+
+## Storage
+
+Recordings use SQLite for indexing and JSON for storage:
+
+```
+recordings/
+├── index.sqlite          # Fast lookup by request hash
+└── responses/
+    ├── abc123def456.json  # Individual response files
+    └── def789ghi012.json
+```
+
+## Recording Modes
+
+### LIVE Mode
+Direct API calls, no recording/replay:
+```python
+with inference_recording(mode=InferenceMode.LIVE):
+    response = await client.chat.completions.create(...)
+```
+
+### RECORD Mode
+Captures API interactions:
+```python
+with inference_recording(mode=InferenceMode.RECORD, storage_dir="./recordings"):
+    response = await client.chat.completions.create(...)
+    # Response captured AND returned
+```
+
+### REPLAY Mode
+Uses stored recordings:
+```python
+with inference_recording(mode=InferenceMode.REPLAY, storage_dir="./recordings"):
+    response = await client.chat.completions.create(...)
+    # Returns cached response, no API call
+```
+
+## Streaming Support
+
+Streaming responses are captured completely before any chunks are yielded, then replayed as an async generator that matches the original API behavior.
+
+## Environment Variables
+
+```bash
+export LLAMA_STACK_TEST_INFERENCE_MODE=replay
+export LLAMA_STACK_TEST_RECORDING_DIR=/path/to/recordings
+pytest tests/integration/
+```
+
+## Common Issues
+
+- **"No recorded response found"** - Re-record with `RECORD` mode
+- **Serialization errors** - Response types changed, re-record
+- **Hash mismatches** - Request parameters changed slightly
--- a/docs/source/contributing/testing/troubleshooting.md
+++ b/docs/source/contributing/testing/troubleshooting.md
@ -0,0 +1,528 @@
+# Testing Troubleshooting Guide
+
+This guide covers common issues encountered when working with Llama Stack's testing infrastructure and how to resolve them.
+
+## Quick Diagnosis
+
+### Test Status Quick Check
+
+```bash
+# Check if tests can run at all
+uv run pytest tests/integration/inference/test_embedding.py::test_basic_embeddings -v
+
+# Check available models and providers
+uv run llama stack list-providers
+uv run llama stack list-models
+
+# Verify server connectivity
+curl http://localhost:5001/v1/health
+```
+
+## Recording and Replay Issues
+
+### "No recorded response found for request hash"
+
+**Symptom:**
+```
+RuntimeError: No recorded response found for request hash: abc123def456
+Endpoint: /v1/chat/completions
+Model: meta-llama/Llama-3.1-8B-Instruct
+```
+
+**Causes and Solutions:**
+
+1. **Missing recording** - Most common cause
+   ```bash
+   # Record the missing interaction
+   LLAMA_STACK_TEST_INFERENCE_MODE=record \
+   LLAMA_STACK_TEST_RECORDING_DIR=./test_recordings \
+   pytest tests/integration/inference/test_failing.py -v
+   ```
+
+2. **Request parameters changed**
+   ```bash
+   # Check what changed by comparing requests
+   sqlite3 test_recordings/index.sqlite \
+   "SELECT request_hash, endpoint, model, timestamp FROM recordings WHERE endpoint='/v1/chat/completions';"
+
+   # View specific request details
+   cat test_recordings/responses/abc123def456.json | jq '.request'
+   ```
+
+3. **Different environment/provider**
+   ```bash
+   # Ensure consistent test environment
+   pytest tests/integration/ --stack-config=starter --text-model=llama3.2:3b
+   ```
+
+### Recording Failures
+
+**Symptom:**
+```
+sqlite3.OperationalError: database is locked
+```
+
+**Solutions:**
+
+1. **Concurrent access** - Multiple test processes
+   ```bash
+   # Run tests sequentially
+   pytest tests/integration/ -n 1
+
+   # Or use separate recording directories
+   LLAMA_STACK_TEST_RECORDING_DIR=./recordings_$(date +%s) pytest ...
+   ```
+
+2. **Incomplete recording cleanup**
+   ```bash
+   # Clear and restart recording
+   rm -rf test_recordings/
+   LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/inference/test_specific.py
+   ```
+
+### Serialization/Deserialization Errors
+
+**Symptom:**
+```
+Failed to deserialize object of type llama_stack.apis.inference.OpenAIChatCompletion
+```
+
+**Causes and Solutions:**
+
+1. **API response format changed**
+   ```bash
+   # Re-record with updated format
+   rm test_recordings/responses/abc123*.json
+   LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/inference/test_failing.py
+   ```
+
+2. **Missing dependencies for deserialization**
+   ```bash
+   # Ensure all required packages installed
+   uv install --group dev
+   ```
+
+3. **Version mismatch between record and replay**
+   ```bash
+   # Check Python environment consistency
+   uv run python -c "import llama_stack; print(llama_stack.__version__)"
+   ```
+
+## Server Connection Issues
+
+### "Connection refused" Errors
+
+**Symptom:**
+```
+requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5001)
+```
+
+**Diagnosis and Solutions:**
+
+1. **Server not running**
+   ```bash
+   # Check if server is running
+   curl http://localhost:5001/v1/health
+
+   # Start server manually for debugging
+   llama stack run --template starter --port 5001
+   ```
+
+2. **Port conflicts**
+   ```bash
+   # Check what's using the port
+   lsof -i :5001
+
+   # Use different port
+   pytest tests/integration/ --stack-config=server:starter:8322
+   ```
+
+3. **Server startup timeout**
+   ```bash
+   # Increase startup timeout or check server logs
+   tail -f server.log
+
+   # Manual server management
+   llama stack run --template starter &
+   sleep 30  # Wait for startup
+   pytest tests/integration/
+   ```
+
+### Auto-Server Startup Issues
+
+**Symptom:**
+```
+Server failed to respond within 30 seconds
+```
+
+**Solutions:**
+
+1. **Check server logs**
+   ```bash
+   # Server logs are written to server.log
+   tail -f server.log
+
+   # Look for startup errors
+   grep -i error server.log
+   ```
+
+2. **Dependencies missing**
+   ```bash
+   # Ensure all dependencies installed
+   uv install --group dev
+
+   # Check specific provider requirements
+   pip list | grep -i fireworks
+   ```
+
+3. **Resource constraints**
+   ```bash
+   # Check system resources
+   htop
+   df -h
+
+   # Use lighter config for testing
+   pytest tests/integration/ --stack-config=starter
+   ```
+
+## Provider and Model Issues
+
+### "Model not found" Errors
+
+**Symptom:**
+```
+Model 'meta-llama/Llama-3.1-8B-Instruct' not found
+```
+
+**Solutions:**
+
+1. **Check available models**
+   ```bash
+   # List models for current provider
+   uv run llama stack list-models
+
+   # Use available model
+   pytest tests/integration/ --text-model=llama3.2:3b
+   ```
+
+2. **Model not downloaded for local providers**
+   ```bash
+   # Download missing model
+   ollama pull llama3.2:3b
+
+   # Verify model available
+   ollama list
+   ```
+
+3. **Provider configuration issues**
+   ```bash
+   # Check provider setup
+   uv run llama stack list-providers
+
+   # Verify API keys set
+   echo $FIREWORKS_API_KEY
+   ```
+
+### Provider Authentication Failures
+
+**Symptom:**
+```
+HTTP 401: Invalid API key
+```
+
+**Solutions:**
+
+1. **Missing API keys**
+   ```bash
+   # Set required API key
+   export FIREWORKS_API_KEY=your_key_here
+   export OPENAI_API_KEY=your_key_here
+
+   # Verify key is set
+   echo $FIREWORKS_API_KEY
+   ```
+
+2. **Invalid API keys**
+   ```bash
+   # Test API key directly
+   curl -H "Authorization: Bearer $FIREWORKS_API_KEY" \
+        https://api.fireworks.ai/inference/v1/models
+   ```
+
+3. **API key environment issues**
+   ```bash
+   # Pass environment explicitly
+   pytest tests/integration/ --env FIREWORKS_API_KEY=your_key
+   ```
+
+## Parametrization Issues
+
+### "No tests ran matching the given pattern"
+
+**Symptom:**
+```
+collected 0 items
+```
+
+**Causes and Solutions:**
+
+1. **No models specified**
+   ```bash
+   # Specify required models
+   pytest tests/integration/inference/ --text-model=llama3.2:3b
+   ```
+
+2. **Model/provider mismatch**
+   ```bash
+   # Use compatible model for provider
+   pytest tests/integration/ \
+     --stack-config=starter \
+     --text-model=llama3.2:3b  # Available in Ollama
+   ```
+
+3. **Missing fixtures**
+   ```bash
+   # Check test requirements
+   pytest tests/integration/inference/test_embedding.py --collect-only
+   ```
+
+### Excessive Test Combinations
+
+**Symptom:**
+Tests run for too many parameter combinations, taking too long.
+
+**Solutions:**
+
+1. **Limit model combinations**
+   ```bash
+   # Test single model instead of list
+   pytest tests/integration/ --text-model=llama3.2:3b
+   ```
+
+2. **Use specific test selection**
+   ```bash
+   # Run specific test pattern
+   pytest tests/integration/ -k "basic and not vision"
+   ```
+
+3. **Separate test runs**
+   ```bash
+   # Split by functionality
+   pytest tests/integration/inference/ --text-model=model1
+   pytest tests/integration/agents/ --text-model=model2
+   ```
+
+## Performance Issues
+
+### Slow Test Execution
+
+**Symptom:**
+Tests take much longer than expected.
+
+**Diagnosis and Solutions:**
+
+1. **Using LIVE mode instead of REPLAY**
+   ```bash
+   # Verify recording mode
+   echo $LLAMA_STACK_TEST_INFERENCE_MODE
+
+   # Force replay mode
+   LLAMA_STACK_TEST_INFERENCE_MODE=replay pytest tests/integration/
+   ```
+
+2. **Network latency to providers**
+   ```bash
+   # Use local providers for development
+   pytest tests/integration/ --stack-config=starter
+   ```
+
+3. **Large recording files**
+   ```bash
+   # Check recording directory size
+   du -sh test_recordings/
+
+   # Clean up old recordings
+   find test_recordings/ -name "*.json" -mtime +30 -delete
+   ```
+
+### Memory Usage Issues
+
+**Symptom:**
+```
+MemoryError: Unable to allocate memory
+```
+
+**Solutions:**
+
+1. **Large recordings in memory**
+   ```bash
+   # Run tests in smaller batches
+   pytest tests/integration/inference/ -k "not batch"
+   ```
+
+2. **Model memory requirements**
+   ```bash
+   # Use smaller models for testing
+   pytest tests/integration/ --text-model=llama3.2:3b  # Instead of 70B
+   ```
+
+## Environment Issues
+
+### Python Environment Problems
+
+**Symptom:**
+```
+ModuleNotFoundError: No module named 'llama_stack'
+```
+
+**Solutions:**
+
+1. **Wrong Python environment**
+   ```bash
+   # Verify uv environment
+   uv run python -c "import llama_stack; print('OK')"
+
+   # Reinstall if needed
+   uv install --group dev
+   ```
+
+2. **Development installation issues**
+   ```bash
+   # Reinstall in development mode
+   pip install -e .
+
+   # Verify installation
+   python -c "import llama_stack; print(llama_stack.__file__)"
+   ```
+
+### Path and Import Issues
+
+**Symptom:**
+```
+ImportError: cannot import name 'LlamaStackClient'
+```
+
+**Solutions:**
+
+1. **PYTHONPATH issues**
+   ```bash
+   # Run from project root
+   cd /path/to/llama-stack
+   uv run pytest tests/integration/
+   ```
+
+2. **Relative import issues**
+   ```bash
+   # Use absolute imports in tests
+   from llama_stack_client import LlamaStackClient  # Not relative
+   ```
+
+## Debugging Techniques
+
+### Verbose Logging
+
+Enable detailed logging to understand what's happening:
+
+```bash
+# Enable debug logging
+LLAMA_STACK_LOG_LEVEL=DEBUG pytest tests/integration/inference/test_failing.py -v -s
+
+# Enable request/response logging
+LLAMA_STACK_TEST_INFERENCE_MODE=live \
+LLAMA_STACK_LOG_LEVEL=DEBUG \
+pytest tests/integration/inference/test_failing.py -v -s
+```
+
+### Interactive Debugging
+
+Drop into debugger when tests fail:
+
+```bash
+# Run with pdb on failure
+pytest tests/integration/inference/test_failing.py --pdb
+
+# Or add breakpoint in test code
+def test_something(llama_stack_client):
+    import pdb; pdb.set_trace()
+    # ... test code
+```
+
+### Isolation Testing
+
+Run tests in isolation to identify interactions:
+
+```bash
+# Run single test
+pytest tests/integration/inference/test_embedding.py::test_basic_embeddings
+
+# Run without recordings
+rm -rf test_recordings/
+LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/inference/test_failing.py
+```
+
+### Recording Inspection
+
+Examine recordings to understand what's stored:
+
+```bash
+# Check recording database
+sqlite3 test_recordings/index.sqlite ".tables"
+sqlite3 test_recordings/index.sqlite ".schema recordings"
+sqlite3 test_recordings/index.sqlite "SELECT * FROM recordings LIMIT 5;"
+
+# Examine specific recording
+find test_recordings/responses/ -name "*.json" | head -1 | xargs cat | jq '.'
+
+# Compare request hashes
+python -c "
+from llama_stack.testing.inference_recorder import normalize_request
+print(normalize_request('POST', 'http://localhost:11434/v1/chat/completions', {}, {'model': 'llama3.2:3b', 'messages': [{'role': 'user', 'content': 'Hello'}]}))
+"
+```
+
+## Getting Help
+
+### Information to Gather
+
+When reporting issues, include:
+
+1. **Environment details:**
+   ```bash
+   uv run python --version
+   uv run python -c "import llama_stack; print(llama_stack.__version__)"
+   uv list
+   ```
+
+2. **Test command and output:**
+   ```bash
+   # Full command that failed
+   pytest tests/integration/inference/test_failing.py -v
+
+   # Error message and stack trace
+   ```
+
+3. **Configuration details:**
+   ```bash
+   # Stack configuration used
+   echo $LLAMA_STACK_TEST_INFERENCE_MODE
+   ls -la test_recordings/
+   ```
+
+4. **Provider status:**
+   ```bash
+   uv run llama stack list-providers
+   uv run llama stack list-models
+   ```
+
+### Common Solutions Summary
+
+| Issue | Quick Fix |
+|-------|-----------|
+| Missing recordings | `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...` |
+| Connection refused | Check server: `curl http://localhost:5001/v1/health` |
+| No tests collected | Add model: `--text-model=llama3.2:3b` |
+| Authentication error | Set API key: `export PROVIDER_API_KEY=...` |
+| Serialization error | Re-record: `rm recordings/*.json && record mode` |
+| Slow tests | Use replay: `LLAMA_STACK_TEST_INFERENCE_MODE=replay` |
+
+Most testing issues stem from configuration mismatches or missing recordings. The record-replay system is designed to be forgiving, but requires consistent environment setup for optimal performance.
--- a/docs/source/contributing/testing/writing-tests.md
+++ b/docs/source/contributing/testing/writing-tests.md
@ -0,0 +1,125 @@
+# Writing Tests
+
+How to write effective tests for Llama Stack.
+
+## Basic Test Pattern
+
+```python
+def test_basic_completion(llama_stack_client, text_model_id):
+    """Test basic text completion functionality."""
+    response = llama_stack_client.inference.completion(
+        model_id=text_model_id,
+        content=CompletionMessage(role="user", content="Hello"),
+    )
+    
+    # Test structure, not AI output quality
+    assert response.completion_message is not None
+    assert isinstance(response.completion_message.content, str)
+    assert len(response.completion_message.content) > 0
+```
+
+## Parameterized Tests
+
+```python
+@pytest.mark.parametrize("temperature", [0.0, 0.5, 1.0])
+def test_completion_temperature(llama_stack_client, text_model_id, temperature):
+    response = llama_stack_client.inference.completion(
+        model_id=text_model_id,
+        content=CompletionMessage(role="user", content="Hello"),
+        sampling_params={"temperature": temperature}
+    )
+    assert response.completion_message is not None
+```
+
+## Provider-Specific Tests
+
+```python
+def test_asymmetric_embeddings(llama_stack_client, embedding_model_id):
+    if embedding_model_id not in MODELS_SUPPORTING_TASK_TYPE:
+        pytest.skip(f"Model {embedding_model_id} doesn't support task types")
+    
+    query_response = llama_stack_client.inference.embeddings(
+        model_id=embedding_model_id,
+        contents=["What is machine learning?"],
+        task_type="query"
+    )
+    
+    passage_response = llama_stack_client.inference.embeddings(
+        model_id=embedding_model_id,
+        contents=["Machine learning is a subset of AI..."],
+        task_type="passage"
+    )
+    
+    assert query_response.embeddings != passage_response.embeddings
+```
+
+## Fixtures
+
+```python
+@pytest.fixture(scope="session")
+def agent_config(llama_stack_client, text_model_id):
+    """Reusable agent configuration."""
+    return {
+        "model": text_model_id,
+        "instructions": "You are a helpful assistant",
+        "tools": [],
+        "enable_session_persistence": False,
+    }
+
+@pytest.fixture(scope="function")
+def fresh_session(llama_stack_client):
+    """Each test gets fresh state."""
+    session = llama_stack_client.create_session()
+    yield session
+    session.delete()
+```
+
+## Common Test Patterns
+
+### Streaming Tests
+```python
+def test_streaming_completion(llama_stack_client, text_model_id):
+    stream = llama_stack_client.inference.completion(
+        model_id=text_model_id,
+        content=CompletionMessage(role="user", content="Count to 5"),
+        stream=True
+    )
+    
+    chunks = list(stream)
+    assert len(chunks) > 1
+    assert all(hasattr(chunk, 'delta') for chunk in chunks)
+```
+
+### Error Testing
+```python
+def test_invalid_model_error(llama_stack_client):
+    with pytest.raises(Exception) as exc_info:
+        llama_stack_client.inference.completion(
+            model_id="nonexistent-model",
+            content=CompletionMessage(role="user", content="Hello")
+        )
+    assert "model" in str(exc_info.value).lower()
+```
+
+## What NOT to Test
+
+```python
+# BAD: Testing AI output quality
+def test_completion_quality(llama_stack_client, text_model_id):
+    response = llama_stack_client.inference.completion(...)
+    assert "correct answer" in response.content  # Fragile!
+
+# GOOD: Testing response structure
+def test_completion_structure(llama_stack_client, text_model_id):
+    response = llama_stack_client.inference.completion(...)
+    assert isinstance(response.completion_message.content, str)
+    assert len(response.completion_message.content) > 0
+```
+
+## Best Practices
+
+- Test API contracts, not AI output quality
+- Use descriptive test names
+- Keep tests simple and focused
+- Record new interactions only when needed
+- Use appropriate fixture scopes (session vs function)