From f4281ce66ad9d649fc74e22c0f8febb34c499ad4 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Wed, 13 Aug 2025 17:37:02 -0700 Subject: [PATCH] docs(tests): Add a bunch of documentation for our testing systems --- docs/source/contributing/index.md | 3 +- docs/source/contributing/testing.md | 32 ++ docs/source/contributing/testing/index.md | 103 ++++ .../testing/integration-testing.md | 136 +++++ .../contributing/testing/record-replay.md | 80 +++ .../contributing/testing/troubleshooting.md | 528 ++++++++++++++++++ .../contributing/testing/writing-tests.md | 125 +++++ 7 files changed, 1006 insertions(+), 1 deletion(-) create mode 100644 docs/source/contributing/testing/index.md create mode 100644 docs/source/contributing/testing/integration-testing.md create mode 100644 docs/source/contributing/testing/record-replay.md create mode 100644 docs/source/contributing/testing/troubleshooting.md create mode 100644 docs/source/contributing/testing/writing-tests.md diff --git a/docs/source/contributing/index.md b/docs/source/contributing/index.md index 7a3a1c2e2..9f3fd8ea4 100644 --- a/docs/source/contributing/index.md +++ b/docs/source/contributing/index.md @@ -19,7 +19,8 @@ new_vector_database ## Testing -See the [Test Page](testing.md) which describes how to test your changes. +Llama Stack uses a record-replay testing system for reliable, cost-effective testing. See the [Testing Documentation](testing.md) for comprehensive guides on writing and running tests. + ```{toctree} :maxdepth: 1 :hidden: diff --git a/docs/source/contributing/testing.md b/docs/source/contributing/testing.md index 454ded266..32318c3b9 100644 --- a/docs/source/contributing/testing.md +++ b/docs/source/contributing/testing.md @@ -1,3 +1,35 @@ +# Testing + +Llama Stack uses a record-replay system for reliable, fast, and cost-effective testing of AI applications. + +## Testing Documentation + +```{toctree} +:maxdepth: 1 + +testing/index +testing/integration-testing +testing/record-replay +testing/writing-tests +testing/troubleshooting +``` + +## Quick Start + +```bash +# Run tests with existing recordings +uv run pytest tests/integration/ + +# Test against live APIs +FIREWORKS_API_KEY=... pytest tests/integration/ --stack-config=server:fireworks +``` + +For detailed information, see the [Testing Overview](testing/index.md). + +--- + +## Original Documentation + ```{include} ../../../tests/README.md ``` diff --git a/docs/source/contributing/testing/index.md b/docs/source/contributing/testing/index.md new file mode 100644 index 000000000..e8cb0f02c --- /dev/null +++ b/docs/source/contributing/testing/index.md @@ -0,0 +1,103 @@ +# Testing in Llama Stack + +Llama Stack uses a record-replay testing system to handle AI API costs, non-deterministic responses, and multiple provider integrations. + +## Core Problems + +Testing AI applications creates three challenges: + +- **API costs** accumulate quickly during development and CI +- **Non-deterministic responses** make tests unreliable +- **Multiple providers** require testing the same logic across different APIs + +## Solution + +Record real API responses once, replay them for fast, deterministic tests. + +## Architecture Overview + +### Test Types + +- **Unit tests** (`tests/unit/`) - Test components in isolation with mocks +- **Integration tests** (`tests/integration/`) - Test complete workflows with record-replay + +### Core Components + +#### Record-Replay System + +Captures API calls and replays them deterministically: + +```python +# Record real API responses +with inference_recording(mode=InferenceMode.RECORD, storage_dir="recordings"): + response = await client.chat.completions.create(...) + +# Replay cached responses +with inference_recording(mode=InferenceMode.REPLAY, storage_dir="recordings"): + response = await client.chat.completions.create(...) # No API call made +``` + +#### Provider Testing + +Write tests once, run against any provider: + +```bash +# Same test, different providers +pytest tests/integration/inference/ --stack-config=openai --text-model=gpt-4 +pytest tests/integration/inference/ --stack-config=starter --text-model=llama3.2:3b +``` + +#### Test Parametrization + +Generate test combinations from CLI arguments: + +```bash +# Creates test for each model/provider combination +pytest tests/integration/ \ + --stack-config=inference=fireworks \ + --text-model=llama-3.1-8b,llama-3.1-70b +``` + +## How It Works + +### Recording Storage + +Recordings use SQLite for lookup and JSON for storage: + +``` +recordings/ +├── index.sqlite # Fast lookup by request hash +└── responses/ + ├── abc123def456.json # Individual response files + └── def789ghi012.json +``` + +### Why Record-Replay? + +Mocking AI APIs is brittle. Real API responses: +- Include edge cases and realistic data structures +- Preserve streaming behavior +- Can be inspected and debugged + +### Why Test All Providers? + +One test verifies behavior across all providers, catching integration bugs early. + +## Workflow + +1. **Develop tests** in `LIVE` mode against real APIs +2. **Record responses** with `RECORD` mode +3. **Commit recordings** for deterministic CI +4. **Tests replay** cached responses in CI + +## Quick Start + +```bash +# Run tests with existing recordings +uv run pytest tests/integration/ + +# Test against live APIs +FIREWORKS_API_KEY=... pytest tests/integration/ --stack-config=server:fireworks +``` + +See [Integration Testing](integration-testing.md) for usage details and [Record-Replay](record-replay.md) for system internals. \ No newline at end of file diff --git a/docs/source/contributing/testing/integration-testing.md b/docs/source/contributing/testing/integration-testing.md new file mode 100644 index 000000000..17f869d9b --- /dev/null +++ b/docs/source/contributing/testing/integration-testing.md @@ -0,0 +1,136 @@ +# Integration Testing Guide + +Practical usage of Llama Stack's integration testing system. + +## Basic Usage + +```bash +# Run all integration tests +uv run pytest tests/integration/ + +# Run specific test suites +uv run pytest tests/integration/inference/ +uv run pytest tests/integration/agents/ +``` + +## Live API Testing + +```bash +# Auto-start server +export FIREWORKS_API_KEY=your_key +pytest tests/integration/inference/ \ + --stack-config=server:fireworks \ + --text-model=meta-llama/Llama-3.1-8B-Instruct + +# Library client +export TOGETHER_API_KEY=your_key +pytest tests/integration/inference/ \ + --stack-config=starter \ + --text-model=meta-llama/Llama-3.1-8B-Instruct +``` + +## Configuration + +### Stack Config + +```bash +--stack-config=server:fireworks # Auto-start server +--stack-config=server:together:8322 # Custom port +--stack-config=starter # Template +--stack-config=/path/to/run.yaml # Config file +--stack-config=inference=fireworks # Adhoc providers +--stack-config=http://localhost:5001 # Existing server +``` + +### Models + +```bash +--text-model=meta-llama/Llama-3.1-8B-Instruct +--vision-model=meta-llama/Llama-3.2-11B-Vision-Instruct +--embedding-model=sentence-transformers/all-MiniLM-L6-v2 +``` + +### Environment + +```bash +--env FIREWORKS_API_KEY=your_key +--env OPENAI_BASE_URL=http://localhost:11434/v1 +``` + +## Test Scenarios + +### New Provider Testing + +```bash +# Test new provider +pytest tests/integration/inference/ \ + --stack-config=inference=your-new-provider \ + --text-model=your-model-id +``` + +### Multiple Models + +```bash +# Test multiple models +pytest tests/integration/inference/ \ + --text-model=llama-3.1-8b,llama-3.1-70b +``` + +### Local Development + +```bash +# Test with local Ollama +pytest tests/integration/inference/ \ + --stack-config=starter \ + --text-model=llama3.2:3b +``` + +## Recording Modes + +```bash +# Live API calls (default) +LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/ + +# Record new responses +LLAMA_STACK_TEST_INFERENCE_MODE=record \ +LLAMA_STACK_TEST_RECORDING_DIR=./recordings \ +pytest tests/integration/inference/test_new.py + +# Replay cached responses +LLAMA_STACK_TEST_INFERENCE_MODE=replay \ +LLAMA_STACK_TEST_RECORDING_DIR=./recordings \ +pytest tests/integration/ +``` + +## Recording Management + +```bash +# View recordings +sqlite3 recordings/index.sqlite "SELECT * FROM recordings;" +cat recordings/responses/abc123.json + +# Re-record tests +rm -rf recordings/ +LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/test_specific.py +``` + +## Debugging + +```bash +# Verbose output +pytest -vvs tests/integration/inference/ + +# Debug logging +LLAMA_STACK_LOG_LEVEL=DEBUG pytest tests/integration/test_failing.py + +# Custom port for conflicts +pytest tests/integration/ --stack-config=server:fireworks:8322 +``` + +## Best Practices + +- Use existing recordings for development +- Record new interactions only when needed +- Test across multiple providers +- Use descriptive test names +- Commit recordings to version control \ No newline at end of file diff --git a/docs/source/contributing/testing/record-replay.md b/docs/source/contributing/testing/record-replay.md new file mode 100644 index 000000000..90ee4c7a1 --- /dev/null +++ b/docs/source/contributing/testing/record-replay.md @@ -0,0 +1,80 @@ +# Record-Replay System + +The record-replay system captures real API interactions and replays them deterministically for fast, reliable testing. + +## How It Works + +### Request Hashing + +API requests are hashed to enable consistent lookup: + +```python +def normalize_request(method: str, url: str, headers: dict, body: dict) -> str: + normalized = { + "method": method.upper(), + "endpoint": urlparse(url).path, + "body": body + } + return hashlib.sha256(json.dumps(normalized, sort_keys=True).encode()).hexdigest() +``` + +Hashing is precise - different whitespace or float precision produces different hashes. + +### Client Interception + +The system patches OpenAI and Ollama client methods to intercept API calls before they leave the client. + +## Storage + +Recordings use SQLite for indexing and JSON for storage: + +``` +recordings/ +├── index.sqlite # Fast lookup by request hash +└── responses/ + ├── abc123def456.json # Individual response files + └── def789ghi012.json +``` + +## Recording Modes + +### LIVE Mode +Direct API calls, no recording/replay: +```python +with inference_recording(mode=InferenceMode.LIVE): + response = await client.chat.completions.create(...) +``` + +### RECORD Mode +Captures API interactions: +```python +with inference_recording(mode=InferenceMode.RECORD, storage_dir="./recordings"): + response = await client.chat.completions.create(...) + # Response captured AND returned +``` + +### REPLAY Mode +Uses stored recordings: +```python +with inference_recording(mode=InferenceMode.REPLAY, storage_dir="./recordings"): + response = await client.chat.completions.create(...) + # Returns cached response, no API call +``` + +## Streaming Support + +Streaming responses are captured completely before any chunks are yielded, then replayed as an async generator that matches the original API behavior. + +## Environment Variables + +```bash +export LLAMA_STACK_TEST_INFERENCE_MODE=replay +export LLAMA_STACK_TEST_RECORDING_DIR=/path/to/recordings +pytest tests/integration/ +``` + +## Common Issues + +- **"No recorded response found"** - Re-record with `RECORD` mode +- **Serialization errors** - Response types changed, re-record +- **Hash mismatches** - Request parameters changed slightly \ No newline at end of file diff --git a/docs/source/contributing/testing/troubleshooting.md b/docs/source/contributing/testing/troubleshooting.md new file mode 100644 index 000000000..e894d6e4f --- /dev/null +++ b/docs/source/contributing/testing/troubleshooting.md @@ -0,0 +1,528 @@ +# Testing Troubleshooting Guide + +This guide covers common issues encountered when working with Llama Stack's testing infrastructure and how to resolve them. + +## Quick Diagnosis + +### Test Status Quick Check + +```bash +# Check if tests can run at all +uv run pytest tests/integration/inference/test_embedding.py::test_basic_embeddings -v + +# Check available models and providers +uv run llama stack list-providers +uv run llama stack list-models + +# Verify server connectivity +curl http://localhost:5001/v1/health +``` + +## Recording and Replay Issues + +### "No recorded response found for request hash" + +**Symptom:** +``` +RuntimeError: No recorded response found for request hash: abc123def456 +Endpoint: /v1/chat/completions +Model: meta-llama/Llama-3.1-8B-Instruct +``` + +**Causes and Solutions:** + +1. **Missing recording** - Most common cause + ```bash + # Record the missing interaction + LLAMA_STACK_TEST_INFERENCE_MODE=record \ + LLAMA_STACK_TEST_RECORDING_DIR=./test_recordings \ + pytest tests/integration/inference/test_failing.py -v + ``` + +2. **Request parameters changed** + ```bash + # Check what changed by comparing requests + sqlite3 test_recordings/index.sqlite \ + "SELECT request_hash, endpoint, model, timestamp FROM recordings WHERE endpoint='/v1/chat/completions';" + + # View specific request details + cat test_recordings/responses/abc123def456.json | jq '.request' + ``` + +3. **Different environment/provider** + ```bash + # Ensure consistent test environment + pytest tests/integration/ --stack-config=starter --text-model=llama3.2:3b + ``` + +### Recording Failures + +**Symptom:** +``` +sqlite3.OperationalError: database is locked +``` + +**Solutions:** + +1. **Concurrent access** - Multiple test processes + ```bash + # Run tests sequentially + pytest tests/integration/ -n 1 + + # Or use separate recording directories + LLAMA_STACK_TEST_RECORDING_DIR=./recordings_$(date +%s) pytest ... + ``` + +2. **Incomplete recording cleanup** + ```bash + # Clear and restart recording + rm -rf test_recordings/ + LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/inference/test_specific.py + ``` + +### Serialization/Deserialization Errors + +**Symptom:** +``` +Failed to deserialize object of type llama_stack.apis.inference.OpenAIChatCompletion +``` + +**Causes and Solutions:** + +1. **API response format changed** + ```bash + # Re-record with updated format + rm test_recordings/responses/abc123*.json + LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/inference/test_failing.py + ``` + +2. **Missing dependencies for deserialization** + ```bash + # Ensure all required packages installed + uv install --group dev + ``` + +3. **Version mismatch between record and replay** + ```bash + # Check Python environment consistency + uv run python -c "import llama_stack; print(llama_stack.__version__)" + ``` + +## Server Connection Issues + +### "Connection refused" Errors + +**Symptom:** +``` +requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5001) +``` + +**Diagnosis and Solutions:** + +1. **Server not running** + ```bash + # Check if server is running + curl http://localhost:5001/v1/health + + # Start server manually for debugging + llama stack run --template starter --port 5001 + ``` + +2. **Port conflicts** + ```bash + # Check what's using the port + lsof -i :5001 + + # Use different port + pytest tests/integration/ --stack-config=server:starter:8322 + ``` + +3. **Server startup timeout** + ```bash + # Increase startup timeout or check server logs + tail -f server.log + + # Manual server management + llama stack run --template starter & + sleep 30 # Wait for startup + pytest tests/integration/ + ``` + +### Auto-Server Startup Issues + +**Symptom:** +``` +Server failed to respond within 30 seconds +``` + +**Solutions:** + +1. **Check server logs** + ```bash + # Server logs are written to server.log + tail -f server.log + + # Look for startup errors + grep -i error server.log + ``` + +2. **Dependencies missing** + ```bash + # Ensure all dependencies installed + uv install --group dev + + # Check specific provider requirements + pip list | grep -i fireworks + ``` + +3. **Resource constraints** + ```bash + # Check system resources + htop + df -h + + # Use lighter config for testing + pytest tests/integration/ --stack-config=starter + ``` + +## Provider and Model Issues + +### "Model not found" Errors + +**Symptom:** +``` +Model 'meta-llama/Llama-3.1-8B-Instruct' not found +``` + +**Solutions:** + +1. **Check available models** + ```bash + # List models for current provider + uv run llama stack list-models + + # Use available model + pytest tests/integration/ --text-model=llama3.2:3b + ``` + +2. **Model not downloaded for local providers** + ```bash + # Download missing model + ollama pull llama3.2:3b + + # Verify model available + ollama list + ``` + +3. **Provider configuration issues** + ```bash + # Check provider setup + uv run llama stack list-providers + + # Verify API keys set + echo $FIREWORKS_API_KEY + ``` + +### Provider Authentication Failures + +**Symptom:** +``` +HTTP 401: Invalid API key +``` + +**Solutions:** + +1. **Missing API keys** + ```bash + # Set required API key + export FIREWORKS_API_KEY=your_key_here + export OPENAI_API_KEY=your_key_here + + # Verify key is set + echo $FIREWORKS_API_KEY + ``` + +2. **Invalid API keys** + ```bash + # Test API key directly + curl -H "Authorization: Bearer $FIREWORKS_API_KEY" \ + https://api.fireworks.ai/inference/v1/models + ``` + +3. **API key environment issues** + ```bash + # Pass environment explicitly + pytest tests/integration/ --env FIREWORKS_API_KEY=your_key + ``` + +## Parametrization Issues + +### "No tests ran matching the given pattern" + +**Symptom:** +``` +collected 0 items +``` + +**Causes and Solutions:** + +1. **No models specified** + ```bash + # Specify required models + pytest tests/integration/inference/ --text-model=llama3.2:3b + ``` + +2. **Model/provider mismatch** + ```bash + # Use compatible model for provider + pytest tests/integration/ \ + --stack-config=starter \ + --text-model=llama3.2:3b # Available in Ollama + ``` + +3. **Missing fixtures** + ```bash + # Check test requirements + pytest tests/integration/inference/test_embedding.py --collect-only + ``` + +### Excessive Test Combinations + +**Symptom:** +Tests run for too many parameter combinations, taking too long. + +**Solutions:** + +1. **Limit model combinations** + ```bash + # Test single model instead of list + pytest tests/integration/ --text-model=llama3.2:3b + ``` + +2. **Use specific test selection** + ```bash + # Run specific test pattern + pytest tests/integration/ -k "basic and not vision" + ``` + +3. **Separate test runs** + ```bash + # Split by functionality + pytest tests/integration/inference/ --text-model=model1 + pytest tests/integration/agents/ --text-model=model2 + ``` + +## Performance Issues + +### Slow Test Execution + +**Symptom:** +Tests take much longer than expected. + +**Diagnosis and Solutions:** + +1. **Using LIVE mode instead of REPLAY** + ```bash + # Verify recording mode + echo $LLAMA_STACK_TEST_INFERENCE_MODE + + # Force replay mode + LLAMA_STACK_TEST_INFERENCE_MODE=replay pytest tests/integration/ + ``` + +2. **Network latency to providers** + ```bash + # Use local providers for development + pytest tests/integration/ --stack-config=starter + ``` + +3. **Large recording files** + ```bash + # Check recording directory size + du -sh test_recordings/ + + # Clean up old recordings + find test_recordings/ -name "*.json" -mtime +30 -delete + ``` + +### Memory Usage Issues + +**Symptom:** +``` +MemoryError: Unable to allocate memory +``` + +**Solutions:** + +1. **Large recordings in memory** + ```bash + # Run tests in smaller batches + pytest tests/integration/inference/ -k "not batch" + ``` + +2. **Model memory requirements** + ```bash + # Use smaller models for testing + pytest tests/integration/ --text-model=llama3.2:3b # Instead of 70B + ``` + +## Environment Issues + +### Python Environment Problems + +**Symptom:** +``` +ModuleNotFoundError: No module named 'llama_stack' +``` + +**Solutions:** + +1. **Wrong Python environment** + ```bash + # Verify uv environment + uv run python -c "import llama_stack; print('OK')" + + # Reinstall if needed + uv install --group dev + ``` + +2. **Development installation issues** + ```bash + # Reinstall in development mode + pip install -e . + + # Verify installation + python -c "import llama_stack; print(llama_stack.__file__)" + ``` + +### Path and Import Issues + +**Symptom:** +``` +ImportError: cannot import name 'LlamaStackClient' +``` + +**Solutions:** + +1. **PYTHONPATH issues** + ```bash + # Run from project root + cd /path/to/llama-stack + uv run pytest tests/integration/ + ``` + +2. **Relative import issues** + ```bash + # Use absolute imports in tests + from llama_stack_client import LlamaStackClient # Not relative + ``` + +## Debugging Techniques + +### Verbose Logging + +Enable detailed logging to understand what's happening: + +```bash +# Enable debug logging +LLAMA_STACK_LOG_LEVEL=DEBUG pytest tests/integration/inference/test_failing.py -v -s + +# Enable request/response logging +LLAMA_STACK_TEST_INFERENCE_MODE=live \ +LLAMA_STACK_LOG_LEVEL=DEBUG \ +pytest tests/integration/inference/test_failing.py -v -s +``` + +### Interactive Debugging + +Drop into debugger when tests fail: + +```bash +# Run with pdb on failure +pytest tests/integration/inference/test_failing.py --pdb + +# Or add breakpoint in test code +def test_something(llama_stack_client): + import pdb; pdb.set_trace() + # ... test code +``` + +### Isolation Testing + +Run tests in isolation to identify interactions: + +```bash +# Run single test +pytest tests/integration/inference/test_embedding.py::test_basic_embeddings + +# Run without recordings +rm -rf test_recordings/ +LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/inference/test_failing.py +``` + +### Recording Inspection + +Examine recordings to understand what's stored: + +```bash +# Check recording database +sqlite3 test_recordings/index.sqlite ".tables" +sqlite3 test_recordings/index.sqlite ".schema recordings" +sqlite3 test_recordings/index.sqlite "SELECT * FROM recordings LIMIT 5;" + +# Examine specific recording +find test_recordings/responses/ -name "*.json" | head -1 | xargs cat | jq '.' + +# Compare request hashes +python -c " +from llama_stack.testing.inference_recorder import normalize_request +print(normalize_request('POST', 'http://localhost:11434/v1/chat/completions', {}, {'model': 'llama3.2:3b', 'messages': [{'role': 'user', 'content': 'Hello'}]})) +" +``` + +## Getting Help + +### Information to Gather + +When reporting issues, include: + +1. **Environment details:** + ```bash + uv run python --version + uv run python -c "import llama_stack; print(llama_stack.__version__)" + uv list + ``` + +2. **Test command and output:** + ```bash + # Full command that failed + pytest tests/integration/inference/test_failing.py -v + + # Error message and stack trace + ``` + +3. **Configuration details:** + ```bash + # Stack configuration used + echo $LLAMA_STACK_TEST_INFERENCE_MODE + ls -la test_recordings/ + ``` + +4. **Provider status:** + ```bash + uv run llama stack list-providers + uv run llama stack list-models + ``` + +### Common Solutions Summary + +| Issue | Quick Fix | +|-------|-----------| +| Missing recordings | `LLAMA_STACK_TEST_INFERENCE_MODE=record pytest ...` | +| Connection refused | Check server: `curl http://localhost:5001/v1/health` | +| No tests collected | Add model: `--text-model=llama3.2:3b` | +| Authentication error | Set API key: `export PROVIDER_API_KEY=...` | +| Serialization error | Re-record: `rm recordings/*.json && record mode` | +| Slow tests | Use replay: `LLAMA_STACK_TEST_INFERENCE_MODE=replay` | + +Most testing issues stem from configuration mismatches or missing recordings. The record-replay system is designed to be forgiving, but requires consistent environment setup for optimal performance. \ No newline at end of file diff --git a/docs/source/contributing/testing/writing-tests.md b/docs/source/contributing/testing/writing-tests.md new file mode 100644 index 000000000..0f1c937c0 --- /dev/null +++ b/docs/source/contributing/testing/writing-tests.md @@ -0,0 +1,125 @@ +# Writing Tests + +How to write effective tests for Llama Stack. + +## Basic Test Pattern + +```python +def test_basic_completion(llama_stack_client, text_model_id): + """Test basic text completion functionality.""" + response = llama_stack_client.inference.completion( + model_id=text_model_id, + content=CompletionMessage(role="user", content="Hello"), + ) + + # Test structure, not AI output quality + assert response.completion_message is not None + assert isinstance(response.completion_message.content, str) + assert len(response.completion_message.content) > 0 +``` + +## Parameterized Tests + +```python +@pytest.mark.parametrize("temperature", [0.0, 0.5, 1.0]) +def test_completion_temperature(llama_stack_client, text_model_id, temperature): + response = llama_stack_client.inference.completion( + model_id=text_model_id, + content=CompletionMessage(role="user", content="Hello"), + sampling_params={"temperature": temperature} + ) + assert response.completion_message is not None +``` + +## Provider-Specific Tests + +```python +def test_asymmetric_embeddings(llama_stack_client, embedding_model_id): + if embedding_model_id not in MODELS_SUPPORTING_TASK_TYPE: + pytest.skip(f"Model {embedding_model_id} doesn't support task types") + + query_response = llama_stack_client.inference.embeddings( + model_id=embedding_model_id, + contents=["What is machine learning?"], + task_type="query" + ) + + passage_response = llama_stack_client.inference.embeddings( + model_id=embedding_model_id, + contents=["Machine learning is a subset of AI..."], + task_type="passage" + ) + + assert query_response.embeddings != passage_response.embeddings +``` + +## Fixtures + +```python +@pytest.fixture(scope="session") +def agent_config(llama_stack_client, text_model_id): + """Reusable agent configuration.""" + return { + "model": text_model_id, + "instructions": "You are a helpful assistant", + "tools": [], + "enable_session_persistence": False, + } + +@pytest.fixture(scope="function") +def fresh_session(llama_stack_client): + """Each test gets fresh state.""" + session = llama_stack_client.create_session() + yield session + session.delete() +``` + +## Common Test Patterns + +### Streaming Tests +```python +def test_streaming_completion(llama_stack_client, text_model_id): + stream = llama_stack_client.inference.completion( + model_id=text_model_id, + content=CompletionMessage(role="user", content="Count to 5"), + stream=True + ) + + chunks = list(stream) + assert len(chunks) > 1 + assert all(hasattr(chunk, 'delta') for chunk in chunks) +``` + +### Error Testing +```python +def test_invalid_model_error(llama_stack_client): + with pytest.raises(Exception) as exc_info: + llama_stack_client.inference.completion( + model_id="nonexistent-model", + content=CompletionMessage(role="user", content="Hello") + ) + assert "model" in str(exc_info.value).lower() +``` + +## What NOT to Test + +```python +# BAD: Testing AI output quality +def test_completion_quality(llama_stack_client, text_model_id): + response = llama_stack_client.inference.completion(...) + assert "correct answer" in response.content # Fragile! + +# GOOD: Testing response structure +def test_completion_structure(llama_stack_client, text_model_id): + response = llama_stack_client.inference.completion(...) + assert isinstance(response.completion_message.content, str) + assert len(response.completion_message.content) > 0 +``` + +## Best Practices + +- Test API contracts, not AI output quality +- Use descriptive test names +- Keep tests simple and focused +- Record new interactions only when needed +- Use appropriate fixture scopes (session vs function) \ No newline at end of file