feat: Add one-step integration testing with server auto-start

Add support for server:<config> format in --stack-config option to enable
seamless one-step integration testing. This eliminates the need to manually
start servers in separate terminals before running tests.

Features:
- Auto-start llama stack server if target port is available
- Reuse existing server if port is already in use
- Health check polling with 2-minute timeout
- Custom port support via server:<config>:<port>
- Clean test output with background server execution
- Backward compatibility with all existing formats

Examples:
  pytest tests/integration/inference/ --stack-config=server:fireworks
  pytest tests/integration/safety/ --stack-config=server:together:8322

Test Plan:
- Verified server auto-start with available ports
- Verified server reuse with occupied ports
- Verified health check polling via /v1/health endpoint
- Tested custom port configuration
- Confirmed backward compatibility with existing config formats
This commit is contained in:
ehhuang 2025-07-01 12:03:47 -07:00
parent 958600a5c1
commit 6060353016
2 changed files with 110 additions and 5 deletions

View file

@ -9,7 +9,9 @@ pytest --help
```
Here are the most important options:
- `--stack-config`: specify the stack config to use. You have three ways to point to a stack:
- `--stack-config`: specify the stack config to use. You have four ways to point to a stack:
- **`server:<config>`** - automatically start a server with the given config (e.g., `server:fireworks`). This provides one-step testing by auto-starting the server if the port is available, or reusing an existing server if already running.
- **`server:<config>:<port>`** - same as above but with a custom port (e.g., `server:together:8322`)
- a URL which points to a Llama Stack distribution server
- a template (e.g., `fireworks`, `together`) or a path to a `run.yaml` file
- a comma-separated list of api=provider pairs, e.g. `inference=fireworks,safety=llama-guard,agents=meta-reference`. This is most useful for testing a single API surface.
@ -26,12 +28,39 @@ Model parameters can be influenced by the following options:
Each of these are comma-separated lists and can be used to generate multiple parameter combinations. Note that tests will be skipped
if no model is specified.
Experimental, under development, options:
- `--record-responses`: record new API responses instead of using cached ones
## Examples
### Testing against a Server
Run all text inference tests by auto-starting a server with the `fireworks` config:
```bash
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config=server:fireworks \
--text-model=meta-llama/Llama-3.1-8B-Instruct
```
Run tests with auto-server startup on a custom port:
```bash
pytest -s -v tests/integration/inference/ \
--stack-config=server:together:8322 \
--text-model=meta-llama/Llama-3.1-8B-Instruct
```
Run multiple test suites with auto-server (eliminates manual server management):
```bash
# Auto-start server and run all integration tests
export FIREWORKS_API_KEY=<your_key>
pytest -s -v tests/integration/inference/ tests/integration/safety/ tests/integration/agents/ \
--stack-config=server:fireworks \
--text-model=meta-llama/Llama-3.1-8B-Instruct
```
### Testing with Library Client
Run all text inference tests with the `together` distribution:
```bash