mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-13 06:07:58 +00:00
docs: concepts and building_applications migration (#3534)
# What does this PR do? - Migrates the remaining documentation sections to the new documentation format <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Partial migration <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
parent
05ff4c4420
commit
c71ce8df61
82 changed files with 2535 additions and 1237 deletions
244
docs/docs/contributing/index.mdx
Normal file
244
docs/docs/contributing/index.mdx
Normal file
|
@ -0,0 +1,244 @@
|
|||
# Contributing to Llama Stack
|
||||
We want to make contributing to this project as easy and transparent as
|
||||
possible.
|
||||
|
||||
## Set up your development environment
|
||||
|
||||
We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.
|
||||
You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).
|
||||
|
||||
You can install the dependencies by running:
|
||||
|
||||
```bash
|
||||
cd llama-stack
|
||||
uv sync --group dev
|
||||
uv pip install -e .
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
```{note}
|
||||
You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`).
|
||||
Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
|
||||
For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
|
||||
```
|
||||
|
||||
Note that you can create a dotenv file `.env` that includes necessary environment variables:
|
||||
```
|
||||
LLAMA_STACK_BASE_URL=http://localhost:8321
|
||||
LLAMA_STACK_CLIENT_LOG=debug
|
||||
LLAMA_STACK_PORT=8321
|
||||
LLAMA_STACK_CONFIG=<provider-name>
|
||||
TAVILY_SEARCH_API_KEY=
|
||||
BRAVE_SEARCH_API_KEY=
|
||||
```
|
||||
|
||||
And then use this dotenv file when running client SDK tests via the following:
|
||||
```bash
|
||||
uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
### Pre-commit Hooks
|
||||
|
||||
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
|
||||
|
||||
```bash
|
||||
uv run pre-commit install
|
||||
```
|
||||
|
||||
After that, pre-commit hooks will run automatically before each commit.
|
||||
|
||||
Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
|
||||
|
||||
```bash
|
||||
uv run pre-commit run --all-files
|
||||
```
|
||||
|
||||
```{caution}
|
||||
Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
|
||||
```
|
||||
|
||||
## Discussions -> Issues -> Pull Requests
|
||||
|
||||
We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
|
||||
|
||||
If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
|
||||
|
||||
### Issues
|
||||
We use GitHub issues to track public bugs. Please ensure your description is
|
||||
clear and has sufficient instructions to be able to reproduce the issue.
|
||||
|
||||
Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
|
||||
disclosure of security bugs. In those cases, please go through the process
|
||||
outlined on that page and do not file a public issue.
|
||||
|
||||
### Contributor License Agreement ("CLA")
|
||||
In order to accept your pull request, we need you to submit a CLA. You only need
|
||||
to do this once to work on any of Meta's open source projects.
|
||||
|
||||
Complete your CLA here: <https://code.facebook.com/cla>
|
||||
|
||||
**I'd like to contribute!**
|
||||
|
||||
If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested
|
||||
leave a comment on the issue and a triager will assign it to you.
|
||||
|
||||
Please avoid picking up too many issues at once. This helps you stay focused and ensures that others in the community also have opportunities to contribute.
|
||||
- Try to work on only 1–2 issues at a time, especially if you’re still getting familiar with the codebase.
|
||||
- Before taking an issue, check if it’s already assigned or being actively discussed.
|
||||
- If you’re blocked or can’t continue with an issue, feel free to unassign yourself or leave a comment so others can step in.
|
||||
|
||||
**I have a bug!**
|
||||
|
||||
1. Search the issue tracker and discussions for similar issues.
|
||||
2. If you don't have steps to reproduce, open a discussion.
|
||||
3. If you have steps to reproduce, open an issue.
|
||||
|
||||
**I have an idea for a feature!**
|
||||
|
||||
1. Open a discussion.
|
||||
|
||||
**I've implemented a feature!**
|
||||
|
||||
1. If there is an issue for the feature, open a pull request.
|
||||
2. If there is no issue, open a discussion and link to your branch.
|
||||
|
||||
**I have a question!**
|
||||
|
||||
1. Open a discussion or use [Discord](https://discord.gg/llama-stack).
|
||||
|
||||
|
||||
**Opening a Pull Request**
|
||||
|
||||
1. Fork the repo and create your branch from `main`.
|
||||
2. If you've changed APIs, update the documentation.
|
||||
3. Ensure the test suite passes.
|
||||
4. Make sure your code lints using `pre-commit`.
|
||||
5. If you haven't already, complete the Contributor License Agreement ("CLA").
|
||||
6. Ensure your pull request follows the [conventional commits format](https://www.conventionalcommits.org/en/v1.0.0/).
|
||||
7. Ensure your pull request follows the [coding style](#coding-style).
|
||||
|
||||
|
||||
Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing.
|
||||
|
||||
```{tip}
|
||||
As a general guideline:
|
||||
- Experienced contributors should try to keep no more than 5 open PRs at a time.
|
||||
- New contributors are encouraged to have only one open PR at a time until they’re familiar with the codebase and process.
|
||||
```
|
||||
|
||||
## Repository guidelines
|
||||
|
||||
### Coding Style
|
||||
|
||||
* Comments should provide meaningful insights into the code. Avoid filler comments that simply
|
||||
describe the next step, as they create unnecessary clutter, same goes for docstrings.
|
||||
* Prefer comments to clarify surprising behavior and/or relationships between parts of the code
|
||||
rather than explain what the next line of code does.
|
||||
* Catching exceptions, prefer using a specific exception type rather than a broad catch-all like
|
||||
`Exception`.
|
||||
* Error messages should be prefixed with "Failed to ..."
|
||||
* 4 spaces for indentation rather than tab
|
||||
* When using `# noqa` to suppress a style or linter warning, include a comment explaining the
|
||||
justification for bypassing the check.
|
||||
* When using `# type: ignore` to suppress a mypy warning, include a comment explaining the
|
||||
justification for bypassing the check.
|
||||
* Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or
|
||||
readability reasons.
|
||||
* Providers configuration class should be Pydantic Field class. It should have a `description` field
|
||||
that describes the configuration. These descriptions will be used to generate the provider
|
||||
documentation.
|
||||
* When possible, use keyword arguments only when calling functions.
|
||||
* Llama Stack utilizes [custom Exception classes](llama_stack/apis/common/errors.py) for certain Resources that should be used where applicable.
|
||||
|
||||
### License
|
||||
By contributing to Llama, you agree that your contributions will be licensed
|
||||
under the LICENSE file in the root directory of this source tree.
|
||||
|
||||
## Common Tasks
|
||||
|
||||
Some tips about common tasks you work on while contributing to Llama Stack:
|
||||
|
||||
### Using `llama stack build`
|
||||
|
||||
Building a stack image will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.
|
||||
|
||||
Example:
|
||||
```bash
|
||||
cd work/
|
||||
git clone https://github.com/meta-llama/llama-stack.git
|
||||
git clone https://github.com/meta-llama/llama-stack-client-python.git
|
||||
cd llama-stack
|
||||
LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --distro <...>
|
||||
```
|
||||
|
||||
### Updating distribution configurations
|
||||
|
||||
If you have made changes to a provider's configuration in any form (introducing a new config key, or
|
||||
changing models, etc.), you should run `./scripts/distro_codegen.py` to re-generate various YAML
|
||||
files as well as the documentation. You should not change `docs/source/.../distributions/` files
|
||||
manually as they are auto-generated.
|
||||
|
||||
### Updating the provider documentation
|
||||
|
||||
If you have made changes to a provider's configuration, you should run `./scripts/provider_codegen.py`
|
||||
to re-generate the documentation. You should not change `docs/source/.../providers/` files manually
|
||||
as they are auto-generated.
|
||||
Note that the provider "description" field will be used to generate the provider documentation.
|
||||
|
||||
### Building the Documentation
|
||||
|
||||
If you are making changes to the documentation at [https://llamastack.github.io/latest/](https://llamastack.github.io/latest/), you can use the following command to build the documentation and preview your changes. You will need [Sphinx](https://www.sphinx-doc.org/en/master/) and the readthedocs theme.
|
||||
|
||||
```bash
|
||||
# This rebuilds the documentation pages.
|
||||
uv run --group docs make -C docs/ html
|
||||
|
||||
# This will start a local server (usually at http://127.0.0.1:8000) that automatically rebuilds and refreshes when you make changes to the documentation.
|
||||
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
|
||||
```
|
||||
|
||||
### Update API Documentation
|
||||
|
||||
If you modify or add new API endpoints, update the API documentation accordingly. You can do this by running the following command:
|
||||
|
||||
```bash
|
||||
uv run ./docs/openapi_generator/run_openapi_generator.sh
|
||||
```
|
||||
|
||||
The generated API documentation will be available in `docs/_static/`. Make sure to review the changes before committing.
|
||||
|
||||
## Adding a New Provider
|
||||
|
||||
See:
|
||||
- [Adding a New API Provider Page](new_api_provider.md) which describes how to add new API providers to the Stack.
|
||||
- [Vector Database Page](new_vector_database.md) which describes how to add a new vector databases with Llama Stack.
|
||||
- [External Provider Page](../providers/external/index.md) which describes how to add external providers to the Stack.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
new_api_provider
|
||||
new_vector_database
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
|
||||
```{include} ../../../tests/README.md
|
||||
```
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
For developers who need deeper understanding of the testing system internals:
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
testing/record-replay
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
|
||||
```{include} ../../../benchmarking/k8s-benchmark/README.md
|
||||
```
|
98
docs/docs/contributing/new_api_provider.mdx
Normal file
98
docs/docs/contributing/new_api_provider.mdx
Normal file
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
title: Adding a New API Provider
|
||||
description: Guide for adding new API providers to Llama Stack
|
||||
sidebar_label: New API Provider
|
||||
sidebar_position: 2
|
||||
---
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
This guide will walk you through the process of adding a new API provider to Llama Stack.
|
||||
|
||||
|
||||
- Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
|
||||
- Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally.
|
||||
- Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary.
|
||||
- Update any distribution {repopath}`Templates::llama_stack/distributions/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
|
||||
|
||||
|
||||
Here are some example PRs to help you get started:
|
||||
- [Grok Inference Implementation](https://github.com/meta-llama/llama-stack/pull/609)
|
||||
- [Nvidia Inference Implementation](https://github.com/meta-llama/llama-stack/pull/355)
|
||||
- [Model context protocol Tool Runtime](https://github.com/meta-llama/llama-stack/pull/665)
|
||||
|
||||
## Guidelines for creating Internal or External Providers
|
||||
|
||||
|**Type** |Internal (In-tree) |External (out-of-tree)
|
||||
|---------|-------------------|---------------------|
|
||||
|**Description** |A provider that is directly in the Llama Stack code|A provider that is outside of the Llama stack core codebase but is still accessible and usable by Llama Stack.
|
||||
|**Benefits** |Ability to interact with the provider with minimal additional configurations or installations| Contributors do not have to add directly to the code to create providers accessible on Llama Stack. Keep provider-specific code separate from the core Llama Stack code.
|
||||
|
||||
## Inference Provider Patterns
|
||||
|
||||
When implementing Inference providers for OpenAI-compatible APIs, Llama Stack provides several mixin classes to simplify development and ensure consistent behavior across providers.
|
||||
|
||||
### OpenAIMixin
|
||||
|
||||
The `OpenAIMixin` class provides direct OpenAI API functionality for providers that work with OpenAI-compatible endpoints. It includes:
|
||||
|
||||
#### Direct API Methods
|
||||
- **`openai_completion()`**: Legacy text completion API with full parameter support
|
||||
- **`openai_chat_completion()`**: Chat completion API supporting streaming, tools, and function calling
|
||||
- **`openai_embeddings()`**: Text embeddings generation with customizable encoding and dimensions
|
||||
|
||||
#### Model Management
|
||||
- **`check_model_availability()`**: Queries the API endpoint to verify if a model exists and is accessible
|
||||
|
||||
#### Client Management
|
||||
- **`client` property**: Automatically creates and configures AsyncOpenAI client instances using your provider's credentials
|
||||
|
||||
#### Required Implementation
|
||||
|
||||
To use `OpenAIMixin`, your provider must implement these abstract methods:
|
||||
|
||||
```python
|
||||
@abstractmethod
|
||||
def get_api_key(self) -> str:
|
||||
"""Return the API key for authentication"""
|
||||
pass
|
||||
|
||||
|
||||
@abstractmethod
|
||||
def get_base_url(self) -> str:
|
||||
"""Return the OpenAI-compatible API base URL"""
|
||||
pass
|
||||
```
|
||||
|
||||
## Testing the Provider
|
||||
|
||||
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --distro together`.
|
||||
|
||||
### 1. Integration Testing
|
||||
|
||||
Integration tests are located in {repopath}`tests/integration`. These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality. Since these tests use client APIs, they can be run either by pointing to an instance of the Llama Stack server or "inline" by using `LlamaStackAsLibraryClient`.
|
||||
|
||||
Consult {repopath}`tests/integration/README.md` for more details on how to run the tests.
|
||||
|
||||
Note that each provider's `sample_run_config()` method (in the configuration class for that provider)
|
||||
typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
|
||||
|
||||
|
||||
### 2. Unit Testing
|
||||
|
||||
Unit tests are located in {repopath}`tests/unit`. Provider-specific unit tests are located in {repopath}`tests/unit/providers`. These tests are all run automatically as part of the CI process.
|
||||
|
||||
Consult {repopath}`tests/unit/README.md` for more details on how to run the tests manually.
|
||||
|
||||
### 3. Additional end-to-end testing
|
||||
|
||||
1. Start a Llama Stack server with your new provider
|
||||
2. Verify compatibility with existing client scripts in the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) repository
|
||||
3. Document which scripts are compatible with your provider
|
||||
|
||||
## Submitting Your PR
|
||||
|
||||
1. Ensure all tests pass
|
||||
2. Include a comprehensive test plan in your PR summary
|
||||
3. Document any known limitations or considerations
|
83
docs/docs/contributing/new_vector_database.mdx
Normal file
83
docs/docs/contributing/new_vector_database.mdx
Normal file
|
@ -0,0 +1,83 @@
|
|||
---
|
||||
title: Adding a New Vector Database
|
||||
description: Guide for adding new vector database providers to Llama Stack
|
||||
sidebar_label: New Vector Database
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
This guide will walk you through the process of adding a new vector database to Llama Stack.
|
||||
|
||||
> **_NOTE:_** Here's an example Pull Request of the [Milvus Vector Database Provider](https://github.com/meta-llama/llama-stack/pull/1467).
|
||||
|
||||
Vector Database providers are used to store and retrieve vector embeddings. Vector databases are not limited to vector
|
||||
search but can support keyword and hybrid search. Additionally, vector database can also support operations like
|
||||
filtering, sorting, and aggregating vectors.
|
||||
|
||||
## Steps to Add a New Vector Database Provider
|
||||
1. **Choose the Database Type**: Determine if your vector database is a remote service, inline, or both.
|
||||
- Remote databases make requests to external services, while inline databases execute locally. Some providers support both.
|
||||
2. **Implement the Provider**: Create a new provider class that inherits from `VectorDatabaseProvider` and implements the required methods.
|
||||
- Implement methods for vector storage, retrieval, search, and any additional features your database supports.
|
||||
- You will need to implement the following methods for `YourVectorIndex`:
|
||||
- `YourVectorIndex.create()`
|
||||
- `YourVectorIndex.initialize()`
|
||||
- `YourVectorIndex.add_chunks()`
|
||||
- `YourVectorIndex.delete_chunk()`
|
||||
- `YourVectorIndex.query_vector()`
|
||||
- `YourVectorIndex.query_keyword()`
|
||||
- `YourVectorIndex.query_hybrid()`
|
||||
- You will need to implement the following methods for `YourVectorIOAdapter`:
|
||||
- `YourVectorIOAdapter.initialize()`
|
||||
- `YourVectorIOAdapter.shutdown()`
|
||||
- `YourVectorIOAdapter.list_vector_dbs()`
|
||||
- `YourVectorIOAdapter.register_vector_db()`
|
||||
- `YourVectorIOAdapter.unregister_vector_db()`
|
||||
- `YourVectorIOAdapter.insert_chunks()`
|
||||
- `YourVectorIOAdapter.query_chunks()`
|
||||
- `YourVectorIOAdapter.delete_chunks()`
|
||||
3. **Add to Registry**: Register your provider in the appropriate registry file.
|
||||
- Update {repopath}`llama_stack/providers/registry/vector_io.py` to include your new provider.
|
||||
```python
|
||||
from llama_stack.providers.registry.specs import InlineProviderSpec
|
||||
from llama_stack.providers.registry.api import Api
|
||||
|
||||
InlineProviderSpec(
|
||||
api=Api.vector_io,
|
||||
provider_type="inline::milvus",
|
||||
pip_packages=["pymilvus>=2.4.10"],
|
||||
module="llama_stack.providers.inline.vector_io.milvus",
|
||||
config_class="llama_stack.providers.inline.vector_io.milvus.MilvusVectorIOConfig",
|
||||
api_dependencies=[Api.inference],
|
||||
optional_api_dependencies=[Api.files],
|
||||
description="",
|
||||
),
|
||||
```
|
||||
4. **Add Tests**: Create unit tests and integration tests for your provider in the `tests/` directory.
|
||||
- Unit Tests
|
||||
- By following the structure of the class methods, you will be able to easily run unit and integration tests for your database.
|
||||
1. You have to configure the tests for your provide in `/tests/unit/providers/vector_io/conftest.py`.
|
||||
2. Update the `vector_provider` fixture to include your provider if they are an inline provider.
|
||||
3. Create a `your_vectorprovider_index` fixture that initializes your vector index.
|
||||
4. Create a `your_vectorprovider_adapter` fixture that initializes your vector adapter.
|
||||
5. Add your provider to the `vector_io_providers` fixture dictionary.
|
||||
- Please follow the naming convention of `your_vectorprovider_index` and `your_vectorprovider_adapter` as the tests require this to execute properly.
|
||||
- Integration Tests
|
||||
- Integration tests are located in {repopath}`tests/integration`. These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality.
|
||||
- The two set of integration tests are:
|
||||
- `tests/integration/vector_io/test_vector_io.py`: This file tests registration, insertion, and retrieval.
|
||||
- `tests/integration/vector_io/test_openai_vector_stores.py`: These tests are for OpenAI-compatible vector stores and test the OpenAI API compatibility.
|
||||
- You will need to update `skip_if_provider_doesnt_support_openai_vector_stores` to include your provider as well as `skip_if_provider_doesnt_support_openai_vector_stores_search` to test the appropriate search functionality.
|
||||
- Running the tests in the GitHub CI
|
||||
- You will need to update the `.github/workflows/integration-vector-io-tests.yml` file to include your provider.
|
||||
- If your provider is a remote provider, you will also have to add a container to spin up and run it in the action.
|
||||
- Updating the pyproject.yml
|
||||
- If you are adding tests for the `inline` provider you will have to update the `unit` group.
|
||||
- `uv add new_pip_package --group unit`
|
||||
- If you are adding tests for the `remote` provider you will have to update the `test` group, which is used in the GitHub CI for integration tests.
|
||||
- `uv add new_pip_package --group test`
|
||||
5. **Update Documentation**: Please update the documentation for end users
|
||||
- Generate the provider documentation by running {repopath}`./scripts/provider_codegen.py`.
|
||||
- Update the autogenerated content in the registry/vector_io.py file with information about your provider. Please see other providers for examples.
|
241
docs/docs/contributing/testing/record-replay.mdx
Normal file
241
docs/docs/contributing/testing/record-replay.mdx
Normal file
|
@ -0,0 +1,241 @@
|
|||
---
|
||||
title: Record-Replay Testing System
|
||||
description: Understanding how Llama Stack captures and replays API interactions for testing
|
||||
sidebar_label: Record-Replay System
|
||||
sidebar_position: 4
|
||||
---
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Record-Replay System
|
||||
|
||||
Understanding how Llama Stack captures and replays API interactions for testing.
|
||||
|
||||
## Overview
|
||||
|
||||
The record-replay system solves a fundamental challenge in AI testing: how do you test against expensive, non-deterministic APIs without breaking the bank or dealing with flaky tests?
|
||||
|
||||
The solution: intercept API calls, store real responses, and replay them later. This gives you real API behavior without the cost or variability.
|
||||
|
||||
## How It Works
|
||||
|
||||
### Request Hashing
|
||||
|
||||
Every API request gets converted to a deterministic hash for lookup:
|
||||
|
||||
```python
|
||||
def normalize_request(method: str, url: str, headers: dict, body: dict) -> str:
|
||||
normalized = {
|
||||
"method": method.upper(),
|
||||
"endpoint": urlparse(url).path, # Just the path, not full URL
|
||||
"body": body, # Request parameters
|
||||
}
|
||||
return hashlib.sha256(json.dumps(normalized, sort_keys=True).encode()).hexdigest()
|
||||
```
|
||||
|
||||
**Key insight:** The hashing is intentionally precise. Different whitespace, float precision, or parameter order produces different hashes. This prevents subtle bugs from false cache hits.
|
||||
|
||||
```python
|
||||
# These produce DIFFERENT hashes:
|
||||
{"content": "Hello world"}
|
||||
{"content": "Hello world\n"}
|
||||
{"temperature": 0.7}
|
||||
{"temperature": 0.7000001}
|
||||
```
|
||||
|
||||
### Client Interception
|
||||
|
||||
The system patches OpenAI and Ollama client methods to intercept calls before they leave your application. This happens transparently - your test code doesn't change.
|
||||
|
||||
### Storage Architecture
|
||||
|
||||
Recordings are stored as JSON files in the recording directory. They are looked up by their request hash.
|
||||
|
||||
```
|
||||
recordings/
|
||||
└── responses/
|
||||
├── abc123def456.json # Individual response files
|
||||
└── def789ghi012.json
|
||||
```
|
||||
|
||||
**JSON files** store complete request/response pairs in human-readable format for debugging.
|
||||
|
||||
## Recording Modes
|
||||
|
||||
### LIVE Mode
|
||||
|
||||
Direct API calls with no recording or replay:
|
||||
|
||||
```python
|
||||
with inference_recording(mode=InferenceMode.LIVE):
|
||||
response = await client.chat.completions.create(...)
|
||||
```
|
||||
|
||||
Use for initial development and debugging against real APIs.
|
||||
|
||||
### RECORD Mode
|
||||
|
||||
Captures API interactions while passing through real responses:
|
||||
|
||||
```python
|
||||
with inference_recording(mode=InferenceMode.RECORD, storage_dir="./recordings"):
|
||||
response = await client.chat.completions.create(...)
|
||||
# Real API call made, response captured AND returned
|
||||
```
|
||||
|
||||
The recording process:
|
||||
1. Request intercepted and hashed
|
||||
2. Real API call executed
|
||||
3. Response captured and serialized
|
||||
4. Recording stored to disk
|
||||
5. Original response returned to caller
|
||||
|
||||
### REPLAY Mode
|
||||
|
||||
Returns stored responses instead of making API calls:
|
||||
|
||||
```python
|
||||
with inference_recording(mode=InferenceMode.REPLAY, storage_dir="./recordings"):
|
||||
response = await client.chat.completions.create(...)
|
||||
# No API call made, cached response returned instantly
|
||||
```
|
||||
|
||||
The replay process:
|
||||
1. Request intercepted and hashed
|
||||
2. Hash looked up in SQLite index
|
||||
3. Response loaded from JSON file
|
||||
4. Response deserialized and returned
|
||||
5. Error if no recording found
|
||||
|
||||
## Streaming Support
|
||||
|
||||
Streaming APIs present a unique challenge: how do you capture an async generator?
|
||||
|
||||
### The Problem
|
||||
|
||||
```python
|
||||
# How do you record this?
|
||||
async for chunk in client.chat.completions.create(stream=True):
|
||||
process(chunk)
|
||||
```
|
||||
|
||||
### The Solution
|
||||
|
||||
The system captures all chunks immediately before yielding any:
|
||||
|
||||
```python
|
||||
async def handle_streaming_record(response):
|
||||
# Capture complete stream first
|
||||
chunks = []
|
||||
async for chunk in response:
|
||||
chunks.append(chunk)
|
||||
|
||||
# Store complete recording
|
||||
storage.store_recording(
|
||||
request_hash, request_data, {"body": chunks, "is_streaming": True}
|
||||
)
|
||||
|
||||
# Return generator that replays captured chunks
|
||||
async def replay_stream():
|
||||
for chunk in chunks:
|
||||
yield chunk
|
||||
|
||||
return replay_stream()
|
||||
```
|
||||
|
||||
This ensures:
|
||||
- **Complete capture** - The entire stream is saved atomically
|
||||
- **Interface preservation** - The returned object behaves like the original API
|
||||
- **Deterministic replay** - Same chunks in the same order every time
|
||||
|
||||
## Serialization
|
||||
|
||||
API responses contain complex Pydantic objects that need careful serialization:
|
||||
|
||||
```python
|
||||
def _serialize_response(response):
|
||||
if hasattr(response, "model_dump"):
|
||||
# Preserve type information for proper deserialization
|
||||
return {
|
||||
"__type__": f"{response.__class__.__module__}.{response.__class__.__qualname__}",
|
||||
"__data__": response.model_dump(mode="json"),
|
||||
}
|
||||
return response
|
||||
```
|
||||
|
||||
This preserves type safety - when replayed, you get the same Pydantic objects with all their validation and methods.
|
||||
|
||||
## Environment Integration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Control recording behavior globally:
|
||||
|
||||
```bash
|
||||
export LLAMA_STACK_TEST_INFERENCE_MODE=replay # this is the default
|
||||
export LLAMA_STACK_TEST_RECORDING_DIR=/path/to/recordings # default is tests/integration/recordings
|
||||
pytest tests/integration/
|
||||
```
|
||||
|
||||
### Pytest Integration
|
||||
|
||||
The system integrates automatically based on environment variables, requiring no changes to test code.
|
||||
|
||||
## Debugging Recordings
|
||||
|
||||
### Inspecting Storage
|
||||
|
||||
```bash
|
||||
# See what's recorded
|
||||
sqlite3 recordings/index.sqlite "SELECT endpoint, model, timestamp FROM recordings LIMIT 10;"
|
||||
|
||||
# View specific response
|
||||
cat recordings/responses/abc123def456.json | jq '.response.body'
|
||||
|
||||
# Find recordings by endpoint
|
||||
sqlite3 recordings/index.sqlite "SELECT * FROM recordings WHERE endpoint='/v1/chat/completions';"
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Hash mismatches:** Request parameters changed slightly between record and replay
|
||||
```bash
|
||||
# Compare request details
|
||||
cat recordings/responses/abc123.json | jq '.request'
|
||||
```
|
||||
|
||||
**Serialization errors:** Response types changed between versions
|
||||
```bash
|
||||
# Re-record with updated types
|
||||
rm recordings/responses/failing_hash.json
|
||||
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest test_failing.py
|
||||
```
|
||||
|
||||
**Missing recordings:** New test or changed parameters
|
||||
```bash
|
||||
# Record the missing interaction
|
||||
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest test_new.py
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Not Mocks?
|
||||
|
||||
Traditional mocking breaks down with AI APIs because:
|
||||
- Response structures are complex and evolve frequently
|
||||
- Streaming behavior is hard to mock correctly
|
||||
- Edge cases in real APIs get missed
|
||||
- Mocks become brittle maintenance burdens
|
||||
|
||||
### Why Precise Hashing?
|
||||
|
||||
Loose hashing (normalizing whitespace, rounding floats) seems convenient but hides bugs. If a test changes slightly, you want to know about it rather than accidentally getting the wrong cached response.
|
||||
|
||||
### Why JSON + SQLite?
|
||||
|
||||
- **JSON** - Human readable, diff-friendly, easy to inspect and modify
|
||||
- **SQLite** - Fast indexed lookups without loading response bodies
|
||||
- **Hybrid** - Best of both worlds for different use cases
|
||||
|
||||
This system provides reliable, fast testing against real AI APIs while maintaining the ability to debug issues when they arise.
|
Loading…
Add table
Add a link
Reference in a new issue