llama-stack-mirror/docs/source/contributing/new_api_provider.md
Matthew Farrellee e1ed152779
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m14s
chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835)
# What does this PR do?

add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.

use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter

## Test Plan

existing unit and integration tests
2025-07-23 06:49:40 -04:00

4.5 KiB

Adding a New API Provider

This guide will walk you through the process of adding a new API provider to Llama Stack.

  • Begin by reviewing the core concepts of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
  • Determine the provider type ({repopath}Remote::llama_stack/providers/remote or {repopath}Inline::llama_stack/providers/inline). Remote providers make requests to external services, while inline providers execute implementation locally.
  • Add your provider to the appropriate {repopath}Registry::llama_stack/providers/registry/. Specify pip dependencies necessary.
  • Update any distribution {repopath}Templates::llama_stack/templates/ build.yaml and run.yaml files if they should include your provider by default. Run {repopath}./scripts/distro_codegen.py if necessary. Note that distro_codegen.py will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's get_distribution_template() code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.

Here are some example PRs to help you get started:

Inference Provider Patterns

When implementing Inference providers for OpenAI-compatible APIs, Llama Stack provides several mixin classes to simplify development and ensure consistent behavior across providers.

OpenAIMixin

The OpenAIMixin class provides direct OpenAI API functionality for providers that work with OpenAI-compatible endpoints. It includes:

Direct API Methods

  • openai_completion(): Legacy text completion API with full parameter support
  • openai_chat_completion(): Chat completion API supporting streaming, tools, and function calling
  • openai_embeddings(): Text embeddings generation with customizable encoding and dimensions

Model Management

  • check_model_availability(): Queries the API endpoint to verify if a model exists and is accessible

Client Management

  • client property: Automatically creates and configures AsyncOpenAI client instances using your provider's credentials

Required Implementation

To use OpenAIMixin, your provider must implement these abstract methods:

@abstractmethod
def get_api_key(self) -> str:
    """Return the API key for authentication"""
    pass


@abstractmethod
def get_base_url(self) -> str:
    """Return the OpenAI-compatible API base URL"""
    pass

Testing the Provider

Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the together distribution, you should install dependencies via llama stack build --template together.

1. Integration Testing

Integration tests are located in {repopath}tests/integration. These tests use the python client-SDK APIs (from the llama_stack_client package) to test functionality. Since these tests use client APIs, they can be run either by pointing to an instance of the Llama Stack server or "inline" by using LlamaStackAsLibraryClient.

Consult {repopath}tests/integration/README.md for more details on how to run the tests.

Note that each provider's sample_run_config() method (in the configuration class for that provider) typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the --env flag to the test command.

2. Unit Testing

Unit tests are located in {repopath}tests/unit. Provider-specific unit tests are located in {repopath}tests/unit/providers. These tests are all run automatically as part of the CI process.

Consult {repopath}tests/unit/README.md for more details on how to run the tests manually.

3. Additional end-to-end testing

  1. Start a Llama Stack server with your new provider
  2. Verify compatibility with existing client scripts in the llama-stack-apps repository
  3. Document which scripts are compatible with your provider

Submitting Your PR

  1. Ensure all tests pass
  2. Include a comprehensive test plan in your PR summary
  3. Document any known limitations or considerations