Add rerank models to the dynamic model list; Fix integration tests

This commit is contained in:
Jiayi 2025-09-28 14:45:16 -07:00
parent 3538477070
commit 816b68fdc7
8 changed files with 247 additions and 25 deletions

View file

@ -18,14 +18,14 @@ title: Batches
## Overview
The Batches API enables efficient processing of multiple requests in a single operation,
particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale.
particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale.
The API is designed to allow use of openai client libraries for seamless integration.
The API is designed to allow use of openai client libraries for seamless integration.
This API provides the following extensions:
- idempotent batch creation
This API provides the following extensions:
- idempotent batch creation
Note: This API is currently under active development and may undergo changes.
Note: This API is currently under active development and may undergo changes.
This section contains documentation for all available providers for the **batches** API.

View file

@ -5,6 +5,7 @@ description: "Llama Stack Inference API for generating completions, chat complet
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models rerank the documents by relevance."
sidebar_label: Inference
title: Inference
---