This commit is contained in:
Anastas Stoyanovsky 2025-12-02 09:58:57 +01:00 committed by GitHub
commit e484c29d45
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 31 additions and 30 deletions

View file

@ -2,7 +2,7 @@
description: | description: |
Agents Agents
APIs for creating and interacting with agentic systems. APIs for creating and interacting with agentic systems.
sidebar_label: Agents sidebar_label: Agents
title: Agents title: Agents
--- ---
@ -13,6 +13,6 @@ title: Agents
Agents Agents
APIs for creating and interacting with agentic systems. APIs for creating and interacting with agentic systems.
This section contains documentation for all available providers for the **agents** API. This section contains documentation for all available providers for the **agents** API.

View file

@ -1,15 +1,15 @@
--- ---
description: | description: |
The Batches API enables efficient processing of multiple requests in a single operation, The Batches API enables efficient processing of multiple requests in a single operation,
particularly useful for processing large datasets, batch evaluation workflows, and particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale. cost-effective inference at scale.
The API is designed to allow use of openai client libraries for seamless integration. The API is designed to allow use of openai client libraries for seamless integration.
This API provides the following extensions: This API provides the following extensions:
- idempotent batch creation - idempotent batch creation
Note: This API is currently under active development and may undergo changes. Note: This API is currently under active development and may undergo changes.
sidebar_label: Batches sidebar_label: Batches
title: Batches title: Batches
--- ---
@ -19,14 +19,14 @@ title: Batches
## Overview ## Overview
The Batches API enables efficient processing of multiple requests in a single operation, The Batches API enables efficient processing of multiple requests in a single operation,
particularly useful for processing large datasets, batch evaluation workflows, and particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale. cost-effective inference at scale.
The API is designed to allow use of openai client libraries for seamless integration. The API is designed to allow use of openai client libraries for seamless integration.
This API provides the following extensions: This API provides the following extensions:
- idempotent batch creation - idempotent batch creation
Note: This API is currently under active development and may undergo changes. Note: This API is currently under active development and may undergo changes.
This section contains documentation for all available providers for the **batches** API. This section contains documentation for all available providers for the **batches** API.

View file

@ -2,7 +2,7 @@
description: | description: |
Evaluations Evaluations
Llama Stack Evaluation API for running evaluations on model and agent candidates. Llama Stack Evaluation API for running evaluations on model and agent candidates.
sidebar_label: Eval sidebar_label: Eval
title: Eval title: Eval
--- ---
@ -13,6 +13,6 @@ title: Eval
Evaluations Evaluations
Llama Stack Evaluation API for running evaluations on model and agent candidates. Llama Stack Evaluation API for running evaluations on model and agent candidates.
This section contains documentation for all available providers for the **eval** API. This section contains documentation for all available providers for the **eval** API.

View file

@ -2,7 +2,7 @@
description: | description: |
Files Files
This API is used to upload documents that can be used with other Llama Stack APIs. This API is used to upload documents that can be used with other Llama Stack APIs.
sidebar_label: Files sidebar_label: Files
title: Files title: Files
--- ---
@ -13,6 +13,6 @@ title: Files
Files Files
This API is used to upload documents that can be used with other Llama Stack APIs. This API is used to upload documents that can be used with other Llama Stack APIs.
This section contains documentation for all available providers for the **files** API. This section contains documentation for all available providers for the **files** API.

View file

@ -2,12 +2,12 @@
description: | description: |
Inference Inference
Llama Stack Inference API for generating completions, chat completions, and embeddings. Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Three kinds of models are supported: This API provides the raw interface to the underlying models. Three kinds of models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions. - LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search. - Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models reorder the documents based on their relevance to a query. - Rerank models: these models reorder the documents based on their relevance to a query.
sidebar_label: Inference sidebar_label: Inference
title: Inference title: Inference
--- ---
@ -18,11 +18,11 @@ title: Inference
Inference Inference
Llama Stack Inference API for generating completions, chat completions, and embeddings. Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Three kinds of models are supported: This API provides the raw interface to the underlying models. Three kinds of models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions. - LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search. - Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models reorder the documents based on their relevance to a query. - Rerank models: these models reorder the documents based on their relevance to a query.
This section contains documentation for all available providers for the **inference** API. This section contains documentation for all available providers for the **inference** API.

View file

@ -2,7 +2,7 @@
description: | description: |
Safety Safety
OpenAI-compatible Moderations API. OpenAI-compatible Moderations API.
sidebar_label: Safety sidebar_label: Safety
title: Safety title: Safety
--- ---
@ -13,6 +13,6 @@ title: Safety
Safety Safety
OpenAI-compatible Moderations API. OpenAI-compatible Moderations API.
This section contains documentation for all available providers for the **safety** API. This section contains documentation for all available providers for the **safety** API.

View file

@ -250,6 +250,7 @@ class StreamingResponseOrchestrator:
messages=messages, messages=messages,
# Pydantic models are dict-compatible but mypy treats them as distinct types # Pydantic models are dict-compatible but mypy treats them as distinct types
tools=self.ctx.chat_tools, # type: ignore[arg-type] tools=self.ctx.chat_tools, # type: ignore[arg-type]
parallel_tool_calls=self.parallel_tool_calls,
stream=True, stream=True,
temperature=self.ctx.temperature, temperature=self.ctx.temperature,
response_format=response_format, response_format=response_format,