mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 01:48:05 +00:00
Merge 0c2b82b30a into ee107aadd6
This commit is contained in:
commit
e484c29d45
7 changed files with 31 additions and 30 deletions
|
|
@ -2,7 +2,7 @@
|
||||||
description: |
|
description: |
|
||||||
Agents
|
Agents
|
||||||
|
|
||||||
APIs for creating and interacting with agentic systems.
|
APIs for creating and interacting with agentic systems.
|
||||||
sidebar_label: Agents
|
sidebar_label: Agents
|
||||||
title: Agents
|
title: Agents
|
||||||
---
|
---
|
||||||
|
|
@ -13,6 +13,6 @@ title: Agents
|
||||||
|
|
||||||
Agents
|
Agents
|
||||||
|
|
||||||
APIs for creating and interacting with agentic systems.
|
APIs for creating and interacting with agentic systems.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **agents** API.
|
This section contains documentation for all available providers for the **agents** API.
|
||||||
|
|
|
||||||
|
|
@ -1,15 +1,15 @@
|
||||||
---
|
---
|
||||||
description: |
|
description: |
|
||||||
The Batches API enables efficient processing of multiple requests in a single operation,
|
The Batches API enables efficient processing of multiple requests in a single operation,
|
||||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||||
cost-effective inference at scale.
|
cost-effective inference at scale.
|
||||||
|
|
||||||
The API is designed to allow use of openai client libraries for seamless integration.
|
The API is designed to allow use of openai client libraries for seamless integration.
|
||||||
|
|
||||||
This API provides the following extensions:
|
This API provides the following extensions:
|
||||||
- idempotent batch creation
|
- idempotent batch creation
|
||||||
|
|
||||||
Note: This API is currently under active development and may undergo changes.
|
Note: This API is currently under active development and may undergo changes.
|
||||||
sidebar_label: Batches
|
sidebar_label: Batches
|
||||||
title: Batches
|
title: Batches
|
||||||
---
|
---
|
||||||
|
|
@ -19,14 +19,14 @@ title: Batches
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The Batches API enables efficient processing of multiple requests in a single operation,
|
The Batches API enables efficient processing of multiple requests in a single operation,
|
||||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||||
cost-effective inference at scale.
|
cost-effective inference at scale.
|
||||||
|
|
||||||
The API is designed to allow use of openai client libraries for seamless integration.
|
The API is designed to allow use of openai client libraries for seamless integration.
|
||||||
|
|
||||||
This API provides the following extensions:
|
This API provides the following extensions:
|
||||||
- idempotent batch creation
|
- idempotent batch creation
|
||||||
|
|
||||||
Note: This API is currently under active development and may undergo changes.
|
Note: This API is currently under active development and may undergo changes.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **batches** API.
|
This section contains documentation for all available providers for the **batches** API.
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
description: |
|
description: |
|
||||||
Evaluations
|
Evaluations
|
||||||
|
|
||||||
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
||||||
sidebar_label: Eval
|
sidebar_label: Eval
|
||||||
title: Eval
|
title: Eval
|
||||||
---
|
---
|
||||||
|
|
@ -13,6 +13,6 @@ title: Eval
|
||||||
|
|
||||||
Evaluations
|
Evaluations
|
||||||
|
|
||||||
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **eval** API.
|
This section contains documentation for all available providers for the **eval** API.
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
description: |
|
description: |
|
||||||
Files
|
Files
|
||||||
|
|
||||||
This API is used to upload documents that can be used with other Llama Stack APIs.
|
This API is used to upload documents that can be used with other Llama Stack APIs.
|
||||||
sidebar_label: Files
|
sidebar_label: Files
|
||||||
title: Files
|
title: Files
|
||||||
---
|
---
|
||||||
|
|
@ -13,6 +13,6 @@ title: Files
|
||||||
|
|
||||||
Files
|
Files
|
||||||
|
|
||||||
This API is used to upload documents that can be used with other Llama Stack APIs.
|
This API is used to upload documents that can be used with other Llama Stack APIs.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **files** API.
|
This section contains documentation for all available providers for the **files** API.
|
||||||
|
|
|
||||||
|
|
@ -2,12 +2,12 @@
|
||||||
description: |
|
description: |
|
||||||
Inference
|
Inference
|
||||||
|
|
||||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||||
|
|
||||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||||
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
||||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||||
- Rerank models: these models reorder the documents based on their relevance to a query.
|
- Rerank models: these models reorder the documents based on their relevance to a query.
|
||||||
sidebar_label: Inference
|
sidebar_label: Inference
|
||||||
title: Inference
|
title: Inference
|
||||||
---
|
---
|
||||||
|
|
@ -18,11 +18,11 @@ title: Inference
|
||||||
|
|
||||||
Inference
|
Inference
|
||||||
|
|
||||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||||
|
|
||||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||||
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
||||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||||
- Rerank models: these models reorder the documents based on their relevance to a query.
|
- Rerank models: these models reorder the documents based on their relevance to a query.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **inference** API.
|
This section contains documentation for all available providers for the **inference** API.
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
description: |
|
description: |
|
||||||
Safety
|
Safety
|
||||||
|
|
||||||
OpenAI-compatible Moderations API.
|
OpenAI-compatible Moderations API.
|
||||||
sidebar_label: Safety
|
sidebar_label: Safety
|
||||||
title: Safety
|
title: Safety
|
||||||
---
|
---
|
||||||
|
|
@ -13,6 +13,6 @@ title: Safety
|
||||||
|
|
||||||
Safety
|
Safety
|
||||||
|
|
||||||
OpenAI-compatible Moderations API.
|
OpenAI-compatible Moderations API.
|
||||||
|
|
||||||
This section contains documentation for all available providers for the **safety** API.
|
This section contains documentation for all available providers for the **safety** API.
|
||||||
|
|
|
||||||
|
|
@ -250,6 +250,7 @@ class StreamingResponseOrchestrator:
|
||||||
messages=messages,
|
messages=messages,
|
||||||
# Pydantic models are dict-compatible but mypy treats them as distinct types
|
# Pydantic models are dict-compatible but mypy treats them as distinct types
|
||||||
tools=self.ctx.chat_tools, # type: ignore[arg-type]
|
tools=self.ctx.chat_tools, # type: ignore[arg-type]
|
||||||
|
parallel_tool_calls=self.parallel_tool_calls,
|
||||||
stream=True,
|
stream=True,
|
||||||
temperature=self.ctx.temperature,
|
temperature=self.ctx.temperature,
|
||||||
response_format=response_format,
|
response_format=response_format,
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue