This commit is contained in:
Jiayi Ni 2025-10-03 12:27:16 -07:00 committed by GitHub
commit 1e04f105f2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
20 changed files with 840 additions and 34 deletions

View file

@ -1,9 +1,10 @@
---
description: "Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Two kinds of models are supported:
This API provides the raw interface to the underlying models. Three kinds of models are supported:
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search."
- Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models reorder the documents based on their relevance to a query."
sidebar_label: Inference
title: Inference
---
@ -14,8 +15,9 @@ title: Inference
Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Two kinds of models are supported:
This API provides the raw interface to the underlying models. Three kinds of models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models reorder the documents based on their relevance to a query.
This section contains documentation for all available providers for the **inference** API.

View file

@ -13335,7 +13335,7 @@
},
{
"name": "Inference",
"description": "This API provides the raw interface to the underlying models. Two kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.",
"description": "This API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
"x-displayName": "Llama Stack Inference API for generating completions, chat completions, and embeddings."
},
{

View file

@ -9990,13 +9990,16 @@ tags:
description: ''
- name: Inference
description: >-
This API provides the raw interface to the underlying models. Two kinds of models
are supported:
This API provides the raw interface to the underlying models. Three kinds of
models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic
search.
- Rerank models: these models reorder the documents based on their relevance
to a query.
x-displayName: >-
Llama Stack Inference API for generating completions, chat completions, and
embeddings.

View file

@ -4992,7 +4992,7 @@
"properties": {
"model": {
"type": "string",
"description": "The identifier of the reranking model to use."
"description": "The identifier of the reranking model to use. The model must be a reranking model registered with Llama Stack and available via the /models endpoint."
},
"query": {
"oneOf": [

View file

@ -3657,7 +3657,8 @@ components:
model:
type: string
description: >-
The identifier of the reranking model to use.
The identifier of the reranking model to use. The model must be a reranking
model registered with Llama Stack and available via the /models endpoint.
query:
oneOf:
- type: string

View file

@ -6829,7 +6829,8 @@
"type": "string",
"enum": [
"llm",
"embedding"
"embedding",
"rerank"
],
"title": "ModelType",
"description": "Enumeration of supported model types in Llama Stack."
@ -12883,7 +12884,7 @@
},
{
"name": "Inference",
"description": "This API provides the raw interface to the underlying models. Two kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.",
"description": "This API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
"x-displayName": "Llama Stack Inference API for generating completions, chat completions, and embeddings."
},
{

View file

@ -5158,6 +5158,7 @@ components:
enum:
- llm
- embedding
- rerank
title: ModelType
description: >-
Enumeration of supported model types in Llama Stack.
@ -9728,13 +9729,16 @@ tags:
description: ''
- name: Inference
description: >-
This API provides the raw interface to the underlying models. Two kinds of models
are supported:
This API provides the raw interface to the underlying models. Three kinds of
models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic
search.
- Rerank models: these models reorder the documents based on their relevance
to a query.
x-displayName: >-
Llama Stack Inference API for generating completions, chat completions, and
embeddings.

View file

@ -8838,7 +8838,8 @@
"type": "string",
"enum": [
"llm",
"embedding"
"embedding",
"rerank"
],
"title": "ModelType",
"description": "Enumeration of supported model types in Llama Stack."
@ -17033,7 +17034,7 @@
"properties": {
"model": {
"type": "string",
"description": "The identifier of the reranking model to use."
"description": "The identifier of the reranking model to use. The model must be a reranking model registered with Llama Stack and available via the /models endpoint."
},
"query": {
"oneOf": [
@ -18456,7 +18457,7 @@
},
{
"name": "Inference",
"description": "This API provides the raw interface to the underlying models. Two kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.",
"description": "This API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
"x-displayName": "Llama Stack Inference API for generating completions, chat completions, and embeddings."
},
{

View file

@ -6603,6 +6603,7 @@ components:
enum:
- llm
- embedding
- rerank
title: ModelType
description: >-
Enumeration of supported model types in Llama Stack.
@ -12693,7 +12694,8 @@ components:
model:
type: string
description: >-
The identifier of the reranking model to use.
The identifier of the reranking model to use. The model must be a reranking
model registered with Llama Stack and available via the /models endpoint.
query:
oneOf:
- type: string
@ -13774,13 +13776,16 @@ tags:
description: ''
- name: Inference
description: >-
This API provides the raw interface to the underlying models. Two kinds of models
are supported:
This API provides the raw interface to the underlying models. Three kinds of
models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic
search.
- Rerank models: these models reorder the documents based on their relevance
to a query.
x-displayName: >-
Llama Stack Inference API for generating completions, chat completions, and
embeddings.