mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-17 06:52:36 +00:00
Merge branch 'main' into nvidia-e2e-notebook
This commit is contained in:
commit
bd64bc99ea
69 changed files with 7913 additions and 2495 deletions
1269
docs/_static/llama-stack-spec.html
vendored
1269
docs/_static/llama-stack-spec.html
vendored
File diff suppressed because it is too large
Load diff
835
docs/_static/llama-stack-spec.yaml
vendored
835
docs/_static/llama-stack-spec.yaml
vendored
|
|
@ -2263,6 +2263,43 @@ paths:
|
|||
schema:
|
||||
$ref: '#/components/schemas/LogEventRequest'
|
||||
required: true
|
||||
/v1/openai/v1/vector_stores/{vector_store_id}/files:
|
||||
post:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreFileObject representing the attached file.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreFileObject'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Attach a file to a vector store.
|
||||
parameters:
|
||||
- name: vector_store_id
|
||||
in: path
|
||||
description: >-
|
||||
The ID of the vector store to attach the file to.
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
requestBody:
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OpenaiAttachFileToVectorStoreRequest'
|
||||
required: true
|
||||
/v1/openai/v1/completions:
|
||||
post:
|
||||
responses:
|
||||
|
|
@ -2294,6 +2331,91 @@ paths:
|
|||
schema:
|
||||
$ref: '#/components/schemas/OpenaiCompletionRequest'
|
||||
required: true
|
||||
/v1/openai/v1/vector_stores:
|
||||
get:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreListResponse containing the list of vector stores.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreListResponse'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Returns a list of vector stores.
|
||||
parameters:
|
||||
- name: limit
|
||||
in: query
|
||||
description: >-
|
||||
A limit on the number of objects to be returned. Limit can range between
|
||||
1 and 100, and the default is 20.
|
||||
required: false
|
||||
schema:
|
||||
type: integer
|
||||
- name: order
|
||||
in: query
|
||||
description: >-
|
||||
Sort order by the `created_at` timestamp of the objects. `asc` for ascending
|
||||
order and `desc` for descending order.
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
- name: after
|
||||
in: query
|
||||
description: >-
|
||||
A cursor for use in pagination. `after` is an object ID that defines your
|
||||
place in the list.
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
- name: before
|
||||
in: query
|
||||
description: >-
|
||||
A cursor for use in pagination. `before` is an object ID that defines
|
||||
your place in the list.
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
post:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreObject representing the created vector store.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreObject'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Creates a vector store.
|
||||
parameters: []
|
||||
requestBody:
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OpenaiCreateVectorStoreRequest'
|
||||
required: true
|
||||
/v1/openai/v1/files/{file_id}:
|
||||
get:
|
||||
responses:
|
||||
|
|
@ -2356,6 +2478,100 @@ paths:
|
|||
required: true
|
||||
schema:
|
||||
type: string
|
||||
/v1/openai/v1/vector_stores/{vector_store_id}:
|
||||
get:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreObject representing the vector store.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreObject'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Retrieves a vector store.
|
||||
parameters:
|
||||
- name: vector_store_id
|
||||
in: path
|
||||
description: The ID of the vector store to retrieve.
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
post:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreObject representing the updated vector store.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreObject'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Updates a vector store.
|
||||
parameters:
|
||||
- name: vector_store_id
|
||||
in: path
|
||||
description: The ID of the vector store to update.
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
requestBody:
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OpenaiUpdateVectorStoreRequest'
|
||||
required: true
|
||||
delete:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreDeleteResponse indicating the deletion status.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreDeleteResponse'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: Delete a vector store.
|
||||
parameters:
|
||||
- name: vector_store_id
|
||||
in: path
|
||||
description: The ID of the vector store to delete.
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
/v1/openai/v1/embeddings:
|
||||
post:
|
||||
responses:
|
||||
|
|
@ -2546,6 +2762,46 @@ paths:
|
|||
required: true
|
||||
schema:
|
||||
type: string
|
||||
/v1/openai/v1/vector_stores/{vector_store_id}/search:
|
||||
post:
|
||||
responses:
|
||||
'200':
|
||||
description: >-
|
||||
A VectorStoreSearchResponse containing the search results.
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/VectorStoreSearchResponsePage'
|
||||
'400':
|
||||
$ref: '#/components/responses/BadRequest400'
|
||||
'429':
|
||||
$ref: >-
|
||||
#/components/responses/TooManyRequests429
|
||||
'500':
|
||||
$ref: >-
|
||||
#/components/responses/InternalServerError500
|
||||
default:
|
||||
$ref: '#/components/responses/DefaultError'
|
||||
tags:
|
||||
- VectorIO
|
||||
description: >-
|
||||
Search for chunks in a vector store.
|
||||
|
||||
Searches a vector store for relevant chunks based on a query and optional
|
||||
file attribute filters.
|
||||
parameters:
|
||||
- name: vector_store_id
|
||||
in: path
|
||||
description: The ID of the vector store to search.
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
requestBody:
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OpenaiSearchVectorStoreRequest'
|
||||
required: true
|
||||
/v1/post-training/preference-optimize:
|
||||
post:
|
||||
responses:
|
||||
|
|
@ -4802,6 +5058,7 @@ components:
|
|||
OpenAIResponseInput:
|
||||
oneOf:
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageWebSearchToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageFileSearchToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageFunctionToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseInputFunctionToolCallOutput'
|
||||
- $ref: '#/components/schemas/OpenAIResponseMessage'
|
||||
|
|
@ -4896,10 +5153,23 @@ components:
|
|||
type: string
|
||||
const: file_search
|
||||
default: file_search
|
||||
vector_store_id:
|
||||
vector_store_ids:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
filters:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
max_num_results:
|
||||
type: integer
|
||||
default: 10
|
||||
ranking_options:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -4913,7 +5183,7 @@ components:
|
|||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
- vector_store_id
|
||||
- vector_store_ids
|
||||
title: OpenAIResponseInputToolFileSearch
|
||||
OpenAIResponseInputToolFunction:
|
||||
type: object
|
||||
|
|
@ -5075,6 +5345,41 @@ components:
|
|||
- type
|
||||
title: >-
|
||||
OpenAIResponseOutputMessageContentOutputText
|
||||
"OpenAIResponseOutputMessageFileSearchToolCall":
|
||||
type: object
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
queries:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
type:
|
||||
type: string
|
||||
const: file_search_call
|
||||
default: file_search_call
|
||||
results:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- id
|
||||
- queries
|
||||
- status
|
||||
- type
|
||||
title: >-
|
||||
OpenAIResponseOutputMessageFileSearchToolCall
|
||||
"OpenAIResponseOutputMessageFunctionToolCall":
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -5272,6 +5577,7 @@ components:
|
|||
oneOf:
|
||||
- $ref: '#/components/schemas/OpenAIResponseMessage'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageWebSearchToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageFileSearchToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageFunctionToolCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageMCPCall'
|
||||
- $ref: '#/components/schemas/OpenAIResponseOutputMessageMCPListTools'
|
||||
|
|
@ -5280,6 +5586,7 @@ components:
|
|||
mapping:
|
||||
message: '#/components/schemas/OpenAIResponseMessage'
|
||||
web_search_call: '#/components/schemas/OpenAIResponseOutputMessageWebSearchToolCall'
|
||||
file_search_call: '#/components/schemas/OpenAIResponseOutputMessageFileSearchToolCall'
|
||||
function_call: '#/components/schemas/OpenAIResponseOutputMessageFunctionToolCall'
|
||||
mcp_call: '#/components/schemas/OpenAIResponseOutputMessageMCPCall'
|
||||
mcp_list_tools: '#/components/schemas/OpenAIResponseOutputMessageMCPListTools'
|
||||
|
|
@ -7511,6 +7818,9 @@ components:
|
|||
type: boolean
|
||||
description: >-
|
||||
Whether there are more items available after this set
|
||||
url:
|
||||
type: string
|
||||
description: The URL for accessing this list
|
||||
additionalProperties: false
|
||||
required:
|
||||
- data
|
||||
|
|
@ -8032,6 +8342,148 @@ components:
|
|||
- event
|
||||
- ttl_seconds
|
||||
title: LogEventRequest
|
||||
VectorStoreChunkingStrategy:
|
||||
oneOf:
|
||||
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'
|
||||
- $ref: '#/components/schemas/VectorStoreChunkingStrategyStatic'
|
||||
discriminator:
|
||||
propertyName: type
|
||||
mapping:
|
||||
auto: '#/components/schemas/VectorStoreChunkingStrategyAuto'
|
||||
static: '#/components/schemas/VectorStoreChunkingStrategyStatic'
|
||||
VectorStoreChunkingStrategyAuto:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
const: auto
|
||||
default: auto
|
||||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
title: VectorStoreChunkingStrategyAuto
|
||||
VectorStoreChunkingStrategyStatic:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
const: static
|
||||
default: static
|
||||
static:
|
||||
$ref: '#/components/schemas/VectorStoreChunkingStrategyStaticConfig'
|
||||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
- static
|
||||
title: VectorStoreChunkingStrategyStatic
|
||||
VectorStoreChunkingStrategyStaticConfig:
|
||||
type: object
|
||||
properties:
|
||||
chunk_overlap_tokens:
|
||||
type: integer
|
||||
default: 400
|
||||
max_chunk_size_tokens:
|
||||
type: integer
|
||||
default: 800
|
||||
additionalProperties: false
|
||||
required:
|
||||
- chunk_overlap_tokens
|
||||
- max_chunk_size_tokens
|
||||
title: VectorStoreChunkingStrategyStaticConfig
|
||||
OpenaiAttachFileToVectorStoreRequest:
|
||||
type: object
|
||||
properties:
|
||||
file_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the file to attach to the vector store.
|
||||
attributes:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The key-value attributes stored with the file, which can be used for filtering.
|
||||
chunking_strategy:
|
||||
$ref: '#/components/schemas/VectorStoreChunkingStrategy'
|
||||
description: >-
|
||||
The chunking strategy to use for the file.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- file_id
|
||||
title: OpenaiAttachFileToVectorStoreRequest
|
||||
VectorStoreFileLastError:
|
||||
type: object
|
||||
properties:
|
||||
code:
|
||||
oneOf:
|
||||
- type: string
|
||||
const: server_error
|
||||
- type: string
|
||||
const: rate_limit_exceeded
|
||||
message:
|
||||
type: string
|
||||
additionalProperties: false
|
||||
required:
|
||||
- code
|
||||
- message
|
||||
title: VectorStoreFileLastError
|
||||
VectorStoreFileObject:
|
||||
type: object
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
object:
|
||||
type: string
|
||||
default: vector_store.file
|
||||
attributes:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
chunking_strategy:
|
||||
$ref: '#/components/schemas/VectorStoreChunkingStrategy'
|
||||
created_at:
|
||||
type: integer
|
||||
last_error:
|
||||
$ref: '#/components/schemas/VectorStoreFileLastError'
|
||||
status:
|
||||
oneOf:
|
||||
- type: string
|
||||
const: completed
|
||||
- type: string
|
||||
const: in_progress
|
||||
- type: string
|
||||
const: cancelled
|
||||
- type: string
|
||||
const: failed
|
||||
usage_bytes:
|
||||
type: integer
|
||||
default: 0
|
||||
vector_store_id:
|
||||
type: string
|
||||
additionalProperties: false
|
||||
required:
|
||||
- id
|
||||
- object
|
||||
- attributes
|
||||
- chunking_strategy
|
||||
- created_at
|
||||
- status
|
||||
- usage_bytes
|
||||
- vector_store_id
|
||||
title: VectorStoreFileObject
|
||||
description: OpenAI Vector Store File object.
|
||||
OpenAIJSONSchema:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -8454,6 +8906,10 @@ components:
|
|||
type: string
|
||||
prompt_logprobs:
|
||||
type: integer
|
||||
suffix:
|
||||
type: string
|
||||
description: >-
|
||||
(Optional) The suffix that should be appended to the completion.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- model
|
||||
|
|
@ -8505,6 +8961,133 @@ components:
|
|||
title: OpenAICompletionChoice
|
||||
description: >-
|
||||
A choice from an OpenAI-compatible completion response.
|
||||
OpenaiCreateVectorStoreRequest:
|
||||
type: object
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
description: A name for the vector store.
|
||||
file_ids:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: >-
|
||||
A list of File IDs that the vector store should use. Useful for tools
|
||||
like `file_search` that can access files.
|
||||
expires_after:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The expiration policy for a vector store.
|
||||
chunking_strategy:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The chunking strategy used to chunk the file(s). If not set, will use
|
||||
the `auto` strategy.
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
Set of 16 key-value pairs that can be attached to an object.
|
||||
embedding_model:
|
||||
type: string
|
||||
description: >-
|
||||
The embedding model to use for this vector store.
|
||||
embedding_dimension:
|
||||
type: integer
|
||||
description: >-
|
||||
The dimension of the embedding vectors (default: 384).
|
||||
provider_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the provider to use for this vector store.
|
||||
provider_vector_db_id:
|
||||
type: string
|
||||
description: >-
|
||||
The provider-specific vector database ID.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- name
|
||||
title: OpenaiCreateVectorStoreRequest
|
||||
VectorStoreObject:
|
||||
type: object
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
object:
|
||||
type: string
|
||||
default: vector_store
|
||||
created_at:
|
||||
type: integer
|
||||
name:
|
||||
type: string
|
||||
usage_bytes:
|
||||
type: integer
|
||||
default: 0
|
||||
file_counts:
|
||||
type: object
|
||||
additionalProperties:
|
||||
type: integer
|
||||
status:
|
||||
type: string
|
||||
default: completed
|
||||
expires_after:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
expires_at:
|
||||
type: integer
|
||||
last_active_at:
|
||||
type: integer
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- id
|
||||
- object
|
||||
- created_at
|
||||
- usage_bytes
|
||||
- file_counts
|
||||
- status
|
||||
- metadata
|
||||
title: VectorStoreObject
|
||||
description: OpenAI Vector Store object.
|
||||
OpenAIFileDeleteResponse:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -8528,6 +9111,24 @@ components:
|
|||
title: OpenAIFileDeleteResponse
|
||||
description: >-
|
||||
Response for deleting a file in OpenAI Files API.
|
||||
VectorStoreDeleteResponse:
|
||||
type: object
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
object:
|
||||
type: string
|
||||
default: vector_store.deleted
|
||||
deleted:
|
||||
type: boolean
|
||||
default: true
|
||||
additionalProperties: false
|
||||
required:
|
||||
- id
|
||||
- object
|
||||
- deleted
|
||||
title: VectorStoreDeleteResponse
|
||||
description: Response from deleting a vector store.
|
||||
OpenaiEmbeddingsRequest:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -8751,9 +9352,179 @@ components:
|
|||
required:
|
||||
- data
|
||||
title: OpenAIListModelsResponse
|
||||
VectorStoreListResponse:
|
||||
type: object
|
||||
properties:
|
||||
object:
|
||||
type: string
|
||||
default: list
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/VectorStoreObject'
|
||||
first_id:
|
||||
type: string
|
||||
last_id:
|
||||
type: string
|
||||
has_more:
|
||||
type: boolean
|
||||
default: false
|
||||
additionalProperties: false
|
||||
required:
|
||||
- object
|
||||
- data
|
||||
- has_more
|
||||
title: VectorStoreListResponse
|
||||
description: Response from listing vector stores.
|
||||
Response:
|
||||
type: object
|
||||
title: Response
|
||||
OpenaiSearchVectorStoreRequest:
|
||||
type: object
|
||||
properties:
|
||||
query:
|
||||
oneOf:
|
||||
- type: string
|
||||
- type: array
|
||||
items:
|
||||
type: string
|
||||
description: >-
|
||||
The query string or array for performing the search.
|
||||
filters:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
Filters based on file attributes to narrow the search results.
|
||||
max_num_results:
|
||||
type: integer
|
||||
description: >-
|
||||
Maximum number of results to return (1 to 50 inclusive, default 10).
|
||||
ranking_options:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
Ranking options for fine-tuning the search results.
|
||||
rewrite_query:
|
||||
type: boolean
|
||||
description: >-
|
||||
Whether to rewrite the natural language query for vector search (default
|
||||
false)
|
||||
additionalProperties: false
|
||||
required:
|
||||
- query
|
||||
title: OpenaiSearchVectorStoreRequest
|
||||
VectorStoreContent:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
const: text
|
||||
text:
|
||||
type: string
|
||||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
- text
|
||||
title: VectorStoreContent
|
||||
VectorStoreSearchResponse:
|
||||
type: object
|
||||
properties:
|
||||
file_id:
|
||||
type: string
|
||||
filename:
|
||||
type: string
|
||||
score:
|
||||
type: number
|
||||
attributes:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: string
|
||||
- type: number
|
||||
- type: boolean
|
||||
content:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/VectorStoreContent'
|
||||
additionalProperties: false
|
||||
required:
|
||||
- file_id
|
||||
- filename
|
||||
- score
|
||||
- content
|
||||
title: VectorStoreSearchResponse
|
||||
description: Response from searching a vector store.
|
||||
VectorStoreSearchResponsePage:
|
||||
type: object
|
||||
properties:
|
||||
object:
|
||||
type: string
|
||||
default: vector_store.search_results.page
|
||||
search_query:
|
||||
type: string
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/VectorStoreSearchResponse'
|
||||
has_more:
|
||||
type: boolean
|
||||
default: false
|
||||
next_page:
|
||||
type: string
|
||||
additionalProperties: false
|
||||
required:
|
||||
- object
|
||||
- search_query
|
||||
- data
|
||||
- has_more
|
||||
title: VectorStoreSearchResponsePage
|
||||
description: Response from searching a vector store.
|
||||
OpenaiUpdateVectorStoreRequest:
|
||||
type: object
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
description: The name of the vector store.
|
||||
expires_after:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The expiration policy for a vector store.
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
Set of 16 key-value pairs that can be attached to an object.
|
||||
additionalProperties: false
|
||||
title: OpenaiUpdateVectorStoreRequest
|
||||
DPOAlignmentConfig:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -8992,7 +9763,13 @@ components:
|
|||
mode:
|
||||
type: string
|
||||
description: >-
|
||||
Search mode for retrieval—either "vector" or "keyword". Default "vector".
|
||||
Search mode for retrieval—either "vector", "keyword", or "hybrid". Default
|
||||
"vector".
|
||||
ranker:
|
||||
$ref: '#/components/schemas/Ranker'
|
||||
description: >-
|
||||
Configuration for the ranker to use in hybrid search. Defaults to RRF
|
||||
ranker.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- query_generator_config
|
||||
|
|
@ -9011,6 +9788,58 @@ components:
|
|||
mapping:
|
||||
default: '#/components/schemas/DefaultRAGQueryGeneratorConfig'
|
||||
llm: '#/components/schemas/LLMRAGQueryGeneratorConfig'
|
||||
RRFRanker:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
const: rrf
|
||||
default: rrf
|
||||
description: The type of ranker, always "rrf"
|
||||
impact_factor:
|
||||
type: number
|
||||
default: 60.0
|
||||
description: >-
|
||||
The impact factor for RRF scoring. Higher values give more weight to higher-ranked
|
||||
results. Must be greater than 0. Default of 60 is from the original RRF
|
||||
paper (Cormack et al., 2009).
|
||||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
- impact_factor
|
||||
title: RRFRanker
|
||||
description: >-
|
||||
Reciprocal Rank Fusion (RRF) ranker configuration.
|
||||
Ranker:
|
||||
oneOf:
|
||||
- $ref: '#/components/schemas/RRFRanker'
|
||||
- $ref: '#/components/schemas/WeightedRanker'
|
||||
discriminator:
|
||||
propertyName: type
|
||||
mapping:
|
||||
rrf: '#/components/schemas/RRFRanker'
|
||||
weighted: '#/components/schemas/WeightedRanker'
|
||||
WeightedRanker:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
const: weighted
|
||||
default: weighted
|
||||
description: The type of ranker, always "weighted"
|
||||
alpha:
|
||||
type: number
|
||||
default: 0.5
|
||||
description: >-
|
||||
Weight factor between 0 and 1. 0 means only use keyword scores, 1 means
|
||||
only use vector scores, values in between blend both scores.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- type
|
||||
- alpha
|
||||
title: WeightedRanker
|
||||
description: >-
|
||||
Weighted ranker configuration that combines vector and keyword scores.
|
||||
QueryRequest:
|
||||
type: object
|
||||
properties:
|
||||
|
|
|
|||
|
|
@ -56,10 +56,10 @@ shields: []
|
|||
server:
|
||||
port: 8321
|
||||
auth:
|
||||
provider_type: "kubernetes"
|
||||
provider_type: "oauth2_token"
|
||||
config:
|
||||
api_server_url: "https://kubernetes.default.svc"
|
||||
ca_cert_path: "/path/to/ca.crt"
|
||||
jwks:
|
||||
uri: "https://my-token-issuing-svc.com/jwks"
|
||||
```
|
||||
|
||||
Let's break this down into the different sections. The first section specifies the set of APIs that the stack server will serve:
|
||||
|
|
@ -132,16 +132,52 @@ The server supports multiple authentication providers:
|
|||
|
||||
#### OAuth 2.0/OpenID Connect Provider with Kubernetes
|
||||
|
||||
The Kubernetes cluster must be configured to use a service account for authentication.
|
||||
The server can be configured to use service account tokens for authorization, validating these against the Kubernetes API server, e.g.:
|
||||
```yaml
|
||||
server:
|
||||
auth:
|
||||
provider_type: "oauth2_token"
|
||||
config:
|
||||
jwks:
|
||||
uri: "https://kubernetes.default.svc:8443/openid/v1/jwks"
|
||||
token: "${env.TOKEN:}"
|
||||
key_recheck_period: 3600
|
||||
tls_cafile: "/path/to/ca.crt"
|
||||
issuer: "https://kubernetes.default.svc"
|
||||
audience: "https://kubernetes.default.svc"
|
||||
```
|
||||
|
||||
To find your cluster's jwks uri (from which the public key(s) to verify the token signature are obtained), run:
|
||||
```
|
||||
kubectl get --raw /.well-known/openid-configuration| jq -r .jwks_uri
|
||||
```
|
||||
|
||||
For the tls_cafile, you can use the CA certificate of the OIDC provider:
|
||||
```bash
|
||||
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.certificate-authority}'
|
||||
```
|
||||
|
||||
For the issuer, you can use the OIDC provider's URL:
|
||||
```bash
|
||||
kubectl get --raw /.well-known/openid-configuration| jq .issuer
|
||||
```
|
||||
|
||||
The audience can be obtained from a token, e.g. run:
|
||||
```bash
|
||||
kubectl create token default --duration=1h | cut -d. -f2 | base64 -d | jq .aud
|
||||
```
|
||||
|
||||
The jwks token is used to authorize access to the jwks endpoint. You can obtain a token by running:
|
||||
|
||||
```bash
|
||||
kubectl create namespace llama-stack
|
||||
kubectl create serviceaccount llama-stack-auth -n llama-stack
|
||||
kubectl create rolebinding llama-stack-auth-rolebinding --clusterrole=admin --serviceaccount=llama-stack:llama-stack-auth -n llama-stack
|
||||
kubectl create token llama-stack-auth -n llama-stack > llama-stack-auth-token
|
||||
export TOKEN=$(cat llama-stack-auth-token)
|
||||
```
|
||||
|
||||
Make sure the `kube-apiserver` runs with `--anonymous-auth=true` to allow unauthenticated requests
|
||||
Alternatively, you can configure the jwks endpoint to allow anonymous access. To do this, make sure
|
||||
the `kube-apiserver` runs with `--anonymous-auth=true` to allow unauthenticated requests
|
||||
and that the correct RoleBinding is created to allow the service account to access the necessary
|
||||
resources. If that is not the case, you can create a RoleBinding for the service account to access
|
||||
the necessary resources:
|
||||
|
|
@ -175,35 +211,6 @@ And then apply the configuration:
|
|||
kubectl apply -f allow-anonymous-openid.yaml
|
||||
```
|
||||
|
||||
Validates tokens against the Kubernetes API server through the OIDC provider:
|
||||
```yaml
|
||||
server:
|
||||
auth:
|
||||
provider_type: "oauth2_token"
|
||||
config:
|
||||
jwks:
|
||||
uri: "https://kubernetes.default.svc"
|
||||
key_recheck_period: 3600
|
||||
tls_cafile: "/path/to/ca.crt"
|
||||
issuer: "https://kubernetes.default.svc"
|
||||
audience: "https://kubernetes.default.svc"
|
||||
```
|
||||
|
||||
To find your cluster's audience, run:
|
||||
```bash
|
||||
kubectl create token default --duration=1h | cut -d. -f2 | base64 -d | jq .aud
|
||||
```
|
||||
|
||||
For the issuer, you can use the OIDC provider's URL:
|
||||
```bash
|
||||
kubectl get --raw /.well-known/openid-configuration| jq .issuer
|
||||
```
|
||||
|
||||
For the tls_cafile, you can use the CA certificate of the OIDC provider:
|
||||
```bash
|
||||
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.certificate-authority}'
|
||||
```
|
||||
|
||||
The provider extracts user information from the JWT token:
|
||||
- Username from the `sub` claim becomes a role
|
||||
- Kubernetes groups become teams
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ The `llamastack/distribution-ollama` distribution consists of the following prov
|
|||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `remote::huggingface`, `inline::localfs` |
|
||||
| eval | `inline::meta-reference` |
|
||||
| files | `inline::localfs` |
|
||||
| inference | `remote::ollama` |
|
||||
| post_training | `inline::huggingface` |
|
||||
| safety | `inline::llama-guard` |
|
||||
|
|
|
|||
|
|
@ -66,25 +66,126 @@ To use sqlite-vec in your Llama Stack project, follow these steps:
|
|||
2. Configure your Llama Stack project to use SQLite-Vec.
|
||||
3. Start storing and querying vectors.
|
||||
|
||||
## Supported Search Modes
|
||||
The SQLite-vec provider supports three search modes:
|
||||
|
||||
The sqlite-vec provider supports both vector-based and keyword-based (full-text) search modes.
|
||||
|
||||
When using the RAGTool interface, you can specify the desired search behavior via the `mode` parameter in
|
||||
`RAGQueryConfig`. For example:
|
||||
1. **Vector Search** (`mode="vector"`): Performs pure vector similarity search using the embeddings.
|
||||
2. **Keyword Search** (`mode="keyword"`): Performs full-text search using SQLite's FTS5.
|
||||
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword search for better results. First performs keyword search to get candidate matches, then applies vector similarity search on those candidates.
|
||||
|
||||
Example with hybrid search:
|
||||
```python
|
||||
from llama_stack.apis.tool_runtime.rag import RAGQueryConfig
|
||||
response = await vector_io.query_chunks(
|
||||
vector_db_id="my_db",
|
||||
query="your query here",
|
||||
params={"mode": "hybrid", "max_chunks": 3, "score_threshold": 0.7},
|
||||
)
|
||||
|
||||
query_config = RAGQueryConfig(max_chunks=6, mode="vector")
|
||||
# Using RRF ranker
|
||||
response = await vector_io.query_chunks(
|
||||
vector_db_id="my_db",
|
||||
query="your query here",
|
||||
params={
|
||||
"mode": "hybrid",
|
||||
"max_chunks": 3,
|
||||
"score_threshold": 0.7,
|
||||
"ranker": {"type": "rrf", "impact_factor": 60.0},
|
||||
},
|
||||
)
|
||||
|
||||
results = client.tool_runtime.rag_tool.query(
|
||||
vector_db_ids=[vector_db_id],
|
||||
content="what is torchtune",
|
||||
query_config=query_config,
|
||||
# Using weighted ranker
|
||||
response = await vector_io.query_chunks(
|
||||
vector_db_id="my_db",
|
||||
query="your query here",
|
||||
params={
|
||||
"mode": "hybrid",
|
||||
"max_chunks": 3,
|
||||
"score_threshold": 0.7,
|
||||
"ranker": {"type": "weighted", "alpha": 0.7}, # 70% vector, 30% keyword
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
Example with explicit vector search:
|
||||
```python
|
||||
response = await vector_io.query_chunks(
|
||||
vector_db_id="my_db",
|
||||
query="your query here",
|
||||
params={"mode": "vector", "max_chunks": 3, "score_threshold": 0.7},
|
||||
)
|
||||
```
|
||||
|
||||
Example with keyword search:
|
||||
```python
|
||||
response = await vector_io.query_chunks(
|
||||
vector_db_id="my_db",
|
||||
query="your query here",
|
||||
params={"mode": "keyword", "max_chunks": 3, "score_threshold": 0.7},
|
||||
)
|
||||
```
|
||||
|
||||
## Supported Search Modes
|
||||
|
||||
The SQLite vector store supports three search modes:
|
||||
|
||||
1. **Vector Search** (`mode="vector"`): Uses vector similarity to find relevant chunks
|
||||
2. **Keyword Search** (`mode="keyword"`): Uses keyword matching to find relevant chunks
|
||||
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword scores using a ranker
|
||||
|
||||
### Hybrid Search
|
||||
|
||||
Hybrid search combines the strengths of both vector and keyword search by:
|
||||
- Computing vector similarity scores
|
||||
- Computing keyword match scores
|
||||
- Using a ranker to combine these scores
|
||||
|
||||
Two ranker types are supported:
|
||||
|
||||
1. **RRF (Reciprocal Rank Fusion)**:
|
||||
- Combines ranks from both vector and keyword results
|
||||
- Uses an impact factor (default: 60.0) to control the weight of higher-ranked results
|
||||
- Good for balancing between vector and keyword results
|
||||
- The default impact factor of 60.0 comes from the original RRF paper by Cormack et al. (2009) [^1], which found this value to provide optimal performance across various retrieval tasks
|
||||
|
||||
2. **Weighted**:
|
||||
- Linearly combines normalized vector and keyword scores
|
||||
- Uses an alpha parameter (0-1) to control the blend:
|
||||
- alpha=0: Only use keyword scores
|
||||
- alpha=1: Only use vector scores
|
||||
- alpha=0.5: Equal weight to both (default)
|
||||
|
||||
Example using RAGQueryConfig with different search modes:
|
||||
|
||||
```python
|
||||
from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
|
||||
|
||||
# Vector search
|
||||
config = RAGQueryConfig(mode="vector", max_chunks=5)
|
||||
|
||||
# Keyword search
|
||||
config = RAGQueryConfig(mode="keyword", max_chunks=5)
|
||||
|
||||
# Hybrid search with custom RRF ranker
|
||||
config = RAGQueryConfig(
|
||||
mode="hybrid",
|
||||
max_chunks=5,
|
||||
ranker=RRFRanker(impact_factor=50.0), # Custom impact factor
|
||||
)
|
||||
|
||||
# Hybrid search with weighted ranker
|
||||
config = RAGQueryConfig(
|
||||
mode="hybrid",
|
||||
max_chunks=5,
|
||||
ranker=WeightedRanker(alpha=0.7), # 70% vector, 30% keyword
|
||||
)
|
||||
|
||||
# Hybrid search with default RRF ranker
|
||||
config = RAGQueryConfig(
|
||||
mode="hybrid", max_chunks=5
|
||||
) # Will use RRF with impact_factor=60.0
|
||||
```
|
||||
|
||||
Note: The ranker configuration is only used in hybrid mode. For vector or keyword modes, the ranker parameter is ignored.
|
||||
|
||||
## Installation
|
||||
|
||||
You can install SQLite-Vec using pip:
|
||||
|
|
@ -96,3 +197,5 @@ pip install sqlite-vec
|
|||
## Documentation
|
||||
|
||||
See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) for more details about sqlite-vec in general.
|
||||
|
||||
[^1]: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). [Reciprocal rank fusion outperforms condorcet and individual rank learning methods](https://dl.acm.org/doi/10.1145/1571941.1572114). In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 758-759).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue