refactor: extract pagination logic into shared helper function (#1770)

# What does this PR do?

Move pagination logic from LocalFS and HuggingFace implementations into
a common helper function to ensure consistent pagination behavior across
providers. This reduces code duplication and centralizes pagination
logic in one place.


## Test Plan

Run this script:

```
from llama_stack_client import LlamaStackClient

# Initialize the client
client = LlamaStackClient(base_url="http://localhost:8321")

# Register a dataset
response = client.datasets.register(
    purpose="eval/messages-answer",  # or "eval/question-answer" or "post-training/messages"
    source={"type": "uri", "uri": "huggingface://datasets/llamastack/simpleqa?split=train"},
    dataset_id="my_dataset",  # optional, will be auto-generated if not provided
    metadata={"description": "My evaluation dataset"},  # optional
)

# Verify the dataset was registered by listing all datasets
datasets = client.datasets.list()
print(f"Registered datasets: {[d.identifier for d in datasets]}")

# You can then access the data using the datasetio API
# rows = client.datasets.iterrows(dataset_id="my_dataset", start_index=1, limit=2)
rows = client.datasets.iterrows(dataset_id="my_dataset")
print(f"Data: {rows.data}")
```

And play with `start_index` and `limit`.

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
Sébastien Han 2025-03-31 22:08:29 +02:00 committed by GitHub
parent d495922949
commit 2ffa2b77ed
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 130 additions and 73 deletions

View file

@ -2115,7 +2115,7 @@
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/IterrowsResponse"
"$ref": "#/components/schemas/PaginatedResponse"
}
}
}
@ -2136,7 +2136,7 @@
"tags": [
"DatasetIO"
],
"description": "Get a paginated list of rows from a dataset. Uses cursor-based pagination.",
"description": "Get a paginated list of rows from a dataset.\nUses offset-based pagination where:\n- start_index: The starting index (0-based). If None, starts from beginning.\n- limit: Number of items to return. If None or -1, returns all items.\n\nThe response includes:\n- data: List of items for the current page\n- has_more: Whether there are more items available after this set",
"parameters": [
{
"name": "dataset_id",
@ -8073,7 +8073,7 @@
"additionalProperties": false,
"title": "ToolInvocationResult"
},
"IterrowsResponse": {
"PaginatedResponse": {
"type": "object",
"properties": {
"data": {
@ -8103,19 +8103,20 @@
]
}
},
"description": "The rows in the current page."
"description": "The list of items for the current page"
},
"next_start_index": {
"type": "integer",
"description": "Index into dataset for the first row in the next page. None if there are no more rows."
"has_more": {
"type": "boolean",
"description": "Whether there are more items available after this set"
}
},
"additionalProperties": false,
"required": [
"data"
"data",
"has_more"
],
"title": "IterrowsResponse",
"description": "A paginated list of rows from a dataset."
"title": "PaginatedResponse",
"description": "A generic paginated response that follows a simple format."
},
"Job": {
"type": "object",

View file

@ -1443,7 +1443,7 @@ paths:
content:
application/json:
schema:
$ref: '#/components/schemas/IterrowsResponse'
$ref: '#/components/schemas/PaginatedResponse'
'400':
$ref: '#/components/responses/BadRequest400'
'429':
@ -1457,7 +1457,20 @@ paths:
tags:
- DatasetIO
description: >-
Get a paginated list of rows from a dataset. Uses cursor-based pagination.
Get a paginated list of rows from a dataset.
Uses offset-based pagination where:
- start_index: The starting index (0-based). If None, starts from beginning.
- limit: Number of items to return. If None or -1, returns all items.
The response includes:
- data: List of items for the current page
- has_more: Whether there are more items available after this set
parameters:
- name: dataset_id
in: path
@ -5542,7 +5555,7 @@ components:
- type: object
additionalProperties: false
title: ToolInvocationResult
IterrowsResponse:
PaginatedResponse:
type: object
properties:
data:
@ -5557,17 +5570,18 @@ components:
- type: string
- type: array
- type: object
description: The rows in the current page.
next_start_index:
type: integer
description: The list of items for the current page
has_more:
type: boolean
description: >-
Index into dataset for the first row in the next page. None if there are
no more rows.
Whether there are more items available after this set
additionalProperties: false
required:
- data
title: IterrowsResponse
description: A paginated list of rows from a dataset.
- has_more
title: PaginatedResponse
description: >-
A generic paginated response that follows a simple format.
Job:
type: object
properties: