The term "artifacts" better represents the purpose of this API, which
handles outputs generated by API executions, eventually stored objects
that can be of served by any storage interface (file, objects).
This aligns better with the industry convention of 'artifacts' (build
outputs, process results) rather than generic 'files'. 'files' would
be appropriate if the goal was to store and retrieve files purely.
Additionally, in our context, artifact is a better term since it will
handle:
* Data produced by SDG (Synthetic Data Generation) - as input
* Output of a trained model - as output
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
The goal of this PR is code base modernization.
Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)
Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Move pagination logic from LocalFS and HuggingFace implementations into
a common helper function to ensure consistent pagination behavior across
providers. This reduces code duplication and centralizes pagination
logic in one place.
## Test Plan
Run this script:
```
from llama_stack_client import LlamaStackClient
# Initialize the client
client = LlamaStackClient(base_url="http://localhost:8321")
# Register a dataset
response = client.datasets.register(
purpose="eval/messages-answer", # or "eval/question-answer" or "post-training/messages"
source={"type": "uri", "uri": "huggingface://datasets/llamastack/simpleqa?split=train"},
dataset_id="my_dataset", # optional, will be auto-generated if not provided
metadata={"description": "My evaluation dataset"}, # optional
)
# Verify the dataset was registered by listing all datasets
datasets = client.datasets.list()
print(f"Registered datasets: {[d.identifier for d in datasets]}")
# You can then access the data using the datasetio API
# rows = client.datasets.iterrows(dataset_id="my_dataset", start_index=1, limit=2)
rows = client.datasets.iterrows(dataset_id="my_dataset")
print(f"Data: {rows.data}")
```
And play with `start_index` and `limit`.
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>