Add a post-processing step that converts anyOf schemas containing
multiple const string values into proper enum types. This fixes the
Schema/EnumDescriptionNotValid error from Stainless by ensuring enum
schemas are properly formatted instead of using anyOf with const values.
Signed-off-by: Sébastien Han <seb@redhat.com>
Replace deprecated `post /v1/models` with `get /v1/models` in the headline
example to fix Stainless Endpoint/NotFound error.
Signed-off-by: Sébastien Han <seb@redhat.com>
Filter out deprecated endpoints from the combined OpenAPI spec and remove
their references from the Stainless config to fix Endpoint/NotFound
errors.
Signed-off-by: Sébastien Han <seb@redhat.com>
The error is that Query default values can't be set in Annotated; they
must be set with = in the function signature.
See the error:
```
The error is that Query default values can't be set in Annotated; they
must be set with = in the function signature. Searching for where
include_embeddings is defined:
```
Signed-off-by: Sébastien Han <seb@redhat.com>
Added _add_titles_to_unions() to:
Recursively scan all schemas for anyOf/oneOf unions
Generate descriptive titles from the union members
Add those titles to help code generators infer names
Signed-off-by: Sébastien Han <seb@redhat.com>
The _filter_combined_schema function was excluding deprecated
operations. I updated it to include all operations (deprecated and
non-deprecated) for the combined/stainless spec, so these deprecated
endpoints are now included.
Signed-off-by: Sébastien Han <seb@redhat.com>
_filter_combined_schema was using path-level filtering with
_is_path_deprecated, which excluded entire paths if any operation was
deprecated. Since /v1/toolgroups has both GET (not deprecated) and POST
(deprecated), the entire path was excluded, removing the GET operation
and its response schema. Updated _filter_combined_schema to use
operation-level filtering, matching _filter_schema_by_version
Signed-off-by: Sébastien Han <seb@redhat.com>
Removes the need for the strong_typing and pyopenapi packages and purely
use Pydantic for schema generation.
Our generator now purely relies on Pydantic and FastAPI, it is available
at `scripts/fastapi_generator.py`, you can run it like so:
```
uv run ./scripts/run_openapi_generator.sh
```
The generator will:
* Generate the deprecated, experimental, stable and combined specs
* Validate all the spec it generates against OpenAPI standards
A few changes in the schema required for oasdiff some updates so I've
made the following ignore rules. The new Pydantic-based generator is
likely more correct and follows OpenAPI standards better than the old
pyopenapi generator. Instead of trying to make the new generator match
the old one's quirks, we should focus on what's actually correct
according to OpenAPI standards.
These are non-critical changes:
* response-property-became-nullable: Backward compatible:
existing non-null values still work, now also accepts null
* response-required-property-removed: oasdiff reports a false
positive because it doesn't resolve $refs inside anyOf; we could use
tool like 'redocly' to flatten the schema to a single file.
* response-property-type-changed: properties are still object
types, but oasdiff doesn't resolve $refs, so it flags the missing
inline type: object even though the referenced schemas define type:
object
* request-property-one-of-removed: These are false positives
caused by schema restructuring (wrapping in anyOf for nullability,
using -Input variants, or simplifying nested oneOf structures)
that don't change the actual API contract - the same data types are
still accepted, just represented differently in the schema.
* request-parameter-enum-value-removed: These are false
positives caused by oasdiff not resolving $refs - the enum values
(asc, desc, assistants, batch) are still present in the referenced
schemas (Order and OpenAIFilePurpose), just represented via schema
references instead of inline enums.
* request-property-enum-value-removed: this is a false positive caused
by oasdiff not resolving $refs - the enum values (llm, embedding,
rerank) are still present in the referenced ModelType schema,
just represented via schema reference instead of inline enums.
* request-property-type-changed: These are schema quality issues
where type information is missing (due to Any fallback in dynamic
model creation), but the API contract remains unchanged -
properties still exist with correct names and defaults, so the same
requests will work.
* response-body-type-changed: These are false positives caused
by schema representation changes (from inferred/empty types to
explicit $ref schemas, or vice versa) - the actual response types
an API contract remain unchanged, just how they're represented in the
OpenAPI spec.
* response-media-type-removed: This is a false positive caused
by FastAPI's OpenAPI generator not documenting union return types with
AsyncIterator - the streaming functionality with text/event-stream
media type still works when stream=True is passed, it's just not
reflected in the generated OpenAPI spec.
* request-body-type-changed: This is a schema correction - the
old spec incorrectly represented the request body as an object, but
the function signature shows chunks: list[Chunk], so the new spec
correctly shows it as an array, matching the actual API
implementation.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
the directory structure was src/llama-stack-api/llama_stack_api
instead it should just be src/llama_stack_api to match the other
packages.
update the structure and pyproject/linting config
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Without this we get below in server logs
```
RuntimeError: OpenAI response failed: InferenceRouter._construct_metrics() got an unexpected keyword argument
'model_id'
```
Seems the method signature got update but this callsite was not updated
## Test Plan
CI and test with Sabre (Agent framework integration)
# What does this PR do?
Error out when creating vector store with unknown embedding model
Closes https://github.com/llamastack/llama-stack/issues/4047
## Test Plan
Added tests
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.
see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942
Motivation
External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:
- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently
This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.
Changes
- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages
Next Steps
- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests
Pre-cursor PRs to this one:
- #4093
- #3954
- #4064
These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.
relates to #3237
## Test Plan
Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
- force a min precommit version
- pin to >= 4.3.0 when installing
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Building/Deploying docs is failing here:
5530320962 (step):8:49
Needs the playground file. Updated it to reflect current admin status.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Fixed bug where models with No provider_model_id were incorrectly
filtered from the startup config display. The function was checking
multiple fields when it should only filter items with explicitly
disabled provider_id.
Changes:
o Modified remove_disabled_providers to only check provider_id field o
Changed condition from checking multiple fields with None to only
checking provider_id for "__disabled__", None or empty string
o Added comprehensive unit tests
Closes: #4131
Signed-off-by: Derek Higgins <derekh@redhat.com>
We would like to run all OpenAI compatibility tests using only the
openai-client library. This is most friendly for contributors since they
can run tests without needing to update the client-sdks (which is
getting easier but still a long pole.)
This is the first step in enabling that -- no using "library client" for
any of the Responses tests. This seems like a reasonable trade-off since
the usage of an embeddeble library client for Responses (or any
OpenAI-compatible) behavior seems to be not very common. To do this, we
needed to enable MCP tests (which only worked in library client mode)
for server mode.
docs: Add comprehensive Files API and Vector Store integration
documentation
- Add Files API documentation with OpenAI-compatible endpoints
- Create comprehensive guide for OpenAI-compatible file operations
- Reorganize documentation structure: move file operations to files/
directory
- Add vector store provider documentation for Milvus, SQLite-vec, FAISS
- Clean up redundant files and improve navigation
- Update cross-references and eliminate documentation duplication
- Support for release 0.2.14 FileResponse and Vector Store API features
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
A few changes to the storage layer to ensure we reduce unnecessary
contention arising out of our design choices (and letting the database
layer do its correct thing):
- SQL stores now share a single `SqlAlchemySqlStoreImpl` per backend,
and `kvstore_impl` caches instances per `(backend, namespace)`. This
avoids spawning multiple SQLite connections for the same file, reducing
lock contention and aligning the cache story for all backends.
- Added an async upsert API (with SQLite/Postgres dialect inserts) and
routed it through `AuthorizedSqlStore`, then switched conversations and
responses to call it. Using native `ON CONFLICT DO UPDATE` eliminates
the insert-then-update retry window that previously caused long WAL lock
retries.
### Test Plan
Existing tests, added a unit test for `upsert()`
Fixes issues in the storage system by guaranteeing immediate durability
for responses and ensuring background writers stay alive. Three related
fixes:
* Responses to the OpenAI-compatible API now write directly to
Postgres/SQLite inside the request instead of detouring through an async
queue that might never drain; this restores the expected
read-after-write behavior and removes the "response not found" races
reported by users.
* The access-control shim was stamping owner_principal/access_attributes
as SQL NULL, which Postgres interprets as non-public rows; fixing it to
use the empty-string/JSON-null pattern means conversations and responses
stored without an authenticated user stay queryable (matching SQLite).
* The inference-store queue remains for batching, but its worker tasks
now start lazily on the live event loop so server startup doesn't cancel
them—writes keep flowing even when the stack is launched via llama stack
run.
Closes#4115
### Test Plan
Added a matrix entry to test our "base" suite against Postgres as the
store.
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.
Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
as a custom identifier
This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.
Signed-off-by: Derek Higgins <derekh@redhat.com>
Help users find the comprehensive integration testing docs by linking to
the record-replay documentation. This clarifies that the technical
README complements the main docs.
# What does this PR do?
- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
- Updates the UI accordingly to display them.
- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.
- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)
```python
In [1]: client.vector_stores.files.content(
vector_store_id=vector_store.id,
file_id=file.id,
extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```
Screenshots of UI are displayed below:
### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>
### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>
### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>
### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>
### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Tests added for Middleware extension and Provider failures.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Add explicit connection cleanup and shorter timeouts to OpenAI client
fixtures. Fixes CI deadlock after 25+ tests due to connection pool
exhaustion. Also adds 60s timeout to test_conversation_context_loading
as safety net.
## Test Plan
tests pass
Signed-off-by: Charlie Doern <cdoern@redhat.com>