llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-22 07:46:17 +00:00

Author	SHA1	Message	Date
Charlie Doern	5d52cb28c2	ci: record-if-missing when coming from stainless (#4408 ) # What does this PR do? we will typically need to record the missing json for net new APIs. use record-if-missing so that the integration tests can re-record and commit the files to the PR set the stainless inference mode to record-if-missing, and properly pass the pr_head_sha on workflow_call. ## Test Plan see `2031824567` which uses this commit. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-12-18 09:40:14 -08:00
Charlie Doern	66f3cf4002	feat: wire Stainless preview SDK into integration tests (#4360 ) # What does this PR do? Enable stainless-builds workflow to test preview SDKs by calling integration-tests workflow with python_url parameter. Add stainless matrix config for faster CI runs on SDK changes. - Make integration-tests.yml reusable with workflow_call inputs - Thread python_url through test setup actions to install preview SDK - Add matrix_key parameter to generate_ci_matrix.py for custom matrices - Update stainless-builds.yml to call integration tests with preview URL This allows us to test a client on the PR introducing the new changes before merging. Contributors can even write new tests using the generated client which should pass on the PR, indicating that they will pass on main upon merge ## Test Plan see triggered action using the workflows on this branch: `5810594042` which installs the stainless SDK from the given url. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-12-16 09:20:40 -08:00
Ashwin Bharambe	492f79ca9b	fix: harden storage semantics (#4118 ) Fixes issues in the storage system by guaranteeing immediate durability for responses and ensuring background writers stay alive. Three related fixes: * Responses to the OpenAI-compatible API now write directly to Postgres/SQLite inside the request instead of detouring through an async queue that might never drain; this restores the expected read-after-write behavior and removes the "response not found" races reported by users. * The access-control shim was stamping owner_principal/access_attributes as SQL NULL, which Postgres interprets as non-public rows; fixing it to use the empty-string/JSON-null pattern means conversations and responses stored without an authenticated user stay queryable (matching SQLite). * The inference-store queue remains for batching, but its worker tasks now start lazily on the live event loop so server startup doesn't cancel them—writes keep flowing even when the stack is launched via llama stack run. Closes #4115 ### Test Plan Added a matrix entry to test our "base" suite against Postgres as the store.	2025-11-12 10:35:39 -08:00
Derek Higgins	c62a09ab76	ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details UI Tests / ui-tests (22) (push) Successful in 57s Details o Introduces vLLM provider support to the record/replay testing framework o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support. The changes enable testing of vLLM functionality. vLLM tests focus on inference capabilities, while Ollama continues to exercise the full API surface including vision features. -- This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B appears to be more capable at structure output and tool calls. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-06 10:36:40 +01:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00

5 commits