llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	aa2bd82b1d	fix(ci): add recordings for responses suite due to web search type changing (#4104 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m3s Details #4103 broke (even though the PR itself was green) trunk	2025-11-07 10:42:07 -08:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
Ashwin Bharambe	f49cb0b717	chore: Stack server no longer depends on llama-stack-client (#4094 ) This dependency has been bothering folks for a long time (cc @leseb). We really needed it due to "library client" which is primarily used for our tests and is not a part of the Stack server. Anyone who needs to use the library client can certainly install `llama-stack-client` in their environment to make that work. Updated the notebook references to install `llama-stack-client` additionally when setting things up.	2025-11-07 09:54:09 -08:00
Ashwin Bharambe	b68a25d377	fix(tests): bring back some responses tests (#4098 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m6s Details https://github.com/llamastack/llama-stack/pull/4055 cleaned the agents implementation but while doing so it removed some tests which actually corresponded to the responses implementation. This PR brings those tests and assocated recordings back. (We should likely combine all responses tests into one suite, but that is beyond the scope of this PR.)	2025-11-07 07:49:38 +01:00
Derek Higgins	dc9497a3b2	ci: Temperarily disable Telemetry during tests (#4090 ) Closes: #4089 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 17:53:02 +01:00
Derek Higgins	03d23db910	ci: vllm ci job update (#4088 ) Add missing recording for vllm in library mode Add Docker env (missed during rebase) Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 16:59:55 +01:00
Derek Higgins	c62a09ab76	ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details UI Tests / ui-tests (22) (push) Successful in 57s Details o Introduces vLLM provider support to the record/replay testing framework o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support. The changes enable testing of vLLM functionality. vLLM tests focus on inference capabilities, while Ollama continues to exercise the full API surface including vision features. -- This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B appears to be more capable at structure output and tool calls. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-06 10:36:40 +01:00
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
ehhuang	84a84ee85c	fix: last_id when listing files in vector store (#4079 ) # What does this PR do? the last_id should be the id of the last item in the returned list, not the unfiltered list. ## Test Plan fixed test	2025-11-05 14:10:10 -08:00
Emilio Garcia	ba50790a28	feat(tests): metrics tests (#3966 ) # What does this PR do? 1. Make telemetry tests as easy as possible for users by expanding the `SpanStub` data class and creating the `MetricStub` dataclass as a way to consistently marshal telemetry data in test fixtures and unmarshal and handle it in tests. 2. Structure server and client tests to always follow the same standards for consistent testing experience by using the `SpanStub` and `MetricStub` data class objects. 3. Enable Metrics Testing for completions endpoint 4. Correct token metrics to use histograms instead of counts to capture tokens per request rather than a cumulative count of tokens over the lifecycle of the server. ## Test Plan These are tests	2025-11-05 10:26:15 -08:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Derek Higgins	1562277cfd	ci: test adjustments for Qwen3-0.6B (#3978 ) Without this hint Qwen3-0.6B tends to reply with the full name and sometimes doesn't reply with the correct drafted year. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 12:19:35 -08:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Jiayi Ni	fa7699d2c3	feat: Add rerank API for NVIDIA Inference Provider (#3329 ) # What does this PR do? Add rerank API for NVIDIA Inference Provider. <!-- If resolving an issue, uncomment and update the line below --> Closes #3278 ## Test Plan Unit test: ``` pytest tests/unit/providers/nvidia/test_rerank_inference.py ``` Integration test: ``` pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-10-30 21:42:09 -07:00
Derek Higgins	ff2b270e2f	fix: relax structured output test assertions to handle whitespace and… (#3997 ) … case variations The ollama/llama3.2:3b-instruct-fp16 model returns string values with trailing whitespace in structured JSON output. Updated test assertions to use case-insensitive substring matching instead of exact equality. Use .lower() for case-insensitive comparison Check if expected value is contained in actual value (handles whitespace) Closes: #3996 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 16:55:23 -07:00
Ashwin Bharambe	77c8bc6fa7	fix(ci): add back server:ci-tests to replay tests (#3976 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details Python Package Build Test / build (3.12) (push) Failing after 39s Details Unit Tests / unit-tests (3.12) (push) Failing after 40s Details UI Tests / ui-tests (22) (push) Successful in 42s Details It is useful for local debugging. If both server and docker are failing, you can just run server locally to debug which is much easier to do.	2025-10-30 11:02:59 -07:00
ehhuang	5e20938832	fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982 ) # What does this PR do? this is meant to be a manual flag ## Test Plan CI	2025-10-30 09:13:04 -07:00
Derek Higgins	19d85003de	test: Updated test skips that were marked with "inline::vllm" (#3979 ) This should be "remote::vllm". This causes some log probs tests to be skipped with remote vllm. (They fail if run). Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 14:48:21 +01:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
ehhuang	1aa8979050	test: enable telemetry tests in server mode (#3927 ) # What does this PR do? - added a server-based test OLTP collector ## Test Plan CI	2025-10-28 16:33:48 -07:00
ehhuang	1f9d48cd54	feat: openai files provider (#3946 ) # What does this PR do? - Adds OpenAI files provider - Note that file content retrieval is pretty limited by `purpose` https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com ## Test Plan Modify run yaml to use openai files provider: ``` files: - provider_id: openai provider_type: remote::openai config: api_key: ${env.OPENAI_API_KEY:=} metadata_store: backend: sql_default table_name: openai_files_metadata # Then run files tests ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files ```	2025-10-28 16:25:03 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00
ehhuang	b7dd3f5c56	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923 ) # What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally	2025-10-27 14:26:06 -07:00
Luis Tomas Bolivar	f18b5eb537	fix: Avoid BadRequestError due to invalid max_tokens (#3667 ) This patch ensures if max tokens is not defined, then is set to None instead of 0 when calling openai_chat_completion. This way some providers (like gemini) that cannot handle the `max_tokens = 0` will not fail Issue: #3666	2025-10-27 09:27:21 -07:00
Derek Higgins	00d8414597	fix(tests): limit vector store providers for record mode in CI tests (#3898 ) The vector_provider_wrapper was only limiting providers to faiss/sqlite-vec for replay mode, but CI tests also run in record mode with the same limited set of providers. This caused test failures when trying to test against milvus, chromadb, pgvector, weaviate, and qdrant which aren't configured in the record job.	2025-10-27 09:22:49 -07:00
ehhuang	8265d4efc8	chore(telemetry): code cleanup (#3897 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details UI Tests / ui-tests (22) (push) Successful in 43s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? Clean up telemetry code since the telemetry API has been remove. - moved telemetry files out of providers to core - removed from Api ## Test Plan ❯ OTEL_SERVICE_NAME=llama_stack OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run starter ❯ curl http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [ { "role": "user", "content": "Hello!" } ] }' -> verify traces in Grafana CI	2025-10-23 23:13:02 -07:00
Ashwin Bharambe	0e57233a0a	chore(misc): update datasets, benchmarks to use alpha, beta prefixes (#3891 ) This will be landed together with https://github.com/llamastack/llama-stack-client-python/pull/282 (hence CI will be red on this one.) I have verified locally that tests pass with the updated version of the client-sdk.	2025-10-22 15:26:35 -07:00
Ashwin Bharambe	7918188f1e	fix(ci): enable responses tests in CI; suppress expected MCP auth error logs (#3889 ) Let us enable responses suite in CI now. Also a minor fix: MCP tool tests intentionally trigger authentication failures to verify error handling, but the resulting error logs clutter test output.	2025-10-22 14:59:42 -07:00
Ashwin Bharambe	30ba8c8655	fix(responses): sync conversation before yielding terminal events in streaming (#3888 ) Move conversation sync logic before yield to ensure it executes even when streaming consumers break early after receiving response.completed event. ## Test Plan ``` OLLAMA_URL=http://localhost:11434 \ pytest -sv tests/integration/responses/ \ --stack-config server:ci-tests \ --text-model ollama/llama3.2:3b-instruct-fp16 \ --inference-mode live \ -k conversation_multi ``` This test now passes.	2025-10-22 14:31:12 -07:00
Ashwin Bharambe	c0c0e337d9	misc(tests): add recordings for responses tests	2025-10-21 16:39:08 -07:00
Ashwin Bharambe	122de785c4	chore(cleanup)!: kill vector_db references as far as possible (#3864 ) There should not be "vector db" anywhere.	2025-10-20 20:06:16 -07:00
ehhuang	444f6c88f3	chore: remove build.py (#3869 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 23s Details Test llama stack list-deps / list-deps (push) Failing after 18s Details UI Tests / ui-tests (22) (push) Successful in 57s Details Pre-commit / pre-commit (push) Successful in 1m52s Details # What does this PR do? ## Test Plan CI	2025-10-20 16:28:15 -07:00
Francisco Arceo	48581bf651	chore: Updating how default embedding model is set in stack (#3818 ) # What does this PR do? Refactor setting default vector store provider and embedding model to use an optional `vector_stores` config in the `StackRunConfig` and clean up code to do so (had to add back in some pieces of VectorDB). Also added remote Qdrant and Weaviate to starter distro (based on other PR where inference providers were added for UX). New config is simply (default for Starter distro): ```yaml vector_stores: default_provider_id: faiss default_embedding_model: provider_id: sentence-transformers model_id: nomic-ai/nomic-embed-text-v1.5 ``` ## Test Plan CI and Unit tests. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-20 14:22:45 -07:00
Ashwin Bharambe	2c43285e22	feat(stores)!: use backend storage references instead of configs (#3697 ) This PR changes configurations in a backward incompatible way. Run configs today repeat full SQLite/Postgres snippets everywhere a store is needed, which means duplicated credentials, extra connection pools, and lots of drift between files. This PR introduces named storage backends so the stack and providers can share a single catalog and reference those backends by name. ## Key Changes - Add `storage.backends` to `StackRunConfig`, register each KV/SQL backend once at startup, and validate that references point to the right family. - Move server stores under `storage.stores` with lightweight references (backend + namespace/table) instead of full configs. - Update every provider/config/doc to use the new reference style; docs/codegen now surface the simplified YAML. ## Migration Before: ```yaml metadata_store: type: sqlite db_path: ~/.llama/distributions/foo/registry.db inference_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} conversations_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} ``` After: ```yaml storage: backends: kv_default: type: kv_sqlite db_path: ~/.llama/distributions/foo/kvstore.db sql_default: type: sql_postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} stores: metadata: backend: kv_default namespace: registry inference: backend: sql_default table_name: inference_store max_write_queue_size: 10000 num_writers: 4 conversations: backend: sql_default table_name: openai_conversations ``` Provider configs follow the same pattern—for example, a Chroma vector adapter switches from: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} kvstore: type: sqlite db_path: ~/.llama/distributions/foo/chroma.db ``` to: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} persistence: backend: kv_default namespace: vector_io::chroma_remote ``` Once the backends are declared, everything else just points at them, so rotating credentials or swapping to Postgres happens in one place and the stack reuses a single connection pool.	2025-10-20 13:20:09 -07:00
Shabana Baig	add64e8e2a	feat: Add instructions parameter in response object (#3741 ) # Problem The current inline provider appends the user provided instructions to messages as a system prompt, but the returned response object does not contain the instructions field (as specified in the OpenAI responses spec). # What does this PR do? This pull request adds the instruction field to the response object definition and updates the inline provider. It also ensures that instructions from previous response is not carried over to the next response (as specified in the openAI spec). Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566) ## Test Plan - Tested manually for change in model response w.r.t supplied instructions field. - Added unit test to check that the instructions from previous response is not carried over to the next response. - Added integration tests to check instructions parameter in the returned response object. - Added new recordings for the integration tests. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-20 13:10:37 -07:00
Emilio Garcia	943558af36	test(telemetry): Telemetry Tests (#3805 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Test External API and Providers / test-external (venv) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (push) Failing after 30s Details API Conformance Tests / check-schema-compatibility (push) Successful in 38s Details UI Tests / ui-tests (22) (push) Successful in 1m32s Details Pre-commit / pre-commit (push) Successful in 3m16s Details # What does this PR do? Adds a test and a standardized way to build future tests out for telemetry in llama stack. Contributes to https://github.com/llamastack/llama-stack/issues/3806 ## Test Plan This is the test plan 😎	2025-10-17 10:43:33 -07:00
Ashwin Bharambe	4c9d944380	fix(perf): make batches tests finish 30x faster (#3834 ) In replay mode, inference is instantenous. We don't need to wait 15 seconds for the batch to be done. Fixing polling to do exp backoff makes things work super fast.	2025-10-17 09:16:44 +02:00
Ashwin Bharambe	cd152f4240	feat(ci): add support for docker:distro in tests (#3832 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details Test Llama Stack Build / build (push) Failing after 7s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 26s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 25s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details UI Tests / ui-tests (22) (push) Successful in 1m26s Details Pre-commit / pre-commit (push) Successful in 2m18s Details Also a critical bug fix so test recordings can be found inside docker	2025-10-16 19:33:13 -07:00
slekkala1	8c5705d39e	fix: test id not being set in headers (#3827 ) # What does this PR do? When stack config is set to server in docker STACK_CONFIG_ARG=--stack-config=http://localhost:8321, the env variable was not getting correctly set and test id not set, causing This is needed for test-and-cut to work E openai.BadRequestError: Error code: 400 - {'detail': 'Invalid value: Test ID is required for file ID allocation'} `5286461406` ## Test Plan CI	2025-10-16 10:29:07 -07:00
Sébastien Han	0c368492b7	chore: update agent call (#3824 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (push) Failing after 11s Details API Conformance Tests / check-schema-compatibility (push) Successful in 17s Details UI Tests / ui-tests (22) (push) Successful in 1m49s Details Pre-commit / pre-commit (push) Successful in 2m51s Details followup on https://github.com/llamastack/llama-stack/pull/3810 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-16 16:04:43 +02:00
Derek Higgins	edb8afb219	chore: remove test_cases/openai/responses.json (#3823 ) Its unused Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-16 06:59:29 -07:00
Ashwin Bharambe	f70aa99c97	fix(models)!: always prefix models with provider_id when registering (#3822 ) !!BREAKING CHANGE!! The lookup is also straightforward -- we always look for this identifier and don't try to find a match for something without the provider_id prefix. Note that, this ideally means we need to update the `register_model()` API also (we should kill "identifier" from there) but I am not doing that as part of this PR. ## Test Plan Existing unit tests	2025-10-16 06:47:39 -07:00
Ashwin Bharambe	f205ab6f6c	fix(responses): fixes, re-record tests (#3820 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 17s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 1m43s Details Wanted to re-enable Responses CI but it seems to hang for some reason due to some interactions with conversations_store or responses_store. ## Test Plan ``` # library client ./scripts/integration-tests.sh --stack-config ci-tests --suite responses # server ./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses ```	2025-10-15 16:37:42 -07:00
slekkala1	99141c29b1	feat: Add responses and safety impl extra_body (#3781 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests	2025-10-15 15:01:37 -07:00
Ashwin Bharambe	8e7e0ddfec	fix(responses): use conversation items when no stored messages exist (#3819 ) Handle a base case when no stored messages exist because no Response call has been made. ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --inference-mode record-if-missing --pattern test_conversation_responses ```	2025-10-15 14:43:44 -07:00
Ashwin Bharambe	0a96a7faa5	fix(responses): fix subtle bugs in non-function tool calling (#3817 ) We were generating "FunctionToolCall" items even for MCP (and file-search, etc.) server-side calls. ID mismatches, etc. galore.	2025-10-15 13:57:37 -07:00
Ashwin Bharambe	e9b4278a51	feat(responses)!: improve responses + conversations implementations (#3810 ) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from https://github.com/llamastack/llama-stack-client-python/pull/281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```	2025-10-15 09:36:11 -07:00

1 2 3 4 5 ...

328 commits