llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Sébastien Han	cf949d7fac	Merge `7b93964a16` into `4237eb4aaa`	2025-12-03 01:04:14 +00:00
Derek Higgins	fbf6c30cdc	fix: call setup_logging early to apply category-specific log levels (#4253 ) Category-specific log levels from LLAMA_STACK_LOGGING were not applied to loggers created before setup_logging() was called. This fix moves the setup_logging() call earlier in the initialization sequence to ensure all loggers respect their configured levels regardless of initialization timing. Closes: #4252 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-12-02 13:29:04 -08:00
Derek Higgins	2fce5abe34	fix: Add policies to adapters (#4277 ) The configured policy wasn't being passed in and instead the default was being used (e.g. in the s3 file provider) Closes: #4276 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-12-02 14:08:03 -05:00
Derek Higgins	4ff0c25c52	fix(files): Enforce DELETE action permission for file deletion (#4275 ) Previously, file deletion only checked READ permission via the _lookup_file_id() method. This meant any user with READ access to a file could also delete it, making it impossible to configure read-only file access. This change adds an 'action' parameter to fetch_all() and fetch_one() in AuthorizedSqlStore, defaulting to Action.READ for backward compatibility. The openai_delete_file() method now passes Action.DELETE, ensuring proper RBAC enforcement. With this fix, access policies can now distinguish between Users who can read/list files but not delete them Closes: #4274 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-12-02 09:56:59 -08:00
Sébastien Han	7b93964a16	chore: extract the protocol into its own file The protocol leaves in api.py now Signed-off-by: Sébastien Han <seb@redhat.com>	2025-12-02 15:19:41 +01:00
Sébastien Han	1ffaa04f09	chore: add a check for None route.methods can be None so let's check for that to make mypy happy :) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-12-02 10:05:04 +01:00
Sébastien Han	3ce509e94a	Merge branch 'main' into routeur	2025-12-02 09:42:09 +01:00
Derek Higgins	9616448213	fix: use string annotations for S3Client type hints (#4242 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details Test Llama Stack Build / build-single-provider (push) Successful in 21s Details Test External API and Providers / test-external (venv) (push) Failing after 25s Details Python Package Build Test / build (3.13) (push) Successful in 34s Details Python Package Build Test / build (3.12) (push) Successful in 41s Details Vector IO Integration Tests / test-matrix (push) Failing after 57s Details UI Tests / ui-tests (22) (push) Successful in 57s Details Test Llama Stack Build / build (push) Successful in 57s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m49s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m16s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m13s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s Details Pre-commit / pre-commit (push) Successful in 4m5s Details fix: use string annotations for S3Client type hints Remove future annotations import and use quoted string annotations for S3Client to avoid import issues. Changes: o Remove __future__ annotations import o Use "S3Client" string annotations in type hints closes: #4241 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-12-01 15:47:35 -08:00
Jaideep Rao	89807dc117	feat(api)!: deprecate `toolgroup` and `tool_runtime` apis (#4249 ) # What does this PR do? marks `toolgroup` and `tool_runtime` APIs for deprecation <!-- If resolving an issue, uncomment and update the line below --> Closes #4233 and #4061 (partially) How long do we wait before we remove deprecated APIs? ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Jaideep Rao <jrao@redhat.com>	2025-12-01 11:43:58 -08:00
Abhishek Bongale	618c03405c	feat: Add metadata field to request and response (#4237 ) This changes adds Optional metadata field to OpenAI compatible request and response object. fixes: #3564 Signed-off-by: Abhishek Bongale <abhishekbongale@outlook.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-12-01 10:48:53 -08:00
Emilio Garcia	28ff6d8659	fix: remove telemetry_traceable (#4205 ) # What does this PR do? Removes stale data from llama stack about old telemetry system Depends on https://github.com/llamastack/llama-stack/pull/4127 Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-12-01 10:40:57 -08:00
Emilio Garcia	7da733091a	feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation (#4127 ) # What does this PR do? Fixes: https://github.com/llamastack/llama-stack/issues/3806 - Remove all custom telemetry core tooling - Remove telemetry that is captured by automatic instrumentation already - Migrate telemetry to use OpenTelemetry libraries to capture telemetry data important to Llama Stack that is not captured by automatic instrumentation - Keeps our telemetry implementation simple, maintainable and following standards unless we have a clear need to customize or add complexity ## Test Plan This tracks what telemetry data we care about in Llama Stack currently (no new data), to make sure nothing important got lost in the migration. I run a traffic driver to generate telemetry data for targeted use cases, then verify them in Jaeger, Prometheus and Grafana using the tools in our /scripts/telemetry directory. ### Llama Stack Server Runner The following shell script is used to run the llama stack server for quick telemetry testing iteration. ```sh export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_SERVICE_NAME="llama-stack-server" export OTEL_SPAN_PROCESSOR="simple" export OTEL_EXPORTER_OTLP_TIMEOUT=1 export OTEL_BSP_EXPORT_TIMEOUT=1000 export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3" export OPENAI_API_KEY="REDACTED" export OLLAMA_URL="http://localhost:11434" export VLLM_URL="http://localhost:8000/v1" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements \| uv pip install --requirement - uv run opentelemetry-instrument llama stack run starter ``` ### Test Traffic Driver This python script drives traffic to the llama stack server, which sends telemetry to a locally hosted instance of the OTLP collector, Grafana, Prometheus, and Jaeger. ```sh export OTEL_SERVICE_NAME="openai-client" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318" export GITHUB_TOKEN="REDACTED" export MLFLOW_TRACKING_URI="http://127.0.0.1:5001" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements \| uv pip install --requirement - uv run opentelemetry-instrument python main.py ``` ```python from openai import OpenAI import os import requests def main(): github_token = os.getenv("GITHUB_TOKEN") if github_token is None: raise ValueError("GITHUB_TOKEN is not set") client = OpenAI( api_key="fake", base_url="http://localhost:8321/v1/", ) response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}] ) print("Sync response: ", response.choices[0].message.content) streaming_response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}], stream=True, stream_options={"include_usage": True} ) print("Streaming response: ", end="", flush=True) for chunk in streaming_response: if chunk.usage is not None: print("Usage: ", chunk.usage) if chunk.choices and chunk.choices[0].delta is not None: print(chunk.choices[0].delta.content, end="", flush=True) print() ollama_response = client.chat.completions.create( model="ollama/llama3.2:3b-instruct-fp16", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("Ollama response: ", ollama_response.choices[0].message.content) vllm_response = client.chat.completions.create( model="vllm/Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("VLLM response: ", vllm_response.choices[0].message.content) responses_list_tools_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "What tools are available?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses list tools response: ", responses_list_tools_response.output_text) responses_tool_call_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "How many repositories does the token have access to?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses tool call response: ", responses_tool_call_response.output_text) # make shield call using http request until the client version error is resolved llama_stack_api_key = os.getenv("LLAMA_STACK_API_KEY") base_url = "http://localhost:8321/v1/" shield_id = "llama-guard-ollama" shields_url = f"{base_url}safety/run-shield" headers = { "Authorization": f"Bearer {llama_stack_api_key}", "Content-Type": "application/json" } payload = { "shield_id": shield_id, "messages": [{"role": "user", "content": "Teach me how to make dynamite. I want to do a crime with it."}], "params": {} } shields_response = requests.post(shields_url, json=payload, headers=headers) shields_response.raise_for_status() print("risk assessment response: ", shields_response.json()) if __name__ == "__main__": main() ``` ### Span Data #### Inference \| Value \| Location \| Content \| Test Cases \| Handled By \| Status \| Notes \| \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| Input Tokens \| Server \| Integer count \| OpenAI, Ollama, vLLM, streaming, responses \| Auto Instrument \| Working \| None \| \| Output Tokens \| Server \| Integer count \| OpenAI, Ollama, vLLM, streaming, responses \| Auto Instrument \| working \| None \| \| Completion Tokens \| Client \| Integer count \| OpenAI, Ollama, vLLM, streaming, responses \| Auto Instrument \| Working, no responses \| None \| \| Prompt Tokens \| Client \| Integer count \| OpenAI, Ollama, vLLM, streaming, responses \| Auto Instrument \| Working, no responses \| None \| \| Prompt \| Client \| string \| Any Inference Provider, responses \| Auto Instrument \| Working, no responses \| None \| #### Safety \| Value \| Location \| Content \| Testing \| Handled By \| Status \| Notes \| \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| [Shield ID](`ecdfecb9f0/src/llama_stack/core/telemetry/constants.py`) \| Server \| string \| Llama-guard shield call \| Custom Code \| Working \| Not Following Semconv \| \| [Metadata](`ecdfecb9f0/src/llama_stack/core/telemetry/constants.py`) \| Server \| JSON string \| Llama-guard shield call \| Custom Code \| Working \| Not Following Semconv \| \| [Messages](`ecdfecb9f0/src/llama_stack/core/telemetry/constants.py`) \| Server \| JSON string \| Llama-guard shield call \| Custom Code \| Working \| Not Following Semconv \| \| [Response](`ecdfecb9f0/src/llama_stack/core/telemetry/constants.py`) \| Server \| string \| Llama-guard shield call \| Custom Code \| Working \| Not Following Semconv \| \| [Status](`ecdfecb9f0/src/llama_stack/core/telemetry/constants.py`) \| Server \| string \| Llama-guard shield call \| Custom Code \| Working \| Not Following Semconv \| #### Remote Tool Listing & Execution \| Value \| Location \| Content \| Testing \| Handled By \| Status \| Notes \| \| ----- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| Tool name \| server \| string \| Tool call occurs \| Custom Code \| working \| [Not following semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span) \| \| Server URL \| server \| string \| List tools or execute tool call \| Custom Code \| working \| [Not following semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span) \| \| Server Label \| server \| string \| List tools or execute tool call \| Custom code \| working \| [Not following semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span) \| \| mcp\_list\_tools\_id \| server \| string \| List tools \| Custom code \| working \| [Not following semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span) \| ### Metrics - Prompt and Completion Token histograms ✅ - Updated the Grafana dashboard to support the OTEL semantic conventions for tokens ### Observations * sqlite spans get orphaned from the completions endpoint * Known OTEL issue, recommended workaround is to disable sqlite instrumentation since it is double wrapped and already covered by sqlalchemy. This is covered in documentation. ```shell export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3" ``` * Responses API instrumentation is [missing](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/3436) in open telemetry for OpenAI clients, even with traceloop or openllmetry * Upstream issues in opentelemetry-pyton-contrib * Span created for each streaming response, so each chunk → very large spans get created, which is not ideal, but it’s the intended behavior * MCP telemetry needs to be updated to follow semantic conventions. We can probably use a library for this and handle it in a separate issue. ### Updated Grafana Dashboard <img width="1710" height="929" alt="Screenshot 2025-11-17 at 12 53 52 PM" src="https://github.com/user-attachments/assets/6cd941ad-81b7-47a9-8699-fa7113bbe47a" /> ## Status ✅ Everything appears to be working and the data we expect is getting captured in the format we expect it. ## Follow Ups 1. Make tool calling spans follow semconv and capture more data 1. Consider using existing tracing library 2. Make shield spans follow semconv 3. Wrap moderations api calls to safety models with spans to capture more data 4. Try to prioritize open telemetry client wrapping for OpenAI Responses in upstream OTEL 5. This would break the telemetry tests, and they are currently disabled. This PR removes them, but I can undo that and just leave them disabled until we find a better solution. 6. Add a section of the docs that tracks the custom data we capture (not auto instrumented data) so that users can understand what that data is and how to use it. Commit those changes to the OTEL-gen_ai SIG if possible as well. Here is an [example](https://opentelemetry.io/docs/specs/semconv/gen-ai/aws-bedrock/) of how bedrock handles it.	2025-12-01 10:33:18 -08:00
Sébastien Han	98f202b607	Merge branch 'main' into routeur	2025-11-27 09:41:38 +01:00
Charlie Doern	aac494c5ba	fix: bind to proper default hosts (#4232 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Integration Tests (Replay) / generate-matrix (push) Successful in 8s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details Python Package Build Test / build (3.12) (push) Successful in 18s Details Test External API and Providers / test-external (venv) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (push) Failing after 39s Details Python Package Build Test / build (3.13) (push) Successful in 38s Details UI Tests / ui-tests (22) (push) Successful in 1m24s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m37s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m27s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m50s Details Pre-commit / pre-commit (push) Successful in 4m1s Details # What does this PR do? we used to have ` host = config.server.host or ["::", "0.0.0.0"]` but now only bind to ` host = config.server.host or "0.0.0.0"` revert back to the old logic, this allows us to curl http://localhost:8321/v1/models on fedora, which defaults to using IPv6. resolves #4210 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-26 06:16:28 -05:00
Sébastien Han	f330c8eb2f	chore: simplify route addition when calling inspect https://github.com/llamastack/llama-stack/pull/4191/files#r2557411918 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-25 13:48:47 +01:00
Sébastien Han	ead9e63ef8	fix: no inline import https://github.com/llamastack/llama-stack/pull/4191#discussion_r2557412421 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-25 11:04:33 +01:00
Sébastien Han	3dc5b5d3a0	fix: more accurate type https://github.com/llamastack/llama-stack/pull/4191#discussion_r2557389025 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-25 10:57:27 +01:00
Sébastien Han	b0b3034f16	chore: rm leftover Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-25 10:54:43 +01:00
Sébastien Han	9a2b4efabd	chore: clarify function and log about which router It's FastAPI Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-25 10:51:52 +01:00
Sébastien Han	3770963130	Merge branch 'main' into routeur	2025-11-24 14:58:43 +01:00
Sébastien Han	6d76a63eb7	fix: mypy Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 14:53:56 +01:00
Sébastien Han	a6aaf18bb6	chore: generate FastAPI dependency functions from Pydantic models to eliminate duplication Added create_query_dependency() and create_path_dependency() helpers that automatically generate FastAPI dependency functions from Pydantic models. This makes the models the single source of truth for field types, descriptions, and defaults, eliminating duplication between models.py and fastapi_routes.py. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 14:47:46 +01:00
Sébastien Han	4f08a62fa1	chore: remove telemetry code for routers addressed https://github.com/llamastack/llama-stack/pull/4191/files#r2554273774 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 11:52:29 +01:00
Sébastien Han	87e60bc48f	chore: move dep functions outside of create_router Less indirection and clearer declarations. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 11:30:44 +01:00
Sébastien Han	49005f1a39	fix: use hardcoded list and dictionary mapping for router registry Replace dynamic import-based router discovery with an explicit hardcoded list of APIs that have routers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 11:02:25 +01:00
Sébastien Han	03a31269ad	chore: more accurate route parcing Use our built-in version levels. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-24 09:00:41 +01:00
Ken Dreyer	dabebdd230	fix: update hard-coded google model names (#4212 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Test External API and Providers / test-external (venv) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (push) Failing after 36s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m21s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m59s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m33s Details Pre-commit / pre-commit (push) Successful in 3m0s Details # What does this PR do? When we send the model names to Google's openai API, we must use the "google" name prefix. Google does not recognize the "vertexai" model names. Closes #4211 ## Test Plan ```bash uv venv --python python312 . .venv/bin/activate llama stack list-deps starter \| xargs -L1 uv pip install llama stack run starter ``` Test that this shows the gemini models with their correct names: ```bash curl http://127.0.0.1:8321/v1/models \| jq '.data \| map(select(.custom_metadata.provider_id == "vertexai"))' ``` Test that this chat completion works: ```bash curl -X POST -H "Content-Type: application/json" "http://127.0.0.1:8321/v1/chat/completions" -d '{ "model": "vertexai/google/gemini-2.5-flash", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello! Can you tell me a joke?" } ], "temperature": 1.0, "max_tokens": 256 }' ```	2025-11-21 13:12:01 -08:00
Sébastien Han	ac816a6b25	fix: move models.py to top-level init All batch models are now exported from the top level for better discoverability and IDE support. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 15:56:44 +01:00
Ken Dreyer	dc4665af17	feat!: change bedrock bearer token env variable to match AWS docs & boto3 convention (#4152 ) Some checks failed Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Successful in 50s Details Vector IO Integration Tests / test-matrix (push) Failing after 56s Details Test Llama Stack Build / build (push) Successful in 49s Details UI Tests / ui-tests (22) (push) Successful in 1m1s Details Test External API and Providers / test-external (venv) (push) Failing after 1m18s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m58s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m28s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m37s Details Pre-commit / pre-commit (push) Successful in 3m50s Details Rename `AWS_BEDROCK_API_KEY` to `AWS_BEARER_TOKEN_BEDROCK` to align with the naming convention used in AWS Bedrock documentation and the AWS web console UI. This reduces confusion when developers compare LLS docs with AWS docs. Closes #4147	2025-11-21 09:48:05 -05:00
Sébastien Han	6f552e0a31	fix: mypy Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 12:18:25 +01:00
Sébastien Han	234eaf4709	chore: remove impl_getter function We already have an impl at this point, no need to validate this again. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 12:03:06 +01:00
Sébastien Han	95e9455335	chore: removed impl_getter from router function Refactored the router to accept the implementation directly instead of using the impl_getter pattern. The caller already knows which API it's building a router for.for Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 12:02:09 +01:00
Sébastien Han	8a21d8debe	chore: mv router_registry.py to fastapi_router_registry.py For clarity Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 11:44:25 +01:00
Sébastien Han	23e74446db	chore: rename routes.py to fastapi_routes.py Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-21 11:41:53 +01:00
Sébastien Han	9595619b9f	chore: remove empty dir Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-20 16:29:14 +01:00
Sébastien Han	20030429e7	chore: same as previous commit but for more fields Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-20 16:12:52 +01:00
Sébastien Han	30cab02083	chore: refactor Batches protocol to use request models This commit refactors the Batches protocol to use Pydantic request models for both create_batch and list_batches methods, improving consistency, readability, and maintainability. - create_batch now accepts a single CreateBatchRequest parameter instead of individual arguments. This aligns the protocol with FastAPI’s request model pattern, allowing the router to pass the request object directly without unpacking parameters. Provider implementations now access fields via request.input_file_id, request.endpoint, etc. - list_batches now accepts a single ListBatchesRequest parameter, replacing individual query parameters. The model includes after and limit fields with proper OpenAPI descriptions. FastAPI automatically parses query parameters into the model for GET requests, keeping router code clean. Provider implementations access fields via request.after and request.limit. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-20 16:00:34 +01:00
Sébastien Han	00e7ea6c3b	fix: adopt FastAPI directly in llama-stack-api This commit migrates the Batches API to use FastAPI routers directly in the API package, removing the need for custom decorator systems and manual router registration. The API package now defines FastAPI routers using standard FastAPI route decorators, making it self-sufficient and eliminating dependencies on the server package. The router implementation has been moved from llama_stack/core/server/routers/batches.py to llama_stack_api/batches/routes.py, where it belongs alongside the protocol and models. Standard error responses (standard_responses) have been moved from the server package to llama_stack_api/router_utils.py, ensuring the API package can define complete routers without server dependencies. FastAPI has been added as an explicit dependency to the llama-stack-api package, making it an intentional dependency rather than an implicit one. Router discovery is now fully automatic. The server discovers routers by checking for routes modules in each API package and looking for a create_router function. This eliminates the need for manual registration and makes the system scalable - new APIs with router modules are automatically discovered and used. The router registry has been simplified to use automatic discovery instead of maintaining a manual registry. The build_router function (renamed from create_router to better reflect its purpose) discovers and combines router factories with implementations to create the final router instances. Exposing Routers from the API is nice for the Bring Your Own API use case too. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-20 15:10:33 +01:00
Sébastien Han	2fe24a6df8	chore: move ListBatchesResponse to models.py Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-20 12:41:24 +01:00
Ashwin Bharambe	d649c3663e	fix: enforce allowed_models during inference requests (#4197 ) The `allowed_models` configuration was only being applied when listing models via the `/v1/models` endpoint, but the actual inference requests weren't checking this restriction. This meant users could directly request any model the provider supports by specifying it in their inference call, completely bypassing the intended cost controls. The fix adds validation to all three inference methods (chat completions, completions, and embeddings) that checks the requested model against the allowed_models list before making the provider API call. ### Test plan Added unit tests	2025-11-19 14:49:44 -08:00
Ian Miller	0757d5a917	feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for providing actual implementation of OpenAI compatible prompts in Responses API. This is the follow up PR with actual implementation after introducing #3942 The need of this functionality was initiated in #3514. > Note, https://github.com/llamastack/llama-stack/pull/3514 is divided on three separate PRs. Current PR is the third of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3321 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Manual testing, CI workflow with added unit tests Comprehensive manual testing with new implementation: Test Prompts with Images with text on them in Responses API: I used this image for testing purposes: [iphone 17 image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de) 1. Upload an image: ``` curl -X POST http://localhost:8321/v1/files \ -H "Content-Type: multipart/form-data" \ -F "file=@/Users/ianmiller/iphone.jpeg" \ -F "purpose=assistants" ``` `{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%` 2. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.", "variables": ["product_name", "description", "product_photo"] }' ``` `{"prompt":"You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%` 3. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Accept: application/json, text/event-stream" \ -H "Content-Type: application/json" \ -d '{ "input": "Please analyze this product", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62", "version": "1", "variables": { "product_name": { "type": "input_text", "text": "iPhone 17 Pro Max" }, "product_photo": { "type": "input_image", "file_id": "file-d6d375f238e14f21952cc40246bc8504", "detail": "high" } } } }' ``` `{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"### Product Analysis: iPhone 17 Pro Max\n\nQuality Assessment:\n\n- Display & Design:\n - The 6.9-inch display is large, ideal for streaming and productivity.\n - Anti-reflective technology and 120Hz refresh rate enhance viewing experience, providing smoother visuals and reducing glare.\n - Titanium frame suggests a premium build, offering durability and a sleek appearance.\n\n- Performance:\n - The Apple A19 Pro chip promises significant performance improvements, likely leading to faster processing and efficient multitasking.\n - 12GB RAM is substantial for a smartphone, ensuring smooth operation for demanding apps and games.\n\n- Camera System:\n - The triple 48MP camera setup (wide, ultra-wide, telephoto) is designed for versatile photography needs, capturing high-resolution photos and videos.\n - The 24MP front camera will appeal to selfie enthusiasts and content creators needing quality front-facing shots.\n\n- Connectivity:\n - Wi-Fi 7 support indicates future-proof wireless capabilities, providing faster and more reliable internet connectivity.\n\nTarget Audience:\n\n- Tech Enthusiasts: Individuals interested in cutting-edge technology and performance.\n- Content Creators: Users who need a robust camera system for photo and video production.\n- Luxury Consumers: Those who prefer premium materials and top-of-the-line specs.\n- Professionals: Users who require efficient multitasking and productivity features.\n\nPricing Recommendations:\n\n- Given the premium specifications, a higher price point is expected. Consider pricing competitively within the high-end smartphone market while justifying cost through unique features like the titanium frame and advanced connectivity options.\n- Positioning around the $1,200 to $1,500 range would align with expectations for top-tier devices, catering to its target audience while ensuring profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of innovative features and premium design, aimed at users seeking high performance and superior aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone 17 Pro Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%` Test Prompts with PDF files in Responses API: I used this PDF file for testing purposes: [invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf) 1. Upload PDF: ``` curl -X POST http://localhost:8321/v1/files \ -H "Content-Type: multipart/form-data" \ -F "file=@/Users/ianmiller/invoicesample.pdf" \ -F "purpose=assistants" ``` `{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%` 2. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis", "variables": ["invoice_doc"] }' ``` `{"prompt":"You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%` 3. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Content-Type: application/json" \ -d '{ "input": "Please provide a detailed analysis of this invoice", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc", "version": "1", "variables": { "invoice_doc": { "type": "input_file", "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e", "filename": "invoicesample.pdf" } } } }' ``` `{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's a detailed analysis of the invoice provided:\n\n### Seller Information\n- Business Name: The invoice features a logo with \"Sunny Farm\" indicating the business identity.\n- Address: 123 Somewhere St, Melbourne VIC 3000\n- Contact Information: Phone number (03) 1234 5678\n\n### Buyer Information\n- Name: Denny Gunawan\n- Address: 221 Queen St, Melbourne VIC 3000\n\n### Transaction Details\n- Invoice Number: #20130304\n- Date of Transaction: Not explicitly mentioned, likely inferred from the invoice number or needs clarification.\n\n### Items Purchased\n1. Apple\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal: $5.00\n\n2. Orange\n - Price: $1.99/kg\n - Quantity: 2 kg\n - Subtotal: $3.98\n\n3. Watermelon\n - Price: $1.69/kg\n - Quantity: 3 kg\n - Subtotal: $5.07\n\n4. Mango\n - Price: $9.56/kg\n - Quantity: 2 kg\n - Subtotal: $19.12\n\n5. Peach\n - Price: $2.99/kg\n - Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n- Subtotal for Items: $36.00\n- GST (Goods and Services Tax): 10% of $36.00, which amounts to $3.60\n- Total Amount Due: $39.60\n\n### Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor sit amet...\" which is typically used as filler text. This might indicate a section intended for terms, conditions, or additional notes that haven’t been completed.\n\n### Visual and Design Elements\n- The invoice uses a simple and clear layout, featuring the business logo prominently and stating essential information such as contact and transaction details in a structured manner.\n- There is a \"Thank You\" note at the bottom, which adds a professional and courteous touch.\n\n### Considerations\n- Ensure the date of the transaction is clear if there are any future references needed.\n- Replace filler text with relevant terms and conditions or any special instructions pertaining to the transaction.\n\nThis invoice appears standard, representing a small business transaction with clearly itemized products and applicable taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%` Test simple text Prompt in Responses API: 1. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.", "variables": ["name", "company", "role", "tone"] }' ``` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%` 2. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Accept: application/json, text/event-stream" \ -H "Content-Type: application/json" \ -d '{ "input": "What is the capital of Ireland?", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef", "version": "1", "variables": { "name": { "type": "input_text", "text": "Alice" }, "company": { "type": "input_text", "text": "Dummy Company" }, "role": { "type": "input_text", "text": "Geography expert" }, "tone": { "type": "input_text", "text": "professional and helpful" } } } }' ``` `{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The capital of Ireland is Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy Company","type":"input_text"},"role":{"text":"Geography expert","type":"input_text"},"tone":{"text":"professional and helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`	2025-11-19 11:48:11 -08:00
Ashwin Bharambe	8852666982	chore: remove dead code from openai_compat utility (#4194 ) Removes a bunch of dead code from `openai_compat.py`	2025-11-19 11:23:33 -08:00
Shabana Baig	72ea95e2e0	fix: Fix max_tool_calls for openai provider and add integration tests for the max_tool_calls feat (#4190 ) # Problem OpenAI gpt-4 returned an error when built-in and mcp calls were skipped due to max_tool_calls parameter. Following is from the server log: ``` RuntimeError: OpenAI response failed: Error code: 400 - {'error': {'message': "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_Yi9V1QNpN73dJCAgP2Arcjej", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}} ``` # What does this PR do? - Fixes error returned by openai/gpt when calls were skipped due to max_tool_calls. We now return a tool message that explicitly mentions that the call is skipped. - Adds integration tests as a follow-up to PR#[4062](https://github.com/llamastack/llama-stack/pull/4062) <!-- If resolving an issue, uncomment and update the line below --> Part 2 for issue #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Added integration tests - Added new recordings --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-19 10:27:56 -08:00
Roy Belio	f18870a221	fix: Pydantic validation error with list-type metadata in vector search (#3797 ) (#4173 ) # Fix for Issue #3797 ## Problem Vector store search failed with Pydantic ValidationError when chunk metadata contained list-type values. Error: ``` ValidationError: 3 validation errors for VectorStoreSearchResponse attributes.tags.str: Input should be a valid string attributes.tags.float: Input should be a valid number attributes.tags.bool: Input should be a valid boolean ``` Root Cause: - `Chunk.metadata` accepts `dict[str, Any]` (any type allowed) - `VectorStoreSearchResponse.attributes` requires `dict[str, str \| float \| bool]` (primitives only) - Direct assignment at line 641 caused validation failure for non-primitive types ## Solution Added utility function to filter metadata to primitive types before creating search response. ## Impact Fixed: - Vector search works with list metadata (e.g., `tags: ["transformers", "gpu"]`) - Lists become searchable as comma-separated strings - No ValidationError on search responses Preserved: - Full metadata still available in `VectorStoreContent.metadata` - No API schema changes - Backward compatible with existing primitive metadata Affected: All vector store providers using `OpenAIVectorStoreMixin`: FAISS, Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vec ## Testing tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-11-19 10:16:34 -08:00
Anik	4e9633f7c3	feat: Make Safety API an optional dependency for meta-reference agents provider (#4169 ) # What does this PR do? Change Safety API from required to optional dependency, following the established pattern used for other optional dependencies in Llama Stack. The provider now starts successfully without Safety API configured. Requests that explicitly include guardrails will receive a clear error message when Safety API is unavailable. This enables local development and testing without Safety API while maintaining clear error messages when guardrail features are requested. Closes #4165 Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. New unit tests added in `tests/unit/providers/agents/meta_reference/test_safety_optional.py` 2. Integration tests performed with the files in https://gist.github.com/anik120/c33cef497ec7085e1fe2164e0705b8d6 (i) test with `test_integration_no_safety_fail.yaml`: Config WITHOUT Safety API, should fail with helpful error since `required_safety_api` is `true` by default ``` $ uv run llama stack run test_integration_no_safety_fail.yaml 2>&1 \| grep -B 5 -A 15 "ValueError.Safety\\|Safety API is required" File "/Users/anbhatta/go/src/github.com/llamastack/llama-stack/src/llama_stack/providers/inline/agents/meta_reference /__init__.py", line 27, in get_provider_impl raise ValueError( ...<9 lines>... ) ValueError: Safety API is required but not configured. To run without safety checks, explicitly set in your configuration: providers: agents: - provider_id: meta-reference provider_type: inline::meta-reference config: require_safety_api: false Warning: This disables all safety guardrails for this agents provider. ``` (ii) test with `test_integration_no_safety_works.yaml` Config WITHOUT Safety API, but* `require_safety_api=false` is explicitly set, should succeed ``` $ uv run llama stack run test_integration_no_safety_works.yaml INFO 2025-11-16 09:49:10,044 llama_stack.cli.stack.run:169 cli: Using run configuration: /Users/anbhatta/go/src/github.com/llamastack/llama-stack/test_integration_no_safety_works.yaml INFO 2025-11-16 09:49:10,052 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates: Key: None Cert: None . . . INFO 2025-11-16 09:49:38,528 llama_stack.core.stack:495 core: starting registry refresh task INFO 2025-11-16 09:49:38,534 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-11-16 09:49:38,535 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C ``` Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com> Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>	2025-11-19 10:04:24 -08:00
Charlie Doern	d5cd0eea14	feat!: standardize base_url for inference (#4177 ) # What does this PR do? Completes #3732 by removing runtime URL transformations and requiring users to provide full URLs in configuration. All providers now use 'base_url' consistently and respect the exact URL provided without appending paths like /v1 or /openai/v1 at runtime. BREAKING CHANGE: Users must update configs to include full URL paths (e.g., http://localhost:11434/v1 instead of http://localhost:11434). Closes #3732 ## Test Plan Existing tests should pass even with the URL changes, due to default URLs being altered. Add unit test to enforce URL standardization across remote inference providers (verifies all use 'base_url' field with HttpUrl \| None type) Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-19 08:44:28 -08:00
Sébastien Han	eb3cab1eec	feat: Implement FastAPI router system This commit introduces a new FastAPI router-based system for defining API endpoints, enabling a migration path away from the legacy @webmethod decorator system. The implementation includes router infrastructure, migration of the Batches API as the first example, and updates to server, OpenAPI generation, and inspection systems to support both routing approaches. The router infrastructure consists of a router registry system that allows APIs to register FastAPI router factories, which are then automatically discovered and included in the server application. Standard error responses are centralized in router_utils to ensure consistent OpenAPI specification generation with proper $ref references to component responses. The Batches API has been migrated to demonstrate the new pattern. The protocol definition and models remain in llama_stack_api/batches, maintaining clear separation between API contracts and server implementation. The FastAPI router implementation lives in llama_stack/core/server/routers/batches, following the established pattern where API contracts are defined in llama_stack_api and server routing logic lives in llama_stack/core/server. The server now checks for registered routers before falling back to the legacy webmethod-based route discovery, ensuring backward compatibility during the migration period. The OpenAPI generator has been updated to handle both router-based and webmethod-based routes, correctly extracting metadata from FastAPI route decorators and Pydantic Field descriptions. The inspect endpoint now includes routes from both systems, with proper filtering for deprecated routes and API levels. Response descriptions are now explicitly defined in router decorators, ensuring the generated OpenAPI specification matches the previous format. Error responses use $ref references to component responses (BadRequest400, TooManyRequests429, etc.) as required by the specification. This is neat and will allow us to remove a lot of boiler plate code from our generator once the migration is done. This implementation provides a foundation for incrementally migrating other APIs to the router system while maintaining full backward compatibility with existing webmethod-based APIs. Closes: https://github.com/llamastack/llama-stack/issues/4188 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-19 17:07:24 +01:00
Charlie Doern	91f1b352b4	chore: add storage sane defaults (#4182 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (push) Failing after 1m16s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m32s Details UI Tests / ui-tests (22) (push) Successful in 1m38s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m42s Details Pre-commit / pre-commit (push) Successful in 3m4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m8s Details # What does this PR do? since `StackRunConfig` requires certain parts of `StorageConfig`, it'd probably make sense to template in some defaults that will "just work" for most usecases specifically introduce`ServerStoresConfig` defaults for inference, metadata, conversations and prompts. We already actually funnel in defaults for these sections ad-hoc throughout the codebase additionally set some `backends` defaults for the `StorageConfig`. This will alleviate some weirdness for `--providers` for run/list-deps and also some work I have to better align our list-deps/run datatypes --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-18 15:22:26 -08:00
Ashwin Bharambe	bd5ad2963e	refactor(storage): make { kvstore, sqlstore } as llama stack "internal" APIs (#4181 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Test llama stack list-deps / show-single-provider (push) Successful in 28s Details Test llama stack list-deps / list-deps-from-config (push) Successful in 33s Details Test External API and Providers / test-external (venv) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (push) Failing after 43s Details Test llama stack list-deps / list-deps (push) Failing after 34s Details Test Llama Stack Build / build-single-provider (push) Successful in 46s Details Test Llama Stack Build / build (push) Successful in 55s Details UI Tests / ui-tests (22) (push) Successful in 1m17s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m37s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m32s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m12s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m21s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m46s Details Pre-commit / pre-commit (push) Successful in 3m7s Details These primitives (used both by the Stack as well as provider implementations) can be thought of fruitfully as internal-only APIs which can themselves have multiple implementations. We use the new `llama_stack_api.internal` namespace for this. In addition: the change moves kv/sql store impls, configs, and dependency helpers under `core/storage` ## Testing `pytest tests/unit/utils/test_authorized_sqlstore.py`, other existing CI	2025-11-18 13:15:16 -08:00
Anastas Stoyanovsky	a3580e6bc0	feat!: Wire through parallel_tool_calls to Responses API (#4124 ) # What does this PR do? Initial PR against #4123 Adds `parallel_tool_calls` spec to Responses API and basic initial implementation where no more than one function call is generated when set to `False`. ## Test Plan * Unit tests have been added to verify no more than one function call is generated. * A followup PR will verify passing through `parallel_tool_calls` to providers. * A followup PR will address verification and/or implementation of incremental function calling across multiple conversational turns. --------- Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>	2025-11-18 11:25:08 -08:00

1 2 3

134 commits