llama-stack-mirror

2851 commits 155 branches 116 tags 125 MiB

Author	SHA1	Message	Date
Eric Huang	a93130e323	test # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. Fix library client (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. Update remaining providers to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. Update unit tests to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.	2025-10-09 13:53:33 -07:00
Ashwin Bharambe	61b4238912	feat(api): add extra_body parameter support with shields example (#3670 ) ## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under `x-llama-stack-extra-body-params` but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-03 13:25:09 -07:00
ehhuang	4c2fcb6b51	chore: refactor server.main (#3462 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s Details API Conformance Tests / check-schema-compatibility (push) Successful in 22s Details UI Tests / ui-tests (22) (push) Successful in 29s Details Pre-commit / pre-commit (push) Successful in 1m25s Details # What does this PR do? As shown in #3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.	2025-09-18 21:11:13 -07:00
ehhuang	9d3a234bf3	chore: remove unused variable (#3389 ) # What does this PR do? ## Test Plan	2025-09-09 15:51:20 -07:00
Mustafa Elbehery	1790fc0f25	feat: Remove initialize() Method from LlamaStackAsLibrary (#2979 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR removes `init()` from `LlamaStackAsLibrary` Currently client.initialize() had to be invoked by user. To improve dev experience and to avoid runtime errors, this PR init LlamaStackAsLibrary implicitly upon using the client. It prevents also multiple init of the same client, while maintaining backward ccompatibility. This PR does the following - Automatic Initialization: Constructor calls initialize_impl() automatically. - Client is fully initialized after __init__ completes. - Prevents consecutive initialization after the client has been successfully initialized. - initialize() method still exists but is now a no-op. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2946 --------- Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 15:59:04 -07:00
Mustafa Elbehery	3f8df167f3	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 ) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-08-20 07:15:35 -04:00
IAN MILLER	c9b78602d3	refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to make the behavior DELETE API endpoints be consistent with standard RESTful conventions and eliminate confusion for API consumers. Old Behavior ``` HTTP Status: 200 OK Response Body: null ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` `INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry: 15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield HTTP/1.1" 200 ` Updated Behavior ``` HTTP Status: 204 No Content Response Body: empty (no body) ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry: 15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield HTTP/1.1" 204 ` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3090 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run `./scripts/unit-tests.sh`	2025-08-13 07:56:26 -07:00
Ashwin Bharambe	cc87995e2b	chore: rename templates to distributions (#3035 ) As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change.	2025-08-04 11:34:17 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00

Renamed from llama_stack/distribution/library_client.py (Browse further)

9 commits