Commit graph

7 commits

Author SHA1 Message Date
ehhuang
4c2fcb6b51
chore: refactor server.main (#3462)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 10s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
API Conformance Tests / check-schema-compatibility (push) Successful in 22s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?
As shown in #3421, we can scale stack to handle more RPS with k8s
replicas. This PR enables multi process stack with uvicorn --workers so
that we can achieve the same scaling without being in k8s.

To achieve that we refactor main to split out the app construction
logic. This method needs to be non-async. We created a new `Stack` class
to house impls and have a `start()` method to be called in lifespan to
start background tasks instead of starting them in the old
`construct_stack`. This way we avoid having to manage an event loop
manually.


## Test Plan
CI

> uv run --with llama-stack python -m llama_stack.core.server.server
benchmarking/k8s-benchmark/stack_run_config.yaml

works.

> LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv
run uvicorn llama_stack.core.server.server:create_app --port 8321
--workers 4

works.
2025-09-18 21:11:13 -07:00
ehhuang
9d3a234bf3
chore: remove unused variable (#3389)
# What does this PR do?


## Test Plan
2025-09-09 15:51:20 -07:00
Mustafa Elbehery
1790fc0f25
feat: Remove initialize() Method from LlamaStackAsLibrary (#2979)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR removes `init()` from `LlamaStackAsLibrary` 

Currently client.initialize() had to be invoked by user.
To improve dev experience and to avoid runtime errors, this PR init
LlamaStackAsLibrary implicitly upon using the client.
It prevents also multiple init of the same client, while maintaining
backward ccompatibility.

This PR does the following 

- Automatic Initialization: Constructor calls initialize_impl()
automatically.
-  Client is fully initialized after __init__ completes.
- Prevents consecutive initialization after the client has been
successfully initialized.
-  initialize() method still exists but is now a no-op.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
fixes https://github.com/meta-llama/llama-stack/issues/2946

---------

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-21 15:59:04 -07:00
Mustafa Elbehery
3f8df167f3
chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)
# What does this PR do?

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-08-20 07:15:35 -04:00
IAN MILLER
c9b78602d3
refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to make the behavior DELETE API endpoints be
consistent with standard RESTful conventions and eliminate confusion for
API consumers.

Old Behavior
```
HTTP Status: 200 OK
Response Body: null
```

Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
`INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry:
15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield
HTTP/1.1" 200 `

Updated Behavior
```
HTTP Status: 204 No Content
Response Body: empty (no body)
```

Eg.  `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry:
15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield
HTTP/1.1" 204 `

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3090 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 07:56:26 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Renamed from llama_stack/distribution/library_client.py (Browse further)