llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-19 05:59:42 +00:00

History

ehhuang 4c2fcb6b51 Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s Details API Conformance Tests / check-schema-compatibility (push) Successful in 22s Details UI Tests / ui-tests (22) (push) Successful in 29s Details Pre-commit / pre-commit (push) Successful in 1m25s Details chore: refactor server.main (#3462 ) # What does this PR do? As shown in #3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.		2025-09-18 21:11:13 -07:00
..
access_control	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
prompts	feat: Adding OpenAI Prompts API (#3319 )	2025-09-08 11:05:13 -04:00
routers	chore: introduce write queue for inference_store (#3383 )	2025-09-10 11:57:42 -07:00
routing_tables	feat: create HTTP DELETE API endpoints to unregister ScoringFn and Benchmark resources in Llama Stack (#3371 )	2025-09-15 12:43:38 -07:00
server	chore: refactor server.main (#3462 )	2025-09-18 21:11:13 -07:00
store	fix: Added a bug fix when registering new models (#3453 )	2025-09-16 19:09:06 -07:00
ui	docs: update documentation links (#3459 )	2025-09-17 10:37:35 -07:00
utils	refactor(logging): rename llama_stack logger categories (#3065 )	2025-08-21 17:31:04 -07:00
__init__.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
build.py	feat(distro): no huggingface provider for starter (#3258 )	2025-08-26 14:06:36 -07:00
build_container.sh	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00
build_venv.sh	fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193 )	2025-08-18 17:02:24 -07:00
client.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
common.sh	refactor: remove Conda support from Llama Stack (#2969 )	2025-08-02 15:52:59 -07:00
configure.py	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 )	2025-08-20 07:15:35 -04:00
datatypes.py	feat: combine ProviderSpec datatypes (#3378 )	2025-09-18 16:10:00 +02:00
distribution.py	feat: combine ProviderSpec datatypes (#3378 )	2025-09-18 16:10:00 +02:00
external.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
inspect.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
library_client.py	chore: refactor server.main (#3462 )	2025-09-18 21:11:13 -07:00
providers.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
request_headers.py	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 )	2025-08-20 07:15:35 -04:00
resolver.py	feat: Adding OpenAI Prompts API (#3319 )	2025-09-08 11:05:13 -04:00
stack.py	chore: refactor server.main (#3462 )	2025-09-18 21:11:13 -07:00
start_stack.sh	docs: update documentation links (#3459 )	2025-09-17 10:37:35 -07:00