llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Francisco Arceo	31ce208bda	fix: Fix requirements from broken github-actions[bot] (#2323 )	2025-05-30 19:05:47 -07:00
github-actions[bot]	ad15276da1	build: Bump version to 0.2.9	2025-05-30 19:43:09 +00:00
Sébastien Han	63a9f08c9e	chore: use starlette built-in Route class (#2267 ) # What does this PR do? Use a more common pattern and known terminology from the ecosystem, where Route is more approved than Endpoint. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:53:33 -07:00
Sébastien Han	4f3f28f718	chore: use dependency-groups for dev (#2287 ) # What does this PR do? The previous `[project.optional-dependencies]` was misrepresenting what the packages were. They were NOT optional dependencies to the project but development dependencies. Unlike optional dependencies, development dependencies are local-only and will not be included in the project requirements when published to PyPI or other indexes. As such, development dependencies are not included in the [project] table. Additionally, the dev group is synced by default. Source: https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 23:00:17 +02:00
github-actions[bot]	7105a25b0f	build: Bump version to 0.2.8	2025-05-27 20:28:29 +00:00
Sébastien Han	448f00903d	chore: mark blobpath as optional (#2271 ) # What does this PR do? This is not a core dependency of the distro server. It's only necessary when using `inline::rag-runtime` or `inline::meta-reference` providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 10:55:24 +02:00
Yuan Tang	055f48b6a2	fix(security): Upgrade setuptools to v80.8.0. Fixes CVE-2025-47273 (#2242 ) # What does this PR do? This fixes a high vulnerable CVE in `setuptools`: https://github.com/advisories/GHSA-5rjg-fvgr-3xxf Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-24 06:57:24 -07:00
Sébastien Han	c25acedbcd	chore: remove k8s auth in favor of k8s jwks endpoint (#2216 ) # What does this PR do? Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our recent oauth2 recent implementation. The CI test has been kept intact for validation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 16:23:54 +02:00
Ashwin Bharambe	c7015d3d60	feat: introduce OAuth2TokenAuthProvider and notion of "principal" (#2185 ) This PR adds a notion of `principal` (aka some kind of persistent identity) to the authentication infrastructure of the Stack. Until now we only used access attributes ("claims" in the more standard OAuth / OIDC setup) but we need the notion of a User fundamentally as well. (Thanks @rhuss for bringing this up.) This value is not yet _used_ anywhere downstream but will be used to segregate access to resources. In addition, the PR introduces a built-in JWT token validator so the Stack does not need to contact an authentication provider to validating the authorization and merely check the signed token for the represented claims. Public keys are refreshed via the configured JWKS server. This Auth Provider should overwhelmingly be considered the default given the seamless integration it offers with OAuth setups.	2025-05-18 17:54:19 -07:00
Charlie Doern	f02f7b28c1	feat: add huggingface post_training impl (#2132 ) # What does this PR do? adds an inline HF SFTTrainer provider. Alongside touchtune -- this is a super popular option for running training jobs. The config allows a user to specify some key fields such as a model, chat_template, device, etc the provider comes with one recipe `finetune_single_device` which works both with and without LoRA. any model that is a valid HF identifier can be given and the model will be pulled. this has been tested so far with CPU and MPS device types, but should be compatible with CUDA out of the box The provider processes the given dataset into the proper format, establishes the various steps per epoch, steps per save, steps per eval, sets a sane SFTConfig, and runs n_epochs of training if checkpoint_dir is none, no model is saved. If there is a checkpoint dir, a model is saved every `save_steps` and at the end of training. ## Test Plan re-enabled post_training integration test suite with a singular test that loads the simpleqa dataset: https://huggingface.co/datasets/llamastack/simpleqa and a tiny granite model: https://huggingface.co/ibm-granite/granite-3.3-2b-instruct. The test now uses the llama stack client and the proper post_training API runs one step with a batch_size of 1. This test runs on CPU on the Ubuntu runner so it needs to be a small batch and a single step. [//]: # (## Documentation) --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-16 14:41:28 -07:00
github-actions[bot]	65cf076f13	build: Bump version to 0.2.7	2025-05-16 20:32:06 +00:00
Sébastien Han	a5d14749a5	chore: rehydrate requirements.txt (#2146 ) # What does this PR do? Hiccup with 0.2.6 bot release? Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-12 12:45:35 -07:00
github-actions[bot]	23d9f3b1fb	build: Bump version to 0.2.6	2025-05-12 18:02:05 +00:00
Ashwin Bharambe	d27a0f276c	fix: pytest.mark.skip, not pytest.skip	2025-05-04 13:22:06 -07:00
github-actions[bot]	6b4c218788	build: Bump version to 0.2.5	2025-05-03 21:31:01 +00:00
Ashwin Bharambe	799286fe52	fix: Bump version to 0.2.4	2025-04-29 10:34:17 -07:00
Sébastien Han	79851d93aa	feat: Add Kubernetes authentication (#1778 ) # What does this PR do? This commit adds a new authentication system to the Llama Stack server with support for Kubernetes and custom authentication providers. Key changes include: - Implemented KubernetesAuthProvider for validating Kubernetes service account tokens - Implemented CustomAuthProvider for validating tokens against external endpoints - this is the same code that was already present. - Added test for Kubernetes - Updated server configuration to support authentication settings - Added documentation for authentication configuration and usage The authentication system supports: - Bearer token validation - Kubernetes service account token validation - Custom authentication endpoints ## Test Plan Setup a Kube cluster using Kind or Minikube. Run a server with: ``` server: port: 8321 auth: provider_type: kubernetes config: api_server_url: http://url ca_cert_path: path/to/cert (optional) ``` Run: ``` curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers ``` Or replace "my-user" with your service account. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 22:24:58 +02:00
Yuan Tang	28687b0e85	fix: Bump h11 to 0.16.0 to fix cve-2025-43859 (#2041 ) This resolves a new critical severity on h11. See https://access.redhat.com/security/cve/cve-2025-43859. We should consider releasing a new patch with this fix. This was updated via: ``` uv add "h11>=0.16.0" uv export --frozen --no-hashes --no-emit-project --output-file=requirements.txt ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-27 11:45:35 -07:00
Ashwin Bharambe	ff14773fa7	fix: update llama stack client dependency	2025-04-12 18:14:33 -07:00
Ben Browning	2b2db5fbda	feat: OpenAI-Compatible models, completions, chat/completions (#1894 ) # What does this PR do? This stubs in some OpenAI server-side compatibility with three new endpoints: /v1/openai/v1/models /v1/openai/v1/completions /v1/openai/v1/chat/completions This gives common inference apps using OpenAI clients the ability to talk to Llama Stack using an endpoint like http://localhost:8321/v1/openai/v1 . The two "v1" instances in there isn't awesome, but the thinking is that Llama Stack's API is v1 and then our OpenAI compatibility layer is compatible with OpenAI V1. And, some OpenAI clients implicitly assume the URL ends with "v1", so this gives maximum compatibility. The openai models endpoint is implemented in the routing layer, and just returns all the models Llama Stack knows about. The following providers should be working with the new OpenAI completions and chat/completions API: * remote::anthropic (untested) * remote::cerebras-openai-compat (untested) * remote::fireworks (tested) * remote::fireworks-openai-compat (untested) * remote::gemini (untested) * remote::groq-openai-compat (untested) * remote::nvidia (tested) * remote::ollama (tested) * remote::openai (untested) * remote::passthrough (untested) * remote::sambanova-openai-compat (untested) * remote::together (tested) * remote::together-openai-compat (untested) * remote::vllm (tested) The goal to support this for every inference provider - proxying directly to the provider's OpenAI endpoint for OpenAI-compatible providers. For providers that don't have an OpenAI-compatible API, we'll add a mixin to translate incoming OpenAI requests to Llama Stack inference requests and translate the Llama Stack inference responses to OpenAI responses. This is related to #1817 but is a bit larger in scope than just chat completions, as I have real use-cases that need the older completions API as well. ## Test Plan ### vLLM ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ## Documentation Run a Llama Stack distribution that uses one of the providers mentioned in the list above. Then, use your favorite OpenAI client to send completion or chat completion requests with the base_url set to http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the host and port of your Llama Stack server, if different. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-11 13:14:17 -07:00
Ashwin Bharambe	5a31e66a91	fix: update llama-stack-client dependency to fix integration tests	2025-04-06 19:11:05 -07:00
Francisco Arceo	9b478f3756	docs: Adding darkmode to documentation (#1843 ) # What does this PR do? docs: Adding darkmode to documentation ## Test Plan Tested locally. Here's the look: ![Screenshot 2025-03-31 at 9 43 05 AM](https://github.com/user-attachments/assets/5989dbc8-ba03-4710-ad8d-6d4b9ac79786) ## Issues Related to https://github.com/meta-llama/llama-stack/issues/1815 Closes https://github.com/meta-llama/llama-stack/issues/1844 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 08:31:53 -07:00
github-actions[bot]	b7ab1a9710	build: Bump version to 0.1.19	2025-03-29 00:18:38 +00:00
Ashwin Bharambe	8c351fe432	build: Bump version to 0.1.8	2025-03-23 16:01:10 -07:00
Ashwin Bharambe	93cfade8c9	ci: Bump version to 0.1.7	2025-03-14 15:21:26 -07:00
yyymeta	a626b7bce3	feat: [new open benchmark] BFCL_v3 (#1578 ) # What does this PR do? create a new dataset BFCL_v3 from https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html overall each question asks the model to perform a task described in natural language, and additionally a set of available functions and their schema are given for the model to choose from. the model is required to write the function call form including function name and parameters , to achieve the stated purpose. the results are validated against provided ground truth, to make sure that the generated function call and the ground truth function call are syntactically and semantically equivalent, by checking their AST . ## Test Plan start server by ``` llama stack run ./llama_stack/templates/ollama/run.yaml ``` then send traffic ``` llama-stack-client eval run-benchmark "bfcl" --model-id meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/gpqa --num-examples 2 ``` [//]: # (## Documentation)	2025-03-14 12:50:49 -07:00
Ashwin Bharambe	bc8daf7fea	fix: include jinja2 as a core llama-stack dependency (#1529 ) We removed `llama-models` as a dep which was pulling this in for us previously. This did not get caught in the release process because the distros we use for testing (fireworks / together) pull that in via sentence transformers which we don't use in all distros (notably ollama.) See #1511 ## Test Plan Ran `llama-stack-ops/actions/test-and-cut/main.sh` with `ONLY_TEST_DONT_CUT=1 COMMIT_ID=origin/fix_jinja2` and by making it build the ollama docker. Ran the docker to ensure it does not error out with jinja2 dependency error. (Unfortunately there is another error with sqlite_vec there.)	2025-03-10 14:59:11 -07:00
Ashwin Bharambe	0db3a2f511	fix: run pre-commit due to release script bumps	2025-03-07 16:31:42 -08:00
ehhuang	1257288361	build: add 'tiktoken' to deps (#1483 ) Summary: Test Plan:	2025-03-07 12:36:02 -08:00
Sébastien Han	ffa32af930	build: bump llama-stack-client version (#1469 ) ## What does this PR do? Use 0.1.5. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-07 11:42:38 -08:00
Ashwin Bharambe	8bbd52bb9f	chore: remove dependency on llama_models completely (#1344 )	2025-03-01 12:48:08 -08:00
Charlie Doern	de878e15a9	fix: pre-commit updates (#1243 ) # What does this PR do? PR #1139 caused pre-commit failures on main likely due to improper rebase before merge. run pre-commit on main and commit the changes see runs here: `3775148428` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-24 17:20:29 -08:00
Sébastien Han	9bbe34694d	ci: add mypy for static type checking (#1101 ) # What does this PR do? - Enable mypy to run in the CI on a subset of the repository - Fix a few mypy errors - Run mypy from pre-commit Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-21 13:15:40 -08:00
Sébastien Han	69eebaf5bf	build: add missing dev dependencies for unit tests (#1004 ) # What does this PR do? Added necessary dependencies to ensure successful execution of unit tests. Without these, the following command would fail due to missing imports: ``` uv run pytest -v -k "ollama" \ --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py ``` Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive 2m & uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py ``` You can observe that some tests pass while others fail, but the test runs successfully. [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-19 22:26:11 -08:00
Sébastien Han	00613d9014	build: resync uv and deps on 0.1.3 (#1108 ) # What does this PR do? The bot just updated the project to 0.1.3 in https://github.com/meta-llama/llama-stack/commits?author=github-actions%5Bbot%5D but the deps need to be synced. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-14 12:26:04 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Sarthak Deshpande	80ba9deab1	chore: Updated requirements.txt (#1017 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Updated requirements.txt [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-02-08 11:50:35 -08:00
Ashwin Bharambe	f98efe68c9	Misc fixes (#944 ) - Make sure torch + torchvision go together as deps, otherwise bad stuff happens - Add a pre-commit for requirements.txt	2025-02-03 14:08:47 -08:00
Ashwin Bharambe	6344b2429b	Kill requirements.txt	2025-01-31 22:38:58 -08:00
Ashwin Bharambe	05d73dd4fd	Bump version to 0.1.0	2025-01-24 09:50:07 -08:00
Ashwin Bharambe	d6fcdefec7	Bump version to 0.0.63	2024-12-17 23:15:27 -08:00
Ashwin Bharambe	eea478618d	Bump version to 0.0.62	2024-12-17 18:19:47 -08:00
Ashwin Bharambe	02b43be9d7	Bump version to 0.0.61	2024-12-10 10:18:44 -08:00
Ashwin Bharambe	1ad691bb04	Bump version to 0.0.60	2024-12-09 22:19:51 -08:00
Ashwin Bharambe	baae4f7b51	Bump version to 0.0.59	2024-12-09 21:22:20 -08:00
Ashwin Bharambe	2c5c73f7ca	Bump version to 0.0.58	2024-12-06 08:36:00 -08:00
dltn	4c7b1a8fb3	Bump version to 0.0.57	2024-12-02 19:48:46 -08:00
Dinesh Yeduguru	fe48b9fb8c	Bump version to 0.0.56	2024-11-30 12:27:31 -08:00
Ashwin Bharambe	45fd73218a	Bump version to 0.0.55	2024-11-23 09:03:58 -08:00
Ashwin Bharambe	2137b0af40	Bump version to 0.0.54	2024-11-21 16:28:30 -08:00

1 2

93 commits