llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Yuan Tang	dce9a24a6c	test: Add default vLLM URL in remote-vllm template (#1736 ) # What does this PR do? This is to avoid errors like the following when running inference integration tests: ``` ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url ``` It's also good to have a default, which is consistent with vLLM API server. ## Test Plan Integration tests can run without the error above. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-21 07:31:59 -07:00
Ashwin Bharambe	03b5c61bfc	feat: make sure agent sessions are under access control (#1737 ) This builds on top of #1703. Agent sessions are now properly access controlled. ## Test Plan Added unit tests	2025-03-21 07:31:16 -07:00
Ashwin Bharambe	d7a6d92466	fix: only invoke openapi generator if APIs or API generator changes (#1744 ) As titled	2025-03-21 10:25:18 -04:00
Botao Chen	9114bef484	fix: fix experimental-post-training template (#1740 ) ## What does this PR do? fix the template to make it compatible with the latest dataset and eval api change ## test run `llama stack run llama_stack/templates/experimental-post-training/run.yaml` and spin up the llama stack server successfully	2025-03-20 23:07:19 -07:00
Hardik Shah	395203ce0f	Update getting_started.ipynb Fix numpy version mismatch issue	2025-03-20 22:00:08 -07:00
Hardik Shah	5a68a28263	Revert "install pandas and numpy beforehand to avoid version mismatch" This reverts commit `6e0bc5b078`.	2025-03-20 21:57:52 -07:00
Yuan Tang	934de0a281	ci: Enforce concurrency to reduce CI loads (#1738 ) # What does this PR do? When multiple commits are pushed to a PR, multiple CI builds will be triggered. This PR ensures that we only run one concurrent build for each PR to reduce CI loads. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 22:28:47 -04:00
Hardik Shah	5b9c366614	fix: install pandas and numpy beforehand to avoid version mismatch (#1735 ) As titled, due to the recent upgrade of colab. Pandas was out of sync with numpy breaking `llama stack build` in colab	2025-03-20 17:14:05 -07:00
Dinesh Yeduguru	6104bd06a0	feat: add different sinks for otel traces and metrics (#1731 ) # What does this PR do? Since we now start recording and exporting metrics, we no longer can use single OTEL endpoint to export both traces and metrics. This PR adds two sinks: OTEL_TRACE and OTEL_METRIC to be able to selectively enable the exporters. ## Test Plan Start server with OTEL_TRACE as sink and verify traces show up in jaeger ![Screenshot 2025-03-20 at 3 12 25 PM](https://github.com/user-attachments/assets/51007f28-b5ed-4853-912a-965a5cfe83af)	2025-03-20 15:51:41 -07:00
Hardik Shah	127bac6869	fix: Default to port 8321 everywhere (#1734 ) As titled, moved all instances of 5001 to 8321	2025-03-20 15:50:41 -07:00
Hardik Shah	581e8ae562	fix: docker run with `--pull always` to fetch the latest image (#1733 ) As titled	2025-03-20 15:35:48 -07:00
Ashwin Bharambe	f95bc29ca9	fix: handle registry errors gracefully (#1732 ) We need to be able to handle stale registry entries gracefully. More needs to be done when we are deleting important attributes from resources which could have been persisted. But at the very least, the server cannot die. ## Test Plan Added unit tests	2025-03-20 15:24:07 -07:00
Yuan Tang	f5a5c5d459	docs: Add instruction on enabling tool calling for remote vLLM (#1719 ) # What does this PR do? This PR adds a link to tool calling instructions in vLLM. Users have asked about this many times, e.g. https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:18:17 -07:00
Ihar Hrachyshka	be03cb7523	chore: Don't hide stderr from api generator (#1720 ) # What does this PR do? If the generator fails, pre-commit logs will now show how it failed. Note: stdout is still suppressed, so that regular informational messages do not pollute pre-commit output when all the hook does is update generated files. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Inject a failure in the generator code and confirm it's seen in the output. ``` $ git diff diff --git a/docs/openapi_generator/pyopenapi/utility.py b/docs/openapi_generator/pyopenapi/utility.py index f60a33bb..482e26ef 100644 --- a/docs/openapi_generator/pyopenapi/utility.py +++ b/docs/openapi_generator/pyopenapi/utility.py @@ -127,6 +127,7 @@ def is_optional_type(type_: Any) -> bool: def validate_api_method_return_types() -> List[str]: """Validate that all API methods have proper return types.""" + raise NotImplementedError("This function is not implemented yet") errors = [] protocols = api_protocol_map() ``` ``` $ pre-commit run --all-files check for merge conflicts................................................Passed trim trailing whitespace.................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed API Spec Codegen.........................................................Failed - hook id: openapi-codegen - exit code: 1 warning: `VIRTUAL_ENV=/Users/ihrachys/.cache/pre-commit/repo9p35zuhm/py_env-python3` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module> fire.Fire(main) File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 44, in main return_type_errors = validate_api_method_return_types() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 130, in validate_api_method_return_types raise NotImplementedError("This function is not implemented yet") NotImplementedError: This function is not implemented yet ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 15:17:52 -07:00
Dinesh Yeduguru	86f617a197	fix: tracing middleware to not start for lifespan events (#1730 ) # What does this PR do? Tracing middleware should not start tracing for lifespan events. Lifespan event happens at server startup and shutdown and if we start tracing for them, we will have an active trace for the lifetime of the server, which messes up with regular tracing since we always expect the traces to be never nested. We started hitting this issue since https://github.com/meta-llama/llama-stack/pull/1495. ## Test Plan * llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml * Verify in sqlite store that the trace now has non null span id ![Screenshot 2025-03-20 at 1 49 47 PM](https://github.com/user-attachments/assets/d77354a7-d5f1-4b53-a946-6adbd7a4f772)	2025-03-20 14:22:19 -07:00
Yuan Tang	029e4fc64d	fix: Add missing gcc in container build. Fixes #1716 (#1727 ) # What does this PR do? This should fix https://github.com/meta-llama/llama-stack/issues/1716 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:50:56 -04:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Ihar Hrachyshka	515c16e352	chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711 ) # What does this PR do? Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}. This also makes API accept a tool call result without any content (like RAG tool already may produce). Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 10:01:10 -07:00
Ihar Hrachyshka	355134f51d	fix: Support types.UnionType in schemas (#1721 ) # What does this PR do? Since Python 3.10, unions can be expressed as `type1 \| type2`. Sadly, while this is functionally equivalent to `Union[type1, type2]`, the type of the expression is different (`types.UnionType`, not `typing.Union`). We should handle both in schemas. ## Test Plan Switch a schema type from Union to `\|` and confirm the generator doesn't crash with: ``` Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module> fire.Fire(main) File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 55, in main spec = Specification( ^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 30, in __init__ self.document = generator.generate() ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 782, in generate operation = self._build_operation(op) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 648, in _build_operation "application/json": builder.build_media_type( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 221, in build_media_type schema = self.schema_builder.classdef_to_ref(item_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 135, in classdef_to_ref type_schema = self.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 116, in classdef_to_schema type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 607, in classdef_to_schema types_defined[sub_name] = self._type_to_schema_with_lookup(sub_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 564, in _type_to_schema_with_lookup type_schema = self.type_to_schema(data_type, force_expand=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 320, in type_to_schema return self._type_to_schema(data_type, force_expand, json_schema_extra) \| common_info ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 487, in _type_to_schema property_docstrings = get_class_property_docstrings(typ, self.options.property_description_fun) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 94, in get_class_property_docstrings for base in inspect.getmro(data_type): ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/w2wykgpkzidnnr6cpw8wf94ghb0p8big-python3-3.11.11/lib/python3.11/inspect.py", line 731, in getmro return cls.__mro__ ^^^^^^^^^^^ AttributeError: 'types.UnionType' object has no attribute '__mro__'. Did you mean: '__or__'? ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 09:54:02 -07:00
Ihar Hrachyshka	5403582582	fix: Restore discriminator for AlgorithmConfig (#1706 )	2025-03-20 07:33:26 -07:00
ehhuang	af8b4484a3	fix: update default tool call system prompt (#1712 ) # What does this PR do? closes #1584 This should be a rather innocuous change. ## Test Plan Verify that there's no more tool call parsing error for example in issue <img width="1216" alt="image" src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429" /> LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-19 22:49:24 -07:00
Ashwin Bharambe	01a25d9744	feat(server): add attribute based access control for resources (#1703 ) This PR introduces a way to implement Attribute Based Access Control (ABAC) for the Llama Stack server. The rough design is: - https://github.com/meta-llama/llama-stack/pull/1626 added a way for the Llama Stack server to query an authenticator - We build upon that and expect "access attributes" as part of the response. These attributes indicate the scopes available for the request. - We use these attributes to perform access control for registered resources as well as for constructing the default access control policies for newly created resources. - By default, if you support authentication but don't return access attributes, we will add a unique namespace pointing to the API_KEY. That way, all resources by default will be scoped to API_KEYs. An important aspect of this design is that Llama Stack stays out of the business of credential management or the CRUD for attributes. How you manage your namespaces or projects is entirely up to you. The design only implements access control checks for the metadata / book-keeping information that the Stack tracks. ### Limitations - Currently, read vs. write vs. admin permissions aren't made explicit, but this can be easily extended by adding appropriate attributes to the `AccessAttributes` data structure. - This design does not apply to agent instances since they are not considered resources the Stack knows about. Agent instances are completely within the scope of the Agents API provider. ### Test Plan Added unit tests, existing integration tests	2025-03-19 21:28:52 -07:00
ehhuang	c4e1b8d094	fix: better tool call parsing error message (#1710 ) # What does this PR do? context #1584 ## Test Plan <img width="1366" alt="image" src="https://github.com/user-attachments/assets/b490b590-3270-43cb-838e-8446a8948f1d" />	2025-03-19 20:39:10 -07:00
Ihar Hrachyshka	41bd350539	chore: Don't set type variables from register_schema() (#1713 ) # What does this PR do? Don't set type variables from register_schema(). `mypy` is not happy about it since type variables are calculated at runtime and hence the typing hints are not available during static analysis. Good news is there is no good reason to set the variables from the return type. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-19 20:29:00 -07:00
Charlie Doern	a483a58c6e	chore: deprecate /v1/inspect/providers (#1678 ) # What does this PR do? with the new /v1/providers API, /v1/inspect/providers is duplicative, deprecate it by removing the route, and add a test for the full /v1/providers API resolves #1623 ## Test Plan `uv run pytest -v tests/integration/providers --stack-config=ollama --text-model="meta-llama/Llama-3.2-3B-Instruct" --embedding-model=all-MiniLM-L6-v2` <img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM" src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:27:06 -07:00
Charlie Doern	1f04ca357b	fix: telemetry logger (#1714 ) # What does this PR do? currently if you have a run yaml without temeletry the following error is hit: TypeError: TelemetryAdapter.__init__() missing 1 required positional argument: 'deps' this is because the TelemetryAdapter requires a deps arg to be passed. Pass {} to avoid errors. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:26:13 -07:00
Botao Chen	f369871083	feat: [New Eval Benchamark] IfEval (#1708 ) # What does this PR do? In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following. ## Test Plan spin up a llama stack server with open-benchmark template run `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` on client side and get the eval aggregate results	2025-03-19 16:39:59 -07:00
Michael Clifford	a7008dc15d	fix: Correctly set CLI_ARGS using BUILD_PLATFORM env with llama stack… (#1702 ) # What does this PR do? This PR updates `build_container.sh` to prevent an "unknown flag" error when using the `BUILD_PLATFORM` environment variable during `llama stack build`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) Closes #1699 ## Test Plan Running the following code with out these changes results in an "unknown flag" error. ``` CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container ``` With these changes, the same command should build the image correctly. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-03-19 16:18:11 -07:00
ehhuang	b6b103a20d	docs: update for mcp tools (#1705 ) # What does this PR do? ## Test Plan read	2025-03-19 15:45:53 -07:00
yyymeta	d117bfe597	feat: [new open benchmark] DocVQA (#1647 ) # What does this PR do? DocVQA asks model to look a a picture, then answer a question given in text, with a text answer by text information in the picture. these questions often require understanding of relative positions of texts within the picture. original dataset is defined in the "Task1" of https://www.docvqa.org/datasets ## Test Plan setup llama server with ``` llama stack run ./llama_stack/templates/open-benchmark/run.yaml ``` then send traffic: ``` llama-stack-client eval run-benchmark "meta-reference-docvqa" --model-id meta-llama/Llama-3.3-70B-Instruct --output-dir /tmp/gpqa --num-examples 200 ```	2025-03-19 14:56:14 -07:00
ehhuang	1902e5754c	fix: toolgroups unregister (#1704 ) # What does this PR do? FAILED tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None] - AttributeError: 'coroutine' object has no attribute 'data' ## Test Plan LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/tools/test_tools.py --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704). * #1705 * __->__ #1704	2025-03-19 13:43:51 -07:00
Botao Chen	ab777ef5cd	fix: fix open-benchmark template (#1695 ) ## What does this PR do? open-benchmark templated is broken after the datasets api refactor due to 2 reasons - provider_id and provider_resource_id are no longer needed - the type in run.yaml will be resolved as dict this PR is to fix the above 2 issues ## Test spin up a llama stack server successfully with llama stack run `llama_stack/templates/open-benchmark/run.yaml`	2025-03-19 11:27:11 -07:00
Derek Higgins	6949bd1999	fix: Call pandas.read_* in a seperate thread (#1698 ) These block on io reads which in turn block the server. Move them to their own thread. Closes: #1697 # What does this PR do? To avoid blocking the main eventloop, updates datasetio/localfs to load data in a seperate thread Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-03-19 10:46:37 -07:00
Hardik Shah	65ca85ba6b	fix: Updating `ToolCall.arguments` to allow for json strings that can be decoded on client side (#1685 ) ### What does this PR do? Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`. However, on the client SDK side -- the `RecursiveType` gets deserialized into a number ( both int and float get collapsed ) and hence when params are `int` they get converted to float which might break client side tools that might be doing type checking. Closes: https://github.com/meta-llama/llama-stack/issues/1683 ### Test Plan Stainless changes -- https://github.com/meta-llama/llama-stack-client-python/pull/204 ``` pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-19 10:36:19 -07:00
ehhuang	113f3a259c	docs: add documentation for RAGDocument (#1693 ) # What does this PR do? ## Test Plan	2025-03-19 10:16:00 -07:00
Francisco Arceo	5418e63919	chore: Add triagers list #1561 (#1701 ) # What does this PR do? Adds triagers list ## Closes #1561 ## Documentation Was provided here: https://github.com/meta-llama/llama-stack/pull/1621 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-19 09:59:17 -07:00
Yuan Tang	7c0448456e	docs: Remove mentions of focus on Llama models (#1690 ) # What does this PR do? This is a follow-up of https://github.com/meta-llama/llama-stack/issues/965 to avoid mentioning exclusive support on Llama models. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-19 00:17:22 -04:00
Ashwin Bharambe	5b39d5a76a	feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626 ) This PR adds support (or is a proposal for) for supporting API KEY authentication on the Llama Stack server end. `llama-stack-client` already supports accepting an api_key parameter and passes it down through every request as an `Authentication: ` header. Currently, Llama Stack does not propose APIs for handling authentication or authorization for resources of any kind. Given that, and the fact that any deployment will typically have _some_ authentication system present, we simply adopt a delegation mechanism: delegate to an HTTPS endpoint performing key management / authentication. It is configured via: ```yaml server: auth: endpoint: <...> ``` in the run.yaml configuration. ## How It Works When authentication is enabled: 1. Every API request must include an `Authorization: Bearer <token>` header 2. The server will send a _POST_ validation request to the configured endpoint with the following payload: ```json { "api_key": "<token>", "request": { "path": "/api/path", "headers": { "header1": "value1", ... }, "params": { "param1": "value1", ... } } } ``` 3. If the authentication endpoint returns a 200 status code, the request is allowed to proceed 4. If the authentication endpoint returns any other status code, a 401 Unauthorized response is returned ## Test Plan Unit tests	2025-03-18 16:24:18 -07:00
yyymeta	b79e0435de	fix: avoid tensor memory error (#1688 ) # What does this PR do? we randomly get errors like the following, it's most likely due to accessing an object that is already deallocated ``` E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Traceback (most recent call last): E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] fn(i, args) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 611, in _wrap E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] ret = record(fn)(args_) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return f(args, kwargs) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 249, in worker_process_entrypoint E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] task = req_gen.send(result) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 156, in retrieve_requests E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] torch.distributed.broadcast_object_list( E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return func(args, **kwargs) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3504, in broadcast_object_list E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] object_list[i] = _tensor_to_object(obj_view, obj_size, group) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2961, in _tensor_to_object E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return _unpickler(io.BytesIO(buf)).load() E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] EOFError: Ran out of input E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Process SpawnProcess-1: Traceback (most recent call last): ``` ## Test Plan start server ``` llama-stack-client eval run-benchmark mmmu_v1 --model-id meta-llama/Llama-4-17B-Omni-Instruct --output-dir /tmp/mmmu_standard --num-examples 30 ``` [//]: # (## Documentation)	2025-03-18 16:17:29 -07:00
Sarthak Deshpande	9c8e88ea9c	fix: Fixed import errors for UI and playground (#1666 ) # What does this PR do? Fixed import errors for playground and ui --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 15:00:48 -07:00
Ihar Hrachyshka	0cbb7f7f21	chore: fix mypy violations in post_training modules (#1548 ) # What does this PR do? Fixes a bunch of violations. Note: this patch touches all files but post_training.py that will be significantly changed by #1437, hence leaving it out of the picture for now. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Testing with https://github.com/meta-llama/llama-stack/pull/1543 Also checked that GPU training works with the change: ``` INFO: ::1:53316 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK INFO: ::1:53316 - "GET /v1/post-training/job/status?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK INFO: ::1:53316 - "GET /v1/post-training/job/artifacts?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK 21:24:01.161 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (32526.75ms) 21:23:28.769 [DEBUG] Setting manual seed to local seed 3918872849. Local seed is seed + rank = 3918872849 + 0 21:23:28.996 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. 21:23:29.933 [INFO] Memory stats after model init: GPU peak memory allocation: 6.05 GiB GPU peak memory reserved: 6.10 GiB GPU peak memory active: 6.05 GiB 21:23:29.934 [INFO] Model is initialized with precision torch.bfloat16. 21:23:30.115 [INFO] Tokenizer is initialized. 21:23:30.118 [INFO] Optimizer is initialized. 21:23:30.119 [INFO] Loss is initialized. 21:23:30.896 [INFO] Dataset and Sampler are initialized. 21:23:30.898 [INFO] Learning rate scheduler is initialized. 21:23:31.618 [INFO] Memory stats after model init: GPU peak memory allocation: 6.24 GiB GPU peak memory reserved: 6.30 GiB GPU peak memory active: 6.24 GiB 21:23:31.620 [INFO] Starting checkpoint save... 21:23:59.428 [INFO] Model checkpoint of size 6.43 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth 21:23:59.445 [INFO] Adapter checkpoint of size 0.00 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-18 14:58:16 -07:00
Sébastien Han	f86f3cf878	docs: remove redundant installation instructions (#1138 ) # What does this PR do? The previous installation instructions were mostly duplicating information already covered in the documentation, either in the “Start a Server” or “Contributing Guide” sections. Removed these redundant details to avoid confusion and streamline the setup process. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:52:21 -07:00
Yuan Tang	22e560351e	ci: Add scheduled workflow to update changelog (#1503 ) # What does this PR do? This is a follow up from https://github.com/meta-llama/llama-stack/pull/1463. cc @yanxi0830 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:39:22 -07:00
Sarthak Deshpande	5ece262976	chore: Make code interpreter async (#1654 ) # What does this PR do? Made code interpreter tool call to be async such that its non blocking ## Test Plan pytest -s -v tests/integration/agents/test_agents.py --stack-config=together --text-model=meta-llama/Llama-3.3-70B-Instruct <img width="1693" alt="image" src="https://github.com/user-attachments/assets/42520bb6-7acf-42d5-b71f-b35ca149d722" /> [//]: # (## Documentation) Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 14:13:46 -07:00
Yuan Tang	d609ffce2a	chore: Add links and badges to both unit and integration tests (#1632 ) # What does this PR do? This makes it easier to know the statuses of both and identifying failed builds. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-18 14:12:17 -07:00
Sébastien Han	c029fbcd13	fix: return 4xx for non-existent resources in GET requests (#1635 ) # What does this PR do? - Removed Optional return types for GET methods - Raised ValueError when requested resource is not found - Ensures proper 4xx response for missing resources - Updated the API generator to check for wrong signatures ``` $ uv run --with ".[dev]" ./docs/openapi_generator/run_openapi_generator.sh Validating API method return types... API Method Return Type Validation Errors: Method ScoringFunctions.get_scoring_function returns Optional type ``` Closes: https://github.com/meta-llama/llama-stack/issues/1630 ## Test Plan Run the server then: ``` curl http://127.0.0.1:8321/v1/models/foo {"detail":"Invalid value: Model 'foo' not found"}% ``` Server log: ``` INFO: 127.0.0.1:52307 - "GET /v1/models/foo HTTP/1.1" 400 Bad Request 09:51:42.654 [END] /v1/models/foo [StatusCode.OK] (134.65ms) 09:51:42.651 [ERROR] Error executing endpoint route='/v1/models/{model_id:path}' method='get' Traceback (most recent call last): File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 193, in endpoint return await maybe_await(value) File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 156, in maybe_await return await value File "/Users/leseb/Documents/AI/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, *kwargs) File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 217, in get_model raise ValueError(f"Model '{model_id}' not found") ValueError: Model 'foo' not found ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:06:53 -07:00
Daniele Martinoli	cca9bd6cc3	feat: Qdrant inline provider (#1273 ) # What does this PR do? Removed local execution option from the remote Qdrant provider and introduced an explicit inline provider for the embedded execution. Updated the ollama template to include this option: this part can be reverted in case we don't want to have two default `vector_io` providers. (Closes #1082) ## Test Plan Build and run an ollama distro: ```bash llama stack build --template ollama --image-type conda llama stack run --image-type conda ollama ``` Run one of the sample ingestionapplicatinos like [rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py), but replace this line: ```py selected_vector_provider = vector_providers[0] ``` with the following, to use the `qdrant` provider: ```py selected_vector_provider = vector_providers[1] ``` After running the test code, verify the timestamp of the Qdrant store: ```bash % ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_* total 784 -rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite ``` [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-03-18 14:04:21 -07:00
Nathan Weinberg	141b3c14dd	docs: fix broken test path in CONTRIBUTING.md (#1679 ) # What does this PR do? fix broken test path in CONTRIBUTING.md Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-18 13:39:46 -07:00
Ihar Hrachyshka	814eb75321	chore: enable ruff for ./scripts too (#1643 ) # What does this PR do? Enable ruff for scripts. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-18 12:17:21 -07:00
Matthew Farrellee	706b4ca651	feat: support nvidia hosted vision models (llama 3.2 11b/90b) (#1278 ) # What does this PR do? support nvidia hosted 3.2 11b/90b vision models. they are not hosted on the common https://integrate.api.nvidia.com/v1. they are hosted on their own individual urls. ## Test Plan `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/inference/test_vision_inference.py --inference-model=meta/llama-3.2-11b-vision-instruct -k image`	2025-03-18 11:54:10 -07:00

1 2 3 4 5 ...

1615 commits