llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Ihar Hrachyshka	18bac27d4e	fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555 ) # What does this PR do? This is the second attempt to switch to system packages by default. Now with a hack to detect conda environment - in which case conda image-type is used. Note: Conda will only be used when --image-name is unset and CONDA_DEFAULT_ENV is set. This means that users without conda will correctly fall back to using system packages when no --image-* arguments are passed at all. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Uses virtualenv: ``` $ llama stack build --template ollama --image-type venv $ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml [...] Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local [...] ``` Uses system packages (virtualenv already initialized): ``` $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages. [...] ``` Attempt to run from environment packages without necessary packages installed: ``` $ python -m venv barebones $ . ./barebones/bin/activate $ pip install -e . # to install llama command $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] ModuleNotFoundError: No module named 'fastapi' ``` ^ failed as expected because the environment doesn't have necessary packages installed. Now install some packages in the new environment: ``` $ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Now see if setting CONDA_DEFAULT_ENV will change what happens by default: ``` $ export CONDA_DEFAULT_ENV=base $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Using conda environment: base Conda environment base does not exist. [...] ``` --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-27 17:13:22 -04:00
Dmitry Rogozhkin	935e706b15	docs: fix remote-vllm instructions (#1805 ) # What does this PR do? * Fix location of `run.yaml` relative to the cloned llama stack repository * Drop `-it` from `docker run` commands as its not needed running services ## Test Plan * Verified running the llama stack following updated instruction CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-03-27 10:19:51 -04:00
Antonin Stefanutti	9d9ab7e7dd	chore: Remove style tags from log formatter (#1808 ) # What does this PR do? Set a formatter for log file handler that does not pollute log messages with color tags. ## Test Plan Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack run ...`	2025-03-27 10:18:21 -04:00
Sébastien Han	e3578b1c1b	chore: remove distributions dir (#1809 ) # What does this PR do? Followup on https://github.com/meta-llama/llama-stack/pull/1801. Move the deps files to llama_stack/templates. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-27 09:03:39 -04:00
Hardik Shah	cb2a9784ab	fix: multiple issues with getting_started notebook (#1795 ) Fixes multiple issues 1. llama stack build of dependencies was breaking with incompatible numpy / pandas when importing datasets Moved the notebook to start a local server instead of using library as a client. This way the setup is cleaner since its all contained and by using `uv run --with` we can test both the server setup process too in CI and release time. 2. The change to [1] surfaced some other issues - running `llama stack run` was defaulting to conda env name - provider data was not being managed properly - Some notebook cells (telemetry for evals) were not updated with latest changes Fixed all the issues and update the notebook. ### Test 1. Manually run it all in local env 2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`	2025-03-26 10:59:12 -07:00
Ihar Hrachyshka	367c08f01e	feat(api): don't return a payload on file delete (#1640 ) # What does this PR do? This is to stay consistent with other APIs. This change registers files in API, even though there are still no providers. Removing tests that require a provider existing for a merged API to enable it in API layer. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-25 17:12:36 -07:00
ehhuang	2f38851751	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 ) Reverts meta-llama/llama-stack#1755 closes #1781	2025-03-25 14:42:05 -07:00
Rashmi Pawar	1a73f8305b	feat: Add nemo customizer (#1448 ) # What does this PR do? This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack post-training module. The integration enables users to fine-tune models using NVIDIA's cloud-based customization service through a consistent Llama Stack interface. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Yet to be done Things pending under this PR: - [x] Integration of fine-tuned model(new checkpoint) for inference with nvidia llm distribution - [x] distribution integration of API - [x] Add test cases for customizer(In Progress) - [x] Documentation ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py ============================================================================================================================================================================ test session starts ============================================================================================================================================================================= platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}} rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%] tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%] ======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ======================================================================================================================================================================== ``` cc: @mattf @dglogo @sumitb --------- Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>	2025-03-25 11:01:10 -07:00
Yuan Tang	441016bee8	feat: Support "stop" parameter in remote:vLLM (#1715 ) # What does this PR do? This adds support for "stop" parameter: https://platform.openai.com/docs/api-reference/completions/create#completions-create-stop ## Test Plan ``` tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=8B-inference:completion:sanity] PASSED [ 5%] tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=8B-inference:completion:sanity] PASSED [ 11%] tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] PASSED [ 16%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=8B-inference:completion:log_probs] PASSED [ 22%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=8B-inference:completion:log_probs] PASSED [ 27%] tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=8B-inference:completion:structured_output] PASSED [ 33%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=8B-inference:chat_completion:non_streaming_01] PASSED [ 38%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=8B-inference:chat_completion:non_streaming_02] PASSED [ 44%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling[txt=8B-inference:chat_completion:ttft] ^TPASSED [ 50%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=8B-inference:chat_completion:streaming_01] PASSED [ 55%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=8B-inference:chat_completion:streaming_02] PASSED [ 61%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=8B-inference:chat_completion:tool_calling] PASSED [ 66%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=8B-inference:chat_completion:tool_calling] PASSED [ 72%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=8B-inference:chat_completion:tool_calling] PASSED [ 77%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=8B-inference:chat_completion:tool_calling] PASSED [ 83%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=8B-inference:chat_completion:structured_output] PASSED [ 88%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B-inference:chat_completion:tool_calling_tools_absent-True] PASSED [ 94%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B-inference:chat_completion:tool_calling_tools_absent-False] PASSED [100%] =============================================================== 18 passed, 3 warnings in 755.79s (0:12:35) =============================================================== ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-24 12:42:55 -07:00
Francisco Arceo	9e1ddf2b53	chore: Updating sqlite-vec to make non-blocking calls (#1762 ) # What does this PR do? This PR updates the sqlite-vec database calls to be non-blocking. Note that each operation creates a new connection, which incurs some performance overhead but is reasonable given [SQLite's threading and connections constraints](https://www.sqlite.org/threadsafe.html). Summary of changes: - Refactored `SQLiteVecIndex` class to store database path instead of connection object - Added `_create_sqlite_connection()` helper function to create connections on demand - Ensured proper connection closure in all database operations - Fixed test fixtures to use a file-based SQLite database for thread-safety - Updated the `SQLiteVecVectorIOAdapter` class to handle per-operation connections This PR helps chip away at https://github.com/meta-llama/llama-stack/issues/1489 ## Test Plan sqlite-vec unit tests passed locally as well as a test script using the client as a library. ## Misc FYI @varshaprasad96 @kevincogan Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-23 17:25:44 -07:00
Xi Yan	094eb6a5ae	feat(rag): entire document context with attachments (#1763 ) # What does this PR do? What Instead of adhoc creating a vectordb and chunking when documents ae sent as an attachment to agent turn, we directly pass raw text from document into messages to model for user context, and let model perform summarization directly. This removes the magic behaviour, and yields better performance than existing approach. Improved Performance - RAG lifecycle notebook - Model: 0.3 factuality score - (+ websearch) Agent: 0.44 factuality score - (+ vector db) Agent: 0.3 factuality score - (+ raw context) Agent: 0.6 factuality score Closes https://github.com/meta-llama/llama-stack/issues/1478 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - [NEW] added section in RAG lifecycle notebook shows better performance <img width="840" alt="image" src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5" /> [//]: # (## Documentation)	2025-03-23 16:57:48 -07:00
ehhuang	39e094736f	chore: make mypy happy with webmethod (#1758 ) # What does this PR do? Gets rid of errors like the below, which is on all webmethod decorated functions llama_stack/apis/agents/agents.py:398: error: Value of type variable "T" of function cannot be "Callable[[Agents, AgentConfig], Coroutine[Any, Any, AgentCreateResponse]]" [type-var] ## Test Plan Run mypy and observes mypy errors gone	2025-03-22 08:17:23 -07:00
ehhuang	06788643b3	feat(telemetry): clean up spans (#1760 )	2025-03-21 20:05:11 -07:00
Dinesh Yeduguru	5eb15684b4	feat: use same trace ids in stack and otel (#1759 ) # What does this PR do? 1) Uses otel compatible id generation for stack 2) Stack starts returning trace id info in the header of response 3) We inject the same trace id that we have into otel in order to force it to use our trace ids. ## Test Plan ``` curl -i --request POST \ --url http://localhost:8321/v1/inference/chat-completion \ --header 'content-type: application/json' \ --data '{ "model_id": "meta-llama/Llama-3.1-70B-Instruct", "messages": [ { "role": "user", "content": { "type": "text", "text": "where do humans live" } } ], "stream": false }' HTTP/1.1 200 OK date: Fri, 21 Mar 2025 21:51:19 GMT server: uvicorn content-length: 1712 content-type: application/json x-trace-id: 595101ede31ece116ebe35b26d67e8cf {"metrics":[{"metric":"prompt_tokens","value":10,"unit":null},{"metric":"completion_tokens","value":320,"unit":null},{"metric":"total_tokens","value":330,"unit":null}],"completion_message":{"role":"assistant","content":"Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. Continents: Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. Countries: There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. Cities and towns: Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. Rural areas: Some humans live in rural areas, such as villages, farms, and countryside.\n5. Islands: Humans inhabit many islands around the world, including tropical islands, island nations, and islands in the Arctic and Antarctic regions.\n6. Underwater habitats: A few humans live in underwater habitats, such as research stations and submarines.\n7. Space: A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nIn terms of specific environments, humans live in a wide range of ecosystems, including:\n\n* Deserts\n* Forests\n* Grasslands\n* Mountains\n* Oceans\n* Rivers\n* Tundras\n* Wetlands\n\nOverall, humans are incredibly adaptable and can be found living in almost every corner of the globe.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null} ``` Same trace id in Jaeger and sqlite: ![Screenshot 2025-03-21 at 2 51 53 PM](https://github.com/user-attachments/assets/38cc04b0-568c-4b9d-bccd-d3b90e581c27) ![Screenshot 2025-03-21 at 2 52 38 PM](https://github.com/user-attachments/assets/722383ad-6305-4020-8a1c-6cfdf381c25f)	2025-03-21 15:41:26 -07:00
ehhuang	b9fbfed216	chore(telemetry): remove service_name entirely (#1755 ) # What does this PR do? ## Test Plan LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py::test_custom_tool --safety-shield meta-llama/Llama-Guard-3-8B --text-model accounts/fireworks/models/llama-v3p1-8b-instruct and verify trace in jaeger UI https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#	2025-03-21 15:11:56 -07:00
Xi Yan	baf68c665c	fix: fix jobs api literal return type (#1757 ) # What does this PR do? - We cannot directly return a literal type > Note: this is not final jobs API change [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="837" alt="image" src="https://github.com/user-attachments/assets/18a17561-35f9-443d-987d-54afdd6ff40c" /> [//]: # (## Documentation)	2025-03-21 14:04:21 -07:00
Ashwin Bharambe	d6887f46c6	fix: a couple of tests were broken and not yet exercised by our per-PR test workflow	2025-03-21 12:12:14 -07:00
ehhuang	34f89bfbd6	feat(telemetry): use zero-width space to avoid clutter (#1754 ) # What does this PR do? Before <img width="858" alt="image" src="https://github.com/user-attachments/assets/6cefb1ae-5603-4818-85ea-a0c337b986bc" /> Note the redundant 'llama-stack' in front of every span ## Test Plan <img width="1171" alt="image" src="https://github.com/user-attachments/assets/bdc5fd5b-ff1f-4f10-8b40-cff2ea93dd1f" />	2025-03-21 12:02:10 -07:00
Ashwin Bharambe	cb7b9dda6c	fix: compare timezones correctly in download script	2025-03-21 11:46:57 -07:00
ehhuang	f76550ce4e	feat(telemetry): normalize path (#1739 ) # What does this PR do? This will prevent 'operations' from being flooded <img width="401" alt="image" src="https://github.com/user-attachments/assets/c95e0eeb-4a10-4003-88df-9bb6d0a548cd" /> Before <img width="1049" alt="image" src="https://github.com/user-attachments/assets/157fb614-e007-4cb3-a571-226e50525bfa" /> ## Test Plan After <img width="811" alt="image" src="https://github.com/user-attachments/assets/b2b10344-1d73-44e5-abee-a9f039090963" />	2025-03-21 10:17:43 -07:00
Derek Higgins	00917ef5b2	fix: Add 'accelerate' dependency to 'prompt-guard' (#1724 ) Required to startup a distribution with prompt guard Closes: #1723 ## Test Plan distribution starts with patch applied Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-03-21 07:37:20 -07:00
Yuan Tang	dce9a24a6c	test: Add default vLLM URL in remote-vllm template (#1736 ) # What does this PR do? This is to avoid errors like the following when running inference integration tests: ``` ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url ``` It's also good to have a default, which is consistent with vLLM API server. ## Test Plan Integration tests can run without the error above. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-21 07:31:59 -07:00
Ashwin Bharambe	03b5c61bfc	feat: make sure agent sessions are under access control (#1737 ) This builds on top of #1703. Agent sessions are now properly access controlled. ## Test Plan Added unit tests	2025-03-21 07:31:16 -07:00
Botao Chen	9114bef484	fix: fix experimental-post-training template (#1740 ) ## What does this PR do? fix the template to make it compatible with the latest dataset and eval api change ## test run `llama stack run llama_stack/templates/experimental-post-training/run.yaml` and spin up the llama stack server successfully	2025-03-20 23:07:19 -07:00
Dinesh Yeduguru	6104bd06a0	feat: add different sinks for otel traces and metrics (#1731 ) # What does this PR do? Since we now start recording and exporting metrics, we no longer can use single OTEL endpoint to export both traces and metrics. This PR adds two sinks: OTEL_TRACE and OTEL_METRIC to be able to selectively enable the exporters. ## Test Plan Start server with OTEL_TRACE as sink and verify traces show up in jaeger ![Screenshot 2025-03-20 at 3 12 25 PM](https://github.com/user-attachments/assets/51007f28-b5ed-4853-912a-965a5cfe83af)	2025-03-20 15:51:41 -07:00
Hardik Shah	127bac6869	fix: Default to port 8321 everywhere (#1734 ) As titled, moved all instances of 5001 to 8321	2025-03-20 15:50:41 -07:00
Hardik Shah	581e8ae562	fix: docker run with `--pull always` to fetch the latest image (#1733 ) As titled	2025-03-20 15:35:48 -07:00
Ashwin Bharambe	f95bc29ca9	fix: handle registry errors gracefully (#1732 ) We need to be able to handle stale registry entries gracefully. More needs to be done when we are deleting important attributes from resources which could have been persisted. But at the very least, the server cannot die. ## Test Plan Added unit tests	2025-03-20 15:24:07 -07:00
Yuan Tang	f5a5c5d459	docs: Add instruction on enabling tool calling for remote vLLM (#1719 ) # What does this PR do? This PR adds a link to tool calling instructions in vLLM. Users have asked about this many times, e.g. https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:18:17 -07:00
Dinesh Yeduguru	86f617a197	fix: tracing middleware to not start for lifespan events (#1730 ) # What does this PR do? Tracing middleware should not start tracing for lifespan events. Lifespan event happens at server startup and shutdown and if we start tracing for them, we will have an active trace for the lifetime of the server, which messes up with regular tracing since we always expect the traces to be never nested. We started hitting this issue since https://github.com/meta-llama/llama-stack/pull/1495. ## Test Plan * llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml * Verify in sqlite store that the trace now has non null span id ![Screenshot 2025-03-20 at 1 49 47 PM](https://github.com/user-attachments/assets/d77354a7-d5f1-4b53-a946-6adbd7a4f772)	2025-03-20 14:22:19 -07:00
Yuan Tang	029e4fc64d	fix: Add missing gcc in container build. Fixes #1716 (#1727 ) # What does this PR do? This should fix https://github.com/meta-llama/llama-stack/issues/1716 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:50:56 -04:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Ihar Hrachyshka	515c16e352	chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711 ) # What does this PR do? Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}. This also makes API accept a tool call result without any content (like RAG tool already may produce). Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 10:01:10 -07:00
Ihar Hrachyshka	355134f51d	fix: Support types.UnionType in schemas (#1721 ) # What does this PR do? Since Python 3.10, unions can be expressed as `type1 \| type2`. Sadly, while this is functionally equivalent to `Union[type1, type2]`, the type of the expression is different (`types.UnionType`, not `typing.Union`). We should handle both in schemas. ## Test Plan Switch a schema type from Union to `\|` and confirm the generator doesn't crash with: ``` Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module> fire.Fire(main) File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 55, in main spec = Specification( ^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 30, in __init__ self.document = generator.generate() ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 782, in generate operation = self._build_operation(op) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 648, in _build_operation "application/json": builder.build_media_type( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 221, in build_media_type schema = self.schema_builder.classdef_to_ref(item_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 135, in classdef_to_ref type_schema = self.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 116, in classdef_to_schema type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 607, in classdef_to_schema types_defined[sub_name] = self._type_to_schema_with_lookup(sub_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 564, in _type_to_schema_with_lookup type_schema = self.type_to_schema(data_type, force_expand=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 320, in type_to_schema return self._type_to_schema(data_type, force_expand, json_schema_extra) \| common_info ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 487, in _type_to_schema property_docstrings = get_class_property_docstrings(typ, self.options.property_description_fun) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 94, in get_class_property_docstrings for base in inspect.getmro(data_type): ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/w2wykgpkzidnnr6cpw8wf94ghb0p8big-python3-3.11.11/lib/python3.11/inspect.py", line 731, in getmro return cls.__mro__ ^^^^^^^^^^^ AttributeError: 'types.UnionType' object has no attribute '__mro__'. Did you mean: '__or__'? ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 09:54:02 -07:00
Ihar Hrachyshka	5403582582	fix: Restore discriminator for AlgorithmConfig (#1706 )	2025-03-20 07:33:26 -07:00
ehhuang	af8b4484a3	fix: update default tool call system prompt (#1712 ) # What does this PR do? closes #1584 This should be a rather innocuous change. ## Test Plan Verify that there's no more tool call parsing error for example in issue <img width="1216" alt="image" src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429" /> LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-19 22:49:24 -07:00
Ashwin Bharambe	01a25d9744	feat(server): add attribute based access control for resources (#1703 ) This PR introduces a way to implement Attribute Based Access Control (ABAC) for the Llama Stack server. The rough design is: - https://github.com/meta-llama/llama-stack/pull/1626 added a way for the Llama Stack server to query an authenticator - We build upon that and expect "access attributes" as part of the response. These attributes indicate the scopes available for the request. - We use these attributes to perform access control for registered resources as well as for constructing the default access control policies for newly created resources. - By default, if you support authentication but don't return access attributes, we will add a unique namespace pointing to the API_KEY. That way, all resources by default will be scoped to API_KEYs. An important aspect of this design is that Llama Stack stays out of the business of credential management or the CRUD for attributes. How you manage your namespaces or projects is entirely up to you. The design only implements access control checks for the metadata / book-keeping information that the Stack tracks. ### Limitations - Currently, read vs. write vs. admin permissions aren't made explicit, but this can be easily extended by adding appropriate attributes to the `AccessAttributes` data structure. - This design does not apply to agent instances since they are not considered resources the Stack knows about. Agent instances are completely within the scope of the Agents API provider. ### Test Plan Added unit tests, existing integration tests	2025-03-19 21:28:52 -07:00
ehhuang	c4e1b8d094	fix: better tool call parsing error message (#1710 ) # What does this PR do? context #1584 ## Test Plan <img width="1366" alt="image" src="https://github.com/user-attachments/assets/b490b590-3270-43cb-838e-8446a8948f1d" />	2025-03-19 20:39:10 -07:00
Ihar Hrachyshka	41bd350539	chore: Don't set type variables from register_schema() (#1713 ) # What does this PR do? Don't set type variables from register_schema(). `mypy` is not happy about it since type variables are calculated at runtime and hence the typing hints are not available during static analysis. Good news is there is no good reason to set the variables from the return type. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-19 20:29:00 -07:00
Charlie Doern	a483a58c6e	chore: deprecate /v1/inspect/providers (#1678 ) # What does this PR do? with the new /v1/providers API, /v1/inspect/providers is duplicative, deprecate it by removing the route, and add a test for the full /v1/providers API resolves #1623 ## Test Plan `uv run pytest -v tests/integration/providers --stack-config=ollama --text-model="meta-llama/Llama-3.2-3B-Instruct" --embedding-model=all-MiniLM-L6-v2` <img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM" src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:27:06 -07:00
Charlie Doern	1f04ca357b	fix: telemetry logger (#1714 ) # What does this PR do? currently if you have a run yaml without temeletry the following error is hit: TypeError: TelemetryAdapter.__init__() missing 1 required positional argument: 'deps' this is because the TelemetryAdapter requires a deps arg to be passed. Pass {} to avoid errors. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:26:13 -07:00
Botao Chen	f369871083	feat: [New Eval Benchamark] IfEval (#1708 ) # What does this PR do? In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following. ## Test Plan spin up a llama stack server with open-benchmark template run `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` on client side and get the eval aggregate results	2025-03-19 16:39:59 -07:00
Michael Clifford	a7008dc15d	fix: Correctly set CLI_ARGS using BUILD_PLATFORM env with llama stack… (#1702 ) # What does this PR do? This PR updates `build_container.sh` to prevent an "unknown flag" error when using the `BUILD_PLATFORM` environment variable during `llama stack build`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) Closes #1699 ## Test Plan Running the following code with out these changes results in an "unknown flag" error. ``` CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container ``` With these changes, the same command should build the image correctly. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-03-19 16:18:11 -07:00
yyymeta	d117bfe597	feat: [new open benchmark] DocVQA (#1647 ) # What does this PR do? DocVQA asks model to look a a picture, then answer a question given in text, with a text answer by text information in the picture. these questions often require understanding of relative positions of texts within the picture. original dataset is defined in the "Task1" of https://www.docvqa.org/datasets ## Test Plan setup llama server with ``` llama stack run ./llama_stack/templates/open-benchmark/run.yaml ``` then send traffic: ``` llama-stack-client eval run-benchmark "meta-reference-docvqa" --model-id meta-llama/Llama-3.3-70B-Instruct --output-dir /tmp/gpqa --num-examples 200 ```	2025-03-19 14:56:14 -07:00
ehhuang	1902e5754c	fix: toolgroups unregister (#1704 ) # What does this PR do? FAILED tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None] - AttributeError: 'coroutine' object has no attribute 'data' ## Test Plan LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/tools/test_tools.py --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704). * #1705 * __->__ #1704	2025-03-19 13:43:51 -07:00
Botao Chen	ab777ef5cd	fix: fix open-benchmark template (#1695 ) ## What does this PR do? open-benchmark templated is broken after the datasets api refactor due to 2 reasons - provider_id and provider_resource_id are no longer needed - the type in run.yaml will be resolved as dict this PR is to fix the above 2 issues ## Test spin up a llama stack server successfully with llama stack run `llama_stack/templates/open-benchmark/run.yaml`	2025-03-19 11:27:11 -07:00
Derek Higgins	6949bd1999	fix: Call pandas.read_* in a seperate thread (#1698 ) These block on io reads which in turn block the server. Move them to their own thread. Closes: #1697 # What does this PR do? To avoid blocking the main eventloop, updates datasetio/localfs to load data in a seperate thread Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-03-19 10:46:37 -07:00
Hardik Shah	65ca85ba6b	fix: Updating `ToolCall.arguments` to allow for json strings that can be decoded on client side (#1685 ) ### What does this PR do? Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`. However, on the client SDK side -- the `RecursiveType` gets deserialized into a number ( both int and float get collapsed ) and hence when params are `int` they get converted to float which might break client side tools that might be doing type checking. Closes: https://github.com/meta-llama/llama-stack/issues/1683 ### Test Plan Stainless changes -- https://github.com/meta-llama/llama-stack-client-python/pull/204 ``` pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-19 10:36:19 -07:00
ehhuang	113f3a259c	docs: add documentation for RAGDocument (#1693 ) # What does this PR do? ## Test Plan	2025-03-19 10:16:00 -07:00
Ashwin Bharambe	5b39d5a76a	feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626 ) This PR adds support (or is a proposal for) for supporting API KEY authentication on the Llama Stack server end. `llama-stack-client` already supports accepting an api_key parameter and passes it down through every request as an `Authentication: ` header. Currently, Llama Stack does not propose APIs for handling authentication or authorization for resources of any kind. Given that, and the fact that any deployment will typically have _some_ authentication system present, we simply adopt a delegation mechanism: delegate to an HTTPS endpoint performing key management / authentication. It is configured via: ```yaml server: auth: endpoint: <...> ``` in the run.yaml configuration. ## How It Works When authentication is enabled: 1. Every API request must include an `Authorization: Bearer <token>` header 2. The server will send a _POST_ validation request to the configured endpoint with the following payload: ```json { "api_key": "<token>", "request": { "path": "/api/path", "headers": { "header1": "value1", ... }, "params": { "param1": "value1", ... } } } ``` 3. If the authentication endpoint returns a 200 status code, the request is allowed to proceed 4. If the authentication endpoint returns any other status code, a 401 Unauthorized response is returned ## Test Plan Unit tests	2025-03-18 16:24:18 -07:00

... 2 3 4 5 6 ...

1129 commits