llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Derek Higgins	64829947d0	feat: Add temperature support to responses API (#2065 ) # What does this PR do? Add support for the temperature to the responses API ## Test Plan Manually tested simple case unit tests added for simple case and tool calls Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-01 11:47:58 -07:00
Ben Browning	8dfce2f596	feat: OpenAI Responses API (#1989 ) # What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-04-28 14:06:00 -07:00
Alexey Rybak	326cbba579	feat(agents): add agent naming functionality (#1922 ) # What does this PR do? Allow users to name an agent and use the name in telemetry instead of relying on randomly generated agent_ids. This improves the developer experience by making it easier to find specific agents in telemetry logs. Closes #1832 ## Test Plan - Added tests to verify the agent name is properly stored and retrieved - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_name_filtering` from the root of the project and made sure the tests pass - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_query_spans` to verify existing code without agent names still works correctly ## Use Example ``` agent = Agent( llama_stack_client, model=text_model_id, name="CustomerSupportAgent", # New parameter instructions="You are a helpful customer support assistant" ) session_id = agent.create_session(f"test-session-{uuid4()}") ``` ## Implementation Notes - Agent names are optional string parameters with no additional validation - Names are not required to be unique - multiple agents can have the same name - The agent_id remains the unique identifier for an agent --------- Co-authored-by: raghotham <raghotham@gmail.com>	2025-04-17 07:02:47 -07:00
Sébastien Han	10882bf478	chore: remove unused tempdir in agent (#1896 ) # What does this PR do? The usage of the tempdir was removed in `094eb6a5ae`. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 09:43:48 +02:00
Ashwin Bharambe	530d4bdfe1	refactor: move all llama code to models/llama out of meta reference (#1887 ) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-04-07 15:03:58 -07:00
Ashwin Bharambe	b8f1561956	feat: introduce llama4 support (#1877 ) As title says. Details in README, elsewhere.	2025-04-05 11:53:35 -07:00
Xi Yan	90efafafb7	chore: change context to content for agent (#1840 )	2025-03-30 10:33:58 -07:00
Xi Yan	094eb6a5ae	feat(rag): entire document context with attachments (#1763 ) # What does this PR do? What Instead of adhoc creating a vectordb and chunking when documents ae sent as an attachment to agent turn, we directly pass raw text from document into messages to model for user context, and let model perform summarization directly. This removes the magic behaviour, and yields better performance than existing approach. Improved Performance - RAG lifecycle notebook - Model: 0.3 factuality score - (+ websearch) Agent: 0.44 factuality score - (+ vector db) Agent: 0.3 factuality score - (+ raw context) Agent: 0.6 factuality score Closes https://github.com/meta-llama/llama-stack/issues/1478 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - [NEW] added section in RAG lifecycle notebook shows better performance <img width="840" alt="image" src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5" /> [//]: # (## Documentation)	2025-03-23 16:57:48 -07:00
ehhuang	06788643b3	feat(telemetry): clean up spans (#1760 )	2025-03-21 20:05:11 -07:00
Ashwin Bharambe	03b5c61bfc	feat: make sure agent sessions are under access control (#1737 ) This builds on top of #1703. Agent sessions are now properly access controlled. ## Test Plan Added unit tests	2025-03-21 07:31:16 -07:00
ehhuang	37f155e41d	feat(agent): support multiple tool groups (#1556 ) Summary: closes #1488 Test Plan: added new integration test ``` LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini ``` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1556). * __->__ #1556 * #1550	2025-03-17 22:13:09 -07:00
ehhuang	c23a7af5d6	fix: agents with non-llama model (#1550 ) # Summary: Includes fixes to get test_agents working with openAI model, e.g. tool parsing and message conversion # Test Plan: ``` LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini ``` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1550). * #1556 * __->__ #1550	2025-03-17 22:11:06 -07:00
Sébastien Han	98b1b15e0f	refactor: move all datetime.now() calls to UTC (#1589 ) # What does this PR do? Updated all instances of datetime.now() to use timezone.utc for consistency in handling time across different systems. This ensures that timestamps are always in Coordinated Universal Time (UTC), avoiding issues with time zone discrepancies and promoting uniformity in time-related data. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-13 15:34:53 -07:00
ehhuang	a505bf45a3	feat(api): remove tool_name from ToolResponseMessage (#1599 ) Summary: This is not used anywhere. closes #1421 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct --record-responses	2025-03-12 19:41:48 -07:00
ehhuang	ed6caead72	chore: simplify _get_tool_defs (#1384 ) Summary: Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-12 18:51:18 -07:00
ehhuang	41c9bca1aa	chore: refactor Agent toolgroup processing (#1381 ) Summary: Refactoring only. Centralize logic to preprocess toolgroup to one place. Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/api/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1381). * #1384 * __->__ #1381	2025-03-12 18:48:03 -07:00
ehhuang	b7a9c45477	chore: deprecate ToolResponseMessage in agent.resume API (#1566 ) # Summary: closes #1431 # Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-12 12:10:21 -07:00
Xi Yan	43044f29e2	fix: fix llama stack run with missing agent impl (#1559 ) # What does this PR do? - recent merge https://github.com/meta-llama/llama-stack/pull/1410 introduce error ``` ValueError: Provider meta-reference (Api.agents) does not implement the following methods: [('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')] ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` llama stack run ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct ``` `1379530386` [//]: # (## Documentation)	2025-03-11 11:22:22 -07:00
Ihar Hrachyshka	c3d7d17bc4	chore: fix typing hints for get_provider_impl deps arguments (#1544 ) # What does this PR do? It's a dict that may contain different types, as per resolver:instantiate_provider implementation. (AFAIU it also never contains ProviderSpecs, but instances of provider implementations.) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan mypy passing if enabled checks for these modules. (See #1543) [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:07:28 -07:00
Dinesh Yeduguru	ead9397e22	fix: tracing fixes for trace context propogation across coroutines (#1522 ) # What does this PR do? This PR has two fixes needed for correct trace context propagation across asycnio boundary Fix 1: Start using context vars to store the global trace context. This is needed since we cannot use the same trace context across coroutines since the state is shared. each coroutine should have its own trace context so that each of it can start storing its state correctly. Fix 2: Start a new span for each new coroutines started for running shields to keep the span tree clean ## Test Plan ### Integration tests with server LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct server logs: https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe test logs: https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3 ### Integration tests with library client LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8 ### Apps test with server: ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml python -m examples.agents.e2e_loop_with_client_tools localhost 8321 ``` server logs: https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797 app logs: https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8	2025-03-11 07:12:48 -07:00
ehhuang	23e39cc3c4	fix: handle log errors (#1499 ) Summary: \| File "/Users/erichuang/projects/llama-stack/llama_stack/distribution/server/server.py", line 213, in sse_generator \| logger.exception(f"Error in sse_generator: {e}") \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1864, in exception \| self.log(ERROR, msg, args, exc_info=exc_info, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1879, in log \| self.logger.log(level, msg, args, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1547, in log \| self._log(level, msg, args, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1624, in _log \| self.handle(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1634, in handle \| self.callHandlers(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1696, in callHandlers \| hdlr.handle(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 968, in handle \| self.emit(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py", line 167, in emit \| message_renderable = self.render_message(record, message) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py", line 193, in render_message \| message_text = Text.from_markup(message) if use_markup else Text(message) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py", line 287, in from_markup \| rendered_text = render(text, style, emoji=emoji, emoji_variant=emoji_variant) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py", line 167, in render \| raise MarkupError( \| rich.errors.MarkupError: closing tag '[/INST]' at position 105 doesn't match any open tag Test Plan: reran failing rag_with_vector_db example	2025-03-07 15:58:26 -08:00
ehhuang	acbae66b9d	chore: escape tool output for logging (#1490 ) Summary: error: llama_stack/providers/inline/agents/meta_reference/agent_instance.py:1032: in execute_tool_call_maybe logger.info(f"tool call {name} completed with result: {result}") /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1841: in info self.log(INFO, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1879: in log self.logger.log(level, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1547: in log self._log(level, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1624: in _log self.handle(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1634: in handle self.callHandlers(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1696: in callHandlers hdlr.handle(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:968: in handle self.emit(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:167: in emit message_renderable = self.render_message(record, message) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:193: in render_message message_text = Text.from_markup(message) if use_markup else Text(message) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py:287: in from_markup rendered_text = render(text, style, emoji=emoji, emoji_variant=emoji_variant) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py:167: in render raise MarkupError( E rich.errors.MarkupError: closing tag '[/INST]' at position 3274 doesn't match any open tag Test Plan:	2025-03-07 13:33:45 -08:00
Sébastien Han	7cf1e24c4e	feat(logging): implement category-based logging (#1362 ) # What does this PR do? This commit introduces a new logging system that allows loggers to be assigned a category while retaining the logger name based on the file name. The log format includes both the logger name and the category, producing output like: ``` INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by tavily-search ``` Key features include: - Category-based logging: Loggers can be assigned a category (e.g., "core", "server") when programming. The logger can be loaded like this: `logger = get_logger(name=__name__, category="server")` - Environment variable control: Log levels can be configured per-category using the `LLAMA_STACK_LOGGING` environment variable. For example: `LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for the "server" and "core" categories. - `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all categories and third-party libraries. This provides fine-grained control over logging levels while maintaining a clean and informative log format. The formatter uses the rich library which provides nice colors better stack traces like so: ``` ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146> exception=UnboundLocalError("local variable 'loop' referenced before assignment")> ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ except asyncio.CancelledError: │ │ 176 │ │ │ pass │ │ 177 │ │ finally: │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: local variable 'loop' referenced before assignment ``` Co-authored-by: Ashwin Bharambe <@ashwinb> Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration: INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis: - agents ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 11:34:30 -08:00
Ashwin Bharambe	2fe976ed0a	refactor(test): introduce --stack-config and simplify options (#1404 ) You now run the integration tests with these options: ```bash Custom options: --stack-config=STACK_CONFIG a 'pointer' to the stack. this can be either be: (a) a template name like `fireworks`, or (b) a path to a run.yaml file, or (c) an adhoc config spec, e.g. `inference=fireworks,safety=llama-guard,agents=meta- reference` --env=ENV Set environment variables, e.g. --env KEY=value --text-model=TEXT_MODEL comma-separated list of text models. Fixture name: text_model_id --vision-model=VISION_MODEL comma-separated list of vision models. Fixture name: vision_model_id --embedding-model=EMBEDDING_MODEL comma-separated list of embedding models. Fixture name: embedding_model_id --safety-shield=SAFETY_SHIELD comma-separated list of safety shields. Fixture name: shield_id --judge-model=JUDGE_MODEL comma-separated list of judge models. Fixture name: judge_model_id --embedding-dimension=EMBEDDING_DIMENSION Output dimensionality of the embedding model to use for testing. Default: 384 --record-responses Record new API responses instead of using cached ones. --report=REPORT Path where the test report should be written, e.g. --report=/path/to/report.md ``` Importantly, if you don't specify any of the models (text-model, vision-model, etc.) the relevant tests will get skipped! This will make running tests somewhat more annoying since all options will need to be specified. We will make this easier by adding some easy wrapper yaml configs. ## Test Plan Example: ```bash ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $ LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ```	2025-03-05 17:02:02 -08:00
ehhuang	6cf79437b3	feat: support ClientTool output metadata (#1426 ) # Summary: Client side change in https://github.com/meta-llama/llama-stack-client-python/pull/180 Changes the resume_turn API to accept `ToolResponse` instead of `ToolResponseMessage`: 1. `ToolResponse` contains `metadata` 2. `ToolResponseMessage` is a concept for model inputs. Here we are just submitting the outputs of tool execution. # Test Plan: Ran integration tests with newly added test using client tool with metadata LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses	2025-03-05 14:30:27 -08:00
Daniele Martinoli	fb998683e0	fix: Agent uses the first configured vector_db_id when documents are provided (#1276 ) # What does this PR do? The agent API allows to query multiple DBs using the `vector_db_ids` argument of the `rag` tool: ```py toolgroups=[ { "name": "builtin::rag", "args": {"vector_db_ids": [vector_db_id]}, } ], ``` This means that multiple DBs can be used to compose an aggregated context by executing the query on each of them. When documents are passed to the next agent turn, there is no explicit way to configure the vector DB where the embeddings will be ingested. In such cases, we can assume that: - if any `vector_db_ids` is given, we use the first one (it probably makes sense to assume that it's the only one in the list, otherwise we should loop on all the given DBs to have a consistent ingestion) - if no `vector_db_ids` is given, we can use the current logic to generate a default DB using the default provider. If multiple providers are defined, the API will fail as expected: the user has to provide details on where to ingest the documents. (Closes #1270) ## Test Plan The issue description details how to replicate the problem. [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com>	2025-03-04 21:44:13 -08:00
Xi Yan	78962be996	chore: refactor create_and_execute_turn and resume_turn (#1399 ) # What does this PR do? - Closes https://github.com/meta-llama/llama-stack/issues/1212 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` <img width="1203" alt="image" src="https://github.com/user-attachments/assets/35b60017-b3f2-4e98-87f2-2868730261bd" /> ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py::test_rag_and_code_agent --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-04 16:07:30 -08:00
ehhuang	fd8c991393	fix: rag as attachment bug (#1392 ) Summary: Test Plan: added new test LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/api/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B	2025-03-04 13:08:16 -08:00
Xi Yan	158b6dc404	chore: deprecate allow_turn_resume (#1377 ) # What does this PR do? - Deprecate allow_turn_resume flag as this is used for staying backward compat. - Closes https://github.com/meta-llama/llama-stack/issues/1363 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses ``` <img width="1054" alt="image" src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a" /> [//]: # (## Documentation)	2025-03-04 12:22:11 -08:00
ehhuang	07a992ef90	feat: deterministic tools ordering (#1380 ) Summary: 1. The `tools` parameter we construct to pass the inference API is non-deterministic. As a result, our recordable mocks is flaky as the ordering change sometimes. This PR makes it so that `tools` ordering is deterministic and aligned with the order user specified. 2. In recordable mock key generation, client tool's parameter type was 'str' and now is 'string' for some reason. I didn't dig into exactly why, but just regenerated the fixtures. Test Plan: Regenerate mocks: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses ``` Rerun tests without --record-responses: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-03-03 20:38:07 -08:00
Ashwin Bharambe	0a76ece249	feat: add more logs to agent_instance.py	2025-03-03 16:15:47 -08:00
Xi Yan	7d111c7510	feat: unify max_infer_iters in client/server agent loop (#1309 ) # What does this PR do? We currently use `max_infer_iters` in 2 different ways 1/ Server: track number of times 2/ Client side: track number of times we send `resume_turn` request This PR gets rid of the need of (2) and makes server track total number of times we perform inference within a Turn NOTE The PR will assume StopReason is set to - end_of_message: turn is not finished, we could be waiting for client tool call responses - end_of_turn: if the entire turn is finished and there's no more things to be done. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-03 10:08:36 -08:00
ehhuang	21ec67356c	fix: RAG with documents (#1337 ) Summary: This was broken by https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190 Test Plan: added e2e test	2025-02-28 16:51:00 -08:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
Hardik Shah	8efa53daf1	fix: Agent telemetry inputs/outputs should be structured (#1302 ) Original telemetry outputs for agent turns look like this. Note: how output was a `str(message)` making it difficult to read them back for downstream tasks ( eg. building eval datasets ) ``` { │ │ 'input': [ │ │ │ '{"role":"system","content":"You are a helpful assistant. Use search tool to answer the questions. "}', │ │ │ '{"role":"user","content":"Which teams played in the NBA western conference finals of 2024","context":null}' │ │ ], │ │ 'output': "content: tool_calls: [ToolCall(call_id='8b7294ec-a83f-4798-ad8f-6bed662f08b6', tool_name=<BuiltinTool.brave_search: 'brave_search'>, arguments={'query': 'NBA Western Conference Finals 2024 teams'})]" │ }, ``` Updated the outputs to be structured . ## Test ```python import uuid from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig model_id = "meta-llama/Llama-3.1-8B-Instruct" agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant who will use the web search tools to help with answering questions.\nOnly provide final answer in short without writing full sentences. Use web search", toolgroups=["builtin::websearch"], enable_session_persistence=True, ) agent = Agent(client, agent_config) session_id = agent.create_session(uuid.uuid4().hex) response = agent.create_turn( messages=[ { "role": "user", "content": "latest news about llama stack", } ], session_id=session_id, stream=False, ) pprint(response) ``` Output: ``` Turn( │ input_messages=[UserMessage(content='latest news about llama stack', role='user', context=None)], │ output_message=CompletionMessage( │ │ content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.", │ │ role='assistant', │ │ stop_reason='end_of_turn', │ │ tool_calls=[] │ ), │ session_id='77379546-4598-485a-b4f4-84e5da28c513', │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915243, tzinfo=TzInfo(-08:00)), │ steps=[ │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content='', │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[ │ │ │ │ │ ToolCall( │ │ │ │ │ │ arguments={'query': 'latest news llama stack'}, │ │ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b', │ │ │ │ │ │ tool_name='brave_search' │ │ │ │ │ ) │ │ │ │ ] │ │ │ ), │ │ │ step_id='81c16bd3-eb00-4721-8edc-f386e07391a3', │ │ │ step_type='inference', │ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 637149, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915831, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ToolExecutionStep( │ │ │ step_id='4782d609-a62e-45f5-8d2a-25a43db46288', │ │ │ step_type='tool_execution', │ │ │ tool_calls=[ │ │ │ │ ToolCall( │ │ │ │ │ arguments={'query': 'latest news llama stack'}, │ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b', │ │ │ │ │ tool_name='brave_search' │ │ │ │ ) │ │ │ ], │ │ │ tool_responses=[ │ │ │ │ ToolResponse( │ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b', │ │ │ │ │ content='{"query": "latest news llama stack", "top_k": [{"title": "Llama 3.2: Revol. ....... Hacker News.", "score": 0.6186197, "raw_content": null}]}', │ │ │ │ │ tool_name='brave_search', │ │ │ │ │ metadata=None │ │ │ │ ) │ │ │ ], │ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 272176, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 640743, tzinfo=TzInfo(-08:00)) │ │ ), │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.", │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[] │ │ │ ), │ │ │ step_id='37994419-5da3-4e84-a010-8d9b85366262', │ │ │ step_type='inference', │ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 961275, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 273168, tzinfo=TzInfo(-08:00)) │ │ ) │ ], │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45', │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 962318, tzinfo=TzInfo(-08:00)), │ output_attachments=[] ) ``` ## Check for Telemetry ```python agent_logs = [] for span in client.telemetry.query_spans( attribute_filters=[ {"key": "session_id", "op": "eq", "value": session_id}, ], attributes_to_return=['input', 'output'], ): agent_logs.append(span.attributes) pprint(json.loads(agent_logs[-1]['output'])) ``` ``` { │ 'content': "The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.", │ 'tool_calls': [] } ```	2025-02-27 23:06:37 -08:00
ehhuang	a34f3aafcf	fix: don't include tool args not in the function definition (#1307 ) # Summary: Right now we would include toolgroup args when we encode messages with tool_calls, which is confusing the model since they not in the function description (see test plan for example). # Test Plan: Add a print statement before raw prompt is sent to providers (no good way to test this currently) Before: ``` cated in the same neighborhood?<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood", vector_db_ids=["829a68735d744dc3830409dcc782964a"])]<\|eot_id\|><\|start_header_id\|>ipython<\|end_header_id\|>\n\nknowledge_search tool found 5 chunks:\nBEGIN of ``` Note the extra `vector_db_ids` After ``` >user<\|end_header_id\|>\n\nAre the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood")]<\|eot_id\|><\|start_header_id\|>ipython<\|end_header_id\|>\n\nknowledge_search tool found ```	2025-02-27 16:25:30 -08:00
Xi Yan	663c6b0537	fix: duplicate ToolResponseMessage in Turn message history (#1305 ) # What does this PR do? - Reproduce with: https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py - Root cause: when we have ToolResponseMessage as part of Turn, we will create duplicate ToolResponseMessage in the conversation history when getting messages from a Turn. - Fix: avoid adding duplicate ToolResponseMessage from a turn's input_messages. - If it is part of a Turn's steps, only add it when processing the steps. - If it is not part of a Turn's steps, add it. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py --inference-model meta-llama/Llama-3.1-8B-Instruct ``` ``` python -m examples.agents.e2e_loop_with_client_tools localhost 8321 ``` ```python Turn( │ input_messages=[ │ │ UserMessage( │ │ │ content='What was the closing price of Google stock (ticker symbol GOOG) for 2023 ?', │ │ │ role='user', │ │ │ context=None │ │ ), │ │ ToolResponseMessage( │ │ │ call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94', │ │ │ content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"', │ │ │ role='tool', │ │ │ tool_name='get_ticker_data' │ │ ) │ ], │ output_message=CompletionMessage( │ │ content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.', │ │ role='assistant', │ │ stop_reason='end_of_turn', │ │ tool_calls=[] │ ), │ session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871', │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 412928, tzinfo=TzInfo(-08:00)), │ steps=[ │ │ ShieldCallStep( │ │ │ step_id='e0514587-b7d6-4bba-8609-8e05a3a46d8a', │ │ │ step_type='shield_call', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 858382, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 425204, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content='', │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[ │ │ │ │ │ ToolCall( │ │ │ │ │ │ arguments={ │ │ │ │ │ │ │ 'ticker_symbol': 'GOOG', │ │ │ │ │ │ │ 'start': '2023-01-01', │ │ │ │ │ │ │ 'end': '2023-12-31' │ │ │ │ │ │ }, │ │ │ │ │ │ call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94', │ │ │ │ │ │ tool_name='get_ticker_data' │ │ │ │ │ ) │ │ │ │ ] │ │ │ ), │ │ │ step_id='a3ceec6a-f149-49d5-a1c2-db461e3f6e9f', │ │ │ step_type='inference', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 910179, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 871130, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='f9339865-96ca-4425-af42-a87bab343e24', │ │ │ step_type='shield_call', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 383013, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 944012, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ ToolExecutionStep( │ │ │ step_id='e317b74a-c4f3-4845-99a3-7d93aa6ea6c8', │ │ │ step_type='tool_execution', │ │ │ tool_calls=[ │ │ │ │ ToolCall( │ │ │ │ │ arguments={'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'}, │ │ │ │ │ call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94', │ │ │ │ │ tool_name='get_ticker_data' │ │ │ │ ) │ │ │ ], │ │ │ tool_responses=[ │ │ │ │ ToolResponse( │ │ │ │ │ call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94', │ │ │ │ │ content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"', │ │ │ │ │ tool_name='get_ticker_data', │ │ │ │ │ metadata=None │ │ │ │ ) │ │ │ ], │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 718810, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 943792, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='c4236616-db89-4c04-ad04-f51cfb726385', │ │ │ step_type='shield_call', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 958946, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 732680, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.', │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[] │ │ │ ), │ │ │ step_id='3386f896-2026-41e4-a60f-f6f3c3981cf6', │ │ │ step_type='inference', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 74750, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 970724, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='bc57ac8c-f94e-4758-bf1a-0dd734eca1cf', │ │ │ step_type='shield_call', │ │ │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 443016, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 86726, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ) │ ], │ turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc', │ completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 459456, tzinfo=TzInfo(-08:00)), │ output_attachments=[] ) ``` ```python Turn( │ input_messages=[ │ │ UserMessage(content='What is 40+30?', role='user', context=None), │ │ ToolResponseMessage( │ │ │ call_id='8e54aca9-244d-44ca-ada0-0365090e8622', │ │ │ content='{"success": true, "result": 70.0}', │ │ │ role='tool', │ │ │ tool_name='calculator' │ │ ) │ ], │ output_message=CompletionMessage( │ │ content='The result of the calculation is 70.', │ │ role='assistant', │ │ stop_reason='end_of_turn', │ │ tool_calls=[] │ ), │ session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871', │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 156903, tzinfo=TzInfo(-08:00)), │ steps=[ │ │ ShieldCallStep( │ │ │ step_id='17b6b645-31cc-4be9-a758-a4f3b741ced9', │ │ │ step_type='shield_call', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 780564, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 174515, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content='', │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[ │ │ │ │ │ ToolCall( │ │ │ │ │ │ arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'}, │ │ │ │ │ │ call_id='8e54aca9-244d-44ca-ada0-0365090e8622', │ │ │ │ │ │ tool_name='calculator' │ │ │ │ │ ) │ │ │ │ ] │ │ │ ), │ │ │ step_id='f59e951a-2b75-497d-a075-ec9aad9aad12', │ │ │ step_type='inference', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 141869, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 792047, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='efafa0cf-23b9-4a90-8350-3a186d80925d', │ │ │ step_type='shield_call', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 766293, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177473, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ ToolExecutionStep( │ │ │ step_id='877cfbe7-57a8-4056-9c29-49aa38dd337c', │ │ │ step_type='tool_execution', │ │ │ tool_calls=[ │ │ │ │ ToolCall( │ │ │ │ │ arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'}, │ │ │ │ │ call_id='8e54aca9-244d-44ca-ada0-0365090e8622', │ │ │ │ │ tool_name='calculator' │ │ │ │ ) │ │ │ ], │ │ │ tool_responses=[ │ │ │ │ ToolResponse( │ │ │ │ │ call_id='8e54aca9-244d-44ca-ada0-0365090e8622', │ │ │ │ │ content='{"success": true, "result": 70.0}', │ │ │ │ │ tool_name='calculator', │ │ │ │ │ metadata=None │ │ │ │ ) │ │ │ ], │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 930899, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177202, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='d47c6160-45d9-47c1-8e39-2faae65ee468', │ │ │ step_type='shield_call', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 510402, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 949433, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ), │ │ InferenceStep( │ │ │ api_model_response=CompletionMessage( │ │ │ │ content='The result of the calculation is 70.', │ │ │ │ role='assistant', │ │ │ │ stop_reason='end_of_turn', │ │ │ │ tool_calls=[] │ │ │ ), │ │ │ step_id='660ba1cc-770e-471c-bf6e-11e103d74443', │ │ │ step_type='inference', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 814944, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 521309, tzinfo=TzInfo(-08:00)) │ │ ), │ │ ShieldCallStep( │ │ │ step_id='4dab8bb0-7d38-4465-ae1a-10069de2b3d1', │ │ │ step_type='shield_call', │ │ │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ │ │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 428561, tzinfo=TzInfo(-08:00)), │ │ │ started_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 825970, tzinfo=TzInfo(-08:00)), │ │ │ violation=None │ │ ) │ ], │ turn_id='4daff286-f703-417e-a5dc-0e158582bbec', │ completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 462823, tzinfo=TzInfo(-08:00)), │ output_attachments=[] ) ``` [//]: # (## Documentation)	2025-02-27 15:06:47 -08:00
Xi Yan	564f0e5f93	fix: Revert "chore: remove vector_db_id from AgentSessionInfo" (#1299 ) Reverts meta-llama/llama-stack#1296 This change breaks test: `session_info.vector_db_id` is actually used ``` pytest -v tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent --inference-model meta-llama/Llama-3.1-8B-Instruct ```	2025-02-27 10:37:15 -08:00
Xi Yan	200ef29233	chore: remove vector_db_id from AgentSessionInfo (#1296 ) # What does this PR do? - It is not being used anywhere and doesn't make sense to have 1 single vector_db_id in an agent session. No top level API change. - See https://github.com/meta-llama/llama-stack/pull/1286#discussion_r1972569881 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - See https://github.com/meta-llama/llama-stack/pull/1286#discussion_r1972569881 [//]: # (## Documentation)	2025-02-27 10:13:10 -08:00
Xi Yan	fc5aff3ccf	feat: ability to retrieve agents session, turn, step by ids (#1286 ) # What does this PR do? - Fix up rotten implementation for retrieving agent's Session, Turn, Step with actual working implementation. - Update `getting_started` notebook with retrieving by agent session_id. https://github.com/meta-llama/llama-stack/blob/export_agent_dataset/docs/getting_started.ipynb [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Test with script: https://gist.github.com/yanxi0830/657cecee8f1f0e39d322963d9c0f598e <img width="503" alt="image" src="https://github.com/user-attachments/assets/5ea9bc33-83d1-40bc-98e1-b68393158387" /> [//]: # (## Documentation)	2025-02-27 09:45:14 -08:00
ehhuang	0762c61402	feat: don't silently ignore incorrect toolgroup (#1285 )	2025-02-27 08:11:09 -05:00
ehhuang	c8a20b8ed0	feat: allow specifying specific tool within toolgroup (#1239 ) Summary: E.g. `builtin::rag::knowledge_search` Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-02-26 14:07:05 -08:00
ehhuang	fca84db5b0	fix: time logging format (#1281 ) Summary: missed in last PR Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py::test_create_turn_response --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-02-26 13:51:33 -08:00
ehhuang	bb2690f176	feat: remove special handling of builtin::rag tool (#1015 ) Summary: Lets the model decide which tool it needs to call to respond to a query. Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B ``` Also evaluated on a small benchmark with 20 questions from HotpotQA. With this PR and some prompting, the performance is 77% recall compared to 50% currently. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1015). * #1268 * #1239 * __->__ #1015	2025-02-26 13:04:52 -08:00
Ben Browning	c64f0d5888	fix: Get builtin tool calling working in remote-vllm (#1236 ) # What does this PR do? This PR makes a couple of changes required to get the test `tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search` passing on the remote-vllm provider. First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well. Next, we adjust the vllm streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls. ## Test Plan With these changes, the following test that previously failed now passes: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` ``` VLLM_URL="http://localhost:8000/v1" \ python -m pytest -s -v \ llama_stack/providers/tests/inference/test_text_inference.py \ --providers "inference=vllm_remote" ``` [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-26 15:25:47 -05:00
Yuan Tang	1a044ef894	fix: Raise exception when tool call result is None (#1253 ) # What does this PR do? When there are issues with the tool call function, an exception is raised but the error message is not informative. This adds a clearer message to tell users to check their functions. ``` Traceback (most recent call last): File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator async for item in event_gen: File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn async for chunk in self.run( File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run async for res in self._run( File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 811, in _run content=tool_result.content, AttributeError: 'NoneType' object has no attribute 'content' ``` ## Test Plan Ran the same script and exception is raised with clearer error message. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-25 13:10:50 -05:00
ehhuang	dc3c881ffe	fix: include timezone in Agent steps' timestamps (#1247 ) Summary: kotlin SDK expects this format Test Plan: python prints the expected format >>> str(datetime.now().astimezone()) '2025-02-24 22:02:58.729763-08:00'	2025-02-25 09:49:25 -08:00
Ashwin Bharambe	45ffe87d7c	Kill noise from test output	2025-02-21 15:37:23 -08:00
ehhuang	25fddccfd8	feat: tool outputs metadata (#1155 ) Summary: Allows tools to output metadata. This is useful for evaluating tool outputs, e.g. RAG tool will output document IDs, which can be used to score recall. Will need to make a similar change on the client side to support ClientTool outputting metadata. Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py	2025-02-21 13:15:31 -08:00

1 2

96 commits