llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Xi Yan 7d111c7510 feat: unify max_infer_iters in client/server agent loop (#1309 ) # What does this PR do? We currently use `max_infer_iters` in 2 different ways 1/ Server: track number of times 2/ Client side: track number of times we send `resume_turn` request This PR gets rid of the need of (2) and makes server track total number of times we perform inference within a Turn NOTE The PR will assume StopReason is set to - end_of_message: turn is not finished, we could be waiting for client tool call responses - end_of_turn: if the entire turn is finished and there's no more things to be done. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)		2025-03-03 10:08:36 -08:00
..
tests	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00
__init__.py	Fix precommit check after moving to ruff (#927 )	2025-02-02 06:46:45 -08:00
agent_instance.py	feat: unify max_infer_iters in client/server agent loop (#1309 )	2025-03-03 10:08:36 -08:00
agents.py	feat: ability to retrieve agents session, turn, step by ids (#1286 )	2025-02-27 09:45:14 -08:00
config.py	Auto-generate distro yamls + docs (#468 )	2024-11-18 14:57:06 -08:00
persistence.py	feat: unify max_infer_iters in client/server agent loop (#1309 )	2025-03-03 10:08:36 -08:00
safety.py	build: configure ruff from pyproject.toml (#1100 )	2025-02-14 09:01:57 -08:00