llama-stack-mirror/llama_stack/providers/inline/agents/meta_reference
Xi Yan 7d111c7510
feat: unify max_infer_iters in client/server agent loop (#1309)
# What does this PR do?

We currently use `max_infer_iters` in 2 different ways
1/ Server: track number of times 
2/ Client side: track number of times we send `resume_turn` request

This PR gets rid of the need of (2) and makes server track total number
of times we perform inference within a Turn

**NOTE**
The PR will assume StopReason is set to
- end_of_message: turn is not finished, we could be waiting for client
tool call responses
- end_of_turn: if the entire turn is finished and there's no more things
to be done.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)
2025-03-03 10:08:36 -08:00
..
tests chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00
__init__.py Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00
agent_instance.py feat: unify max_infer_iters in client/server agent loop (#1309) 2025-03-03 10:08:36 -08:00
agents.py feat: ability to retrieve agents session, turn, step by ids (#1286) 2025-02-27 09:45:14 -08:00
config.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
persistence.py feat: unify max_infer_iters in client/server agent loop (#1309) 2025-03-03 10:08:36 -08:00
safety.py build: configure ruff from pyproject.toml (#1100) 2025-02-14 09:01:57 -08:00