llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

History

Xi Yan 7d111c7510 feat: unify max_infer_iters in client/server agent loop (#1309 ) # What does this PR do? We currently use `max_infer_iters` in 2 different ways 1/ Server: track number of times 2/ Client side: track number of times we send `resume_turn` request This PR gets rid of the need of (2) and makes server track total number of times we perform inference within a Turn NOTE The PR will assume StopReason is set to - end_of_message: turn is not finished, we could be waiting for client tool call responses - end_of_turn: if the entire turn is finished and there's no more things to be done. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-03 10:08:36 -08:00
..
meta_reference	feat: unify max_infer_iters in client/server agent loop (#1309 )	2025-03-03 10:08:36 -08:00
__init__.py	add missing inits	2024-11-08 17:54:24 -08:00

feat: unify max_infer_iters in client/server agent loop (#1309 )

# What does this PR do?

We currently use `max_infer_iters` in 2 different ways
1/ Server: track number of times 
2/ Client side: track number of times we send `resume_turn` request

This PR gets rid of the need of (2) and makes server track total number
of times we perform inference within a Turn

**NOTE**
The PR will assume StopReason is set to
- end_of_message: turn is not finished, we could be waiting for client
tool call responses
- end_of_turn: if the entire turn is finished and there's no more things
to be done.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)

2025-03-03 10:08:36 -08:00

meta_reference

feat: unify max_infer_iters in client/server agent loop (#1309 )

2025-03-03 10:08:36 -08:00

__init__.py

add missing inits

2024-11-08 17:54:24 -08:00