- Convert test classes to pytest functions and fixtures
- Replace unittest assertions with pytest assertions
- Use pytest.mark.parametrize for parameterized tests
- Remove unittest.TestCase inheritance and setUp methods
- Maintain all test functionality and coverage
Addresses review feedback from PR #2663
This commit fixes the tool calling failures with Llama4 models that were
returning 500 errors while Together API worked correctly. The root cause
was that the system was using Llama3's JSON format for all models instead
of Llama4's python_list format.
Key changes:
- NEW: llama_stack/models/llama/llama4/interface.py - Complete Llama4 interface
with python_list tool format support
- MODIFIED: prompt_adapter.py - Added model-aware decode_assistant_message()
that uses Llama4ChatFormat for llama4 models and Llama3ChatFormat for others
- MODIFIED: openai_compat.py - Updated to pass model_id parameter to enable
model-specific format detection
- MODIFIED: sku_list.py - Enhanced with provider alias support for better
model resolution
- NEW: tests/unit/models/test_decode_assistant_message.py - Comprehensive unit
tests for the new decode_assistant_message function
The fix ensures that:
- Llama4 models (meta-llama/Llama-4-*) use python_list format: [func(args)]
- Other models continue using JSON format: {"type": "function", ...}
- Backward compatibility is maintained for existing models
- Tool calling works correctly across different model families
- Graceful fallback when Llama4 dependencies are unavailable
Testing:
- All 17 unit tests pass (9 original + 8 new)
- Conditional imports prevent torch dependency issues
- Comprehensive test coverage for different model types and scenarios
Fixes#2584
# What does this PR do?
closes#1584
This should be a rather innocuous change.
## Test Plan
Verify that there's no more tool call parsing error for example in issue
<img width="1216" alt="image"
src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429"
/>
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
### What does this PR do?
Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`.
However, on the client SDK side -- the `RecursiveType` gets deserialized
into a number ( both int and float get collapsed ) and hence when params
are `int` they get converted to float which might break client side
tools that might be doing type checking.
Closes: https://github.com/meta-llama/llama-stack/issues/1683
### Test Plan
Stainless changes --
https://github.com/meta-llama/llama-stack-client-python/pull/204
```
pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.1-8B-Instruct
```
# What does this PR do?
The test class by default enables debug mode, which produces some
unexpected warnings like:
```
tests/unit/models/test_prompt_adapter.py::PrepareMessagesTests::test_completion_message_encoding
WARNING 2025-03-10 20:41:48,577 asyncio:1904 uncategorized: Executing <Task pending name='Task-1'
coro=<IsolatedAsyncioTestCase._asyncioLoopRunner() running at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:95
> wait_for=<Future pending cb=[Task.task_wakeup()] created at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py:42
9> created at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:11
7> took 0.231 seconds
PASSED
```
I suggest we disable these since they are not very useful and can
confuse other developers.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run tests. The warnings are no longer seen.
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>