llama-stack-mirror/llama_stack/providers
Matthew Farrellee 435f34b05e
reduce the accuracy requirements to pass the chat completion structured output test (#522)
i find `test_structured_output` to be flakey. it's both a functionality
and accuracy test -

```
        answer = AnswerFormat.model_validate_json(response.completion_message.content)
        assert answer.first_name == "Michael"
        assert answer.last_name == "Jordan"
        assert answer.year_of_birth == 1963
        assert answer.num_seasons_in_nba == 15
```

it's an accuracy test because it checks the value of first/last name,
birth year, and num seasons.

i find that -
- llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality
portion
- llama-3.2-3b-instruct consistently fails the accuracy portion
(thinking MJ was in the NBA for 14 seasons)
 - llama-3.1-8b-instruct occasionally fails the accuracy portion

suggestions (not mutually exclusive) -
1. turn the test into functionality only, skip the value checks
2. split the test into a functionality version and an xfail accuracy
version
3. add context to the prompt so the llm can answer without accessing
embedded memory

# What does this PR do?

implements option (3) by adding context to the system prompt.


## Test Plan


`pytest -s -v ... llama_stack/providers/tests/inference/ ... -k
structured_output`


## Before submitting

- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-12-03 02:55:14 -08:00
..
inline fixes tests & move braintrust api_keys to request headers (#535) 2024-11-26 13:11:21 -08:00
registry fixes tests & move braintrust api_keys to request headers (#535) 2024-11-26 13:11:21 -08:00
remote allow env NVIDIA_BASE_URL to set NVIDIAConfig.url (#531) 2024-11-26 17:46:44 -08:00
tests reduce the accuracy requirements to pass the chat completion structured output test (#522) 2024-12-03 02:55:14 -08:00
utils add missing __init__ 2024-11-25 09:42:46 -08:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py unregister for memory banks and remove update API (#458) 2024-11-14 17:12:11 -08:00