llama-stack-mirror/llama_stack
Matthew Farrellee 435f34b05e
reduce the accuracy requirements to pass the chat completion structured output test (#522)
i find `test_structured_output` to be flakey. it's both a functionality
and accuracy test -

```
        answer = AnswerFormat.model_validate_json(response.completion_message.content)
        assert answer.first_name == "Michael"
        assert answer.last_name == "Jordan"
        assert answer.year_of_birth == 1963
        assert answer.num_seasons_in_nba == 15
```

it's an accuracy test because it checks the value of first/last name,
birth year, and num seasons.

i find that -
- llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality
portion
- llama-3.2-3b-instruct consistently fails the accuracy portion
(thinking MJ was in the NBA for 14 seasons)
 - llama-3.1-8b-instruct occasionally fails the accuracy portion

suggestions (not mutually exclusive) -
1. turn the test into functionality only, skip the value checks
2. split the test into a functionality version and an xfail accuracy
version
3. add context to the prompt so the llm can answer without accessing
embedded memory

# What does this PR do?

implements option (3) by adding context to the system prompt.


## Test Plan


`pytest -s -v ... llama_stack/providers/tests/inference/ ... -k
structured_output`


## Before submitting

- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-12-03 02:55:14 -08:00
..
apis Fix opentelemetry adapter (#510) 2024-11-22 18:18:11 -08:00
cli No need to use os.path.relpath() when Path() knows everything anyway 2024-11-23 11:45:47 -08:00
distribution move playground ui to llama-stack repo (#536) 2024-11-26 22:04:21 -08:00
providers reduce the accuracy requirements to pass the chat completion structured output test (#522) 2024-12-03 02:55:14 -08:00
scripts Integrate distro docs into the restructured docs 2024-11-20 23:20:05 -08:00
templates Fix URLs to Llama Stack Read the Docs Webpages (#547) 2024-11-29 10:11:50 -06:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00