llama-stack-mirror/llama_stack/providers/tests/inference
Matthew Farrellee 435f34b05e
reduce the accuracy requirements to pass the chat completion structured output test (#522)
i find `test_structured_output` to be flakey. it's both a functionality
and accuracy test -

```
        answer = AnswerFormat.model_validate_json(response.completion_message.content)
        assert answer.first_name == "Michael"
        assert answer.last_name == "Jordan"
        assert answer.year_of_birth == 1963
        assert answer.num_seasons_in_nba == 15
```

it's an accuracy test because it checks the value of first/last name,
birth year, and num seasons.

i find that -
- llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality
portion
- llama-3.2-3b-instruct consistently fails the accuracy portion
(thinking MJ was in the NBA for 14 seasons)
 - llama-3.1-8b-instruct occasionally fails the accuracy portion

suggestions (not mutually exclusive) -
1. turn the test into functionality only, skip the value checks
2. split the test into a functionality version and an xfail accuracy
version
3. add context to the prompt so the llm can answer without accessing
embedded memory

# What does this PR do?

implements option (3) by adding context to the system prompt.


## Test Plan


`pytest -s -v ... llama_stack/providers/tests/inference/ ... -k
structured_output`


## Before submitting

- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-12-03 02:55:14 -08:00
..
__init__.py Remove "routing_table" and "routing_key" concepts for the user (#201) 2024-10-10 10:24:13 -07:00
conftest.py add NVIDIA NIM inference adapter (#355) 2024-11-23 15:59:00 -08:00
fixtures.py Tgi fixture (#519) 2024-11-25 13:17:02 -08:00
pasta.jpeg Enable vision models for (Together, Fireworks, Meta-Reference, Ollama) (#376) 2024-11-05 16:22:33 -08:00
test_model_registration.py Since we are pushing for HF repos, we should accept them in inference configs (#497) 2024-11-20 16:14:37 -08:00
test_prompt_adapter.py Added tests for persistence (#274) 2024-10-22 19:41:46 -07:00
test_text_inference.py reduce the accuracy requirements to pass the chat completion structured output test (#522) 2024-12-03 02:55:14 -08:00
test_vision_inference.py Don't skip meta-reference for the tests 2024-11-21 13:29:53 -08:00
utils.py Enable vision models for (Together, Fireworks, Meta-Reference, Ollama) (#376) 2024-11-05 16:22:33 -08:00