test: Make text-based chat completion tests run 10x faster (#1016)

# What does this PR do?

This significantly shortens the test time (about 10x faster) since most
of the time is spent on outputing the tokens "there are several planets
in our solar system that have...". We want to have an answer quicker,
especially when testing even larger models.

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py -k "test_text_chat_completion_non_streaming or test_text_chat_completion_streaming"
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.7.0
collected 12 items / 8 deselected / 4 selected                                                                                                           

tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [100%]


```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
Yuan Tang 2025-02-08 14:49:46 -05:00 committed by GitHub
parent 7766e68e92
commit 413099ef6a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -156,8 +156,8 @@ def test_text_completion_structured_output(llama_stack_client, text_model_id, in
@pytest.mark.parametrize( @pytest.mark.parametrize(
"question,expected", "question,expected",
[ [
("What are the names of planets in our solar system?", "Earth"), ("Which planet do humans live on?", "Earth"),
("What are the names of the planets that have rings around them?", "Saturn"), ("Which planet has rings around it with a name starting with letter S?", "Saturn"),
], ],
) )
def test_text_chat_completion_non_streaming(llama_stack_client, text_model_id, question, expected): def test_text_chat_completion_non_streaming(llama_stack_client, text_model_id, question, expected):