llama-stack-mirror/tests/client-sdk
Yuan Tang 413099ef6a
test: Make text-based chat completion tests run 10x faster (#1016)
# What does this PR do?

This significantly shortens the test time (about 10x faster) since most
of the time is spent on outputing the tokens "there are several planets
in our solar system that have...". We want to have an answer quicker,
especially when testing even larger models.

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py -k "test_text_chat_completion_non_streaming or test_text_chat_completion_streaming"
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.7.0
collected 12 items / 8 deselected / 4 selected                                                                                                           

tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [100%]


```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-08 11:49:46 -08:00
..
agents test: remove flaky agent test (#1006) 2025-02-07 09:35:38 -08:00
inference test: Make text-based chat completion tests run 10x faster (#1016) 2025-02-08 11:49:46 -08:00
safety Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00
tool_runtime Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00
vector_io Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00
__init__.py [tests] add client-sdk pytests & delete client.py (#638) 2024-12-16 12:04:56 -08:00
conftest.py Update client-sdk test config option handling 2025-01-31 15:30:07 -08:00
metadata.py Report generation minor fixes (#884) 2025-01-28 04:58:12 -08:00
README.md test: Split inference tests to text and vision (#1008) 2025-02-07 09:35:49 -08:00
report.py Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00

Llama Stack Integration Tests

You can run llama stack integration tests on either a Llama Stack Library or a Llama Stack endpoint.

To test on a Llama Stack library with certain configuration, run

LLAMA_STACK_CONFIG=./llama_stack/templates/cerebras/run.yaml
pytest -s -v tests/client-sdk/inference/

or just the template name

LLAMA_STACK_CONFIG=together
pytest -s -v tests/client-sdk/inference/

To test on a Llama Stack endpoint, run

LLAMA_STACK_BASE_URL=http//localhost:8089
pytest -s -v tests/client-sdk/inference

Report Generation

To generate a report, run with --report option

LLAMA_STACK_CONFIG=together pytest -s -v report.md tests/client-sdk/ --report

Common options

Depending on the API, there are custom options enabled

  • For tests in inference/ and agents/, we support --inference-model(to be used in text inference tests) and--vision-inference-model` (only used in image inference tests) overrides
  • For tests in vector_io/, we support --embedding-model override
  • For tests in safety/, we support --safety-shield override
  • The param can be --report or --report <path> If path is not provided, we do a best effort to infer based on the config / template name. For url endpoints, path is required.