llama-stack/tests
LESSuseLESS 2370e826bc
test: adding an e2e test for measuring TTFT (#1568)
# What does this PR do?

TTFT number largely depends on input length. Ideally we have a
"standard" test that we can use to measure against any llama stack
serving.

TODO: Once JSON is replaced with YAML, I will add "notes" for each test
to explain purpose of each test in place.

## Test plan

Please refer to e2e test doc for setup.
```
LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling
```
2025-03-11 14:41:55 -07:00
..
integration test: adding an e2e test for measuring TTFT (#1568) 2025-03-11 14:41:55 -07:00
unit fix: Disable async loop warning messages during test run (#1526) 2025-03-10 15:29:08 -07:00
__init__.py refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00