llama-stack/tests/integration/test_cases
LESSuseLESS 2370e826bc
test: adding an e2e test for measuring TTFT (#1568)
# What does this PR do?

TTFT number largely depends on input length. Ideally we have a
"standard" test that we can use to measure against any llama stack
serving.

TODO: Once JSON is replaced with YAML, I will add "notes" for each test
to explain purpose of each test in place.

## Test plan

Please refer to e2e test doc for setup.
```
LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling
```
2025-03-11 14:41:55 -07:00
..
inference test: adding an e2e test for measuring TTFT (#1568) 2025-03-11 14:41:55 -07:00
__init__.py refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393) 2025-03-04 10:41:57 -08:00
test_case.py refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393) 2025-03-04 10:41:57 -08:00