llama-stack/tests/integration/inference
LESSuseLESS 2370e826bc
test: adding an e2e test for measuring TTFT (#1568)
# What does this PR do?

TTFT number largely depends on input length. Ideally we have a
"standard" test that we can use to measure against any llama stack
serving.

TODO: Once JSON is replaced with YAML, I will add "notes" for each test
to explain purpose of each test in place.

## Test plan

Please refer to e2e test doc for setup.
```
LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling
```
2025-03-11 14:41:55 -07:00
..
__init__.py fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
dog.png refactor: tests/unittests -> tests/unit; tests/api -> tests/integration 2025-03-04 09:57:00 -08:00
test_embedding.py refactor: tests/unittests -> tests/unit; tests/api -> tests/integration 2025-03-04 09:57:00 -08:00
test_text_inference.py test: adding an e2e test for measuring TTFT (#1568) 2025-03-11 14:41:55 -07:00
test_vision_inference.py test: image downloading is flaky (#1491) 2025-03-07 13:39:26 -08:00