forked from phoenix-oss/llama-stack-mirror
# What does this PR do? TTFT number largely depends on input length. Ideally we have a "standard" test that we can use to measure against any llama stack serving. TODO: Once JSON is replaced with YAML, I will add "notes" for each test to explain purpose of each test in place. ## Test plan Please refer to e2e test doc for setup. ``` LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \ --text-model="meta-llama/Llama-3.2-3B-Instruct" \ tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling ``` |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| dog.png | ||
| test_embedding.py | ||
| test_text_inference.py | ||
| test_vision_inference.py | ||