llama-stack

History

LESSuseLESS 2370e826bc test: adding an e2e test for measuring TTFT (#1568 ) # What does this PR do? TTFT number largely depends on input length. Ideally we have a "standard" test that we can use to measure against any llama stack serving. TODO: Once JSON is replaced with YAML, I will add "notes" for each test to explain purpose of each test in place. ## Test plan Please refer to e2e test doc for setup. ``` LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \ --text-model="meta-llama/Llama-3.2-3B-Instruct" \ tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling ```		2025-03-11 14:41:55 -07:00
..
__init__.py	fix: remove ruff N999 (#1388 )	2025-03-07 11:14:04 -08:00
dog.png	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
test_embedding.py	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
test_text_inference.py	test: adding an e2e test for measuring TTFT (#1568 )	2025-03-11 14:41:55 -07:00
test_vision_inference.py	test: image downloading is flaky (#1491 )	2025-03-07 13:39:26 -08:00