# What does this PR do?
We have several places running tests for different purposes.
- oss llama stack
- provider tests
- e2e tests
- provider llama stack
- unit tests
- e2e tests
It would be nice if they can *share the same set of test data*, so we
maintain the consistency between spec and implementation. This is what
this diff is about, isolating test data from test coding, so that we can
reuse the same data at different places by writing different test
coding.
## Test Plan
== Set up Ollama local server
== Run a provider test
conda activate stack
OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output
// test_structured_output should also work
== Run an e2e test
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
- Run test client,
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \
tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output
// test_text_chat_completion_structured_output should also work
## Notes
- This PR was automatically generated by oss_sync
- Please refer to D69478008 for more details.
# What does this PR do?
This PR removes the warnings when running tests for `remote::vllm`
provider:
```
Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
```
## Test Plan
All tests passed without the warning messages shown above.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This significantly shortens the test time (about 10x faster) since most
of the time is spent on outputing the tokens "there are several planets
in our solar system that have...". We want to have an answer quicker,
especially when testing even larger models.
## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py -k "test_text_chat_completion_non_streaming or test_text_chat_completion_streaming"
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.7.0
collected 12 items / 8 deselected / 4 selected
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [100%]
```
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR splits the inference tests into text and vision to make testing
on vLLM provider easier as mentioned in
https://github.com/meta-llama/llama-stack/pull/951 since serving
multiple models (e.g. Llama-3.2-11B-Vision-Instruct and
Llama-3.1-8B-Instruct) on a single port using the OpenAI API is [not
supported yet](https://docs.vllm.ai/en/v0.5.5/serving/faq.html) so it's
a bit tricky to test both at the same time.
## Test Plan
All previously passing tests related to text still pass:
`LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v
tests/client-sdk/inference/test_text_inference.py`
All vision tests passed via `LLAMA_STACK_BASE_URL=http://localhost:5002
pytest -v tests/client-sdk/inference/test_vision_inference.py`.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-07 09:35:49 -08:00
Renamed from tests/client-sdk/inference/test_inference.py (Browse further)