mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ben Browning ac717f38dc chore: Reduce flakes in test_text_inference on smaller models (#1428 ) # What does this PR do? When running `tests/integration/inference/test_text_inference.py` on smaller models, such as Llama-3.2-3B-Instruct, I sometimes get test flakes where the model passes "San Francisco" as an argument to my tool call instead of "San Francisco, CA" which is what we expect. So, this expands upon that tool calling parameter's description to explicitly state that both city and state are required. With this change, the tool calling tests that are checking for this "San Francisco, CA" value are always passing for me instead of sometimes failing. ## Test Plan I test this locally via vLLM like: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` I don't expect this would negatively impact the parameter generated for this tool call by other models, as we're providing additional guidance but not removing any of the existing guidance. However, I cannot easily confirm that myself. Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-03-05 13:05:30 -08:00
..
agents	fix: rag as attachment bug (#1392 )	2025-03-04 13:08:16 -08:00
datasetio	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
eval	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
fixtures	chore: refactor create_and_execute_turn and resume_turn (#1399 )	2025-03-04 16:07:30 -08:00
inference	refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393 )	2025-03-04 10:41:57 -08:00
post_training	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
safety	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
scoring	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
test_cases	chore: Reduce flakes in test_text_inference on smaller models (#1428 )	2025-03-05 13:05:30 -08:00
tool_runtime	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
vector_io	refactor(test): unify vector_io tests and make them configurable (#1398 )	2025-03-04 13:37:45 -08:00
__init__.py	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
conftest.py	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
metadata.py	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
README.md	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
report.py	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00

README.md

Llama Stack Integration Tests

You can run llama stack integration tests on either a Llama Stack Library or a Llama Stack endpoint.

To test on a Llama Stack library with certain configuration, run

LLAMA_STACK_CONFIG=./llama_stack/templates/cerebras/run.yaml pytest -s -v tests/api/inference/

or just the template name

LLAMA_STACK_CONFIG=together pytest -s -v tests/api/inference/

To test on a Llama Stack endpoint, run

LLAMA_STACK_BASE_URL=http://localhost:8089 pytest -s -v tests/api/inference

Report Generation

To generate a report, run with --report option

LLAMA_STACK_CONFIG=together pytest -s -v report.md tests/api/ --report

Common options

Depending on the API, there are custom options enabled

For tests in inference/ and agents/, we support --inference-model(to be used in text inference tests) and--vision-inference-model` (only used in image inference tests) overrides
For tests in vector_io/, we support --embedding-model override
For tests in safety/, we support --safety-shield override
The param can be --report or --report <path> If path is not provided, we do a best effort to infer based on the config / template name. For url endpoints, path is required.