Commit graph

108 commits

Author SHA1 Message Date
Ben Browning
ac717f38dc
chore: Reduce flakes in test_text_inference on smaller models (#1428)
# What does this PR do?

When running `tests/integration/inference/test_text_inference.py` on
smaller models, such as Llama-3.2-3B-Instruct, I sometimes get test
flakes where the model passes "San Francisco" as an argument to my tool
call instead of "San Francisco, CA" which is what we expect.

So, this expands upon that tool calling parameter's description to
explicitly state that both city and state are required. With this
change, the tool calling tests that are checking for this "San
Francisco, CA" value are always passing for me instead of sometimes
failing.

## Test Plan

I test this locally via vLLM like:

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/integration/inference/test_text_inference.py \
--inference-model "meta-llama/Llama-3.2-3B-Instruct" \
--vision-inference-model ""
```

I don't expect this would negatively impact the parameter generated for
this tool call by other models, as we're providing additional guidance
but not removing any of the existing guidance. However, I cannot easily
confirm that myself.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-03-05 13:05:30 -08:00
Xi Yan
78962be996
chore: refactor create_and_execute_turn and resume_turn (#1399)
# What does this PR do?
- Closes https://github.com/meta-llama/llama-stack/issues/1212

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```
<img width="1203" alt="image"
src="https://github.com/user-attachments/assets/35b60017-b3f2-4e98-87f2-2868730261bd"
/>

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py::test_rag_and_code_agent --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)
2025-03-04 16:07:30 -08:00
Ashwin Bharambe
abfbaf3c1b
refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401)
All of the tests from `llama_stack/providers/tests/` are now moved to
`tests/integration`.

I converted the `tools`, `scoring` and `datasetio` tests to use API.
However, `eval` and `post_training` proved to be a bit challenging to
leaving those. I think `post_training` should be relatively
straightforward also.

As part of this, I noticed that `wolfram_alpha` tool wasn't added to
some of our commonly used distros so I added it. I am going to remove a
lot of code duplication from distros next so while this looks like a
one-off right now, it will go away and be there uniformly for all
distros.
2025-03-04 14:53:47 -08:00
Ashwin Bharambe
dd0db8038b
refactor(test): unify vector_io tests and make them configurable (#1398)
## Test Plan


`LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec
pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2
--inference-model='' --vision-inference-model=''`

```
test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED
```

Same thing with:
- LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss
- LLAMA_STACK_CONFIG=fireworks

(Note that ergonomics will soon be improved re: cmd-line options and env
variables)
2025-03-04 13:37:45 -08:00
ehhuang
fd8c991393
fix: rag as attachment bug (#1392)
Summary:

Test Plan:
added new test
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/api/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B
2025-03-04 13:08:16 -08:00
Xi Yan
158b6dc404
chore: deprecate allow_turn_resume (#1377)
# What does this PR do?

- Deprecate allow_turn_resume flag as this is used for staying backward
compat.
- Closes https://github.com/meta-llama/llama-stack/issues/1363

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses
```

<img width="1054" alt="image"
src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a"
/>


[//]: # (## Documentation)
2025-03-04 12:22:11 -08:00
Ashwin Bharambe
cad5eed4b5
refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393)
Continues the refactor of tests. 

Tests from `providers/tests` should be considered deprecated. For this
PR, I deleted most of the tests in
 - inference
 - safety
 - agents
since much more comprehensive tests exist in
`tests/integration/{inference,safety,agents}` already.

I moved `test_persistence.py` from agents, but disabled all the tests
since that test needs to be properly migrated.

## Test Plan

```
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents --vision-inference-model=''
/Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/lib/python3.10/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
======================================================================================================= test session starts ========================================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.3, pluggy-1.5.0 -- /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.3', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/ashwin/local/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, anyio-4.8.0, nbval-0.11.0
asyncio: mode=strict, default_loop_scope=None
collected 15 items

agents/test_agents.py::test_agent_simple[txt=8B] PASSED
agents/test_agents.py::test_tool_config[txt=8B] PASSED
agents/test_agents.py::test_builtin_tool_web_search[txt=8B] PASSED
agents/test_agents.py::test_builtin_tool_code_execution[txt=8B] PASSED
agents/test_agents.py::test_code_interpreter_for_attachments[txt=8B] PASSED
agents/test_agents.py::test_custom_tool[txt=8B] PASSED
agents/test_agents.py::test_custom_tool_infinite_loop[txt=8B] PASSED
agents/test_agents.py::test_tool_choice[txt=8B] PASSED
agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag/knowledge_search] PASSED
agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag] PASSED
agents/test_agents.py::test_rag_agent_with_attachments[txt=8B] PASSED
agents/test_agents.py::test_rag_and_code_agent[txt=8B] PASSED
agents/test_agents.py::test_create_turn_response[txt=8B] PASSED
agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This test needs to be migrated to api / client-sdk world)
agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This test needs to be migrated to api / client-sdk world)

```
2025-03-04 10:41:57 -08:00
Ashwin Bharambe
4ca58eb987 refactor: tests/unittests -> tests/unit; tests/api -> tests/integration 2025-03-04 09:57:00 -08:00