Commit graph

348 commits

Author SHA1 Message Date
Xi Yan
158b6dc404
chore: deprecate allow_turn_resume (#1377)
# What does this PR do?

- Deprecate allow_turn_resume flag as this is used for staying backward
compat.
- Closes https://github.com/meta-llama/llama-stack/issues/1363

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses
```

<img width="1054" alt="image"
src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a"
/>


[//]: # (## Documentation)
2025-03-04 12:22:11 -08:00
Ashwin Bharambe
cad5eed4b5
refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393)
Continues the refactor of tests. 

Tests from `providers/tests` should be considered deprecated. For this
PR, I deleted most of the tests in
 - inference
 - safety
 - agents
since much more comprehensive tests exist in
`tests/integration/{inference,safety,agents}` already.

I moved `test_persistence.py` from agents, but disabled all the tests
since that test needs to be properly migrated.

## Test Plan

```
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents --vision-inference-model=''
/Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/lib/python3.10/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
======================================================================================================= test session starts ========================================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.3, pluggy-1.5.0 -- /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.3', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/ashwin/local/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, anyio-4.8.0, nbval-0.11.0
asyncio: mode=strict, default_loop_scope=None
collected 15 items

agents/test_agents.py::test_agent_simple[txt=8B] PASSED
agents/test_agents.py::test_tool_config[txt=8B] PASSED
agents/test_agents.py::test_builtin_tool_web_search[txt=8B] PASSED
agents/test_agents.py::test_builtin_tool_code_execution[txt=8B] PASSED
agents/test_agents.py::test_code_interpreter_for_attachments[txt=8B] PASSED
agents/test_agents.py::test_custom_tool[txt=8B] PASSED
agents/test_agents.py::test_custom_tool_infinite_loop[txt=8B] PASSED
agents/test_agents.py::test_tool_choice[txt=8B] PASSED
agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag/knowledge_search] PASSED
agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag] PASSED
agents/test_agents.py::test_rag_agent_with_attachments[txt=8B] PASSED
agents/test_agents.py::test_rag_and_code_agent[txt=8B] PASSED
agents/test_agents.py::test_create_turn_response[txt=8B] PASSED
agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This test needs to be migrated to api / client-sdk world)
agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This test needs to be migrated to api / client-sdk world)

```
2025-03-04 10:41:57 -08:00
Ashwin Bharambe
4ca58eb987 refactor: tests/unittests -> tests/unit; tests/api -> tests/integration 2025-03-04 09:57:00 -08:00
Ashwin Bharambe
c6b13b6a24 fix: pre-commit 2025-03-04 09:49:40 -08:00
Ashwin Bharambe
1c63ec981a feat(test): allow specifying simple ad-hoc distributions in LLAMA_STACK_CONFIG 2025-03-04 09:47:11 -08:00
Sébastien Han
468edfd92c
fix: fix end of files for pre-commit (#1387)
# What does this PR do?

Fix end of files hook for pre-commit.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run pre-commit without any errors:

```
uv run pre-commit run --all-files
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-04 07:05:02 -08:00
ehhuang
07a992ef90
feat: deterministic tools ordering (#1380)
Summary:

1. The `tools` parameter we construct to pass the inference API is
non-deterministic. As a result, our recordable mocks is flaky as the
ordering change sometimes. This PR makes it so that `tools` ordering is
deterministic and aligned with the order user specified.
2. In recordable mock key generation, client tool's parameter type was
'str' and now is 'string' for some reason. I didn't dig into exactly
why, but just regenerated the fixtures.

Test Plan:
Regenerate mocks:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses
```

Rerun tests without  --record-responses:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-03-03 20:38:07 -08:00
Ashwin Bharambe
86fc514abb
refactor: move more tests, delete some providers tests (#1382)
Move unittests to tests/unittests. Gradually nuking tests from
providers/tests/ and unifying them into tests/api (which are e2e tests
using SDK types)

## Test Plan

`pytest -s -v tests/unittests/`
2025-03-03 20:28:34 -08:00
Ashwin Bharambe
e5ec68f66e fix: fix bugs in relative imports exposed due to dir move 2025-03-03 19:42:45 -08:00
Ashwin Bharambe
55668d3c5b refactor: move a few tests to top-level tests/ directory 2025-03-03 17:33:39 -08:00
Ashwin Bharambe
5736c7d682
refactor: move tests/client-sdk to tests/api (#1376)
This PR moves the client-sdk tests to the api directory to better
reflect their purpose and improve code organization.
2025-03-03 17:28:12 -08:00
Ashwin Bharambe
c3155cb1bc fix: add a bunch more keys to be passed as provider data for client-sdk tests 2025-03-03 17:05:26 -08:00
ehhuang
386c806c70
test: introduce recordable mocks for Agent tests (#1268)
Summary:

Agent tests shouldn't need to run inference and tools calls repeatedly.
This PR introduces a way to record inference/tool calls and reuse them
in subsequent test runs, which makes the tests more reliable and saves
costs.

Test Plan:
Run when there's no recorded calls created (fails):
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```

Run when `--record-responses` to record calls:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses
```

Run without `--record-responses` again (succeeds):
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-03-03 14:48:32 -08:00
Xi Yan
7d111c7510
feat: unify max_infer_iters in client/server agent loop (#1309)
# What does this PR do?

We currently use `max_infer_iters` in 2 different ways
1/ Server: track number of times 
2/ Client side: track number of times we send `resume_turn` request

This PR gets rid of the need of (2) and makes server track total number
of times we perform inference within a Turn

**NOTE**
The PR will assume StopReason is set to
- end_of_message: turn is not finished, we could be waiting for client
tool call responses
- end_of_turn: if the entire turn is finished and there's no more things
to be done.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)
2025-03-03 10:08:36 -08:00
Ashwin Bharambe
754feba61f
feat: add a configurable category-based logger (#1352)
A self-respecting server needs good observability which starts with
configurable logging. Llama Stack had little until now. This PR adds a
`logcat` facility towards that. Callsites look like:

```python
logcat.debug("inference", f"params to ollama: {params}")
```

- the first parameter is a category. there is a static list of
categories in `llama_stack/logcat.py`
- each category can be associated with a log-level which can be
configured via the `LLAMA_STACK_LOGGING` env var.
- a value `LLAMA_STACK_LOGGING=inference=debug;server=info"` does the
obvious thing. there is a special key called `all` which is an alias for
all categories

## Test Plan

Ran with `LLAMA_STACK_LOGGING="all=debug" llama stack run fireworks` and
saw the following:


![image](https://github.com/user-attachments/assets/d24b95ab-3941-426c-9ea0-a4c62542e6f0)

Hit it with a client-sdk test case and saw this:


![image](https://github.com/user-attachments/assets/3fee8c6c-986e-4125-a09c-f5dc019682e2)
2025-03-02 18:51:14 -08:00
Ashwin Bharambe
6609d4ada4
feat: allow conditionally enabling providers in run.yaml (#1321)
# What does this PR do?

We want to bundle a bunch of (typically remote) providers in a distro
template and be able to configure them "on the fly" via environment
variables. So far, we have been able to do this with simple env var
replacements. However, sometimes you want to only conditionally enable
providers (because the relevant remote services may not be alive, or
relevant.) This was not possible until now.

To aid this, we add a simple (bash-like) env var replacement
enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET
and evaluates to empty string if it is not. On top of that, we update
our main resolver to ignore any provider whose ID is null.

This allows using the distro like this:

```bash
llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1
```

when only Chroma is UP. This disables the other `pgvector` provider in
the run configuration.


## Test Plan

Hard code `chromadb` as the vector io provider inside
`test_vector_io.py` and run:

```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2
```
2025-03-01 11:19:14 -08:00
ehhuang
21ec67356c
fix: RAG with documents (#1337)
Summary:
This was broken by
https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190

Test Plan:

added e2e test
2025-02-28 16:51:00 -08:00
ehhuang
ba3bedc7e9
test: remove old test (#1334)
Summary:

This test is no longer relevant. We updated the default system prompt in
https://github.com/meta-llama/llama-stack/pull/1310, and system override
behavior is already unit-tested in test_prompt_adapter.py

Test Plan:
read
2025-02-28 16:42:13 -08:00
Xi Yan
82fa0803fa
chore: refactor client tool in test (#1331)
# What does this PR do?

Use @client_tool decorator instead of ClientTool

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/client-sdk/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

<img width="1053" alt="image"
src="https://github.com/user-attachments/assets/d3ade884-ef42-494e-8028-3b09d9ef1978"
/>


[//]: # (## Documentation)
2025-02-28 12:29:50 -08:00
Matthew Farrellee
83dc8fbdff
test: cleanup embedding model test suite (#1322)
# What does this PR do?

 - skip media tests for models that do not support media
 - skip output_dimension tests for models that do not support it
 - skip task_type tests for models that do not support it
 - provide task_type for models that require it

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model ...`
2025-02-28 10:02:36 -08:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Ashwin Bharambe
ece354eedd test: dont hardcode faiss as provider in the tests please 2025-02-27 22:54:34 -08:00
Yuan Tang
6824d23dc9
test: Only run embedding tests for remote::nvidia (#1317)
This fixes release build failure
3796497240:

```
=================================== FAILURES ===================================
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] __________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
=========================== short test summary info ============================
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
= 9 failed, 48 passed, 2 skipped, 3 deselected, 3 xfailed, 1 xpassed, 121 warnings in 90.16s (0:01:30) =
Error: Process completed with exit code 1.
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 22:35:52 -05:00
Hardik Shah
2f7683bc5f
fix: Structured outputs for recursive models (#1311)
Handle recursive nature in the structured response_formats. 

Update test to include 1 nested model.

```
 LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/inference/test_text_inference.py --inference-model "openai/gpt-4o-mini" -k test_text_chat_completion_structured_output
```

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-27 17:31:53 -08:00
Matthew Farrellee
e28cedd833
feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation (#1213)
# What does this PR do?

updates nvidia inference provider's embedding implementation to use new
signature

add support for task_type, output_dimensions, text_truncation parameters

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-27 16:58:11 -08:00
Ashwin Bharambe
928a39d17b
feat(providers): Groq now uses LiteLLM openai-compat (#1303)
Groq has never supported raw completions anyhow. So this makes it easier
to switch it to LiteLLM. All our test suite passes.

I also updated all the openai-compat providers so they work with api
keys passed from headers. `provider_data`

## Test Plan

```bash
LLAMA_STACK_CONFIG=groq \
   pytest -s -v tests/client-sdk/inference/test_text_inference.py \
   --inference-model=groq/llama-3.3-70b-versatile --vision-inference-model=""
```

Also tested (openai, anthropic, gemini) providers. No regressions.
2025-02-27 13:16:50 -08:00
Ashwin Bharambe
981fc3c93c
fix(test): no need to specify tool prompt format explicitly in tests (#1295)
# What does this PR do?

No need to have complex tool prompt format related machinery in the
tests.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```bash
LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/inference/test_text_inference.py --inference-model=meta-llama/Llama-3.2-3B-Instruct --vision-inference-model=""
```

[//]: # (## Documentation)
2025-02-27 10:09:57 -08:00
Ashwin Bharambe
23b65b6cee
fix(test): update client-sdk tests to handle tool format parametrization better (#1287)
# What does this PR do?

Tool format depends on the model. @ehhuang introduced a
`get_default_tool_prompt_format` function for this purpose. We should
use that instead of hacky model ID matching we had before.

Secondly, non llama models don't have this concept so testing with those
models should work as is.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```bash
for distro in fireworks ollama; do
  LLAMA_STACK_CONFIG=$distro \
    pytest -s -v tests/client-sdk/inference/test_text_inference.py \
       --inference-model=meta-llama/Llama-3.2-3B-Instruct \
       --vision-inference-model=""
done

LLAMA_STACK_CONFIG=dev \
   pytest -s -v tests/client-sdk/inference/test_text_inference.py \
       --inference-model=openai/gpt-4o \
       --vision-inference-model=""

```

[//]: # (## Documentation)
2025-02-26 21:16:00 -08:00
ehhuang
c8a20b8ed0
feat: allow specifying specific tool within toolgroup (#1239)
Summary:

E.g. `builtin::rag::knowledge_search`

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-02-26 14:07:05 -08:00
ehhuang
bb2690f176
feat: remove special handling of builtin::rag tool (#1015)
Summary:

Lets the model decide which tool it needs to call to respond to a query.

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B
```

Also evaluated on a small benchmark with 20 questions from HotpotQA.
With this PR and some prompting, the performance is 77% recall compared
to 50% currently.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1015).
* #1268
* #1239
* __->__ #1015
2025-02-26 13:04:52 -08:00
Ashwin Bharambe
63e6acd0c3
feat: add (openai, anthropic, gemini) providers via litellm (#1267)
# What does this PR do?

This PR introduces more non-llama model support to llama stack.
Providers introduced: openai, anthropic and gemini. All of these
providers use essentially the same piece of code -- the implementation
works via the `litellm` library.

We will expose only specific models for providers we enable making sure
they all work well and pass tests. This setup (instead of automatically
enabling _all_ providers and models allowed by LiteLLM) ensures we can
also perform any needed prompt tuning on a per-model basis as needed
(just like we do it for llama models.)

## Test Plan

```bash
#!/bin/bash

args=("$@")
for model in openai/gpt-4o anthropic/claude-3-5-sonnet-latest gemini/gemini-1.5-flash; do
    LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/inference/test_text_inference.py \
        --embedding-model=all-MiniLM-L6-v2 \
        --vision-inference-model="" \
        --inference-model=$model "${args[@]}"
done
```
2025-02-25 22:07:33 -08:00
LESSuseLESS
3a31611486
feat: completing text /chat-completion and /completion tests (#1223)
# What does this PR do?

The goal is to have a fairly complete set of provider and e2e tests for
/chat-completion and /completion. This is the current list,
```
grep -oE "def test_[a-zA-Z_+]*" llama_stack/providers/tests/inference/test_text_inference.py | cut -d' ' -f2
```
- test_model_list
- test_text_completion_non_streaming
- test_text_completion_streaming
- test_text_completion_logprobs_non_streaming
- test_text_completion_logprobs_streaming
- test_text_completion_structured_output
- test_text_chat_completion_non_streaming
- test_text_chat_completion_structured_output
- test_text_chat_completion_streaming
- test_text_chat_completion_with_tool_calling
- test_text_chat_completion_with_tool_calling_streaming

```
grep -oE "def test_[a-zA-Z_+]*" tests/client-sdk/inference/test_text_inference.py | cut -d' ' -f2
```
- test_text_completion_non_streaming
- test_text_completion_streaming
- test_text_completion_log_probs_non_streaming
- test_text_completion_log_probs_streaming
- test_text_completion_structured_output
- test_text_chat_completion_non_streaming
- test_text_chat_completion_streaming
- test_text_chat_completion_with_tool_calling_and_non_streaming
- test_text_chat_completion_with_tool_calling_and_streaming
- test_text_chat_completion_with_tool_choice_required
- test_text_chat_completion_with_tool_choice_none
- test_text_chat_completion_structured_output
- test_text_chat_completion_tool_calling_tools_not_in_request

## Test plan

== Set up Ollama local server
```
OLLAMA_HOST=127.0.0.1:8321 with-proxy ollama serve
OLLAMA_HOST=127.0.0.1:8321 ollama run llama3.2:3b-instruct-fp16 --keepalive 60m
```

==  Run a provider test
```
conda activate stack
OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \
llama_stack/providers/tests/inference/test_text_inference.py::TestInference
```

== Run an e2e test
```
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
```
```
conda activate stack
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \
tests/client-sdk/inference/test_text_inference.py
```
2025-02-25 11:37:04 -08:00
Hardik Shah
27a08b7266 test fix for sometimes tools get called more than once 2025-02-24 13:16:40 -08:00
ehhuang
e8f4efba44
test: fix test_tool_choice (#1234)
Summary:

Test Plan:
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1234).
* __->__ #1234
* #1214
2025-02-24 12:42:42 -08:00
Ashwin Bharambe
b890d7a611 Test be not having prints yo 2025-02-21 16:43:00 -08:00
ehhuang
c9e08cc0a8
test: do not overwrite agent_config (#1216)
Summary:

Test Plan:
2025-02-21 16:38:56 -08:00
Ashwin Bharambe
45ffe87d7c Kill noise from test output 2025-02-21 15:37:23 -08:00
ehhuang
bf38d0aba0
test: fix test_rag_agent test (#1215)
Summary:

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/client-sdk/agents/test_agents.py::test_rag_agent --safety-shield
meta-llama/Llama-Guard-3-8B
2025-02-21 15:24:28 -08:00
Ashwin Bharambe
e7d261ef4a Fix test infra, sentence embeddings mixin 2025-02-21 15:11:46 -08:00
Ashwin Bharambe
182608d4bf better test naming 2025-02-21 14:27:08 -08:00
Ashwin Bharambe
ab54b8cd58
feat(providers): support non-llama models for inference providers (#1200)
This PR begins the process of supporting non-llama models within Llama
Stack. We start simple by adding support for this functionality within a
few existing providers: fireworks, together and ollama.

## Test Plan

```bash
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/inference/test_text_inference.py \
  --inference-model accounts/fireworks/models/phi-3-vision-128k-instruct
```

^ this passes most of the tests but as expected fails the tool calling
related tests since they are very specific to Llama models

```
inference/test_text_inference.py::test_text_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct-completion-01] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet do humans live on?-Earth] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet has rings around it with a name starting w
ith letter S?-Saturn] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What's the name of the Sun in latin?-Sol] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What is the name of the US captial?-Washington] PASSED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_chat_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct] ERROR
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-True] PASSED
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-False] PASSED
```
2025-02-21 13:21:28 -08:00
ehhuang
25fddccfd8
feat: tool outputs metadata (#1155)
Summary:

Allows tools to output metadata. This is useful for evaluating tool
outputs, e.g. RAG tool will output document IDs, which can be used to
score recall.

Will need to make a similar change on the client side to support
ClientTool outputting metadata.

Test Plan:

LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/client-sdk/agents/test_agents.py
2025-02-21 13:15:31 -08:00
Xi Yan
0fe071764f
feat(1/n): api: unify agents for handling server & client tools (#1178)
# Problem

Our current Agent framework has discrepancies in definition on how we
handle server side and client side tools.

1. Server Tools: a single Turn is returned including `ToolExecutionStep`
in agenst
2. Client Tools: `create_agent_turn` is called in loop with client agent
lib yielding the agent chunk

ad6ffc63df/src/llama_stack_client/lib/agents/agent.py (L186-L211)

This makes it inconsistent to work with server & client tools. It also
complicates the logs to telemetry to get information about agents turn /
history for observability.

#### Principle
The same `turn_id` should be used to represent the steps required to
complete a user message including client tools.

## Solution

1. `AgentTurnResponseEventType.turn_awaiting_input` status to indicate
that the current turn is not completed, and awaiting tool input
2. `continue_agent_turn` endpoint to update agent turn with client's
tool response.


# What does this PR do?
- Skeleton API as example

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

- Just API update, no functionality change
```
llama stack run + client-sdk test
```

<img width="842" alt="image"
src="https://github.com/user-attachments/assets/7ac56b5f-f424-4632-9476-7e0f57555bc3"
/>


[//]: # (## Documentation)
2025-02-21 11:48:27 -08:00
Matthew Farrellee
46da187c07
fix: remove list of list tests, no longer relevant after #1161 (#1205)
# What does this PR do?

#1161 updated the embedding signature making the nested list tests
irrelevant
2025-02-21 08:07:35 -08:00
Matthew Farrellee
3099c5243f
fix: update URL import, URL -> ImageContentItemImageURL (#1204)
# What does this PR do?

fixes test to use new name for URL import

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-21 08:02:21 -08:00
Ashwin Bharambe
34226d6c93 Another test_case related breakage fix 2025-02-20 23:10:33 -08:00
Ashwin Bharambe
36b762303c Fix client-sdk inference text -- spurious parameterization of test_case 2025-02-20 22:46:17 -08:00
Matthew Farrellee
832c535aaf
feat(providers): add NVIDIA Inference embedding provider and tests (#935)
# What does this PR do?

add /v1/inference/embeddings implementation to NVIDIA provider

**open topics** -
- *asymmetric models*. NeMo Retriever includes asymmetric models, which
are models that embed differently depending on if the input is destined
for storage or lookup against storage. the /v1/inference/embeddings api
does not allow the user to indicate the type of embedding to perform.
see https://github.com/meta-llama/llama-stack/issues/934
- *truncation*. embedding models typically have a limited context
window, e.g. 1024 tokens is common though newer models have 8k windows.
when the input is larger than this window the endpoint cannot perform
its designed function. two options: 0. return an error so the user can
reduce the input size and retry; 1. perform truncation for the user and
proceed (common strategies are left or right truncation). many users
encounter context window size limits and will struggle to write reliable
programs. this struggle is especially acute without access to the
model's tokenizer. the /v1/inference/embeddings api does not allow the
user to delegate truncation policy. see
https://github.com/meta-llama/llama-stack/issues/933
- *dimensions*. "Matryoshka" embedding models are available. they allow
users to control the number of embedding dimensions the model produces.
this is a critical feature for managing storage constraints. embeddings
of 1024 dimensions what achieve 95% recall for an application may not be
worth the storage cost if a 512 dimensions can achieve 93% recall.
controlling embedding dimensions allows applications to determine their
recall and storage tradeoffs. the /v1/inference/embeddings api does not
allow the user to control the output dimensions. see
https://github.com/meta-llama/llama-stack/issues/932

## Test Plan

- `llama stack run llama_stack/templates/nvidia/run.yaml`
- `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-20 16:59:48 -08:00
LESSuseLESS
2cbe9395b0
feat: D69478008 [llama-stack] turning tests into data-driven (#1180)
# What does this PR do?

We have several places running tests for different purposes.
- oss llama stack
  - provider tests
  - e2e tests
- provider llama stack
  - unit tests
  - e2e tests

It would be nice if they can *share the same set of test data*, so we
maintain the consistency between spec and implementation. This is what
this diff is about, isolating test data from test coding, so that we can
reuse the same data at different places by writing different test
coding.

## Test Plan

== Set up Ollama local server  
==  Run a provider test
conda activate stack

OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output
// test_structured_output should also work

== Run an e2e test
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
  - Run test client,
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \

tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output
// test_text_chat_completion_structured_output should also work

## Notes

- This PR was automatically generated by oss_sync
- Please refer to D69478008 for more details.
2025-02-20 14:13:06 -08:00
Sixian Yi
531940aea9
script for running client sdk tests (#895)
# What does this PR do?
Create a script for running all client-sdk tests on Async Library
client, with the option to generate report


## Test Plan

```
python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report
```



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-19 22:38:06 -08:00