Commit graph

96 commits

Author SHA1 Message Date
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Derek Higgins
64829947d0
feat: Add temperature support to responses API (#2065)
# What does this PR do?
Add support for the temperature to the responses API 


## Test Plan
Manually tested simple case
unit tests added for simple case and tool calls

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-05-01 11:47:58 -07:00
Ben Browning
8dfce2f596
feat: OpenAI Responses API (#1989)
# What does this PR do?

This provides an initial [OpenAI Responses
API](https://platform.openai.com/docs/api-reference/responses)
implementation. The API is not yet complete, and this is more a
proof-of-concept to show how we can store responses in our key-value
stores and use them to support the Responses API concepts like
`previous_response_id`.

## Test Plan

I've added a new
`tests/integration/openai_responses/test_openai_responses.py` as part of
a test-driven development for this new API. I'm only testing this
locally with the remote-vllm provider for now, but it should work with
any of our inference providers since the only API it requires out of the
inference provider is the `openai_chat_completion` endpoint.

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
```

```
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"
 ```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-04-28 14:06:00 -07:00
Alexey Rybak
326cbba579
feat(agents): add agent naming functionality (#1922)
# What does this PR do?
Allow users to name an agent and use the name in telemetry instead of
relying on randomly generated agent_ids. This improves the developer
experience by making it easier to find specific agents in telemetry
logs.

Closes #1832

## Test Plan

- Added tests to verify the agent name is properly stored and retrieved
- Ran `uv run -- pytest -v
tests/integration/telemetry/test_telemetry.py::test_agent_name_filtering`
from the root of the project and made sure the tests pass
- Ran `uv run -- pytest -v
tests/integration/telemetry/test_telemetry.py::test_agent_query_spans`
to verify existing code without agent names still works correctly

## Use Example
```
agent = Agent(
    llama_stack_client, 
    model=text_model_id, 
    name="CustomerSupportAgent",  # New parameter
    instructions="You are a helpful customer support assistant"
)
session_id = agent.create_session(f"test-session-{uuid4()}")
```

## Implementation Notes
- Agent names are optional string parameters with no additional
validation
- Names are not required to be unique - multiple agents can have the
same name
- The agent_id remains the unique identifier for an agent

---------

Co-authored-by: raghotham <raghotham@gmail.com>
2025-04-17 07:02:47 -07:00
Sébastien Han
10882bf478
chore: remove unused tempdir in agent (#1896)
# What does this PR do?

The usage of the tempdir was removed in
094eb6a5ae.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-09 09:43:48 +02:00
Ashwin Bharambe
530d4bdfe1
refactor: move all llama code to models/llama out of meta reference (#1887)
# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-04-07 15:03:58 -07:00
Ashwin Bharambe
b8f1561956
feat: introduce llama4 support (#1877)
As title says. Details in README, elsewhere.
2025-04-05 11:53:35 -07:00
Xi Yan
90efafafb7
chore: change context to content for agent (#1840) 2025-03-30 10:33:58 -07:00
Xi Yan
094eb6a5ae
feat(rag): entire document context with attachments (#1763)
# What does this PR do?
**What**
Instead of adhoc creating a vectordb and chunking when documents ae sent
as an attachment to agent turn, we directly pass raw text from document
into messages to model for user context, and let model perform
summarization directly.

This removes the magic behaviour, and yields better performance than
existing approach.

**Improved Performance**
- RAG lifecycle notebook
  - Model: 0.3 factuality score
  - (+ websearch) Agent: 0.44 factuality score
  - (+ vector db) Agent: 0.3 factuality score
  - (+ raw context) Agent: 0.6 factuality score

Closes https://github.com/meta-llama/llama-stack/issues/1478

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- [NEW] added section in RAG lifecycle notebook shows better performance

<img width="840" alt="image"
src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5"
/>


[//]: # (## Documentation)
2025-03-23 16:57:48 -07:00
ehhuang
06788643b3
feat(telemetry): clean up spans (#1760) 2025-03-21 20:05:11 -07:00
Ashwin Bharambe
03b5c61bfc
feat: make sure agent sessions are under access control (#1737)
This builds on top of #1703.

Agent sessions are now properly access controlled.

## Test Plan

Added unit tests
2025-03-21 07:31:16 -07:00
ehhuang
37f155e41d
feat(agent): support multiple tool groups (#1556)
Summary:
closes #1488 

Test Plan:
added new integration test
```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini
```
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1556).
* __->__ #1556
* #1550
2025-03-17 22:13:09 -07:00
ehhuang
c23a7af5d6
fix: agents with non-llama model (#1550)
# Summary:
Includes fixes to get test_agents working with openAI model, e.g. tool
parsing and message conversion

# Test Plan:
```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini
```

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1550).
* #1556
* __->__ #1550
2025-03-17 22:11:06 -07:00
Sébastien Han
98b1b15e0f
refactor: move all datetime.now() calls to UTC (#1589)
# What does this PR do?

Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-13 15:34:53 -07:00
ehhuang
a505bf45a3
feat(api): remove tool_name from ToolResponseMessage (#1599)
Summary:
This is not used anywhere.

closes #1421 

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct --record-responses
2025-03-12 19:41:48 -07:00
ehhuang
ed6caead72
chore: simplify _get_tool_defs (#1384)
Summary:

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
2025-03-12 18:51:18 -07:00
ehhuang
41c9bca1aa
chore: refactor Agent toolgroup processing (#1381)
Summary:
Refactoring only.

Centralize logic to preprocess toolgroup to one place. 

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/api/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1381).
* #1384
* __->__ #1381
2025-03-12 18:48:03 -07:00
ehhuang
b7a9c45477
chore: deprecate ToolResponseMessage in agent.resume API (#1566)
# Summary:
closes #1431 

# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
2025-03-12 12:10:21 -07:00
Xi Yan
43044f29e2
fix: fix llama stack run with missing agent impl (#1559)
# What does this PR do?

- recent merge https://github.com/meta-llama/llama-stack/pull/1410
introduce error
```
ValueError: Provider meta-reference (Api.agents) does not implement the following methods:
[('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')]
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama stack run
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```

1379530386

[//]: # (## Documentation)
2025-03-11 11:22:22 -07:00
Ihar Hrachyshka
c3d7d17bc4
chore: fix typing hints for get_provider_impl deps arguments (#1544)
# What does this PR do?

It's a dict that may contain different types, as per
resolver:instantiate_provider implementation. (AFAIU it also never
contains ProviderSpecs, but *instances* of provider implementations.)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

mypy passing if enabled checks for these modules. (See #1543)

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-11 10:07:28 -07:00
Dinesh Yeduguru
ead9397e22
fix: tracing fixes for trace context propogation across coroutines (#1522)
# What does this PR do?
This PR has two fixes needed for correct trace context propagation
across asycnio boundary
Fix 1: Start using context vars to store the global trace context.
This is needed since we cannot use the same trace context across
coroutines since the state is shared. each coroutine
should have its own trace context so that each of it can start storing
its state correctly.
Fix 2: Start a new span for each new coroutines started for running
shields to keep the span tree clean


## Test Plan

### Integration tests with server
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run
~/.llama/distributions/together/together-run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
server logs:
https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe
test logs:
https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3
 
### Integration tests with library client
LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct

logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8

### Apps test with server:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
server logs:
https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797
app logs:
https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8
2025-03-11 07:12:48 -07:00
ehhuang
23e39cc3c4
fix: handle log errors (#1499)
Summary:
| File
"/Users/erichuang/projects/llama-stack/llama_stack/distribution/server/server.py",
line 213, in sse_generator
    |     logger.exception(f"Error in sse_generator: {e}")
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1864, in exception
    |     self.log(ERROR, msg, *args, exc_info=exc_info, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1879, in log
    |     self.logger.log(level, msg, *args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1547, in log
    |     self._log(level, msg, args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1624, in _log
    |     self.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1634, in handle
    |     self.callHandlers(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1696, in callHandlers
    |     hdlr.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 968, in handle
    |     self.emit(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 167, in emit
    |     message_renderable = self.render_message(record, message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 193, in render_message
| message_text = Text.from_markup(message) if use_markup else
Text(message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py",
line 287, in from_markup
| rendered_text = render(text, style, emoji=emoji,
emoji_variant=emoji_variant)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py",
line 167, in render
    |     raise MarkupError(
| rich.errors.MarkupError: closing tag '[/INST]' at position 105 doesn't
match any open tag


Test Plan:
reran failing rag_with_vector_db example
2025-03-07 15:58:26 -08:00
ehhuang
acbae66b9d
chore: escape tool output for logging (#1490)
Summary:

error:


llama_stack/providers/inline/agents/meta_reference/agent_instance.py:1032:
in execute_tool_call_maybe
    logger.info(f"tool call {name} completed with result: {result}")

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1841:
in info
    self.log(INFO, msg, *args, **kwargs)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1879:
in log
    self.logger.log(level, msg, *args, **kwargs)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1547:
in log
    self._log(level, msg, args, **kwargs)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1624:
in _log
    self.handle(record)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1634:
in handle
    self.callHandlers(record)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1696:
in callHandlers
    hdlr.handle(record)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:968:
in handle
    self.emit(record)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:167:
in emit
    message_renderable = self.render_message(record, message)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:193:
in render_message
message_text = Text.from_markup(message) if use_markup else
Text(message)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py:287:
in from_markup
rendered_text = render(text, style, emoji=emoji,
emoji_variant=emoji_variant)

/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py:167:
in render
    raise MarkupError(
E rich.errors.MarkupError: closing tag '[/INST]' at position 3274
doesn't match any open tag

Test Plan:
2025-03-07 13:33:45 -08:00
Sébastien Han
7cf1e24c4e
feat(logging): implement category-based logging (#1362)
# What does this PR do?

This commit introduces a new logging system that allows loggers to be
assigned
a category while retaining the logger name based on the file name. The
log
format includes both the logger name and the category, producing output
like:

```
INFO     2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by
         tavily-search
```

Key features include:

- Category-based logging: Loggers can be assigned a category (e.g.,
  "core", "server") when programming. The logger can be loaded like
  this: `logger = get_logger(name=__name__, category="server")`
- Environment variable control: Log levels can be configured
per-category using the
  `LLAMA_STACK_LOGGING` environment variable. For example:
`LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for
the "server"
    and "core" categories.
- `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all
categories and
    third-party libraries.

This provides fine-grained control over logging levels while maintaining
a clean and
informative log format.

The formatter uses the rich library which provides nice colors better
stack traces like so:

```
ERROR    2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown
         task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at
         /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146>
         exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
         ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
         │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown                │
         │                                                                                                                │
         │   175 │   │   except asyncio.CancelledError:                                                                   │
         │   176 │   │   │   pass                                                                                         │
         │   177 │   │   finally:                                                                                         │
         │ ❱ 178 │   │   │   loop.stop()                                                                                  │
         │   179 │                                                                                                        │
         │   180 │   loop = asyncio.get_running_loop()                                                                    │
         │   181 │   loop.create_task(shutdown())                                                                         │
         ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         UnboundLocalError: local variable 'loop' referenced before assignment
```

Co-authored-by: Ashwin Bharambe <@ashwinb>
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
INFO     2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml           
INFO     2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration:                                                 
INFO     2025-03-03 21:55:35,928 __main__:380 [server]: apis:                                                              
         - agents                                                     
``` 
[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-07 11:34:30 -08:00
Ashwin Bharambe
2fe976ed0a
refactor(test): introduce --stack-config and simplify options (#1404)
You now run the integration tests with these options:

```bash
Custom options:
  --stack-config=STACK_CONFIG
                        a 'pointer' to the stack. this can be either be:
                        (a) a template name like `fireworks`, or
                        (b) a path to a run.yaml file, or
                        (c) an adhoc config spec, e.g.
                        `inference=fireworks,safety=llama-guard,agents=meta-
                        reference`
  --env=ENV             Set environment variables, e.g. --env KEY=value
  --text-model=TEXT_MODEL
                        comma-separated list of text models. Fixture name:
                        text_model_id
  --vision-model=VISION_MODEL
                        comma-separated list of vision models. Fixture name:
                        vision_model_id
  --embedding-model=EMBEDDING_MODEL
                        comma-separated list of embedding models. Fixture name:
                        embedding_model_id
  --safety-shield=SAFETY_SHIELD
                        comma-separated list of safety shields. Fixture name:
                        shield_id
  --judge-model=JUDGE_MODEL
                        comma-separated list of judge models. Fixture name:
                        judge_model_id
  --embedding-dimension=EMBEDDING_DIMENSION
                        Output dimensionality of the embedding model to use for
                        testing. Default: 384
  --record-responses    Record new API responses instead of using cached ones.
  --report=REPORT       Path where the test report should be written, e.g.
                        --report=/path/to/report.md

```

Importantly, if you don't specify any of the models (text-model,
vision-model, etc.) the relevant tests will get **skipped!**

This will make running tests somewhat more annoying since all options
will need to be specified. We will make this easier by adding some easy
wrapper yaml configs.

## Test Plan

Example:

```bash
ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $ 
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \
   --text-model meta-llama/Llama-3.2-3B-Instruct 
```
2025-03-05 17:02:02 -08:00
ehhuang
6cf79437b3
feat: support ClientTool output metadata (#1426)
# Summary:
Client side change in
https://github.com/meta-llama/llama-stack-client-python/pull/180
Changes the resume_turn API to accept `ToolResponse` instead of
`ToolResponseMessage`:
1. `ToolResponse` contains `metadata`
2. `ToolResponseMessage` is a concept for model inputs. Here we are just
submitting the outputs of tool execution.

# Test Plan:
Ran integration tests with newly added test using client tool with
metadata

LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --record-responses
2025-03-05 14:30:27 -08:00
Daniele Martinoli
fb998683e0
fix: Agent uses the first configured vector_db_id when documents are provided (#1276)
# What does this PR do?
The agent API allows to query multiple DBs using the `vector_db_ids`
argument of the `rag` tool:
```py
        toolgroups=[
            {
                "name": "builtin::rag",
                "args": {"vector_db_ids": [vector_db_id]},
            }
        ],
```
This means that multiple DBs can be used to compose an aggregated
context by executing the query on each of them.

When documents are passed to the next agent turn, there is no explicit
way to configure the vector DB where the embeddings will be ingested. In
such cases, we can assume that:
- if any `vector_db_ids` is given, we use the first one (it probably
makes sense to assume that it's the only one in the list, otherwise we
should loop on all the given DBs to have a consistent ingestion)
- if no `vector_db_ids` is given, we can use the current logic to
generate a default DB using the default provider. If multiple providers
are defined, the API will fail as expected: the user has to provide
details on where to ingest the documents.

(Closes #1270)

## Test Plan
The issue description details how to replicate the problem.

[//]: # (## Documentation)

---------

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
2025-03-04 21:44:13 -08:00
Xi Yan
78962be996
chore: refactor create_and_execute_turn and resume_turn (#1399)
# What does this PR do?
- Closes https://github.com/meta-llama/llama-stack/issues/1212

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```
<img width="1203" alt="image"
src="https://github.com/user-attachments/assets/35b60017-b3f2-4e98-87f2-2868730261bd"
/>

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py::test_rag_and_code_agent --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)
2025-03-04 16:07:30 -08:00
ehhuang
fd8c991393
fix: rag as attachment bug (#1392)
Summary:

Test Plan:
added new test
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/api/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B
2025-03-04 13:08:16 -08:00
Xi Yan
158b6dc404
chore: deprecate allow_turn_resume (#1377)
# What does this PR do?

- Deprecate allow_turn_resume flag as this is used for staying backward
compat.
- Closes https://github.com/meta-llama/llama-stack/issues/1363

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses
```

<img width="1054" alt="image"
src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a"
/>


[//]: # (## Documentation)
2025-03-04 12:22:11 -08:00
ehhuang
07a992ef90
feat: deterministic tools ordering (#1380)
Summary:

1. The `tools` parameter we construct to pass the inference API is
non-deterministic. As a result, our recordable mocks is flaky as the
ordering change sometimes. This PR makes it so that `tools` ordering is
deterministic and aligned with the order user specified.
2. In recordable mock key generation, client tool's parameter type was
'str' and now is 'string' for some reason. I didn't dig into exactly
why, but just regenerated the fixtures.

Test Plan:
Regenerate mocks:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses
```

Rerun tests without  --record-responses:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-03-03 20:38:07 -08:00
Ashwin Bharambe
0a76ece249 feat: add more logs to agent_instance.py 2025-03-03 16:15:47 -08:00
Xi Yan
7d111c7510
feat: unify max_infer_iters in client/server agent loop (#1309)
# What does this PR do?

We currently use `max_infer_iters` in 2 different ways
1/ Server: track number of times 
2/ Client side: track number of times we send `resume_turn` request

This PR gets rid of the need of (2) and makes server track total number
of times we perform inference within a Turn

**NOTE**
The PR will assume StopReason is set to
- end_of_message: turn is not finished, we could be waiting for client
tool call responses
- end_of_turn: if the entire turn is finished and there's no more things
to be done.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

[//]: # (## Documentation)
2025-03-03 10:08:36 -08:00
ehhuang
21ec67356c
fix: RAG with documents (#1337)
Summary:
This was broken by
https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190

Test Plan:

added e2e test
2025-02-28 16:51:00 -08:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Hardik Shah
8efa53daf1
fix: Agent telemetry inputs/outputs should be structured (#1302)
Original telemetry outputs for agent turns look like this. 
Note: how output was a `str(message)` making it difficult to read them
back for downstream tasks ( eg. building eval datasets )
```
{
│   │   'input': [
│   │   │   '{"role":"system","content":"You are a helpful assistant. Use search tool to answer the questions. "}',
│   │   │   '{"role":"user","content":"Which teams played in the NBA western conference finals of 2024","context":null}'
│   │   ],
│   │   'output': "content:  tool_calls: [ToolCall(call_id='8b7294ec-a83f-4798-ad8f-6bed662f08b6', tool_name=<BuiltinTool.brave_search: 'brave_search'>, arguments={'query': 'NBA Western Conference Finals 2024 teams'})]"
│   },
``` 

Updated the outputs to be structured .

## Test 

```python
import uuid

from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig

model_id = "meta-llama/Llama-3.1-8B-Instruct"
agent_config = AgentConfig(
    model=model_id,
    instructions="You are a helpful assistant who will use the web search tools to help with answering questions.\nOnly provide final answer in short without writing full sentences. Use web search",
    toolgroups=["builtin::websearch"],
    enable_session_persistence=True,
)

agent = Agent(client, agent_config)

session_id = agent.create_session(uuid.uuid4().hex)
response = agent.create_turn(
    messages=[
        {
            "role": "user",
            "content": "latest news about llama stack",
        }
    ],
    session_id=session_id,
    stream=False,
)

pprint(response)
```
Output: 
```
Turn(
│   input_messages=[UserMessage(content='latest news about llama stack', role='user', context=None)],
│   output_message=CompletionMessage(
│   │   content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='77379546-4598-485a-b4f4-84e5da28c513',
│   started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915243, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={'query': 'latest news llama stack'},
│   │   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   │   tool_name='brave_search'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='81c16bd3-eb00-4721-8edc-f386e07391a3',
│   │   │   step_type='inference',
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 637149, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915831, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='4782d609-a62e-45f5-8d2a-25a43db46288',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'query': 'latest news llama stack'},
│   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   tool_name='brave_search'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   content='{"query": "latest news llama stack", "top_k": [{"title": "Llama 3.2: Revol. .......  Hacker News.", "score": 0.6186197, "raw_content": null}]}',
│   │   │   │   │   tool_name='brave_search',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 272176, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 640743, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='37994419-5da3-4e84-a010-8d9b85366262',
│   │   │   step_type='inference',
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 961275, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 273168, tzinfo=TzInfo(-08:00))
│   │   )
│   ],
│   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 962318, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)

```

## Check for Telemetry 
```python 

agent_logs = []
for span in client.telemetry.query_spans(
    attribute_filters=[
      {"key": "session_id", "op": "eq", "value": session_id},
    ],
    attributes_to_return=['input', 'output'],
):
    agent_logs.append(span.attributes)

pprint(json.loads(agent_logs[-1]['output']))
```
```
{
│   'content': "The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   'tool_calls': []
}
```
2025-02-27 23:06:37 -08:00
ehhuang
a34f3aafcf
fix: don't include tool args not in the function definition (#1307)
# Summary:
Right now we would include toolgroup args when we encode messages with
tool_calls, which is confusing the model since they not in the function
description (see test plan for example).

# Test Plan:
Add a print statement before raw prompt is sent to providers (no good
way to test this currently)

Before:
```
cated in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood", vector_db_ids=["829a68735d744dc3830409dcc782964a"])]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found 5 chunks:\nBEGIN of
```
Note the extra `vector_db_ids`

After
```
>user<|end_header_id|>\n\nAre the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood")]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found
```
2025-02-27 16:25:30 -08:00
Xi Yan
663c6b0537
fix: duplicate ToolResponseMessage in Turn message history (#1305)
# What does this PR do?

- Reproduce with:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py

- **Root cause**: when we have ToolResponseMessage as part of Turn, we
will create duplicate ToolResponseMessage in the conversation history
when getting messages from a Turn.
- Fix: avoid adding duplicate ToolResponseMessage from a turn's
input_messages.
- If it is part of a Turn's steps, only add it when processing the
steps.
   - If it is not part of a Turn's steps, add it. 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py --inference-model meta-llama/Llama-3.1-8B-Instruct
```


```
python -m examples.agents.e2e_loop_with_client_tools localhost 8321 
```

```python
Turn(
│   input_messages=[
│   │   UserMessage(
│   │   │   content='What was the closing price of Google stock (ticker symbol GOOG) for 2023 ?',
│   │   │   role='user',
│   │   │   context=None
│   │   ),
│   │   ToolResponseMessage(
│   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"',
│   │   │   role='tool',
│   │   │   tool_name='get_ticker_data'
│   │   )
│   ],
│   output_message=CompletionMessage(
│   │   content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.',
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871',
│   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 412928, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   ShieldCallStep(
│   │   │   step_id='e0514587-b7d6-4bba-8609-8e05a3a46d8a',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 858382, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 425204, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={
│   │   │   │   │   │   │   'ticker_symbol': 'GOOG',
│   │   │   │   │   │   │   'start': '2023-01-01',
│   │   │   │   │   │   │   'end': '2023-12-31'
│   │   │   │   │   │   },
│   │   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   │   tool_name='get_ticker_data'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='a3ceec6a-f149-49d5-a1c2-db461e3f6e9f',
│   │   │   step_type='inference',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 910179, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 871130, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='f9339865-96ca-4425-af42-a87bab343e24',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 383013, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 944012, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='e317b74a-c4f3-4845-99a3-7d93aa6ea6c8',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'},
│   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   tool_name='get_ticker_data'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"',
│   │   │   │   │   tool_name='get_ticker_data',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 718810, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 943792, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='c4236616-db89-4c04-ad04-f51cfb726385',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 958946, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 732680, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='3386f896-2026-41e4-a60f-f6f3c3981cf6',
│   │   │   step_type='inference',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 74750, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 970724, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='bc57ac8c-f94e-4758-bf1a-0dd734eca1cf',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 443016, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 86726, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   )
│   ],
│   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 459456, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)
```

```python
Turn(
│   input_messages=[
│   │   UserMessage(content='What is 40+30?', role='user', context=None),
│   │   ToolResponseMessage(
│   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   content='{"success": true, "result": 70.0}',
│   │   │   role='tool',
│   │   │   tool_name='calculator'
│   │   )
│   ],
│   output_message=CompletionMessage(
│   │   content='The result of the calculation is 70.',
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871',
│   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 156903, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   ShieldCallStep(
│   │   │   step_id='17b6b645-31cc-4be9-a758-a4f3b741ced9',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 780564, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 174515, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'},
│   │   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   │   tool_name='calculator'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='f59e951a-2b75-497d-a075-ec9aad9aad12',
│   │   │   step_type='inference',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 141869, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 792047, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='efafa0cf-23b9-4a90-8350-3a186d80925d',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 766293, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177473, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='877cfbe7-57a8-4056-9c29-49aa38dd337c',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'},
│   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   tool_name='calculator'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   content='{"success": true, "result": 70.0}',
│   │   │   │   │   tool_name='calculator',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 930899, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177202, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='d47c6160-45d9-47c1-8e39-2faae65ee468',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 510402, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 949433, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='The result of the calculation is 70.',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='660ba1cc-770e-471c-bf6e-11e103d74443',
│   │   │   step_type='inference',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 814944, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 521309, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='4dab8bb0-7d38-4465-ae1a-10069de2b3d1',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 428561, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 825970, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   )
│   ],
│   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 462823, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)
```


[//]: # (## Documentation)
2025-02-27 15:06:47 -08:00
Xi Yan
564f0e5f93
fix: Revert "chore: remove vector_db_id from AgentSessionInfo" (#1299)
Reverts meta-llama/llama-stack#1296

This change breaks test: `session_info.vector_db_id` is actually used
```
pytest -v tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent --inference-model meta-llama/Llama-3.1-8B-Instruct
```
2025-02-27 10:37:15 -08:00
Xi Yan
200ef29233
chore: remove vector_db_id from AgentSessionInfo (#1296)
# What does this PR do?

- It is not being used anywhere and doesn't make sense to have 1 single
vector_db_id in an agent session. No top level API change.
- See
https://github.com/meta-llama/llama-stack/pull/1286#discussion_r1972569881

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

- See
https://github.com/meta-llama/llama-stack/pull/1286#discussion_r1972569881

[//]: # (## Documentation)
2025-02-27 10:13:10 -08:00
Xi Yan
fc5aff3ccf
feat: ability to retrieve agents session, turn, step by ids (#1286)
# What does this PR do?

- Fix up rotten implementation for retrieving agent's Session, Turn,
Step with actual working implementation.

- Update `getting_started` notebook with retrieving by agent session_id.
https://github.com/meta-llama/llama-stack/blob/export_agent_dataset/docs/getting_started.ipynb

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Test with script:
https://gist.github.com/yanxi0830/657cecee8f1f0e39d322963d9c0f598e

<img width="503" alt="image"
src="https://github.com/user-attachments/assets/5ea9bc33-83d1-40bc-98e1-b68393158387"
/>


[//]: # (## Documentation)
2025-02-27 09:45:14 -08:00
ehhuang
0762c61402
feat: don't silently ignore incorrect toolgroup (#1285) 2025-02-27 08:11:09 -05:00
ehhuang
c8a20b8ed0
feat: allow specifying specific tool within toolgroup (#1239)
Summary:

E.g. `builtin::rag::knowledge_search`

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-02-26 14:07:05 -08:00
ehhuang
fca84db5b0
fix: time logging format (#1281)
Summary:
missed in last PR

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py::test_create_turn_response --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-02-26 13:51:33 -08:00
ehhuang
bb2690f176
feat: remove special handling of builtin::rag tool (#1015)
Summary:

Lets the model decide which tool it needs to call to respond to a query.

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B
```

Also evaluated on a small benchmark with 20 questions from HotpotQA.
With this PR and some prompting, the performance is 77% recall compared
to 50% currently.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1015).
* #1268
* #1239
* __->__ #1015
2025-02-26 13:04:52 -08:00
Ben Browning
c64f0d5888
fix: Get builtin tool calling working in remote-vllm (#1236)
# What does this PR do?

This PR makes a couple of changes required to get the test
`tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search`
passing on the remote-vllm provider.

First, we adjust agent_instance to also pass in the description and
parameters of builtin tools. We need these parameters so we can pass the
tool's expected parameters into vLLM. The meta-reference implementations
may not have needed these for builtin tools, as they are able to take
advantage of the Llama-model specific support for certain builtin tools.
However, with vLLM, our server-side chat templates for tool calling
treat all tools the same and don't separate out Llama builtin vs custom
tools. So, we need to pass the full set of parameter definitions and
list of required parameters for builtin tools as well.

Next, we adjust the vllm streaming chat completion code to fix up some
edge cases where it was returning an extra ChatCompletionResponseEvent
with an empty ToolCall with empty string call_id, tool_name, and
arguments properties. This is a bug discovered after the above fix,
where after a successful tool invocation we were sending extra chunks
back to the client with these empty ToolCalls.

## Test Plan

With these changes, the following test that previously failed now
passes:

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"
```

Additionally, I ran the remote-vllm client-sdk and provider inference
tests as below to ensure they all still passed with this change:

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/inference/test_text_inference.py \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"
```

```
VLLM_URL="http://localhost:8000/v1" \
python -m pytest -s -v \
llama_stack/providers/tests/inference/test_text_inference.py \
--providers "inference=vllm_remote"
```


[//]: # (## Documentation)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-26 15:25:47 -05:00
Yuan Tang
1a044ef894
fix: Raise exception when tool call result is None (#1253)
# What does this PR do?

When there are issues with the tool call function, an exception is
raised but the error message is not informative. This adds a clearer
message to tell users to check their functions.
```
Traceback (most recent call last):
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
    async for chunk in self.run(
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
    async for res in self._run(
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 811, in _run
    content=tool_result.content,
AttributeError: 'NoneType' object has no attribute 'content'
```

## Test Plan

Ran the same script and exception is raised with clearer error message.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-25 13:10:50 -05:00
ehhuang
dc3c881ffe
fix: include timezone in Agent steps' timestamps (#1247)
Summary:

kotlin SDK expects this format

Test Plan:

python prints the expected format
>>> str(datetime.now().astimezone())
'2025-02-24 22:02:58.729763-08:00'
2025-02-25 09:49:25 -08:00
Ashwin Bharambe
45ffe87d7c Kill noise from test output 2025-02-21 15:37:23 -08:00
ehhuang
25fddccfd8
feat: tool outputs metadata (#1155)
Summary:

Allows tools to output metadata. This is useful for evaluating tool
outputs, e.g. RAG tool will output document IDs, which can be used to
score recall.

Will need to make a similar change on the client side to support
ClientTool outputting metadata.

Test Plan:

LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/client-sdk/agents/test_agents.py
2025-02-21 13:15:31 -08:00