Commit graph

28 commits

Author SHA1 Message Date
Sébastien Han
1a529705da
chore: more mypy fixes (#2029)
# What does this PR do?

Mainly tried to cover the entire llama_stack/apis directory, we only
have one left. Some excludes were just noop.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-06 09:52:31 -07:00
Michael Clifford
fe9b5ef08b
fix: tools page on playground resets agent after every interaction (#2044)
# What does this PR do?

This PR updates how the `AgentType` gets set using the radio button on
the tools page of the playground. This change is needed due to the fact
with its current implementation, the chat interface will resets after
every input, preventing users from having a multi-turn conversation with
the agent.

## Test Plan

Run the Playground without these changes:
```bash
streamlit run llama_stack/distribution/ui/app.py
```
Navigate to the tools page and attempt to have a multi-turn
conversation. You should see the conversation reset after asking a
second question.

Repeat the steps above with these changes and you will see that it works
as expected when asking the agent multiple questions.

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-28 23:13:27 +02:00
Andy Xie
f5dae0517c
feat: Support ReAct Agent on Tools Playground (#2012)
# What does this PR do?
ReAct prompting attempts to use the Thinking, Action, Observation loop
to improve the model's reasoning ability via prompt engineering.

With this PR, it now supports the various features in Streamlit's
playground:
1. Adding the selection box for choosing between Agent Type: normal,
ReAct.
2. Adding the Thinking, Action, Observation loop streamlit logic for
ReAct agent, as seen in many LLM clients.
3. Improving tool calling accuracies via ReAct prompting, e.g. using
web_search.


**Folded**
![react_output_folded
png](https://github.com/user-attachments/assets/bf1bdce7-e6ef-455d-b6b0-c22a64e9d5c1)

**Collapsed**

![react_output_collapsed](https://github.com/user-attachments/assets/cda2fc17-df0b-400d-971c-988de821f2a4)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Run the playground and uses reasoning prompts to see for yourself. Steps
to test the ReAct agent mode:
1. Setup a llama-stack server as
[getting_started](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html)
describes.
2. Setup your Web Search API keys under
`llama_stack/distribution/ui/modules/api.py`.
3. Run the streamlit playground and try ReAct agent, possibly with
`websearch`, with the command: `streamlit run
llama_stack/distribution/ui/app.py`.

## Test Process
Current results are demonstrated with `llama-3.2-3b-instruct`. Results
will vary with different models.

You should be seeing clear distinction with normal agent and ReAct
agent. Example prompts listed below:
1. Aside from the Apple Remote, what other devices can control the
program Apple Remote was originally designed to interact with?
2. What is the elevation range for the area that the eastern sector of
the Colorado orogeny extends into?

## Example Test Results

**Web search on AppleTV**
<img width="1440" alt="normal_output_appletv"
src="https://github.com/user-attachments/assets/bf6b3273-1c94-4976-8b4a-b2d82fe41330"
/>

<img width="1440" alt="react_output_appletv"
src="https://github.com/user-attachments/assets/687f1feb-88f4-4d32-93d5-5013d0d5fe25"
/>

**Web search on Colorado**
<img width="1440" alt="normal_output_colorado"
src="https://github.com/user-attachments/assets/10bd3ad4-f2ad-466d-9ce0-c66fccee40c1"
/>

<img width="1440" alt="react_output_colorado"
src="https://github.com/user-attachments/assets/39cfd82d-2be9-4e2f-9f90-a2c4840185f7"
/>

**Web search tool + MCP Slack server**
<img width="1250" alt="normal_output_search_slack png"
src="https://github.com/user-attachments/assets/72e88125-cdbf-4a90-bcb9-ab412c51d62d"
/>

<img width="1217" alt="react_output_search_slack"
src="https://github.com/user-attachments/assets/8ae04efb-a4fd-49f6-9465-37dbecb6b73e"
/>


![slack_screenshot](https://github.com/user-attachments/assets/bb70e669-6067-462a-bdf6-7aaac6ccbcef)
2025-04-25 17:01:51 +02:00
Surya Prakash Pathak
59b7593609
feat: Enhance tool display in Tools sidebar by simplifying tool identifiers (#2024)
# What does this PR do?
This PR improves the Tools page in the LlamaStack Playground UI by
enhancing the readability of the active tool list shown in the sidebar.
- Previously, active tools were displayed in a flat JSON array with
verbose identifiers (e.g., builtin::code_interpreter:code_interpreter).
- This PR updates the logic to group tools by their toolgroup (e.g.,
builtin::websearch) and renders each tool name in a simplified,
human-readable format (e.g., web_search).
- This change improves usability when working with multiple toolgroups,
especially in configurations involving MCP tools or complex tool
identifiers.

Before and After Comparison:
**Before**
![Screenshot 2025-04-24 at 1 05
47 PM](https://github.com/user-attachments/assets/44843a79-49dc-4b4d-ab28-c6187f9bb5ba)

**After**
![Screenshot 2025-04-24 at 1 24
08 PM](https://github.com/user-attachments/assets/ebb01006-e0a9-4664-a95a-e6f72eea6f94)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- Followed the [LlamaStack UI Developer Setup
instructions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distribution/ui)
- Ran the Streamlit UI via: `uv run --with "[.ui]" streamlit run
llama_stack/distribution/ui/app.py`
- Selected multiple built-in toolgroups (e.g., code_interpreter,
websearch, wolfram_alpha) from the sidebar.

[//]: # (## Documentation)
2025-04-25 10:22:22 +02:00
Michael Clifford
64f747fe09
feat: add tool name to chat output in playground (#1996)
# What does this PR do?
This PR adds the name of the tool that is used by the agent on the
"tools" page of the playground. See image below for an example.

![Screenshot 2025-04-18 at 3 14
18 PM](https://github.com/user-attachments/assets/04e97783-4003-4121-9446-9e0ad7209256)

## Test Plan

Run the playground and navigate to the tools page. There users can see
that this additional text is present when tools are invoked and absent
when they are not.
```
streamlit run llama_stack/distribution/ui/app.py
```

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-23 15:57:54 +02:00
Ilya Kolchinsky
d39462d073
feat: Hide tool output under an expander in Playground UI (#2003)
# What does this PR do?
Now, tool outputs and retrieved chunks from the vector DB (i.e.,
everything except for the actual model reply) are hidden under an
expander form when presented to the user.

# Test Plan
Navigate to the RAG page in the Playground UI.
2025-04-23 15:32:12 +02:00
Michael Clifford
e4d001c4e4
feat: cleanup sidebar formatting on tools playground (#1998)
# What does this PR do?

This PR cleans up the sidebar on the tools page of the playground in the
following ways:
* created a clearer hierarchy of configuration options and tool
selections.
* Removed the `mcp::` or `builtin::` prefixes from the tool selection
buttons.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run the playground and see the updated sidebar does not cause any new
errors.
```
streamlit run llama_stack/distribution/ui/app.py  
```
[//]: # (## Documentation)

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-22 10:40:37 +02:00
Michael Clifford
f12011794b
fix: Updated tools playground to allow vdb selection (#1960)
# What does this PR do?

This PR lets users select an existing vdb to use with their agent on the
tools page of the playground. The drop down menu that lets users select
a vdb only appears when the rag tool is selected. Without this change,
there is no way for a user to specify which vdb they want their rag tool
to use on the tools page. I have intentionally left the RAG options
sparse here since the full RAG options are exposed on the RAG page.

## Test Plan

Without these changes the RAG tool will throw the following error:
`name: knowledge_search) does not have any content `

With these changes the RAG tool works as expected.

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-17 09:29:40 +02:00
Michael Clifford
093881071a
fix: add max_tokens slider to playground tools page (#1958)
# What does this PR do?

This PR adds a `max_tokens` slider to playground tools page. I have
found that in some instances the llama stack server throws a 500 error
if the max_tokens value is not explicitly set in the agent's
`sampling_params`. This PR, uses the same implementation of the
`max_tokens` slider from the chat page, and includes it on the tools
page.


## Test Plan
1. Attempting to call a tool without these changes results in a `500:
Internal server error: An unexpected error occurred`.
2. Attempting to call a tool with these changes results in the expected
output.

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-15 09:11:08 -07:00
Ilya Kolchinsky
40f41af2f7
feat: Add a direct (non-agentic) RAG option to the Playground RAG page (#1940)
# What does this PR do?
This PR makes it possible to switch between agentic and non-agentic RAG
when running the respective Playground page.
When non-agentic RAG is selected, user queries are answered by directly
querying the vector DB, augmenting the prompt, and sending the extended
prompt to the model via Inference API.

## Test Plan
- Launch the Playground and go to the RAG page;
- Select the vector DB ID;
- Adjust other configuration parameters if necessary;
- Set the radio button to Agent-based RAG;
- Send a message to the chat;
- The query will be answered by an agent using the knowledge search tool
as indicated by the output;
- Click the 'Clear Chat' button to make it possible to switch modes;
- Send a message to the chat again;
- This time, the query will be answered by the model directly as can be
deduced from the reply.
2025-04-11 10:16:10 -07:00
Ilya Kolchinsky
79fc81f78f
fix: Playground RAG page errors (#1928)
# What does this PR do?
This PR fixes two issues with the RAG page of the Playground UI:

1. When the user modifies a configurable setting via a widget (e.g.,
system prompt, temperature, etc.), the agent is not recreated. Thus, the
change has no effect and the user gets no indication of that.
2. After the first issue is fixed, it becomes possible to recreate the
agent mid-conversation or even mid-generation. To mitigate this, widgets
related to agent configuration are now disabled when a conversation is
in progress (i.e., when the chat is non-empty). They are automatically
enabled again when the user resets the chat history.

## Test Plan

- Launch the Playground and go to the RAG page;
- Select the vector DB ID;
- Send a message to the agent via the chat;
- The widgets in charge of the agent parameters will become disabled at
this point;
- Send a second message asking the model about the content of the first
message;
- The reply will indicate that the two messages were sent over the same
session, that is, the agent was not recreated;
- Click the 'Clear Chat' button;
- All widgets will be enabled and a new agent will be created (which can
be validated by sending another message).
2025-04-10 13:38:31 -07:00
Michael Clifford
9657105304
feat: Add tools page to playground (#1904)
# What does this PR do?

This PR adds an additional page to the playground called "Tools". This
page connects to a llama-stack server and lists all the available LLM
models, builtin tools and MCP tools in the sidebar. Users can select
whatever combination of model and tools they want from the sidebar for
their agent. Once the selections are made, users can chat with their
agent similarly to the RAG page and test out agent tool use.

closes #1902 

## Test Plan

Ran the following commands with a llama-stack server and the updated
playground worked as expected.
```
export LLAMA_STACK_ENDPOINT="http://localhost:8321"     
streamlit run  llama_stack/distribution/ui/app.py
```

[//]: # (## Documentation)

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-09 15:26:52 +02:00
Michael Clifford
c6e93e32f6
feat: Updated playground rag to use session id for persistent conversation (#1870)
# What does this PR do?

This PR updates the [playground RAG
example](llama_stack/distribution/ui/page/playground/rag.py) so that the
agent is able to use its builtin conversation history. Here we are using
streamlit's `cache_resource` functionality to prevent the agent from
re-initializing after every interaction as well as storing its
session_id in the `session_state`. This allows the agent in the RAG
example to behave more closely to how it works using the python-client
directly.

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1869 

## Test Plan

Without these changes, if you ask it "What is 2 + 2"? followed by the
question "What did I just ask?" It will provide an obviously incorrect
answer.

With these changes, you can ask the same series of questions and it will
provide the correct answer.

[//]: # (## Documentation)

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-04-08 09:46:13 +02:00
Francisco Arceo
af6594f670
fix: Adding chunk_size_in_tokens to playground rag_tool insert (#1826)
# What does this PR do?
Adding chunk_size_in_tokens to playground rag_tool insert.

# Closes #1825 

## Test Plan
Tested locally.

[//]: # (## Documentation)

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-28 15:56:25 -04:00
ehhuang
ea6a4a14ce
feat(api): simplify client imports (#1687)
# What does this PR do?
closes #1554 

## Test Plan
test_agents.py
2025-03-20 10:15:49 -07:00
Sarthak Deshpande
9c8e88ea9c
fix: Fixed import errors for UI and playground (#1666)
# What does this PR do?
Fixed import errors for playground and ui

---------

Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
2025-03-18 15:00:48 -07:00
ehhuang
ca2910d27a
docs: update test_agents to use new Agent SDK API (#1402)
# Summary:
new Agent SDK API is added in
https://github.com/meta-llama/llama-stack-client-python/pull/178

Update docs and test to reflect this.

Closes https://github.com/meta-llama/llama-stack/issues/1365

# Test Plan:
```bash
py.test -v -s --nbval-lax ./docs/getting_started.ipynb

LLAMA_STACK_CONFIG=fireworks \
   pytest -s -v tests/integration/agents/test_agents.py \
  --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct
```
2025-03-06 15:21:12 -08:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
ehhuang
c8a20b8ed0
feat: allow specifying specific tool within toolgroup (#1239)
Summary:

E.g. `builtin::rag::knowledge_search`

Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B
```
2025-02-26 14:07:05 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Xi Yan
9d574f4aee
fix playground for v1 (#799)
# What does this PR do?

- update playground callsites for v1 api changes

## Test Plan

```
cd llama_stack/distribution/ui
streamlit run app.py
```


https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f




## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 19:32:07 -08:00
Hardik Shah
a51c8b4efc
Convert SamplingParams.strategy to a union (#767)
# What does this PR do?

Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier, 
```
class SamplingParams: 
    strategy: enum ()
    top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.

Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.

Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
   

## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together 
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks 
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-15 05:38:51 -08:00
Xi Yan
75e72cf2fc model_type=llm for filering available models for playground 2024-12-17 19:42:38 -08:00
Xi Yan
af8f1b3531 model selection playground fix 2024-12-17 18:13:52 -08:00
Xi Yan
16769256b7
[llama stack ui] add native eval & inspect distro & playground pages (#541)
# What does this PR do?

New Pages Added: 

- (1) Inspect Distro
- (2) Evaluations: 
  - (a) native evaluations (including generation)
  - (b) application evaluations (no generation, scoring only)
- (3) Playground: 
  - (a) chat
  - (b) RAG  

## Test Plan

```
streamlit run app.py
```

#### Playground

https://github.com/user-attachments/assets/6ca617e8-32ca-49b2-9774-185020ff5204

#### Inspect

https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99


#### Evaluations (Generation + Scoring)

https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf

#### Evaluations (Scoring)

https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-04 09:47:09 -08:00