llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 19:04:19 +00:00

Author	SHA1	Message	Date
Ilya Kolchinsky	d39462d073	feat: Hide tool output under an expander in Playground UI (#2003 ) # What does this PR do? Now, tool outputs and retrieved chunks from the vector DB (i.e., everything except for the actual model reply) are hidden under an expander form when presented to the user. # Test Plan Navigate to the RAG page in the Playground UI.	2025-04-23 15:32:12 +02:00
Ilya Kolchinsky	40f41af2f7	feat: Add a direct (non-agentic) RAG option to the Playground RAG page (#1940 ) # What does this PR do? This PR makes it possible to switch between agentic and non-agentic RAG when running the respective Playground page. When non-agentic RAG is selected, user queries are answered by directly querying the vector DB, augmenting the prompt, and sending the extended prompt to the model via Inference API. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Adjust other configuration parameters if necessary; - Set the radio button to Agent-based RAG; - Send a message to the chat; - The query will be answered by an agent using the knowledge search tool as indicated by the output; - Click the 'Clear Chat' button to make it possible to switch modes; - Send a message to the chat again; - This time, the query will be answered by the model directly as can be deduced from the reply.	2025-04-11 10:16:10 -07:00
Ilya Kolchinsky	79fc81f78f	fix: Playground RAG page errors (#1928 ) # What does this PR do? This PR fixes two issues with the RAG page of the Playground UI: 1. When the user modifies a configurable setting via a widget (e.g., system prompt, temperature, etc.), the agent is not recreated. Thus, the change has no effect and the user gets no indication of that. 2. After the first issue is fixed, it becomes possible to recreate the agent mid-conversation or even mid-generation. To mitigate this, widgets related to agent configuration are now disabled when a conversation is in progress (i.e., when the chat is non-empty). They are automatically enabled again when the user resets the chat history. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Send a message to the agent via the chat; - The widgets in charge of the agent parameters will become disabled at this point; - Send a second message asking the model about the content of the first message; - The reply will indicate that the two messages were sent over the same session, that is, the agent was not recreated; - Click the 'Clear Chat' button; - All widgets will be enabled and a new agent will be created (which can be validated by sending another message).	2025-04-10 13:38:31 -07:00
Michael Clifford	c6e93e32f6	feat: Updated playground rag to use session id for persistent conversation (#1870 ) # What does this PR do? This PR updates the [playground RAG example](llama_stack/distribution/ui/page/playground/rag.py) so that the agent is able to use its builtin conversation history. Here we are using streamlit's `cache_resource` functionality to prevent the agent from re-initializing after every interaction as well as storing its session_id in the `session_state`. This allows the agent in the RAG example to behave more closely to how it works using the python-client directly. [//]: # (If resolving an issue, uncomment and update the line below) Closes #1869 ## Test Plan Without these changes, if you ask it "What is 2 + 2"? followed by the question "What did I just ask?" It will provide an obviously incorrect answer. With these changes, you can ask the same series of questions and it will provide the correct answer. [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-08 09:46:13 +02:00
Francisco Arceo	af6594f670	fix: Adding chunk_size_in_tokens to playground rag_tool insert (#1826 ) # What does this PR do? Adding chunk_size_in_tokens to playground rag_tool insert. # Closes #1825 ## Test Plan Tested locally. [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 15:56:25 -04:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Sarthak Deshpande	9c8e88ea9c	fix: Fixed import errors for UI and playground (#1666 ) # What does this PR do? Fixed import errors for playground and ui --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 15:00:48 -07:00
ehhuang	ca2910d27a	docs: update test_agents to use new Agent SDK API (#1402 ) # Summary: new Agent SDK API is added in https://github.com/meta-llama/llama-stack-client-python/pull/178 Update docs and test to reflect this. Closes https://github.com/meta-llama/llama-stack/issues/1365 # Test Plan: ```bash py.test -v -s --nbval-lax ./docs/getting_started.ipynb LLAMA_STACK_CONFIG=fireworks \ pytest -s -v tests/integration/agents/test_agents.py \ --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-06 15:21:12 -08:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
ehhuang	c8a20b8ed0	feat: allow specifying specific tool within toolgroup (#1239 ) Summary: E.g. `builtin::rag::knowledge_search` Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-02-26 14:07:05 -08:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Ashwin Bharambe	f3d8864c36	Rename builtin::memory -> builtin::rag	2025-01-22 20:22:51 -08:00
Ashwin Bharambe	c9e5578151	[memory refactor][5/n] Migrate all vector_io providers (#835 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This PR finishes off all the stragglers and migrates everything to the new naming.	2025-01-22 10:17:59 -08:00
Xi Yan	9d574f4aee	fix playground for v1 (#799 ) # What does this PR do? - update playground callsites for v1 api changes ## Test Plan ``` cd llama_stack/distribution/ui streamlit run app.py ``` https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 19:32:07 -08:00
Hardik Shah	a51c8b4efc	Convert `SamplingParams.strategy` to a union (#767 ) # What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-15 05:38:51 -08:00
Xi Yan	75e72cf2fc	model_type=llm for filering available models for playground	2024-12-17 19:42:38 -08:00
Xi Yan	af8f1b3531	model selection playground fix	2024-12-17 18:13:52 -08:00
Xi Yan	16769256b7	[llama stack ui] add native eval & inspect distro & playground pages (#541 ) # What does this PR do? New Pages Added: - (1) Inspect Distro - (2) Evaluations: - (a) native evaluations (including generation) - (b) application evaluations (no generation, scoring only) - (3) Playground: - (a) chat - (b) RAG ## Test Plan ``` streamlit run app.py ``` #### Playground https://github.com/user-attachments/assets/6ca617e8-32ca-49b2-9774-185020ff5204 #### Inspect https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99 #### Evaluations (Generation + Scoring) https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf #### Evaluations (Scoring) https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-04 09:47:09 -08:00

19 commits