llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Sébastien Han	1a529705da	chore: more mypy fixes (#2029 ) # What does this PR do? Mainly tried to cover the entire llama_stack/apis directory, we only have one left. Some excludes were just noop. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-06 09:52:31 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Sébastien Han	dc94433072	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 ) # What does this PR do? Add several new pre-commit hooks to improve code quality and security: - no-commit-to-branch: prevent direct commits to protected branches like `main` - check-yaml: validate YAML files - detect-private-key: prevent accidental commit of private keys - requirements-txt-fixer: maintain consistent requirements.txt format and sorting - mixed-line-ending: enforce LF line endings to avoid mixed line endings - check-executables-have-shebangs: ensure executable scripts have shebangs - check-json: validate JSON files - check-shebang-scripts-are-executable: verify shebang scripts are executable - check-symlinks: validate symlinks and report broken ones - check-toml: validate TOML files mainly for pyproject.toml The respective fixes have been included. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 11:35:49 -07:00
Michael Clifford	fe9b5ef08b	fix: tools page on playground resets agent after every interaction (#2044 ) # What does this PR do? This PR updates how the `AgentType` gets set using the radio button on the tools page of the playground. This change is needed due to the fact with its current implementation, the chat interface will resets after every input, preventing users from having a multi-turn conversation with the agent. ## Test Plan Run the Playground without these changes: ```bash streamlit run llama_stack/distribution/ui/app.py ``` Navigate to the tools page and attempt to have a multi-turn conversation. You should see the conversation reset after asking a second question. Repeat the steps above with these changes and you will see that it works as expected when asking the agent multiple questions. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-28 23:13:27 +02:00
Andy Xie	f5dae0517c	feat: Support ReAct Agent on Tools Playground (#2012 ) # What does this PR do? ReAct prompting attempts to use the Thinking, Action, Observation loop to improve the model's reasoning ability via prompt engineering. With this PR, it now supports the various features in Streamlit's playground: 1. Adding the selection box for choosing between Agent Type: normal, ReAct. 2. Adding the Thinking, Action, Observation loop streamlit logic for ReAct agent, as seen in many LLM clients. 3. Improving tool calling accuracies via ReAct prompting, e.g. using web_search. Folded ![react_output_folded png](https://github.com/user-attachments/assets/bf1bdce7-e6ef-455d-b6b0-c22a64e9d5c1) Collapsed ![react_output_collapsed](https://github.com/user-attachments/assets/cda2fc17-df0b-400d-971c-988de821f2a4) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Run the playground and uses reasoning prompts to see for yourself. Steps to test the ReAct agent mode: 1. Setup a llama-stack server as [getting_started](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) describes. 2. Setup your Web Search API keys under `llama_stack/distribution/ui/modules/api.py`. 3. Run the streamlit playground and try ReAct agent, possibly with `websearch`, with the command: `streamlit run llama_stack/distribution/ui/app.py`. ## Test Process Current results are demonstrated with `llama-3.2-3b-instruct`. Results will vary with different models. You should be seeing clear distinction with normal agent and ReAct agent. Example prompts listed below: 1. Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with? 2. What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into? ## Example Test Results Web search on AppleTV <img width="1440" alt="normal_output_appletv" src="https://github.com/user-attachments/assets/bf6b3273-1c94-4976-8b4a-b2d82fe41330" /> <img width="1440" alt="react_output_appletv" src="https://github.com/user-attachments/assets/687f1feb-88f4-4d32-93d5-5013d0d5fe25" /> Web search on Colorado <img width="1440" alt="normal_output_colorado" src="https://github.com/user-attachments/assets/10bd3ad4-f2ad-466d-9ce0-c66fccee40c1" /> <img width="1440" alt="react_output_colorado" src="https://github.com/user-attachments/assets/39cfd82d-2be9-4e2f-9f90-a2c4840185f7" /> Web search tool + MCP Slack server <img width="1250" alt="normal_output_search_slack png" src="https://github.com/user-attachments/assets/72e88125-cdbf-4a90-bcb9-ab412c51d62d" /> <img width="1217" alt="react_output_search_slack" src="https://github.com/user-attachments/assets/8ae04efb-a4fd-49f6-9465-37dbecb6b73e" /> ![slack_screenshot](https://github.com/user-attachments/assets/bb70e669-6067-462a-bdf6-7aaac6ccbcef)	2025-04-25 17:01:51 +02:00
Surya Prakash Pathak	59b7593609	feat: Enhance tool display in Tools sidebar by simplifying tool identifiers (#2024 ) # What does this PR do? This PR improves the Tools page in the LlamaStack Playground UI by enhancing the readability of the active tool list shown in the sidebar. - Previously, active tools were displayed in a flat JSON array with verbose identifiers (e.g., builtin::code_interpreter:code_interpreter). - This PR updates the logic to group tools by their toolgroup (e.g., builtin::websearch) and renders each tool name in a simplified, human-readable format (e.g., web_search). - This change improves usability when working with multiple toolgroups, especially in configurations involving MCP tools or complex tool identifiers. Before and After Comparison: Before ![Screenshot 2025-04-24 at 1 05 47 PM](https://github.com/user-attachments/assets/44843a79-49dc-4b4d-ab28-c6187f9bb5ba) After ![Screenshot 2025-04-24 at 1 24 08 PM](https://github.com/user-attachments/assets/ebb01006-e0a9-4664-a95a-e6f72eea6f94) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - Followed the [LlamaStack UI Developer Setup instructions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distribution/ui) - Ran the Streamlit UI via: `uv run --with "[.ui]" streamlit run llama_stack/distribution/ui/app.py` - Selected multiple built-in toolgroups (e.g., code_interpreter, websearch, wolfram_alpha) from the sidebar. [//]: # (## Documentation)	2025-04-25 10:22:22 +02:00
Michael Clifford	64f747fe09	feat: add tool name to chat output in playground (#1996 ) # What does this PR do? This PR adds the name of the tool that is used by the agent on the "tools" page of the playground. See image below for an example. ![Screenshot 2025-04-18 at 3 14 18 PM](https://github.com/user-attachments/assets/04e97783-4003-4121-9446-9e0ad7209256) ## Test Plan Run the playground and navigate to the tools page. There users can see that this additional text is present when tools are invoked and absent when they are not. ``` streamlit run llama_stack/distribution/ui/app.py ``` Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-23 15:57:54 +02:00
Ilya Kolchinsky	d39462d073	feat: Hide tool output under an expander in Playground UI (#2003 ) # What does this PR do? Now, tool outputs and retrieved chunks from the vector DB (i.e., everything except for the actual model reply) are hidden under an expander form when presented to the user. # Test Plan Navigate to the RAG page in the Playground UI.	2025-04-23 15:32:12 +02:00
Michael Clifford	e4d001c4e4	feat: cleanup sidebar formatting on tools playground (#1998 ) # What does this PR do? This PR cleans up the sidebar on the tools page of the playground in the following ways: * created a clearer hierarchy of configuration options and tool selections. * Removed the `mcp::` or `builtin::` prefixes from the tool selection buttons. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run the playground and see the updated sidebar does not cause any new errors. ``` streamlit run llama_stack/distribution/ui/app.py ``` [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-22 10:40:37 +02:00
Michael Clifford	f12011794b	fix: Updated tools playground to allow vdb selection (#1960 ) # What does this PR do? This PR lets users select an existing vdb to use with their agent on the tools page of the playground. The drop down menu that lets users select a vdb only appears when the rag tool is selected. Without this change, there is no way for a user to specify which vdb they want their rag tool to use on the tools page. I have intentionally left the RAG options sparse here since the full RAG options are exposed on the RAG page. ## Test Plan Without these changes the RAG tool will throw the following error: `name: knowledge_search) does not have any content ` With these changes the RAG tool works as expected. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-17 09:29:40 +02:00
Michael Clifford	093881071a	fix: add max_tokens slider to playground tools page (#1958 ) # What does this PR do? This PR adds a `max_tokens` slider to playground tools page. I have found that in some instances the llama stack server throws a 500 error if the max_tokens value is not explicitly set in the agent's `sampling_params`. This PR, uses the same implementation of the `max_tokens` slider from the chat page, and includes it on the tools page. ## Test Plan 1. Attempting to call a tool without these changes results in a `500: Internal server error: An unexpected error occurred`. 2. Attempting to call a tool with these changes results in the expected output. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-15 09:11:08 -07:00
Ilya Kolchinsky	40f41af2f7	feat: Add a direct (non-agentic) RAG option to the Playground RAG page (#1940 ) # What does this PR do? This PR makes it possible to switch between agentic and non-agentic RAG when running the respective Playground page. When non-agentic RAG is selected, user queries are answered by directly querying the vector DB, augmenting the prompt, and sending the extended prompt to the model via Inference API. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Adjust other configuration parameters if necessary; - Set the radio button to Agent-based RAG; - Send a message to the chat; - The query will be answered by an agent using the knowledge search tool as indicated by the output; - Click the 'Clear Chat' button to make it possible to switch modes; - Send a message to the chat again; - This time, the query will be answered by the model directly as can be deduced from the reply.	2025-04-11 10:16:10 -07:00
Ilya Kolchinsky	79fc81f78f	fix: Playground RAG page errors (#1928 ) # What does this PR do? This PR fixes two issues with the RAG page of the Playground UI: 1. When the user modifies a configurable setting via a widget (e.g., system prompt, temperature, etc.), the agent is not recreated. Thus, the change has no effect and the user gets no indication of that. 2. After the first issue is fixed, it becomes possible to recreate the agent mid-conversation or even mid-generation. To mitigate this, widgets related to agent configuration are now disabled when a conversation is in progress (i.e., when the chat is non-empty). They are automatically enabled again when the user resets the chat history. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Send a message to the agent via the chat; - The widgets in charge of the agent parameters will become disabled at this point; - Send a second message asking the model about the content of the first message; - The reply will indicate that the two messages were sent over the same session, that is, the agent was not recreated; - Click the 'Clear Chat' button; - All widgets will be enabled and a new agent will be created (which can be validated by sending another message).	2025-04-10 13:38:31 -07:00
Sébastien Han	770b38f8b5	chore: simplify running the demo UI (#1907 ) # What does this PR do? * Manage UI deps in pyproject * Use a new "ui" dep group to pull the deps with "uv" * Simplify the run command * Bump versions in requirements.txt Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 11:22:29 -07:00
Michael Clifford	5c010e234a	fix: add tavily_search option to playground api (#1909 ) # What does this PR do? This PR adds the "TAVILY_SEARCH_API_KEY" option to the playground to enable the use of the websearch tool. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` export TAVILY_SEARCH_API_KEY=*** streamlit run llama_stack/distribution/ui/app.py ``` Without this change the builtin websearch tool will fail due to missing API key. [//]: # (## Documentation) Related to #1902 Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-09 15:56:41 +02:00
Michael Clifford	9657105304	feat: Add tools page to playground (#1904 ) # What does this PR do? This PR adds an additional page to the playground called "Tools". This page connects to a llama-stack server and lists all the available LLM models, builtin tools and MCP tools in the sidebar. Users can select whatever combination of model and tools they want from the sidebar for their agent. Once the selections are made, users can chat with their agent similarly to the RAG page and test out agent tool use. closes #1902 ## Test Plan Ran the following commands with a llama-stack server and the updated playground worked as expected. ``` export LLAMA_STACK_ENDPOINT="http://localhost:8321" streamlit run llama_stack/distribution/ui/app.py ``` [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-09 15:26:52 +02:00
Jaland	30b49d8dfa	fix: Playground Container Issue (#1868 ) What does this PR do? This PR fixes a build issue with the Containerfile caused by missing requirement `llama-stack`. It updates the Containerfile to include the necessary requirements and upgrades the Python version to ensure successful builds. Test Plan The updated Containerfile has been tested, and the build now completes successfully with the required dependencies included.	2025-04-09 11:45:15 +02:00
Michael Clifford	c6e93e32f6	feat: Updated playground rag to use session id for persistent conversation (#1870 ) # What does this PR do? This PR updates the [playground RAG example](llama_stack/distribution/ui/page/playground/rag.py) so that the agent is able to use its builtin conversation history. Here we are using streamlit's `cache_resource` functionality to prevent the agent from re-initializing after every interaction as well as storing its session_id in the `session_state`. This allows the agent in the RAG example to behave more closely to how it works using the python-client directly. [//]: # (If resolving an issue, uncomment and update the line below) Closes #1869 ## Test Plan Without these changes, if you ask it "What is 2 + 2"? followed by the question "What did I just ask?" It will provide an obviously incorrect answer. With these changes, you can ask the same series of questions and it will provide the correct answer. [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-08 09:46:13 +02:00
Francisco Arceo	af6594f670	fix: Adding chunk_size_in_tokens to playground rag_tool insert (#1826 ) # What does this PR do? Adding chunk_size_in_tokens to playground rag_tool insert. # Closes #1825 ## Test Plan Tested locally. [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 15:56:25 -04:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Sarthak Deshpande	9c8e88ea9c	fix: Fixed import errors for UI and playground (#1666 ) # What does this PR do? Fixed import errors for playground and ui --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 15:00:48 -07:00
Jamie Land	f4dc290705	feat: Created Playground Containerfile and Image Workflow (#1256 ) # What does this PR do? Adds a container file that can be used to build the playground UI. This file will be built by this PR in the stack-ops repo: https://github.com/meta-llama/llama-stack-ops/pull/9 Docker command in the docs will need to change once I know the address of the official repository. ## Test Plan Tested image on my local Openshift Instance using this helm chart: https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack [//]: # (## Documentation) --------- Co-authored-by: Jamie Land <hokie10@gmail.com>	2025-03-18 09:26:49 -07:00
Xi Yan	5287b437ae	feat(api): (1/n) datasets api clean up (#1573 ) ## PR Stack - https://github.com/meta-llama/llama-stack/pull/1573 - https://github.com/meta-llama/llama-stack/pull/1625 - https://github.com/meta-llama/llama-stack/pull/1656 - https://github.com/meta-llama/llama-stack/pull/1657 - https://github.com/meta-llama/llama-stack/pull/1658 - https://github.com/meta-llama/llama-stack/pull/1659 - https://github.com/meta-llama/llama-stack/pull/1660 Client SDK - https://github.com/meta-llama/llama-stack-client-python/pull/203 CI - `1391130488` <img width="1042" alt="image" src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca" /> -- the test_rag_agent_with_attachments is flaky and not related to this PR ## Doc <img width="789" alt="image" src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9" /> ## Client Usage ```python client.datasets.register( source={ "type": "uri", "uri": "lsfs://mydata.jsonl", }, schema="jsonl_messages", # optional dataset_id="my_first_train_data" ) # quick prototype debugging client.datasets.register( data_reference={ "type": "rows", "rows": [ "messages": [...], ], }, schema="jsonl_messages", ) ``` ## Test Plan - CI: `1387805545` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py ``` ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ```	2025-03-17 16:55:45 -07:00
ehhuang	ca2910d27a	docs: update test_agents to use new Agent SDK API (#1402 ) # Summary: new Agent SDK API is added in https://github.com/meta-llama/llama-stack-client-python/pull/178 Update docs and test to reflect this. Closes https://github.com/meta-llama/llama-stack/issues/1365 # Test Plan: ```bash py.test -v -s --nbval-lax ./docs/getting_started.ipynb LLAMA_STACK_CONFIG=fireworks \ pytest -s -v tests/integration/agents/test_agents.py \ --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-06 15:21:12 -08:00
Ellis Tarn	24a27baf7c	chore: Make README code blocks more easily copy pastable (#1420 ) # What does this PR do? When going through READMEs, I found that I had to keep editing the code blocks since they were prefixed with `$ `. A common pattern is to triple click (highlight all) a block and then copy paste. This minor change will make this easier for folks to follow the READMEs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan N/A [//]: # (## Documentation)	2025-03-05 09:11:01 -08:00
Xi Yan	e9a37bad63	chore: rename task_config to benchmark_config (#1397 ) # What does this PR do? - This was missed from previous deprecation: https://github.com/meta-llama/llama-stack/pull/1186 - Part of https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` [//]: # (## Documentation)	2025-03-04 12:44:04 -08:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
ehhuang	c8a20b8ed0	feat: allow specifying specific tool within toolgroup (#1239 ) Summary: E.g. `builtin::rag::knowledge_search` Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-02-26 14:07:05 -08:00
Xi Yan	8b655e3cd2	fix!: update eval-tasks -> benchmarks (#1032 ) # What does this PR do? - Update `/eval-tasks` to `/benchmarks` - ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Ben Browning <ben324@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu <reid201711@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 16:40:58 -08:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
snova-edwardm	22dc684da6	Sambanova inference provider (#555 ) # What does this PR do? This PR adds SambaNova as one of the Provider - Add SambaNova as a provider ## Test Plan Test the functional command ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key> ``` Test the distribution template: ``` # Docker LLAMA_STACK_PORT=5001 docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ llamastack/distribution-sambanova \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY # Conda llama stack build --template sambanova --image-type conda llama stack run ./run.yaml \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY ``` ## Source [SambaNova API Documentation](https://cloud.sambanova.ai/apis) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [Y ] Wrote necessary unit or integration tests. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-01-23 12:20:28 -08:00
Ashwin Bharambe	f3d8864c36	Rename builtin::memory -> builtin::rag	2025-01-22 20:22:51 -08:00
Ashwin Bharambe	c9e5578151	[memory refactor][5/n] Migrate all vector_io providers (#835 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This PR finishes off all the stragglers and migrates everything to the new naming.	2025-01-22 10:17:59 -08:00
Xi Yan	9d574f4aee	fix playground for v1 (#799 ) # What does this PR do? - update playground callsites for v1 api changes ## Test Plan ``` cd llama_stack/distribution/ui streamlit run app.py ``` https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 19:32:07 -08:00
Ashwin Bharambe	03ac84a829	Update default port from 5000 -> 8321	2025-01-16 15:26:48 -08:00
Hardik Shah	a51c8b4efc	Convert `SamplingParams.strategy` to a union (#767 ) # What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-15 05:38:51 -08:00
Yuan Tang	c1987d6143	Fix failing flake8 E226 check (#701 ) This fixes the pre-commit check when running locally (not sure why this was not caught on CI check): ``` > pre-commit run --show-diff-on-failure --color=always --all-files trim trailing whitespace.................................................Passed check python ast.........................................................Passed check for merge conflicts................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 llama_stack/distribution/ui/page/evaluations/app_eval.py:132:65: E226 missing whitespace around arithmetic operator llama_stack/distribution/ui/page/evaluations/native_eval.py:235:61: E226 missing whitespace around arithmetic operator llama_stack/providers/utils/telemetry/trace_protocol.py:56:78: E226 missing whitespace around arithmetic operator ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-02 09:04:07 -08:00
Xi Yan	75e72cf2fc	model_type=llm for filering available models for playground	2024-12-17 19:42:38 -08:00
Xi Yan	af8f1b3531	model selection playground fix	2024-12-17 18:13:52 -08:00
Xi Yan	7301403ce3	Add eval/scoring/datasetio API providers to distribution templates & UI developer guide (#564 ) # What does this PR do? - add /eval, /scoring, /datasetio API providers to distribution templates - regenerate build.yaml / run.yaml files - fix `template.py` to take in list of providers instead of only first one - override memory provider as faiss default for all distro (as only 1 memory provider is needed to start basic flow, chromadb/pgvector need additional setup step). ``` python llama_stack/scripts/distro_codegen.py ``` - updated README to start UI via conda builds. ## Test Plan ``` python llama_stack/scripts/distro_codegen.py ``` - Use newly generated `run.yaml` to start server ``` llama stack run ./llama_stack/templates/together/run.yaml ``` <img width="1191" alt="image" src="https://github.com/user-attachments/assets/62f7d179-0cd0-427c-b6e8-e087d4648f09"> #### Registration ``` ❯ llama-stack-client datasets register \ --dataset-id "mmlu" \ --provider-id "huggingface" \ --url "https://huggingface.co/datasets/llamastack/evals" \ --metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \ --schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string", "chat_completion_input": {"type": "string"}}}' ❯ llama-stack-client datasets list ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ metadata ┃ type ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩ │ mmlu │ huggingface │ {'path': 'llamastack/evals', 'name': │ dataset │ │ │ │ 'evals__mmlu__details', 'split': │ │ │ │ │ 'train'} │ │ └────────────┴─────────────┴─────────────────────────────────────────┴─────────┘ ``` ``` ❯ llama-stack-client datasets register \ --dataset-id "simpleqa" \ --provider-id "huggingface" \ --url "https://huggingface.co/datasets/llamastack/evals" \ --metadata '{"path": "llamastack/evals", "name": "evals__simpleqa", "split": "train"}' \ --schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string", "chat_completion_input": {"type": "string"}}}' ❯ llama-stack-client datasets list ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ metadata ┃ type ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩ │ mmlu │ huggingface │ {'path': 'llamastack/evals', 'name': 'evals__mmlu__details', │ dataset │ │ │ │ 'split': 'train'} │ │ │ simpleqa │ huggingface │ {'path': 'llamastack/evals', 'name': 'evals__simpleqa', │ dataset │ │ │ │ 'split': 'train'} │ │ └────────────┴─────────────┴───────────────────────────────────────────────────────────────┴─────────┘ ``` ``` ❯ llama-stack-client eval_tasks register \ > --eval-task-id meta-reference-mmlu \ > --provider-id meta-reference \ > --dataset-id mmlu \ > --scoring-functions basic::regex_parser_multiple_choice_answer ❯ llama-stack-client eval_tasks register \ --eval-task-id meta-reference-simpleqa \ --provider-id meta-reference \ --dataset-id simpleqa \ --scoring-functions llm-as-judge::405b-simpleqa ❯ llama-stack-client eval_tasks list ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ dataset_id ┃ identifier ┃ metadata ┃ provider_id ┃ provider_resour… ┃ scoring_functio… ┃ type ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ mmlu │ meta-reference-… │ {} │ meta-reference │ meta-reference-… │ ['basic::regex_… │ eval_task │ │ simpleqa │ meta-reference-… │ {} │ meta-reference │ meta-reference-… │ ['llm-as-judge:… │ eval_task │ └────────────┴──────────────────┴──────────┴────────────────┴──────────────────┴──────────────────┴───────────┘ ``` #### Test with UI ``` streamlit run app.py ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-05 16:29:32 -08:00
Xi Yan	16769256b7	[llama stack ui] add native eval & inspect distro & playground pages (#541 ) # What does this PR do? New Pages Added: - (1) Inspect Distro - (2) Evaluations: - (a) native evaluations (including generation) - (b) application evaluations (no generation, scoring only) - (3) Playground: - (a) chat - (b) RAG ## Test Plan ``` streamlit run app.py ``` #### Playground https://github.com/user-attachments/assets/6ca617e8-32ca-49b2-9774-185020ff5204 #### Inspect https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99 #### Evaluations (Generation + Scoring) https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf #### Evaluations (Scoring) https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-04 09:47:09 -08:00
Xi Yan	b1a63df8cd	move playground ui to llama-stack repo (#536 ) # What does this PR do? - Move Llama Stack Playground UI to llama-stack repo under llama_stack/distribution - Original PR in llama-stack-apps: https://github.com/meta-llama/llama-stack-apps/pull/127 ## Test Plan ``` cd llama-stack/llama_stack/distribution/ui streamlit run app.py ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 22:04:21 -08:00

43 commits