llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-24 06:08:04 +00:00

Author	SHA1	Message	Date
Dinesh Yeduguru	b46d94d87d	do not pass memory tools to inference	2025-01-08 18:53:32 -08:00
Dinesh Yeduguru	a7a55748ca	address feedback	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	e08b7f4432	move _interpret_content_as_attachment to outside	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	67b35613bb	test turn overrides in unit tests	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	854fef7478	add unit tests for chat agent	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	db2ec110a1	fix failing code interpreter tests	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	82395ba654	fix the rag query generator types	2025-01-08 18:25:52 -08:00
Dinesh Yeduguru	efe3189728	client sdk test fixes	2025-01-08 18:25:51 -08:00
Dinesh Yeduguru	c3865faf37	minor fixes	2025-01-08 18:25:51 -08:00
Dinesh Yeduguru	6632d7e410	fix list tools method name	2025-01-08 18:25:51 -08:00
Dinesh Yeduguru	94cca7a72a	add wolfram alpha, bing search	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	f9a98c278a	simplify toolgroups registration	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	ba242c04cc	remove memory from available tools to agent	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	e3775eb6f6	rename UserDefinedToolDef to ToolDef	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	db0b2a60c1	remove breakpoints	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	17abffb505	fix handle_docs	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	9efe30c9d3	add documents to turn	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	d0e8e1647b	add matplotlib_custom_backend.py	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	229999c572	add init.py	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	0bc876c130	minor fixes to agent instance	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	16d1f66f55	address feedback	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	ac46bd5eb4	address feedback	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	70b2a58bef	linter fixes	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	9a3d7fa33c	rebase fixes	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	f408fd3aca	remove attachements, move memory bank to tool metadata	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	97798c8442	add a RAG test to client SDK	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	c76f5f418f	move brave and tavily to remote	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	4dd2f4c363	working end to end client sdk tests with custom tools	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	1a66ddc1b5	add support for built in tool type	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	40f35f3a8d	add code interpreter	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	2ad67529ef	fix agents to run custom tools	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	9192a9bbb4	add tavily	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	dcdf9da6ef	remove all usages of builtin tools in agents	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	f90e9c2003	agents to use tools api	2025-01-08 18:24:53 -08:00
Xi Yan	7a90fc5854	move DataSchemaValidatorMixin into standalone utils (#720 ) # What does this PR do? - there's no value in keeping data schema validation logic in a DataSchemaValidatorMixin - move into data schema validation logic into standalone utils ## Test Plan ``` pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-06 13:25:09 -08:00
Botao Chen	e86271aeac	support llama3.1 8B instruct in post training (#698 ) ## What does this PR do? - Change to support llama3.1 8B instruct model other than llama3 8B model as llama3.1 8B instruct model is a better model to finetune on top of - Make the copy files logic in checkpointer safer in case the file be copied doesn't exist in source path ## test issue a post training request from client and verify training works as expect <img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM" src="https://github.com/user-attachments/assets/47cc4df9-3edc-4afd-b5dd-abe1f039f1ed" /> <img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM" src="https://github.com/user-attachments/assets/b9435274-ef1d-4570-bd8e-0880c3a4b2e9" />	2025-01-03 17:33:05 -08:00
Ashwin Bharambe	21357a6dee	Kill autocomplete slop	2025-01-03 09:29:25 -08:00
Botao Chen	4320b0ebb2	[Post training] make validation steps configurable (#715 ) ## what does this PR do? The current code hardcode the validation steps to run (forgot to change it after testing). in this PR, we make it configurable by training config ## test On client side, issue a post training request with 20 validation steps, server side logging shows that it runs 20 validation steps successfully <img width="1128" alt="Screenshot 2025-01-02 at 8 21 06 PM" src="https://github.com/user-attachments/assets/7a757516-c6ba-41d4-85c5-361a80ecf46e" />	2025-01-03 08:43:24 -08:00
Botao Chen	d9f75cc98f	Import from the right path (#708 ) Import BaseModel and Field from pydantic	2025-01-02 13:15:31 -08:00
Botao Chen	750604c7af	[Post Training] Fix missing import (#705 ) ## context Post training apis are broken after the import * refactor https://github.com/meta-llama/llama-stack/pull/689. This PR is adding the missing import back ## Test Issue a post training request from client and the training finishes successfully <img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM" src="https://github.com/user-attachments/assets/8c781459-f340-4021-85e1-fc68b1dcb8c8" /> <img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM" src="https://github.com/user-attachments/assets/14b04b7d-e5c7-4662-8fa6-748446ad3511" />	2025-01-02 13:08:20 -08:00
Xi Yan	3a269c4635	[rag evals] refactor & add ability to eval retrieval + generation in agentic eval pipeline (#664 ) # What does this PR do? - See https://github.com/meta-llama/llama-stack/pull/666 & https://github.com/meta-llama/llama-stack/pull/668 - Refactor BaseScoringFn to be just a minimal interface, add new RegistrableBaseScoring - Refactor data schema check - To separately evaluate retrieval component in RAG, we will have scoring functions needing "context" column additionally. - Refactor braintrust eval (more scoring fn added & tested in following PR) ## Test Plan ``` pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py ``` <img width="847" alt="image" src="https://github.com/user-attachments/assets/d099cb2d-6f9c-4bdf-9d0d-f388cf758c0f" /> ``` pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` <img width="850" alt="image" src="https://github.com/user-attachments/assets/dce28fc3-0493-4d34-820a-567260873cc8" /> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-02 11:21:33 -08:00
Aidan Do	49ad168336	[#407 ] Agents: Avoid calling tools that haven't been explicitly enabled (#637 ) # What does this PR do? Contributes to issue (#407) tl;dr - @subramen was getting a 500 error because llama-stack called code_interpreter when it never was defined as a tool. Prevents failures like: <img width="544" alt="image" src="https://github.com/user-attachments/assets/392683d2-4670-414c-aaba-07ebc006d748" /> ``` # Server side Traceback (most recent call last): File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator async for item in await event_gen: File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 560, in _run result_messages = await execute_tool_call_maybe( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 824, in execute_tool_call_maybe assert name in tools_dict, f"Tool {name} not found" AssertionError: Tool code_interpreter not found ``` Instead, if the model hallucinates, we just let it hallucinate and let the client know. <img width="544" alt="image" src="https://github.com/user-attachments/assets/d2418583-d45a-48db-b476-45a584f2986f" /> ## Test Plan <details> <summary>pytest llama_stack/providers/tests/agents/test_agents.py -k ollama</summary> ``` llama stack build --template ollama --image-type conda conda activate llamastack-ollama ``` ``` llama_stack/providers/tests/agents/test_agents.py ..Fss [100%] ======================================================================= FAILURES ======================================================================= _________________________________________ TestAgents.test_rag_agent_as_attachments[--ollama][ollama] __________________________________________ llama_stack/providers/tests/agents/test_agents.py:261: in test_rag_agent_as_attachments turn_response = [ llama_stack/providers/tests/agents/test_agents.py:261: in <listcomp> turn_response = [ llama_stack/providers/inline/agents/meta_reference/agents.py:153: in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): llama_stack/providers/inline/agents/meta_reference/agent_instance.py:179: in create_and_execute_turn async for chunk in self.run( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:250: in run async for res in self._run( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:363: in _run rag_context, bank_ids = await self._retrieve_context( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:698: in _retrieve_context bank_id = await self._ensure_memory_bank(session_id) llama_stack/providers/inline/agents/meta_reference/agent_instance.py:653: in _ensure_memory_bank await self.memory_banks_api.register_memory_bank( llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper result = await method(self, args, *kwargs) llama_stack/distribution/routers/routing_tables.py:312: in register_memory_bank raise ValueError( E ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additional inference provider. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/together/run.yaml for an example. =============================================================== short test summary info ================================================================ FAILED llama_stack/providers/tests/agents/test_agents.py::TestAgents::test_rag_agent_as_attachments[--ollama] - ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additiona... ========================================== 1 failed, 2 passed, 2 skipped, 20 deselected, 5 warnings in 14.24s ========================================== ``` Unrelated test is failing (also failing on main) </details> <details> <summary>Manual</summary> Using this client code: `7ebc257b27/client.py` <img width="544" alt="Screenshot 2024-12-16 at 17 41 31" src="https://github.com/user-attachments/assets/7425deaf-c94a-4dda-a635-922728e373f1" /> </details> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-02 09:21:35 -08:00
Xi Yan	3c72c034e6	[remove import ] clean up import 's (#689 ) # What does this PR do? - as title, cleaning up `import `'s - upgrade tests to make them more robust to bad model outputs - remove import 's in llama_stack/apis/* (skip __init__ modules) <img width="465" alt="image" src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2" /> - run `sh run_openapi_generator.sh`, no types gets affected ## Test Plan ### Providers Tests agents ``` pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8 ``` inference ```bash # meta-reference torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py # together pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py ``` safety ``` pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B ``` memory ``` pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384 ``` scoring ``` pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py ``` datasetio ``` pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py ``` eval ``` pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py ``` ### Client-SDK Tests ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk ``` ### llama-stack-apps ``` PORT=5000 LOCALHOST=localhost python -m examples.agents.hello $LOCALHOST $PORT python -m examples.agents.inflation $LOCALHOST $PORT python -m examples.agents.podcast_transcript $LOCALHOST $PORT python -m examples.agents.rag_as_attachments $LOCALHOST $PORT python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT # Vision model python -m examples.interior_design_assistant.app python -m examples.agent_store.app $LOCALHOST $PORT ``` ### CLI ``` which llama llama model prompt-format -m Llama3.2-11B-Vision-Instruct llama model list llama stack list-apis llama stack list-providers inference llama stack build --template ollama --image-type conda ``` ### Distributions Tests ollama ``` llama stack build --template ollama --image-type conda ollama run llama3.2:1b-instruct-fp16 llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct ``` fireworks ``` llama stack build --template fireworks --image-type conda llama stack run ./llama_stack/templates/fireworks/run.yaml ``` together ``` llama stack build --template together --image-type conda llama stack run ./llama_stack/templates/together/run.yaml ``` tgi ``` llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-27 15:45:44 -08:00
Botao Chen	bae197c37e	Fix post training apis broken by torchtune release (#674 ) There is a torchtune release this morning https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post training apis ## test spinning up server and the post training works again after the fix <img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM" src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee" /> ## Note We need to think hard of how to avoid this happen again and have a fast follow up on this after holidays	2024-12-20 16:12:02 -08:00
Botao Chen	06cb0c837e	[torchtune integration] post training + eval (#670 ) ## What does this PR do? - Add related Apis in experimental-post-training template to enable eval on the finetuned checkpoint in the template - A small bug fix on meta reference eval - A small error handle improvement on post training ## Test Plan From client side issued an E2E post training request https://github.com/meta-llama/llama-stack-client-python/pull/70 and get eval results successfully <img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM" src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a" />	2024-12-20 13:43:13 -08:00
Dinesh Yeduguru	c8be0bf1c9	Tools API with brave and MCP providers (#639 ) This PR adds a new Tools api and adds two tool runtime providers: brave and MCP. Test plan: ``` curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \ -H 'Content-Type: application/json' \ -d '{ "tool_group_id": "simple_tool", "tool_group": { "type": "model_context_protocol", "endpoint": {"uri": "http://localhost:56000/sse"} }, "provider_id": "model-context-protocol" }' curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \ -H 'Content-Type: application/json' \ -d '{ "tool_group_id": "search", "provider_id": "brave-search", "tool_group": { "type": "user_defined", "tools": [ { "name": "brave_search", "description": "A web search tool", "parameters": [ { "name": "query", "parameter_type": "string", "description": "The query to search" } ], "metadata": {}, "tool_prompt_format": "json" } ] } }' curl -X GET http://localhost:5000/alpha/tools/list \| jq . % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 662 100 662 0 0 333k 0 --:--:-- --:--:-- --:--:-- 646k [ { "identifier": "brave_search", "provider_resource_id": "brave_search", "provider_id": "brave-search", "type": "tool", "tool_group": "search", "description": "A web search tool", "parameters": [ { "name": "query", "parameter_type": "string", "description": "The query to search" } ], "metadata": {}, "tool_prompt_format": "json" }, { "identifier": "fetch", "provider_resource_id": "fetch", "provider_id": "model-context-protocol", "type": "tool", "tool_group": "simple_tool", "description": "Fetches a website and returns its content", "parameters": [ { "name": "url", "parameter_type": "string", "description": "URL to fetch" } ], "metadata": { "endpoint": "http://localhost:56000/sse" }, "tool_prompt_format": "json" } ] curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \ -H 'Content-Type: application/json' \ -d '{ "tool_name": "fetch", "args": { "url": "http://google.com/" } }' curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \ -H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \ -d '{ "tool_name": "brave_search", "args": { "query": "who is meta ceo" } }' ```	2024-12-19 21:25:17 -08:00
Ashwin Bharambe	540fc4d717	Fix Meta reference GPU implementation (#663 ) By performing in-place mutations, we lost. Never in life do that.	2024-12-19 14:09:45 -08:00
Ashwin Bharambe	f19eb8eee3	Update types in parallel_utils for meta-refernece-gpu impl	2024-12-19 13:58:41 -08:00
Xi Yan	5be2ea37b1	fix context_retriever model->model_id	2024-12-19 12:52:00 -08:00
Dinesh Yeduguru	03607a68c7	remove unused telemetry related code for console (#659 ) # What does this PR do? Remove unused code since this now exists in the meta reference provider as a sink ## Test Plan llama stack run ~/.llama/distributions/llamastack-together/together-run.yaml	2024-12-19 11:21:11 -08:00

1 2 3

132 commits