llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-22 22:22:32 +00:00

Author	SHA1	Message	Date
Dinesh Yeduguru	c3865faf37	minor fixes	2025-01-08 18:25:51 -08:00
Dinesh Yeduguru	6632d7e410	fix list tools method name	2025-01-08 18:25:51 -08:00
Dinesh Yeduguru	94cca7a72a	add wolfram alpha, bing search	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	f9a98c278a	simplify toolgroups registration	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	ba242c04cc	remove memory from available tools to agent	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	e3775eb6f6	rename UserDefinedToolDef to ToolDef	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	db0b2a60c1	remove breakpoints	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	17abffb505	fix handle_docs	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	9efe30c9d3	add documents to turn	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	d0e8e1647b	add matplotlib_custom_backend.py	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	229999c572	add init.py	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	0bc876c130	minor fixes to agent instance	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	16d1f66f55	address feedback	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	ac46bd5eb4	address feedback	2025-01-08 18:25:21 -08:00
Dinesh Yeduguru	70b2a58bef	linter fixes	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	9a3d7fa33c	rebase fixes	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	f408fd3aca	remove attachements, move memory bank to tool metadata	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	97798c8442	add a RAG test to client SDK	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	c76f5f418f	move brave and tavily to remote	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	4dd2f4c363	working end to end client sdk tests with custom tools	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	1a66ddc1b5	add support for built in tool type	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	40f35f3a8d	add code interpreter	2025-01-08 18:25:20 -08:00
Dinesh Yeduguru	2ad67529ef	fix agents to run custom tools	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	9192a9bbb4	add tavily	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	dcdf9da6ef	remove all usages of builtin tools in agents	2025-01-08 18:24:53 -08:00
Dinesh Yeduguru	f90e9c2003	agents to use tools api	2025-01-08 18:24:53 -08:00
Xi Yan	7a90fc5854	move DataSchemaValidatorMixin into standalone utils (#720 ) # What does this PR do? - there's no value in keeping data schema validation logic in a DataSchemaValidatorMixin - move into data schema validation logic into standalone utils ## Test Plan ``` pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-06 13:25:09 -08:00
Botao Chen	e86271aeac	support llama3.1 8B instruct in post training (#698 ) ## What does this PR do? - Change to support llama3.1 8B instruct model other than llama3 8B model as llama3.1 8B instruct model is a better model to finetune on top of - Make the copy files logic in checkpointer safer in case the file be copied doesn't exist in source path ## test issue a post training request from client and verify training works as expect <img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM" src="https://github.com/user-attachments/assets/47cc4df9-3edc-4afd-b5dd-abe1f039f1ed" /> <img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM" src="https://github.com/user-attachments/assets/b9435274-ef1d-4570-bd8e-0880c3a4b2e9" />	2025-01-03 17:33:05 -08:00
Ashwin Bharambe	21357a6dee	Kill autocomplete slop	2025-01-03 09:29:25 -08:00
Botao Chen	4320b0ebb2	[Post training] make validation steps configurable (#715 ) ## what does this PR do? The current code hardcode the validation steps to run (forgot to change it after testing). in this PR, we make it configurable by training config ## test On client side, issue a post training request with 20 validation steps, server side logging shows that it runs 20 validation steps successfully <img width="1128" alt="Screenshot 2025-01-02 at 8 21 06 PM" src="https://github.com/user-attachments/assets/7a757516-c6ba-41d4-85c5-361a80ecf46e" />	2025-01-03 08:43:24 -08:00
Botao Chen	d9f75cc98f	Import from the right path (#708 ) Import BaseModel and Field from pydantic	2025-01-02 13:15:31 -08:00
Botao Chen	750604c7af	[Post Training] Fix missing import (#705 ) ## context Post training apis are broken after the import * refactor https://github.com/meta-llama/llama-stack/pull/689. This PR is adding the missing import back ## Test Issue a post training request from client and the training finishes successfully <img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM" src="https://github.com/user-attachments/assets/8c781459-f340-4021-85e1-fc68b1dcb8c8" /> <img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM" src="https://github.com/user-attachments/assets/14b04b7d-e5c7-4662-8fa6-748446ad3511" />	2025-01-02 13:08:20 -08:00
Xi Yan	3a269c4635	[rag evals] refactor & add ability to eval retrieval + generation in agentic eval pipeline (#664 ) # What does this PR do? - See https://github.com/meta-llama/llama-stack/pull/666 & https://github.com/meta-llama/llama-stack/pull/668 - Refactor BaseScoringFn to be just a minimal interface, add new RegistrableBaseScoring - Refactor data schema check - To separately evaluate retrieval component in RAG, we will have scoring functions needing "context" column additionally. - Refactor braintrust eval (more scoring fn added & tested in following PR) ## Test Plan ``` pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py ``` <img width="847" alt="image" src="https://github.com/user-attachments/assets/d099cb2d-6f9c-4bdf-9d0d-f388cf758c0f" /> ``` pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` <img width="850" alt="image" src="https://github.com/user-attachments/assets/dce28fc3-0493-4d34-820a-567260873cc8" /> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-02 11:21:33 -08:00
Aidan Do	49ad168336	[#407 ] Agents: Avoid calling tools that haven't been explicitly enabled (#637 ) # What does this PR do? Contributes to issue (#407) tl;dr - @subramen was getting a 500 error because llama-stack called code_interpreter when it never was defined as a tool. Prevents failures like: <img width="544" alt="image" src="https://github.com/user-attachments/assets/392683d2-4670-414c-aaba-07ebc006d748" /> ``` # Server side Traceback (most recent call last): File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator async for item in await event_gen: File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 560, in _run result_messages = await execute_tool_call_maybe( File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 824, in execute_tool_call_maybe assert name in tools_dict, f"Tool {name} not found" AssertionError: Tool code_interpreter not found ``` Instead, if the model hallucinates, we just let it hallucinate and let the client know. <img width="544" alt="image" src="https://github.com/user-attachments/assets/d2418583-d45a-48db-b476-45a584f2986f" /> ## Test Plan <details> <summary>pytest llama_stack/providers/tests/agents/test_agents.py -k ollama</summary> ``` llama stack build --template ollama --image-type conda conda activate llamastack-ollama ``` ``` llama_stack/providers/tests/agents/test_agents.py ..Fss [100%] ======================================================================= FAILURES ======================================================================= _________________________________________ TestAgents.test_rag_agent_as_attachments[--ollama][ollama] __________________________________________ llama_stack/providers/tests/agents/test_agents.py:261: in test_rag_agent_as_attachments turn_response = [ llama_stack/providers/tests/agents/test_agents.py:261: in <listcomp> turn_response = [ llama_stack/providers/inline/agents/meta_reference/agents.py:153: in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): llama_stack/providers/inline/agents/meta_reference/agent_instance.py:179: in create_and_execute_turn async for chunk in self.run( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:250: in run async for res in self._run( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:363: in _run rag_context, bank_ids = await self._retrieve_context( llama_stack/providers/inline/agents/meta_reference/agent_instance.py:698: in _retrieve_context bank_id = await self._ensure_memory_bank(session_id) llama_stack/providers/inline/agents/meta_reference/agent_instance.py:653: in _ensure_memory_bank await self.memory_banks_api.register_memory_bank( llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper result = await method(self, args, *kwargs) llama_stack/distribution/routers/routing_tables.py:312: in register_memory_bank raise ValueError( E ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additional inference provider. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/together/run.yaml for an example. =============================================================== short test summary info ================================================================ FAILED llama_stack/providers/tests/agents/test_agents.py::TestAgents::test_rag_agent_as_attachments[--ollama] - ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additiona... ========================================== 1 failed, 2 passed, 2 skipped, 20 deselected, 5 warnings in 14.24s ========================================== ``` Unrelated test is failing (also failing on main) </details> <details> <summary>Manual</summary> Using this client code: `7ebc257b27/client.py` <img width="544" alt="Screenshot 2024-12-16 at 17 41 31" src="https://github.com/user-attachments/assets/7425deaf-c94a-4dda-a635-922728e373f1" /> </details> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-02 09:21:35 -08:00
Xi Yan	3c72c034e6	[remove import ] clean up import 's (#689 ) # What does this PR do? - as title, cleaning up `import `'s - upgrade tests to make them more robust to bad model outputs - remove import 's in llama_stack/apis/* (skip __init__ modules) <img width="465" alt="image" src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2" /> - run `sh run_openapi_generator.sh`, no types gets affected ## Test Plan ### Providers Tests agents ``` pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8 ``` inference ```bash # meta-reference torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py # together pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py ``` safety ``` pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B ``` memory ``` pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384 ``` scoring ``` pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py ``` datasetio ``` pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py ``` eval ``` pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py ``` ### Client-SDK Tests ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk ``` ### llama-stack-apps ``` PORT=5000 LOCALHOST=localhost python -m examples.agents.hello $LOCALHOST $PORT python -m examples.agents.inflation $LOCALHOST $PORT python -m examples.agents.podcast_transcript $LOCALHOST $PORT python -m examples.agents.rag_as_attachments $LOCALHOST $PORT python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT # Vision model python -m examples.interior_design_assistant.app python -m examples.agent_store.app $LOCALHOST $PORT ``` ### CLI ``` which llama llama model prompt-format -m Llama3.2-11B-Vision-Instruct llama model list llama stack list-apis llama stack list-providers inference llama stack build --template ollama --image-type conda ``` ### Distributions Tests ollama ``` llama stack build --template ollama --image-type conda ollama run llama3.2:1b-instruct-fp16 llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct ``` fireworks ``` llama stack build --template fireworks --image-type conda llama stack run ./llama_stack/templates/fireworks/run.yaml ``` together ``` llama stack build --template together --image-type conda llama stack run ./llama_stack/templates/together/run.yaml ``` tgi ``` llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-27 15:45:44 -08:00
Botao Chen	bae197c37e	Fix post training apis broken by torchtune release (#674 ) There is a torchtune release this morning https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post training apis ## test spinning up server and the post training works again after the fix <img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM" src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee" /> ## Note We need to think hard of how to avoid this happen again and have a fast follow up on this after holidays	2024-12-20 16:12:02 -08:00
Botao Chen	06cb0c837e	[torchtune integration] post training + eval (#670 ) ## What does this PR do? - Add related Apis in experimental-post-training template to enable eval on the finetuned checkpoint in the template - A small bug fix on meta reference eval - A small error handle improvement on post training ## Test Plan From client side issued an E2E post training request https://github.com/meta-llama/llama-stack-client-python/pull/70 and get eval results successfully <img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM" src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a" />	2024-12-20 13:43:13 -08:00
Dinesh Yeduguru	c8be0bf1c9	Tools API with brave and MCP providers (#639 ) This PR adds a new Tools api and adds two tool runtime providers: brave and MCP. Test plan: ``` curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \ -H 'Content-Type: application/json' \ -d '{ "tool_group_id": "simple_tool", "tool_group": { "type": "model_context_protocol", "endpoint": {"uri": "http://localhost:56000/sse"} }, "provider_id": "model-context-protocol" }' curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \ -H 'Content-Type: application/json' \ -d '{ "tool_group_id": "search", "provider_id": "brave-search", "tool_group": { "type": "user_defined", "tools": [ { "name": "brave_search", "description": "A web search tool", "parameters": [ { "name": "query", "parameter_type": "string", "description": "The query to search" } ], "metadata": {}, "tool_prompt_format": "json" } ] } }' curl -X GET http://localhost:5000/alpha/tools/list \| jq . % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 662 100 662 0 0 333k 0 --:--:-- --:--:-- --:--:-- 646k [ { "identifier": "brave_search", "provider_resource_id": "brave_search", "provider_id": "brave-search", "type": "tool", "tool_group": "search", "description": "A web search tool", "parameters": [ { "name": "query", "parameter_type": "string", "description": "The query to search" } ], "metadata": {}, "tool_prompt_format": "json" }, { "identifier": "fetch", "provider_resource_id": "fetch", "provider_id": "model-context-protocol", "type": "tool", "tool_group": "simple_tool", "description": "Fetches a website and returns its content", "parameters": [ { "name": "url", "parameter_type": "string", "description": "URL to fetch" } ], "metadata": { "endpoint": "http://localhost:56000/sse" }, "tool_prompt_format": "json" } ] curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \ -H 'Content-Type: application/json' \ -d '{ "tool_name": "fetch", "args": { "url": "http://google.com/" } }' curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \ -H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \ -d '{ "tool_name": "brave_search", "args": { "query": "who is meta ceo" } }' ```	2024-12-19 21:25:17 -08:00
Ashwin Bharambe	540fc4d717	Fix Meta reference GPU implementation (#663 ) By performing in-place mutations, we lost. Never in life do that.	2024-12-19 14:09:45 -08:00
Ashwin Bharambe	f19eb8eee3	Update types in parallel_utils for meta-refernece-gpu impl	2024-12-19 13:58:41 -08:00
Xi Yan	5be2ea37b1	fix context_retriever model->model_id	2024-12-19 12:52:00 -08:00
Dinesh Yeduguru	03607a68c7	remove unused telemetry related code for console (#659 ) # What does this PR do? Remove unused code since this now exists in the meta reference provider as a sink ## Test Plan llama stack run ~/.llama/distributions/llamastack-together/together-run.yaml	2024-12-19 11:21:11 -08:00
Botao Chen	36b4fe02cc	[4/n][torchtune integration] support lazy load model during inference (#620 ) ## What does this PR do? In this PR, we refactor the meta reference inference logic to support - load the model during registering model instead of during spinning up server - support inference finetuned model checkpoint on top of native llama model ## Why need these changes To solve the existing pain points that - user cannot lazy load the model and hot switch the inference checkpoint after spinning up the server - this blocks us doing inference and eval on the same sever for a finetuned checkpoint after post training - user cannot do inference on a finetuned checkpoint on top of native llama models ## Expect user experience change - The inference model won't be loaded when spinning up server. Instead, it will be loaded during register model. If user add the model as models resource in run.yaml, it will be registered and loaded automatically when starting server. There is an optional flag 'skip_initialize' in model metadata to skip model loading during registration. - There is an optional flag 'llama_model' in model metadata to identify the base model of the Model class for validation and initialize model arch. model identifier no longer needs to be a native llama model - the default inference model name updates from 'meta-llama/Llama-3.2-3B-Instruct' to 'Llama3.2-3B-Instruct' - It aligns with the checkpoint folder name after running 'llama model download' - It aligns with the descriptor name defined in llama-models SKU list `bf5b0c4fe7/models/datatypes.py (L95)` ## test run python llama_stack/scripts/distro_codegen.py run unit test - torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="Llama3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py - torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="Llama3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_model_registration.py test post training experience on server side run: llama stack run llama_stack/templates/experimental-post-training/run.yaml server is spinning up without model loaded <img width="812" alt="Screenshot 2024-12-17 at 1 24 50 PM" src="https://github.com/user-attachments/assets/ce1f606b-3b6f-452f-b48e-b3761ffd90f3" /> on client side, run: llama-stack-client --endpoint http://devgpu018.nha2.facebook.com:5000 models register Llama3.2-3B-Instruct register model successfully and the model is loaded <img width="1111" alt="Screenshot 2024-12-17 at 1 26 30 PM" src="https://github.com/user-attachments/assets/56e02131-cf7d-4de5-8f63-fbdcb8c55c26" /> <img width="1541" alt="Screenshot 2024-12-17 at 1 26 09 PM" src="https://github.com/user-attachments/assets/a83255a1-20f5-40a2-af51-55641410a115" /> if add "skip_initialize" in metadata, model is registered but isn't loaded on client side, run: llama-stack-client --endpoint http://devgpu018.nha2.facebook.com:5000 inference chat-completion --message "hello, what model are you?" Inference the model succesfully <img width="1121" alt="Screenshot 2024-12-17 at 1 27 33 PM" src="https://github.com/user-attachments/assets/8e708545-3fe7-4a73-8754-1470fa5f1e75" /> test inference experience run: llama stack run llama_stack/templates/meta-reference-gpu/run.yaml model is loaded since the model is in resouce list in run.yaml <img width="1537" alt="Screenshot 2024-12-17 at 1 30 19 PM" src="https://github.com/user-attachments/assets/5c8af817-66eb-43f8-bf4c-f5e24b0a12c6" /> on client side, run: llama-stack-client --endpoint http://devgpu018.nha2.facebook.com:5000 inference chat-completion --message "hello, what model are you?" inference successfully <img width="1123" alt="Screenshot 2024-12-17 at 1 31 08 PM" src="https://github.com/user-attachments/assets/471809aa-c65e-46dc-a37e-7094fb857f97" /> ## inference on a finetuned model register a finetuned model that finetuned by post training api (torchtune) - the model is registered and loaded successfully - the model is shown up in the model list <img width="974" alt="Screenshot 2024-12-18 at 3 56 33 PM" src="https://github.com/user-attachments/assets/2994b4f5-4fa9-40c6-acc6-4b971479f3e2" /> run inference <img width="977" alt="Screenshot 2024-12-18 at 3 57 59 PM" src="https://github.com/user-attachments/assets/d117abbc-b2a0-41d8-a028-1a13128787b2" />	2024-12-18 16:30:53 -08:00
Ashwin Bharambe	0fb4b7de6f	Add more debugging logs to when llama guard fails	2024-12-17 18:52:02 -08:00
Ashwin Bharambe	b7a7caa9a8	Fix conversion to RawMessage everywhere	2024-12-17 14:00:43 -08:00
Ashwin Bharambe	8de8eb03c8	Update the "InterleavedTextMedia" type (#635 ) ## What does this PR do? This is a long-pending change and particularly important to get done now. Specifically: - we cannot "localize" (aka download) any URLs from media attachments anywhere near our modeling code. it must be done within llama-stack. - `PIL.Image` is infesting all our APIs via `ImageMedia -> InterleavedTextMedia` and that cannot be right at all. Anything in the API surface must be "naturally serializable". We need a standard `{ type: "image", image_url: "<...>" }` which is more extensible - `UserMessage`, `SystemMessage`, etc. are moved completely to llama-stack from the llama-models repository. See https://github.com/meta-llama/llama-models/pull/244 for the corresponding PR in llama-models. ## Test Plan ```bash cd llama_stack/providers/tests pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py pytest -s -v -k chroma memory/test_memory.py \ --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar pytest -s -v -k fireworks agents/test_agents.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct ``` Updated the client sdk (see PR ...), installed the SDK in the same environment and then ran the SDK tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py # this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py ```	2024-12-17 11:18:31 -08:00
Xi Yan	99f331f5c8	[bugfix] no shield_call when there's no shields configured (#642 ) # What does this PR do? Why - When AgentConfig has no `input_shields` / `output_shields` defined, we still outputs a shield_call step with violation=None. This is impossible to distinguish the case b/w (1) no violation from running shields v.s. (2) no shields call What - We should not have a shield_call step when no `input_shields` / `output_shields` are defined. - Also removes a never reached try/catch code block in agent loop. `run_multiple_shields` is never called in the try block (verified by stacktrace print) Side Note - pre-commit fix ## Test Plan Tested w/ DirectClient via: https://gist.github.com/yanxi0830/b48f2a53b6f5391b9ff1e39992bc05b3 No Shields <img width="858" alt="image" src="https://github.com/user-attachments/assets/67319370-329f-4954-bd16-d21ce54c6ebf" /> With Input + Output Shields <img width="854" alt="image" src="https://github.com/user-attachments/assets/75ab1bee-3ba9-4549-ab51-23210be83da7" /> Input Shields Only <img width="858" alt="image" src="https://github.com/user-attachments/assets/1897206b-13dd-4ea5-92c2-b39bf68e9286" /> E2E pytest ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk/agents/test_agents.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-17 11:10:19 -08:00
Ashwin Bharambe	2e5bfcd42a	Update Telemetry API so OpenAPI generation can work (#640 ) We cannot use recursive types because not only does our OpenAPI generator not like them, even if it did, it is not easy for all client languages to automatically construct proper APIs (especially considering garbage collection) around them. For now, we can return a `Dict[str, SpanWithStatus]` instead of `SpanWithChildren` and rely on the client to reconstruct the tree. Also fixed a super subtle issue with the OpenAPI generation process (monkey-patching of json_schema_type wasn't working because of import reordering.)	2024-12-16 13:00:14 -08:00
Botao Chen	20383bfea5	[3/n][torchtune integration] add validation logic (#600 ) ## What does this PR do? - add validation logic in SFT recipe (validation loss and perplexity) - add progress bar in both training and validation to better track the progress on server side (eval has the similar logic) ## Test Plan validation logic shows up in the Checkpoint training_metric part <img width="799" alt="Screenshot 2024-12-12 at 3 21 52 PM" src="https://github.com/user-attachments/assets/36330ffe-0555-4b2d-93f0-9487dfdf7b4e" /> progress bar shows up as <img width="476" alt="Screenshot 2024-12-12 at 3 38 11 PM" src="https://github.com/user-attachments/assets/77306fa2-cb9c-460f-8efc-b41bbe424a7d" /> expected	2024-12-13 16:35:06 -08:00
Botao Chen	c294a01c4b	[2/n][torchtune integration] implement job management and return training artifacts (#593 ) ### Context In this PR, we - Implement the post training job management and get training artifacts apis - get_training_jobs - get_training_job_status - get_training_job_artifacts - get_training_job_logstream is deleted since the trace can be directly accessed by UI with Jaeger https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#jaeger-to-visualize-traces - Refactor the post training and training types definition to make them more intuitive. - Rewrite the checkpointer to make it compatible with llama-stack file system and can be recognized during inference ### Test Unit test `pytest llama_stack/providers/tests/post_training/test_post_training.py -m "torchtune_post_training_huggingface_datasetio" -v -s --tb=short --disable-warnings` <img width="1506" alt="Screenshot 2024-12-10 at 4 06 17 PM" src="https://github.com/user-attachments/assets/16225029-bdb7-48c4-9d13-e580cc769c0a"> e2e test with client side call <img width="888" alt="Screenshot 2024-12-10 at 4 09 44 PM" src="https://github.com/user-attachments/assets/de375e4c-ef67-4dcc-a045-4037d9489191">	2024-12-13 15:00:04 -08:00

1 2 3

124 commits