llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-29 03:14:19 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	272d3359ee	fix: remove code interpeter implementation (#2087 ) # What does this PR do? The builtin implementation of code interpreter is not robust and has a really weak sandboxing shell (the `bubblewrap` container). Given the availability of better MCP code interpreter servers coming up, we should use them instead of baking an implementation into the Stack and expanding the vulnerability surface to the rest of the Stack. This PR only does the removal. We will add examples with how to integrate with MCPs in subsequent ones. ## Test Plan Existing tests.	2025-05-01 14:35:08 -07:00
Yuan Tang	c4570bcb48	docs: Add tips for debugging remote vLLM provider (#1992 ) # What does this PR do? This is helpful when debugging issues with vLLM + Llama Stack after this PR https://github.com/vllm-project/vllm/pull/15593 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-18 14:47:47 +02:00
Dmitry Rogozhkin	71ed47ea76	docs: add example for intel gpu in vllm remote (#1952 ) # What does this PR do? PR adds instructions to setup vLLM remote endpoint for vllm-remote llama stack distribution. ## Test Plan * Verified with manual tests of the configured vllm-remote against vllm endpoint running on the system with Intel GPU * Also verified with ci pytests (see cmdline below). Test passes in the same capacity as it does on the A10 Nvidia setup (some tests do fail which seems to be known issues with vllm remote llama stack distribution) ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config=http://localhost:5001 \ --text-model=meta-llama/Llama-3.2-3B-Instruct ``` CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-04-15 07:56:23 -07:00
Yuan Tang	1be66d754e	docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923 ) # What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-10 10:04:17 +02:00
AlexHe99	983f6feeb8	docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858 ) Add the content to use AMD GPU as the vLLM server. Split the original part to two sub chapters, 1. AMD vLLM server 2. NVIDIA vLLM server (orignal) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: Alex He <alehe@amd.com>	2025-04-08 21:35:32 -07:00
Dmitry Rogozhkin	935e706b15	docs: fix remote-vllm instructions (#1805 ) # What does this PR do? * Fix location of `run.yaml` relative to the cloned llama stack repository * Drop `-it` from `docker run` commands as its not needed running services ## Test Plan * Verified running the llama stack following updated instruction CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-03-27 10:19:51 -04:00
Hardik Shah	127bac6869	fix: Default to port 8321 everywhere (#1734 ) As titled, moved all instances of 5001 to 8321	2025-03-20 15:50:41 -07:00
Hardik Shah	581e8ae562	fix: docker run with `--pull always` to fetch the latest image (#1733 ) As titled	2025-03-20 15:35:48 -07:00
Yuan Tang	f5a5c5d459	docs: Add instruction on enabling tool calling for remote vLLM (#1719 ) # What does this PR do? This PR adds a link to tool calling instructions in vLLM. Users have asked about this many times, e.g. https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:18:17 -07:00
Ashwin Bharambe	abfbaf3c1b	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 ) All of the tests from `llama_stack/providers/tests/` are now moved to `tests/integration`. I converted the `tools`, `scoring` and `datasetio` tests to use API. However, `eval` and `post_training` proved to be a bit challenging to leaving those. I think `post_training` should be relatively straightforward also. As part of this, I noticed that `wolfram_alpha` tool wasn't added to some of our commonly used distros so I added it. I am going to remove a lot of code duplication from distros next so while this looks like a one-off right now, it will go away and be there uniformly for all distros.	2025-03-04 14:53:47 -08:00
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Ellis Tarn	afca9d92f9	fix: Readthedocs cannot parse comments, resulting in docs bugs (#1033 )	2025-02-10 16:35:16 -05:00
Hardik Shah	a84e7669f0	feat: Add a new template for `dell` (#978 ) - Added new template `dell` and its documentation - Update docs - [minor] uv fix i came across - codegen for all templates Tested with ```bash export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=6601 export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank) export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 # build the stack template llama stack build --template=dell # start the TGI inference server podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0 # start chroma-db for vector-io ( aka RAG ) podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname) # build docker llama stack build --template=dell --image-type=container # run llama stack server ( via docker ) podman run -it \ --network host \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ~/.llama:/root/.llama \ # NOTE: mount the llama-stack / llama-model directories if testing local changes -v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL # test the server cd <PATH_TO_LLAMA_STACK_REPO> LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-02-06 14:14:39 -08:00
Hardik Shah	2cebb24d3a	Update doc templates for running safety on self-hosted templates (#874 )	2025-01-24 11:28:20 -08:00
Hardik Shah	25a70ca4dc	Fixed distro documentation (#852 ) More docs	2025-01-23 08:19:51 -08:00
Ashwin Bharambe	f3d8864c36	Rename builtin::memory -> builtin::rag	2025-01-22 20:22:51 -08:00
Ashwin Bharambe	c9e5578151	[memory refactor][5/n] Migrate all vector_io providers (#835 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This PR finishes off all the stragglers and migrates everything to the new naming.	2025-01-22 10:17:59 -08:00
Dinesh Yeduguru	3d4c53dfec	add mcp runtime as default to all providers (#816 ) # What does this PR do? This is needed to have the notebook work with MCP	2025-01-17 16:40:58 -08:00
Xi Yan	9d005154d7	fix vllm template (#813 ) # What does this PR do? - Fix vLLM template to resolve https://github.com/meta-llama/llama-stack/issues/805 - Fix agents test with shields ## Test Plan ``` vllm serve meta-llama/Llama-3.1-8B-Instruct VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack run ./llama_stack/templates/remote-vllm/run.yaml ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/ ``` <img width="1245" alt="image" src="https://github.com/user-attachments/assets/9af27684-5a9c-4187-b338-cbfc5211bd99" /> - custom tool flaky due to model outputs - /completions API not implemented Vision Model - 11B-Vision-Instruct <img width="1240" alt="image" src="https://github.com/user-attachments/assets/1d3b3b17-fa09-43a7-b56c-3f77263825c5" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 15:34:29 -08:00
raghotham	ff182ff6de	rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars (#744 ) # What does this PR do? Rename environment var for consistency ## Test Plan No regressions ## Sources ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-10 11:09:49 -08:00
Yuan Tang	203d36e2db	Fixed typo in default VLLM_URL in remote-vllm.md (#723 ) Fixed a small typo.	2025-01-09 22:34:34 -08:00
Dinesh Yeduguru	a5c57cd381	agents to use tools api (#673 ) # What does this PR do? PR #639 introduced the notion of Tools API and ability to invoke tools through API just as any resource. This PR changes the Agents to start using the Tools API to invoke tools. Major changes include: 1) Ability to specify tool groups with AgentConfig 2) Agent gets the corresponding tool definitions for the specified tools and pass along to the model 3) Attachements are now named as Documents and their behavior is mostly unchanged from user perspective 4) You can specify args that can be injected to a tool call through Agent config. This is especially useful in case of memory tool, where you want the tool to operate on a specific memory bank. 5) You can also register tool groups with args, which lets the agent inject these as well into the tool call. 6) All tests have been migrated to use new tools API and fixtures including client SDK tests 7) Telemetry just works with tools API because of our trace protocol decorator ## Test Plan ``` pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct pytest -s -v -k together llama_stack/providers/tests/tools/test_tools.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py ``` run.yaml: https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994 Notebook: https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing	2025-01-08 19:01:00 -08:00
Ashwin Bharambe	2cfc41e13b	Mark some pages as not-in-toctree explicitly	2024-11-23 15:27:44 -08:00
Ashwin Bharambe	cd6ccb664c	Integrate distro docs into the restructured docs	2024-11-20 23:20:05 -08:00
Dinesh Yeduguru	b3f9e8b2f2	Restructure docs (#494 ) Rendered docs at: https://llama-stack.readthedocs.io/en/doc-simplify/	2024-11-20 15:54:47 -08:00

25 commits