llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-12 05:54:38 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	1a7490470a	[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Third part: - we need to make `tool_runtime.rag_tool.query_context()` and `tool_runtime.rag_tool.insert_documents()` methods work smoothly with complete type safety. To that end, we introduce a sub-resource path `tool-runtime/rag-tool/` and make changes to the resolver to make things work. - the PR updates the agents implementation to directly call these typed APIs for memory accesses rather than going through the complex, untyped "invoke_tool" API. the code looks much nicer and simpler (expectedly.) - there are a number of hacks in the server resolver implementation still, we will live with some and fix some Note that we must make sure the client SDKs are able to handle this subresource complexity also. Stainless has support for subresources, so this should be possible but beware. ## Test Plan Our RAG test is sad (doesn't actually test for actual RAG output) but I verified that the implementation works. I will work on fixing the RAG test afterwards. ```bash pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B ```	2025-01-22 10:04:16 -08:00
Ashwin Bharambe	78a481bb22	[memory refactor][2/n] Update faiss and make it pass tests (#830 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Second part: - updates routing table / router code - updates the faiss implementation ## Test Plan ``` pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384 ```	2025-01-22 10:02:15 -08:00
Ashwin Bharambe	3ae8585b65	[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This is the first part: - delete other kinds of memory banks (keyvalue, keyword, graph) for now; we will introduce a keyvalue store API as part of this design but not use it in the RAG tool yet. - renaming of the APIs	2025-01-22 09:59:30 -08:00
Sixian Yi	35a00d004a	bug fix for distro report generation (#836 ) # What does this PR do? Minor bug fix and simplify code - [ ] Addresses issue (#issue) ## Test Plan See the updated `llama_stack/templates/fireworks/report.md` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:44:06 -08:00
Sixian Yi	edf56884a7	add pytest option to generate a functional report for distribution (#833 ) # What does this PR do? add pytest option (`--report`) to support generating a functional report for llama stack distribution ## Test Plan ``` export LLAMA_STACK_CONFIG=./llama_stack/templates/fireworks/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/ --report ``` See a report file was generated under `./llama_stack/templates/fireworks/report.md` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:18:23 -08:00
Sixian Yi	e41873f268	[ez] structured output for /completion ollama & enable tests (#822 ) # What does this PR do? 1) enabled structured output for ollama /completion API. It seems we missed this one. 2) fixed ollama structured output test in client sdk - ollama does not support list format for structured output 3) enable structured output unit test as the result was stable on Llama-3.1-8B-Instruct and ollama, fireworks, together. ## Test Plan 1) Run `test_completion_structured_output` on /completion API with 3 providers: ollama, fireworks, together. pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output ``` (base) sxyi@sxyi-mbp llama-stack % pytest -s -v llama_stack/providers/tests/inference --config=ci_test_config.yaml /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ================================================================================================ test session starts ================================================================================================= platform darwin -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0 -- /Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13 cachedir: .pytest_cache metadata: {'Python': '3.13.0', 'Platform': 'macOS-15.1.1-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'md': '0.2.0', 'dependency': '0.6.0', 'md-report': '0.6.3', 'anyio': '4.6.2.post1'}} rootdir: /Users/sxyi/llama-stack configfile: pyproject.toml plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, md-0.2.0, dependency-0.6.0, md-report-0.6.3, anyio-4.6.2.post1 asyncio: mode=Mode.STRICT, default_loop_scope=None collected 85 items / 82 deselected / 3 selected llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-ollama] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-fireworks] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-together] PASSED ==================================================================================== 3 passed, 82 deselected, 8 warnings in 5.67s ==================================================================================== ``` 2) ` LLAMA_STACK_CONFIG="./llama_stack/templates/ollama/run.yaml" /opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/inference` Before: ``` ________________________________________________________________________________________ test_completion_structured_output __________________________________________________________________________________________ tests/client-sdk/inference/test_inference.py:174: in test_completion_structured_output answer = AnswerFormat.model_validate_json(response.content) E pydantic_core._pydantic_core.ValidationError: 1 validation error for AnswerFormat E Invalid JSON: expected value at line 1 column 2 [type=json_invalid, input_value=' The year he retired, he...5\n\nThe best answer is', input_type=str] E For further information visit https://errors.pydantic.dev/2.10/v/json_invalid ``` After: test consistently passes ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:10:24 -08:00
Ashwin Bharambe	75a2694daa	Refactor the API enum to an independent file into llama_stack/apis/	2025-01-19 12:22:40 -08:00
Xi Yan	74f6af8bbe	[CICD] add simple test step for docker build workflow, fix prefix bug (#821 ) # What does this PR do? Main Thing - Add a simple test step before publishing docker image in workflow Side Fix - Docker push action fails recently due to extra prefix introduced. E.g. see: https://github.com/meta-llama/llama-stack/pull/802#issuecomment-2599507062 cc @terrytangyuan ## Test Plan 1. Release a TestPyPi version on this code: 0.0.63.dev51206766 `3581203331` ``` # 1. build docker image TEST_PYPI_VERSION=0.0.63.dev51206766 llama stack build --template fireworks # 2. test the docker image cd distributions/fireworks && docker compose up ``` 4. Test the full build + test docker flow using TestPyPi from (1): `1284218494` <img width="1049" alt="image" src="https://github.com/user-attachments/assets/c025893d-5ce2-48ff-aa90-de00e105ee09" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-18 15:16:05 -08:00
Sixian Yi	55067fa81d	test report for v0.1 (#814 ) # What does this PR do? MD file for the test results of provider <> inference tests ## Test Plan 1) install `pip install pytest-md-report` 2) Run inference tests with the additions to the commands `--md-report --md-report-verbose=1 --md-report-output=tgi.md` Test text model: meta-llama/Llama-3.1-8B-Instruct Test vision model: meta-llama/Llama-3.2-11B-Vision-Instruct ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Co-authored-by: Xi Yan <xiyan@meta.com>	2025-01-18 07:50:45 -08:00
Yuan Tang	5a63d0ff1d	Fix incorrect RunConfigSettings due to the removal of conda_env (#801 )	2025-01-17 21:30:57 -08:00
Xi Yan	3a9468ce9b	fix again vllm for non base64 (#818 ) # What does this PR do? - previous fix introduced regression for non base64 image - add back download, and base64 check ## Test Plan <img width="835" alt="image" src="https://github.com/user-attachments/assets/b70bf725-035a-4b42-b492-53daaf71458a" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 18:33:40 -08:00
Xi Yan	3e7496e835	fix vllm base64 image inference (#815 ) # What does this PR do? - fix base64 based image url for vllm - add a test case for base64 based image_url - fixes issue: https://github.com/meta-llama/llama-stack/issues/571 ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url ``` <img width="991" alt="image" src="https://github.com/user-attachments/assets/d56381ba-6777-4d23-9da9-81f73ce93566" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 17:07:28 -08:00
Dinesh Yeduguru	3d4c53dfec	add mcp runtime as default to all providers (#816 ) # What does this PR do? This is needed to have the notebook work with MCP	2025-01-17 16:40:58 -08:00
Yuan Tang	6da3053c0e	More generic image type for OCI-compliant container technologies (#802 ) It's a more generic term and applicable to alternatives of Docker, such as Podman or other OCI-compliant technologies. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-17 16:37:42 -08:00
Xi Yan	9d005154d7	fix vllm template (#813 ) # What does this PR do? - Fix vLLM template to resolve https://github.com/meta-llama/llama-stack/issues/805 - Fix agents test with shields ## Test Plan ``` vllm serve meta-llama/Llama-3.1-8B-Instruct VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack run ./llama_stack/templates/remote-vllm/run.yaml ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/ ``` <img width="1245" alt="image" src="https://github.com/user-attachments/assets/9af27684-5a9c-4187-b338-cbfc5211bd99" /> - custom tool flaky due to model outputs - /completions API not implemented Vision Model - 11B-Vision-Instruct <img width="1240" alt="image" src="https://github.com/user-attachments/assets/1d3b3b17-fa09-43a7-b56c-3f77263825c5" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 15:34:29 -08:00
Ashwin Bharambe	eb60f04f86	optional api dependencies (#793 ) Co-authored-by: Dinesh Yeduguru <yvdinesh@gmail.com>	2025-01-17 15:26:53 -08:00
Aidan Do	1f60c0286d	cannot import name 'GreedySamplingStrategy' (#806 ) # What does this PR do? Fixes error when running an provider using openai_compat.py ```python Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 426, in <module> main() File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 349, in main impls = asyncio.run(construct_stack(config)) File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/stack.py", line 207, in construct_stack impls = await resolve_impls( File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 239, in resolve_impls impl = await instantiate_provider( File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 330, in instantiate_provider impl = await fn(*args) File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/__init__.py", line 11, in get_adapter_impl from .vllm import VLLMInferenceAdapter File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 39, in <module> from llama_stack.providers.utils.inference.openai_compat import ( File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 11, in <module> from llama_models.llama3.api.datatypes import ( ImportError: cannot import name 'GreedySamplingStrategy' from 'llama_models.llama3.api.datatypes' (/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py) ++ error_handler 61 ++ echo 'Error occurred in script at line: 61' Error occurred in script at line: 61 ++ exit 1 ``` ## Test Plan ```bash conda create --name llamastack-vllm python=3.10 conda activate llamastack-vllm # To sync with the current llama-models repo pip install -e git+https://github.com/meta-llama/llama-models.git#egg=llama-models export INFERENCE_MODEL=unsloth/Llama-3.3-70B-Instruct-bnb-4bit && \ pip install -e . && \ llama stack build --template remote-vllm --image-type conda && \ llama stack run ./distributions/remote-vllm/run.yaml \ --port 5000 \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env VLLM_URL=http://localhost:8000 ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 14:34:29 -08:00
Dinesh Yeduguru	53b5f6b24a	add json_schema_type to ParamType deps (#808 ) # What does this PR do? Add missing json_schema_type annotation to ParamType deps	2025-01-17 11:02:25 -08:00
Xi Yan	9d574f4aee	fix playground for v1 (#799 ) # What does this PR do? - update playground callsites for v1 api changes ## Test Plan ``` cd llama_stack/distribution/ui streamlit run app.py ``` https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 19:32:07 -08:00
Hardik Shah	b2ac29b9da	fix provider model list test (#800 ) Fixes provider tests ``` pytest -v -s -k "together or fireworks or ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py ``` ``` ... .... llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-together] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-together] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-together] PASSED ================ 21 passed, 6 skipped, 81 deselected, 5 warnings in 32.11s ================= ``` Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-16 19:27:29 -08:00
Ashwin Bharambe	9f14382d82	meta reference inference fixes (#797 ) Miscellaneous fixes for meta reference inference Tests for log probs dont pass because meta reference does not support top_k > 1	2025-01-16 18:17:46 -08:00
Ashwin Bharambe	cb41848a2a	disable version check optionally	2025-01-16 18:14:48 -08:00
Xi Yan	38009631bc	Remove llama-guard in Cerebras template & improve agent test (#798 ) # What does this PR do? - fix cerebras template - fix agent test case without shields ## Test Plan <img width="1261" alt="image" src="https://github.com/user-attachments/assets/04381f85-9192-4fc6-984b-c9bec99bdb82" /> ``` llama stack run ./llama_stack/templates/cerebras/run.yaml LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v tests/client-sdk/ --html=report.html --self-contained-html ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 18:11:35 -08:00
Xi Yan	0fefd4390a	Fix tgi adapter (#796 ) # What does this PR do? - Fix TGI adapter ## Test Plan <img width="851" alt="image" src="https://github.com/user-attachments/assets/0084cbc6-6713-4079-b87b-0befd9aca0b0" /> - most inference working - agent test failure due to model outputs ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 17:44:12 -08:00
Dinesh Yeduguru	73215460ba	add default toolgroups to all providers (#795 ) # What does this PR do? Add toolgroup defs to all the distribution templates	2025-01-16 16:54:59 -08:00
Botao Chen	35bf6ea75a	Pin torchtune pkg version (#791 ) ## context This is the follow up of https://github.com/meta-llama/llama-stack/pull/674. Since torchtune is still in alpha stage and the apis are not guarantee backward compatible. Pin the torchtune and torchao pkg version to avoid the latest torchtune release breaks llama stack post training. We will bump the version number manually after with the new pkg release some testing ## test ping an old torchtune pkg version (0.4.0) and the 0.4.0 was installed <img width="1016" alt="Screenshot 2025-01-16 at 3 06 47 PM" src="https://github.com/user-attachments/assets/630b05d0-8d0d-4e2f-8b48-22e578a62659" />	2025-01-16 16:31:13 -08:00
Xi Yan	d1f3b032c9	cerebras template update for memory (#792 ) # What does this PR do? - we no longer have meta-reference as memory provider, update cerebras template ## Test Plan ``` python llama_stack/scripts/distro_codegen.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 16:07:53 -08:00
Sixian Yi	48b12b9777	[Test automation] generate custom test report (#739 ) # What does this PR do? Generate a test report in MD that contains two main infos: 1) custom report on inference provider -> API / functionalities 2) [TO BE ADDED] test log for easy debugging ## Test Plan For local testing, run test script in command line. See a test report being generated at tests/report.html `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/. --config=ci_test_config.yaml` See [gist](https://gist.github.com/sixianyi0721/a421fd3bc450b74354a1c2c7da483fa5) for output MD file ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 15:33:50 -08:00
Ashwin Bharambe	03ac84a829	Update default port from 5000 -> 8321	2025-01-16 15:26:48 -08:00
Xi Yan	a6b9f2cec7	fix cerebras template (#790 ) # What does this PR do? - fix cerebras template ## Test Plan ``` llama stack build --template cerebras --image-type conda llama stack run cerebras LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/ --html=report.html --self-contained-html ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 13:53:06 -08:00
Dinesh Yeduguru	12c994b5b2	REST API fixes (#789 ) # What does this PR do? Client SDK fixes ## Test Plan LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml" pytest -v tests/client-sdk/safety/test_safety.py LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml" pytest -v tests/client-sdk/memory/test_memory.py	2025-01-16 13:47:08 -08:00
Ashwin Bharambe	cee3816609	Make llama stack build not create a new conda by default (#788 ) ## What does this PR do? So far `llama stack build` has always created a separate conda environment for packaging the dependencies of a distribution. The main reason to do so is isolation -- distributions are composed of providers which can have a variety of potentially conflicting dependencies. That said, this has created significant annoyance for new users since it is not at all transparent. The fact that `llama stack run` is actually running the code in some other conda is very surprising. This PR tries to make things better. - Both `llama stack build` and `llama stack run` now accept an `--image-name` argument which represents the (conda, docker, virtualenv) image you want to operate upon. - For the default (conda) mode, the script checks if a current conda environment exists. If one exists, it uses it. - If `--image-name` is provided, that option is used. In this case, an environment is created if needed. - There is no automatic `llamastack-` prefixing of the environment names done anymore. ## Test Plan Start in a conda environment, run `llama stack build --template fireworks`; verify that it successfully built into the current environment and stored the build file at `$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks` which started correctly in the current environment. Ran the same build command outside of conda. It failed asking for `--image-name`. Ran it with `llama stack build --template fireworks --image-name foo`. This successfully created a conda environment called `foo` and installed deps. Ran `llama stack run fireworks` outside conda which failed. Activated a different conda, ran again, it failed saying it did not find the `llamastack-build.yaml` file. Then used `--image-name foo` option and it ran successfully.	2025-01-16 13:44:53 -08:00
Dinesh Yeduguru	59eeaf7f81	Idiomatic REST API: Telemetry (#786 ) # What does this PR do? Changes Telemetry API to follow more idiomatic REST - [ ] Addresses issue (#issue) ## Test Plan TBD, once i get an approval for rest endpoints	2025-01-16 12:08:46 -08:00
Sixian Yi	c79b087552	[test automation] support run tests on config file (#730 ) # Context For test automation, the end goal is to run a single pytest command from root test directory (llama_stack/providers/tests/.) such that we execute push-blocking tests The work plan: 1) trigger pytest from llama_stack/providers/tests/. 2) use config file to determine what tests and parametrization we want to run # What does this PR do? 1) consolidates the "inference-models" / "embedding-model" / "judge-model" ... options in root conftest.py. Without this change, we will hit into error when trying to run `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of duplicated `addoptions` definitions across child conftest files. 2) Add a `config` option to specify test config in YAML. (see [`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e) for example config file) For provider_fixtures, we allow users to use either a default fixture combination or define their own {api:provider} combinations. ``` memory: .... fixtures: provider_fixtures: - default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk) - inference: sentence_transformers memory: faiss - default_fixture_param_id: chroma ``` 3) generate tests according to the config. Logic lives in two places: a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to do parametrization. b) after test collection, in `pytest_collection_modifyitems`, we filter the tests to include only functions listed in config. ## Test Plan 1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/. --collect-only --config=ci_test_config.yaml` Using `--collect-only` tag to print the pytests listed in the config file (`ci_test_config.yaml`). output: [gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e) 2) sanity check on `--inference-model` option ``` pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 12:05:49 -08:00
Dinesh Yeduguru	8d30ecb91a	Idiomatic REST API: Evals (#782 ) # What does this PR do? Changes Evals API to follow more idiomatic REST ## Test Plan TBD, once i get an approval for rest endpoints	2025-01-16 11:02:42 -08:00
Dinesh Yeduguru	678ab29129	Idiomatic REST API: Inspect (#779 ) # What does this PR do? Since provider list returns a map grouping providers by API, we should not be using data. This PR fixes the types to just be the plain dict, basically reverting back to previous behavior ## Test Plan llama-stack on  fix-provider-list [$] 🅒 stack❯ LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/safety/test_safety.py	2025-01-16 10:39:42 -08:00
Xi Yan	e239280932	fireworks add completion logprobs adapter (#778 ) # What does this PR do? - add completion log probs for fireworks ## Test Plan <img width="849" alt="image" src="https://github.com/user-attachments/assets/5aa1f27f-02a6-422c-8478-94dd1e345342" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-16 10:37:07 -08:00
Dinesh Yeduguru	05f6b44da7	Fix telemetry (#787 ) # What does this PR do? PR fixes couple of issues with telemetry: 1) The REST refactor changed the method from get_span_tree to query_span_tree, which is causing the server side to return empty spans 2) Library client has introduced a new event loop, which required changing the location of where start and end trace are called ## Test Plan LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py -k "test_builtin_tool_web_search" And querying for spans from the agent run using the library client.	2025-01-16 10:36:13 -08:00
Xi Yan	b76bef169c	fix nvidia inference provider (#781 ) # What does this PR do? - fixes to nvidia inference provider to account for strategy update - update nvidia templates ## Test Plan ``` llama stack run ./llama_stack/templates/nvidia/run.yaml --port 5000 LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/inference/test_inference.py --html=report.html --self-contained-html ``` <img width="1288" alt="image" src="https://github.com/user-attachments/assets/d20f9aea-525e-47de-a5be-586e022e0d55" /> NOTE - vision inference broken - tool calling broken - /completion broken cc @mattf @cdgamarose-nv for improving NVIDIA inference adapter ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 18:49:36 -08:00
Dinesh Yeduguru	8fd9bcb8cd	fix routing in library client (#776 ) # What does this PR do? Library client needs to match the impl based on both the path and method. Since path is no longer static, this PR uses the inefficient way of using regexes computed based on the annotated route path to match against the incoming request path. The variables now also can come to the impl from both path or the body, which is also handled cleanly by finding all the regex matches. ## Test Plan LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py	2025-01-15 15:59:45 -08:00
Sixian Yi	67450e4024	bug fixes on inference tests (#774 ) # What does this PR do? Fixes two issues on providers/test/inference - [ ] Addresses issue (#issue) ## Test Plan ### Before ``` ===================================================================================== FAILURES ===================================================================================== __________________________________ TestVisionModelInference.test_vision_chat_completion_streaming[llama_vision-fireworks][llama_vision] ___________________________________ providers/tests/inference/test_vision_inference.py:145: in test_vision_chat_completion_streaming content = "".join( E TypeError: sequence item 0: expected str instance, TextDelta found ------------------------------------------------------------------------------ Captured log teardown ------------------------------------------------------------------------------- ERROR asyncio:base_events.py:1858 Task was destroyed but it is pending! task: <Task pending name='Task-5' coro=<<async_generator_athrow without __name__>()>> ============================================================================= short test summary info ============================================================================== FAILED providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[llama_vision-fireworks] - TypeError: sequence item 0: expected str instance, TextDelta found ============================================================== 1 failed, 2 passed, 33 deselected, 7 warnings in 3.59s ============================================================== (base) sxyi@sxyi-mbp llama_stack % ``` ### After ``` (base) sxyi@sxyi-mbp llama_stack % pytest -k "fireworks" /Users/sxyi/llama-stack/llama_stack/providers/tests/inference/test_vision_inference.py /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) =============================================================================== test session starts ================================================================================ platform darwin -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0 rootdir: /Users/sxyi/llama-stack configfile: pyproject.toml plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, dependency-0.6.0, anyio-4.6.2.post1 asyncio: mode=Mode.STRICT, default_loop_scope=None collected 36 items / 33 deselected / 3 selected providers/tests/inference/test_vision_inference.py ... [100%] =================================================================== 3 passed, 33 deselected, 7 warnings in 3.75s =================================================================== (base) sxyi@sxyi-mbp llama_stack % ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 15:39:05 -08:00
cdgamarose-nv	b3202bcf77	add nvidia distribution (#565 ) # What does this PR do? adds nvidia template for creating a distribution using inference adapter for NVIDIA NIMs. ## Test Plan Please describe: Build llama stack distribution for nvidia using the template, docker and conda. ```bash (.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client configure --endpoint http://localhost:5000 Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5000 (.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client models list ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Llama3.1-8B-Instruct │ nvidia │ meta/llama-3.1-8b-instruct │ {} │ │ meta-llama/Llama-3.2-3B-Instruct │ nvidia │ meta/llama-3.2-3b-instruct │ {} │ └──────────────────────────────────┴─────────────┴────────────────────────────┴──────────┘ (.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client inference chat-completion --message "hello, write me a 2 sentence poem" ChatCompletionResponse( completion_message=CompletionMessage( content='Here is a 2 sentence poem:\n\nThe sun sets slow and paints the sky, \nA gentle hue of pink that makes me sigh.', role='assistant', stop_reason='end_of_turn', tool_calls=[] ), logprobs=None ) ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-01-15 14:04:43 -08:00
Dinesh Yeduguru	7fb2c1c48d	More idiomatic REST API (#765 ) # What does this PR do? This PR changes our API to follow more idiomatic REST API approaches of having paths being resources and methods indicating the action being performed. Changes made to generator: 1) removed the prefix check of "get" as its not required and is actually needed for other method types too 2) removed _ check on path since variables can have "_" ## Test Plan LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v tests/client-sdk/agents/test_agents.py	2025-01-15 13:20:09 -08:00
Xi Yan	6deef1ece0	rebase eval test w/ tool_runtime fixtures (#773 ) # What does this PR do? - fix eval tests to include tool_runtime fixtures - rebase eval for extracting memory retrieval context ## Test Plan ``` pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py ``` - With notebook: https://gist.github.com/yanxi0830/1260a6cb7ec42498a195b88422462a34 ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 12:55:19 -08:00
Xi Yan	d0a25dd453	[bugfix] fix llama guard parsing ContentDelta (#772 ) # What does this PR do? Fix this error <img width="1183" alt="image" src="https://github.com/user-attachments/assets/a4d48832-a9b9-4fc9-b8b6-79326a13c03e" /> ## Test Plan ``` LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/inference/test_inference.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 11:20:23 -08:00
Xi Yan	32d3abe964	[CICD] Github workflow for publishing Docker images (#764 ) # What does this PR do? - Add Github workflow for publishing docker images. - Manual Inputs - We can use a (1) TestPyPi version / (2) build via released PyPi version Notes - Keep this workflow manually triggered as we don't want to publish nightly docker images Additional Changes - Resolve issue with running llama stack build in non-terminal device ``` File "/home/runner/.local/lib/python3.12/site-packages/llama_stack/distribution/utils/exec.py", line 25, in run_with_pty old_settings = termios.tcgetattr(sys.stdin) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ termios.error: (25, 'Inappropriate ioctl for device') ``` - Modified build_container.sh to work in non-terminal environment ## Test Plan - Triggered workflow: `3562217878` <img width="1076" alt="image" src="https://github.com/user-attachments/assets/f1b5cef6-05ab-49c7-b405-53abc9264734" /> - Tested published docker image <img width="702" alt="image" src="https://github.com/user-attachments/assets/e7135189-65c8-45d8-86f9-9f3be70e380b" /> - /tools API endpoints are served so that docker is correctly using the TestPyPi package <img width="296" alt="image" src="https://github.com/user-attachments/assets/bbcaa7fe-c0a4-4d22-b600-90e3c254bbfd" /> - Published tagged images: https://hub.docker.com/repositories/llamastack <img width="947" alt="image" src="https://github.com/user-attachments/assets/2a0a0494-4d45-4643-bc29-72154ecc54a5" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 09:01:33 -08:00
Ashwin Bharambe	b78e6675ea	llama-stack version alpha -> v1	2025-01-15 05:58:09 -08:00
Hardik Shah	a51c8b4efc	Convert `SamplingParams.strategy` to a union (#767 ) # What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-15 05:38:51 -08:00
Yuan Tang	300e6e2702	Fix issue when generating distros (#755 ) Addressed comment https://github.com/meta-llama/llama-stack/pull/723#issuecomment-2581902075. cc @yanxi0830 I am not 100% sure if the diff is correct though but this is the result of running `python llama_stack/scripts/distro_codegen.py`. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-15 05:34:08 -08:00
Botao Chen	52a21ce78f	Free up memory after post training finishes (#770 ) ## context Currently, the GPU memory will be continuously occupied after the training finishes. In this PR, we explicitly delete the reference and clean up the memory after training finishes. ## test Before the change, after training a llama 3.2 3B model, >6GB GPU memory is still occupied After the change, after training a llama 3.2 3B model, the GPU memory drops to ~1GB <img width="156" alt="Screenshot 2025-01-14 at 6 05 17 PM" src="https://github.com/user-attachments/assets/45d212b1-a651-49f3-aad9-1c0a27fcebcf" />	2025-01-14 19:19:38 -08:00

... 23 24 25 26 27 ...

1746 commits