llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Hardik Shah	a7b929f17e	Sec fixes as raised by bandit (#917 ) minor fixes to hashlib and jinja	2025-01-31 13:44:26 -08:00
Dmitry Rogozhkin	7ea14ae62e	feat: enable xpu support for meta-reference stack (#558 ) This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: https://github.com/meta-llama/llama-models/pull/233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-31 12:11:49 -08:00
Hardik Shah	97eb3eecea	Fix Agents to support code and rag simultaneously (#908 ) # What does this PR do? Fixes a bug where agents were not working when both rag and code-interpreter were added as tools. ## Test Plan Added a new client_sdk test which tests for this scenario ``` LLAMA_STACK_CONFIG=together pytest -s -v tests/client-sdk -k 'test_rag_and_code_agent' ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-30 17:09:34 -08:00
Xi Yan	94051cfe9e	fix ImageContentItem to take base64 string as image.data (#909 ) # What does this PR do? - Discussion in https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819 - image.data should accept base64 string as input instead of binary bytes, change prompt_adapter to account for that. ## Test Plan ``` pytest -v tests/client-sdk/inference/test_inference.py ``` with test in https://github.com/meta-llama/llama-stack/pull/906 ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-30 15:58:23 -08:00
snova-edwardm	7fe2592795	SambaNova supports Llama 3.3 (#905 ) # What does this PR do? - Fix typo - Support Llama 3.3 70B ## Test Plan Run the following scripts and obtain the test results Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 1.26s ============================================ ``` Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 0.52s ============================================ ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [N] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [N] Wrote necessary unit or integration tests.	2025-01-30 09:24:46 -08:00
Sixian Yi	836f47a82d	log probs - mark pytests as xfail for unsupported providers + add support for together (#883 ) # What does this PR do? 1) As per @mattf's suggestion, we want to mark the pytest as xfail for providers that do not support the functionality. In this diff, we xfail the logProbs inference tests for providers who does not support log probs. ( log probs is only supported by together, fireworks and vllm) 2) Added logProbs support for together according to their developer [doc](https://docs.together.ai/docs/logprobs). ## Test Plan 1) Together & Fireworks ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` ``` tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED ========================================================================================== 15 passed, 2 warnings in 19.46s =========================================================================================== ``` ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` All tests passed 2) Ollama - LogProbs tests are marked as xfailed. ``` tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 23:41:25 -08:00
Dmitry Rogozhkin	80f2032485	Fix running stack built with base conda environment (#903 ) Fixes: #902 For the test verified that llama stack can run if built: * With default "base" conda environment * With new custom conda environment using `--image-name XXX` option In both cases llama stack starts fine (was failing with "base") before this patch. CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-29 21:24:22 -08:00
Aidan Do	39c34dd25f	[#432 ] Groq Provider tool call tweaks (#811 ) # What does this PR do? Follow up for @ashwinb's comments in https://github.com/meta-llama/llama-stack/pull/630 - [x] Contributes to issue (#432) ## Test Plan <details> <summary>Environment</summary> ```shell export GROQ_API_KEY=<api-key> # Create environment if not already conda create --name llamastack-groq python=3.10 conda activate llamastack-groq wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml wget https://raw.githubusercontent.com/meta-llama/llama-stack/918172c7fa92522c9ebc586bdb4f386b1d9ea224/run.yaml # Build pip install -e . && llama stack build --config ./build.yaml --image-type conda # Activate built environment conda activate llamastack-groq # Test deps pip install pytest pytest_html pytest_asyncio ``` </details> <details> <summary>Unit tests</summary> ```shell # Setup conda activate llamastack-groq pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s # Result llama_stack/providers/tests/inference/groq/test_groq_utils.py ....................... ========================================= 23 passed, 11 warnings in 0.06s ========================================= ``` </details> <details> <summary>Integration tests</summary> ```shell # Tests pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s # Results ___________________________ TestInference.test_chat_completion_with_tool_calling[-groq] ___________________________ llama_stack/providers/tests/inference/test_text_inference.py:403: in test_chat_completion_with_tool_calling assert len(message.tool_calls) > 0 E assert 0 > 0 E + where 0 = len([]) E + where [] = CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]).tool_calls ============================================= short test summary info ============================================= FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-groq] - assert 0 > 0 ======================== 1 failed, 3 passed, 5 skipped, 99 deselected, 7 warnings in 2.13s ======================== ``` (One failure as expected from 3.2 3B - re: https://github.com/meta-llama/llama-stack/pull/630#discussion_r1914056503) </details> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-01-29 12:02:12 -08:00
Ashwin Bharambe	0d96070af9	Update OpenAPI generator to add param and field documentation (#896 ) We desperately need to document our APIs. This is the basic requirement of having a Spec :) This PR updates the OpenAPI generator so documentation for request parameters and object fields can be properly added to the OpenAPI specs. From there, this should get picked by Stainless, etc. ## Test Plan: Updated client-sdk (See https://github.com/meta-llama/llama-stack-client-python/pull/104) and then ran: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py ```	2025-01-29 10:04:30 -08:00
Yuan Tang	53721e91ad	Fix validator of "container" image type (#901 ) This was missed in https://github.com/meta-llama/llama-stack/pull/802 somehow.	2025-01-29 09:36:52 -08:00
Matthew Farrellee	11b1cdf31d	add NVIDIA_BASE_URL and NVIDIA_API_KEY to control hosted vs local endpoints (#897 ) # What does this PR do? allows template distribution connect to hosted or local NIM: use --env NVIDIA_BASE_URL=http://localhost:8000 to connect to a local NIM running at localhost:8000 use --env NVIDIA_API_KEY=blah when connecting to hosted NIM, e.g. NVIDIA_BASE_URL=https://integrate.api.nvidia.com ## Test Plan - `llama stack run ./llama_stack/templates/nvidia/run.yaml` -> error, e.g. API key is required for hosted NVIDIA NIM - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=https://integrate.api.nvidia.com` -> error, e.g. API key is required for hosted NVIDIA NIM - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_API_KEY=REDACTED` -> successful connection to NIM on https://integrate.api.nvidia.com - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=https://integrate.api.nvidia.com --env NVIDIA_API_KEY=REDACTED` -> successful connection to NIM running on integrate.api.nvidia.com - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=http://localhost:8000` -> successful connection to NIM running on localhost:8000 - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=http://localhost:8000 --env NVIDIA_API_KEY=REDACTED` -> successful connection to NIM running on http://localhost:8000 - `llama stack run ./llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=http://bogus` -> runtime error, e.g. ConnectionError (TODO: this should be a startup error) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 09:31:56 -08:00
Matthew Farrellee	1a5c17a92f	align with CompletionResponseStreamChunk.delta as str (instead of TextDelta) (#900 ) # What does this PR do? fix type mismatch in /v1/inference/completion ## Test Plan `llama stack run ./llama_stack/templates/nvidia/run.yaml` `LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v tests/client-sdk/inference/test_inference.py` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 09:25:50 -08:00
Ashwin Bharambe	f2feb7d15c	Fix Chroma adapter (#893 ) Chroma method had the wrong signature. ## Test Plan Start Chroma: `chroma run --path /tmp/foo/chroma2 --host localhost --port 6001` Modify run.yaml to include Chroma server pointing to localhost:6001 and run `llama stack run` Then: ```bash LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v agents/test_agents.py -k rag ``` passes	2025-01-28 13:19:47 -08:00
Ashwin Bharambe	41749944a5	Fix ResponseFormat import	2025-01-28 09:34:05 -08:00
Ashwin Bharambe	aee6237685	Small refactor for run_with_pty	2025-01-28 09:32:33 -08:00
Vladislav Bronzov	8332ea23ad	Add run win command for stack (#890 ) # What does this PR do? Add win platform run command for stack - [x] Addresses issue (#issue) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. https://github.com/meta-llama/llama-stack/pull/889 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 08:04:28 -08:00
Vladislav Bronzov	09299e908e	Add windows support for build execution (#889 ) # What does this PR do? This PR implements windows platform support for build_container.sh execution from terminal. Additionally, it resolves "no support for Terminos and PTY for Window PC" issues. - [x] Addresses issue (#issue) Releates issues: https://github.com/meta-llama/llama-stack/issues/826, https://github.com/meta-llama/llama-stack/issues/726 ## Test Plan Changes were tested manually by executing standard scripts from LLama guide: - llama stack build --template ollama --image-type container - llama stack build --list-templates - llama stack build ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 07:41:41 -08:00
Zhonglin Han	229f0d5f7c	Agent response format (#660 ) # What does this PR do? Add response format for agents structured output. - [ ] Using structured output for agents (interior_design app as an example) (#issue) https://github.com/meta-llama/llama-stack-apps/issues/122 ## Test Plan E2E test plan with llama-stack-apps interior_design Please describe: Test ran: - provide instructions so it can be reproduced. Start your distro: llama stack run llama_stack/templates/fireworks/run.yaml --env FIREWORKS_API_KEY=<API_KEY> Run api test: ```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces``` ## Sources Results: https://github.com/meta-llama/llama-stack-client-python/pull/72 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 05:05:38 -08:00
Sixian Yi	ba453c3487	Report generation minor fixes (#884 ) # What does this PR do? fixed report generation: 1) do not initialize a new client in report.py - instead get it from pytest fixture 2) Add "provider" for "safety" and "agents" section 3) add logprobs functionality in "inference" section ## Test Plan See the regenerated report ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 04:58:12 -08:00
snova-edwardm	aa65610e75	Sambanova - LlamaGuard (#886 ) # What does this PR do? - Fix loading SambaNovaImpl issue - Add LlamaGuard model support for inference ## Test Plan Run the following unit test scripts and results ### Embedding ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models) llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models) =================================================================================================================== 2 skipped, 1 warning in 0.32s =================================================================================================================== ``` ### Vision ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED =================================================================================================================== 3 passed, 1 warning in 2.68s ==================================================================================================================== ``` ### Text ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED =================================================================================================================== 1 passed, 1 warning in 0.46s ==================================================================================================================== ``` ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED =================================================================================================================== 1 passed, 1 warning in 0.48s ==================================================================================================================== ``` ## Before submitting - [] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [Y] Wrote necessary unit or integration tests.	2025-01-27 15:46:30 -08:00
Dinesh Yeduguru	3c1a2c3d66	Fix telemetry init (#885 ) # What does this PR do? When you re-initialize the library client in a notebook, we were seeing this error: ``` Getting traces for session_id=5c8d1969-0957-49d2-b852-32cbb8ef8caf --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [<ipython-input-11-d74bb6cdd3ab>](https://localhost:8080/#) in <cell line: 0>() 7 agent_logs = [] 8 ----> 9 for span in client.telemetry.query_spans( 10 attribute_filters=[ 11 {"key": "session_id", "op": "eq", "value": session_id}, 10 frames [/usr/local/lib/python3.11/dist-packages/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py](https://localhost:8080/#) in query_traces(self, attribute_filters, limit, offset, order_by) 246 ) -> QueryTracesResponse: 247 return QueryTracesResponse( --> 248 data=await self.trace_store.query_traces( 249 attribute_filters=attribute_filters, 250 limit=limit, AttributeError: 'TelemetryAdapter' object has no attribute 'trace_store' ``` This is happening because the we were skipping some required steps for the object state as part of the global _TRACE_PROVIDER check. This PR moves the initialization of the object state out of the TRACE_PROVIDER init.	2025-01-27 11:20:28 -08:00
Ashwin Bharambe	e5936a8df8	Update discriminator to have the correct `mapping` (#881 ) See https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator When specifying discriminators, mapping must be specified unless the value of the discriminator is the subtype itself (which in our case is not.) The changes in the YAML are self-explanatory.	2025-01-27 09:18:13 -08:00
Ashwin Bharambe	891bf704eb	Ensure llama stack build --config <> --image-type <> works (#879 ) Fix the issues brought up in https://github.com/meta-llama/llama-stack/issues/870 Test all combinations of (conda, container) vs. (template, config) combos.	2025-01-25 11:13:36 -08:00
Ashwin Bharambe	087a83f673	Bump key for faiss	2025-01-24 12:08:36 -08:00
Hardik Shah	2cebb24d3a	Update doc templates for running safety on self-hosted templates (#874 )	2025-01-24 11:28:20 -08:00
Dinesh Yeduguru	cb11336886	remove logger handler only in notebook (#868 ) remove logger handler only in notebook	2025-01-23 16:58:17 -08:00
Dinesh Yeduguru	a78f1fc70d	make default tool prompt format none in agent config (#863 ) # What does this PR do? Previously the tests hard coded the tool prompt format to be json which will cause it to fail when using 3.2/3.3 family of models. This change make the default to be none for the agent config and just remove the specification in the tests. ## Test Plan LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py	2025-01-23 14:44:59 -08:00
Ashwin Bharambe	d78027f3b5	Move runpod provider to the correct directory Also cleanup the test code to avoid skipping tests. Let failures be known and public.	2025-01-23 12:25:12 -08:00
snova-edwardm	22dc684da6	Sambanova inference provider (#555 ) # What does this PR do? This PR adds SambaNova as one of the Provider - Add SambaNova as a provider ## Test Plan Test the functional command ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key> ``` Test the distribution template: ``` # Docker LLAMA_STACK_PORT=5001 docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ llamastack/distribution-sambanova \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY # Conda llama stack build --template sambanova --image-type conda llama stack run ./run.yaml \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY ``` ## Source [SambaNova API Documentation](https://cloud.sambanova.ai/apis) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [Y ] Wrote necessary unit or integration tests. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-01-23 12:20:28 -08:00
Marut Pandya	e2b5456e48	Add Runpod Provider + Distribution (#362 ) Add Runpod as a inference provider for openAI compatible managed endpoints. Testing - Configured llama stack from scratch, set `remote::runpod` as a inference provider. - Added Runpod Endpoint URL and API key. - Started llama-stack server - llama stack run my-local-stack --port 3000 ``` curl http://localhost:5000/inference/chat_completion \ -H "Content-Type: application/json" \ -d '{ "model": "Llama3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the moon"} ], "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} }' ``` --------- Signed-off-by: pandyamarut <pandyamarut@gmail.com>	2025-01-23 12:19:02 -08:00
Sixian Yi	bfbd773b54	remove test report	2025-01-23 01:06:39 -08:00
Aidan Do	910717c1fd	Add vLLM raw completions API (#823 ) # What does this PR do? Adds raw completions API to vLLM ## Test Plan <details> <summary>Setup</summary> ```bash # Run vllm server conda create -n vllm python=3.12 -y conda activate vllm pip install vllm # Run llamastack conda create --name llamastack-vllm python=3.10 conda activate llamastack-vllm export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct && \ pip install -e . && \ pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7 && \ llama stack build --template remote-vllm --image-type conda && \ llama stack run ./distributions/remote-vllm/run.yaml \ --port 5000 \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env VLLM_URL=http://localhost:8000/v1 \| tee -a llama-stack.log ``` </details> <details> <summary>Integration</summary> ```bash # Run conda activate llamastack-vllm export VLLM_URL=http://localhost:8000/v1 pip install pytest pytest_html pytest_asyncio aiosqlite pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k vllm # Results llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[-vllm_remote] PASSED [ 11%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-vllm_remote] PASSED [ 22%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-vllm_remote] SKIPPED [ 33%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-vllm_remote] SKIPPED [ 44%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-vllm_remote] PASSED [ 55%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[-vllm_remote] PASSED [ 66%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-vllm_remote] PASSED [ 77%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-vllm_remote] PASSED [ 88%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-vllm_remote] PASSED [100%] ====================================== 7 passed, 2 skipped, 99 deselected, 1 warning in 9.80s ====================================== ``` </details> <details> <summary>Manual</summary> ```bash # Install pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7 ``` Apply this diff ```diff diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py index 8dbb193..95173e2 100644 --- a/llama_stack/distribution/server/server.py +++ b/llama_stack/distribution/server/server.py @@ -250,7 +250,7 @@ class ClientVersionMiddleware: server_version_parts = tuple( map(int, self.server_version.split(".")[:2]) ) - if client_version_parts != server_version_parts: + if False and client_version_parts != server_version_parts: async def send_version_error(send): await send( diff --git a/llama_stack/templates/remote-vllm/run.yaml b/llama_stack/templates/remote-vllm/run.yaml index 4eac4da..32eb50e 100644 --- a/llama_stack/templates/remote-vllm/run.yaml +++ b/llama_stack/templates/remote-vllm/run.yaml @@ -94,7 +94,8 @@ metadata_store: type: sqlite db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db models: -- metadata: {} +- metadata: + llama_model: meta-llama/Llama-3.2-3B-Instruct model_id: ${env.INFERENCE_MODEL} provider_id: vllm-inference model_type: llm ``` Test 1: ```python from llama_stack_client import LlamaStackClient client = LlamaStackClient( base_url="http://localhost:5000", ) response = client.inference.completion( model_id="meta-llama/Llama-3.2-3B-Instruct", content="Hello, world client!", ) print(response) ``` Test 2 ``` from llama_stack_client import LlamaStackClient client = LlamaStackClient( base_url="http://localhost:5000", ) response = client.inference.completion( model_id="meta-llama/Llama-3.2-3B-Instruct", content="Hello, world client!", stream=True, ) for chunk in response: print(chunk.delta, end="", flush=True) ``` ``` I'm excited to introduce you to our latest project, a comprehensive guide to the best coffee shops in [City]. As a coffee connoisseur, you're in luck because we've scoured the city to bring you the top picks for the perfect cup of joe. In this guide, we'll take you on a journey through the city's most iconic coffee shops, highlighting their unique features, must-try drinks, and insider tips from the baristas themselves. From cozy cafes to trendy cafes, we've got you covered. Top 5 Coffee Shops in [City] 1. The Daily Grind: This beloved institution has been serving up expertly crafted pour-overs and lattes for over 10 years. Their expert baristas are always happy to guide you through their menu, which features a rotating selection of single-origin beans from around the world... ``` </details> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-22 22:58:27 -08:00
Ashwin Bharambe	4d7c8c797f	Kill colons	2025-01-22 22:54:13 -08:00
Ashwin Bharambe	35c71d5bbe	Update OpenAPI generator to output discriminator (#848 ) oneOf should have discriminators so Stainless can generate better code ## Test Plan Going to generate the SDK now and check.	2025-01-22 22:15:23 -08:00
Ashwin Bharambe	6c205e1d5a	Fix tool tests	2025-01-22 20:31:18 -08:00
Ashwin Bharambe	0bff6e1658	Move tool_runtime.memory -> tool_runtime.rag	2025-01-22 20:25:02 -08:00
Ashwin Bharambe	f3d8864c36	Rename builtin::memory -> builtin::rag	2025-01-22 20:22:51 -08:00
Sixian Yi	597869a2aa	add distro report (#847 ) # What does this PR do? Generate distro reports to cover inference, agents, and vector_io. ## Test Plan Report generated through `/opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/ --report` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-22 19:20:49 -08:00
Ashwin Bharambe	23f1980f9c	Fix meta-reference GPU implementation for inference	2025-01-22 18:31:59 -08:00
Ashwin Bharambe	a8345f5f76	Fix llama stack build docker creation to have correct entrypoint	2025-01-22 16:53:54 -08:00
Ashwin Bharambe	08dcb9e31e	Accept "query_config" params for the RAG tool	2025-01-22 16:42:36 -08:00
Hardik Shah	deab4f57dd	Improved report generation for providers (#844 ) # What does this PR do? Automates the model list check by querying the distro. Added support for both remote hosted and templates. ## Test Plan Run on a remote hosted distro via `LLAMA_STACK_BASE_URL="https://llamastack-preview.fireworks.ai" pytest -s -v tests/client-sdk --report` Run on a template via `LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk --report`	2025-01-22 15:27:09 -08:00
Botao Chen	8738c3e5a7	fix experimental-post-training template (#842 ) ## What does this PR do? For the completion of https://github.com/meta-llama/llama-stack/pull/835 ## Test Plan llama stack build --template experimental-post-training --image-type conda llama stack run llama_stack/templates/experimental-post-training/run.yaml	2025-01-22 15:04:05 -08:00
Ashwin Bharambe	07b87365ab	[inference api] modify content types so they follow a more standard structure (#841 ) Some small updates to the inference types to make them more standard Specifically: - image data is now located in a "image" subkey - similarly tool call data is located in a "tool_call" subkey The pattern followed is `dict(type="foo", foo=<...>)`	2025-01-22 12:16:18 -08:00
Hardik Shah	caa8387dd2	Fix fireworks client sdk chat completion with images (#840 ) Enable downloads before sending request to fireworks. Test using -- `LLAMA_STACK_CONFIG=./llama_stack/templates/fireworks/run.yaml pytest -s -v -k 'test_image_chat_completion_streaming' tests/client-sdk`	2025-01-22 11:25:10 -08:00
Ashwin Bharambe	a63a43c646	[memory refactor][6/n] Update naming and routes (#839 ) Making a few small naming changes as per feedback: - RAGToolRuntime methods are called `insert` and `query` to keep them more general - The tool names are changed to non-namespaced forms `insert_into_memory` and `query_from_memory` - The REST endpoints are more REST-ful	2025-01-22 10:39:13 -08:00
Ashwin Bharambe	c9e5578151	[memory refactor][5/n] Migrate all vector_io providers (#835 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This PR finishes off all the stragglers and migrates everything to the new naming.	2025-01-22 10:17:59 -08:00
Ashwin Bharambe	1a7490470a	[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Third part: - we need to make `tool_runtime.rag_tool.query_context()` and `tool_runtime.rag_tool.insert_documents()` methods work smoothly with complete type safety. To that end, we introduce a sub-resource path `tool-runtime/rag-tool/` and make changes to the resolver to make things work. - the PR updates the agents implementation to directly call these typed APIs for memory accesses rather than going through the complex, untyped "invoke_tool" API. the code looks much nicer and simpler (expectedly.) - there are a number of hacks in the server resolver implementation still, we will live with some and fix some Note that we must make sure the client SDKs are able to handle this subresource complexity also. Stainless has support for subresources, so this should be possible but beware. ## Test Plan Our RAG test is sad (doesn't actually test for actual RAG output) but I verified that the implementation works. I will work on fixing the RAG test afterwards. ```bash pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B ```	2025-01-22 10:04:16 -08:00
Ashwin Bharambe	78a481bb22	[memory refactor][2/n] Update faiss and make it pass tests (#830 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Second part: - updates routing table / router code - updates the faiss implementation ## Test Plan ``` pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384 ```	2025-01-22 10:02:15 -08:00
Ashwin Bharambe	3ae8585b65	[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This is the first part: - delete other kinds of memory banks (keyvalue, keyword, graph) for now; we will introduce a keyvalue store API as part of this design but not use it in the RAG tool yet. - renaming of the APIs	2025-01-22 09:59:30 -08:00

1 2 3 4 5 ...

593 commits