llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Reid	56c1a50b86	fix: fix the describe table display issue (#1221 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] If not passed the `headers`, it will display empty for the first row, also might break the second row, make the `Model` row as `headers`. ``` Before: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ <<<--------- ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Model │ Llama3.1-70B │ <<<--------- ├─────────────────────────────┼────────────────────────────────┤ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... after: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ Llama3.1-70B ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 21:34:53 -08:00
Sébastien Han	929c5f0842	refactor(server): replace print statements with logger (#1250 ) # What does this PR do? - Introduced logging in `StackRun` to replace print-based messages - Improved error handling for config file loading and parsing - Replaced `cprint` with `logger.error` for consistent error messaging - Ensured logging is used in `server.py` for startup, shutdown, and runtime messages - Added missing exception handling for invalid providers Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-25 21:31:37 -08:00
Yuan Tang	eb743a3b26	build: Merge redundant "files" field for codegen check in .pre-commit-config.yaml (#1261 ) # What does this PR do? Merges the two "files" field for codegen check. This also fixes the broken main branch CI build. ## Test Plan ``` Distribution Template Codegen............................................Passed - hook id: distro-codegen - duration: 367.44s ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-25 20:56:22 -08:00
Reid	55eb257459	chore: update the zero_to_hero_guide doc link (#1220 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It should changed by `8585b95a28`, so show `404` when click it. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 17:16:02 -08:00
Hardik Shah	c0c7622295	fix: dont assume SentenceTransformer is imported as titled	2025-02-25 16:53:01 -08:00
Vladislav Bronzov	967cff4533	feat: Add Groq distribution template (#1173 ) # What does this PR do? Create a distribution template using Groq as inference provider. Link to issue: https://github.com/meta-llama/llama-stack/issues/958 ## Test Plan Run `python llama_stack/scripts/distro_codegen.py` to generate run.yaml and build.yaml Test the newly created template by running `llama stack build --template <template-name>` `llama stack run <template-name>`	2025-02-25 14:16:56 -08:00
Kelly Brown	99c1d4c456	docs: Remove $ from client CLI ref to add valid copy and paste ability (#1260 ) Description: This PR removes the "$" symbol from the client CLI reference so that users have the ability to use the copy and paste code function without copying over the "$" symbol. Ik the "$" are good for showing user permissions, but I noticed they're not really used in other parts of the docs, and it makes the the copy and paste code blocks user flow easier. Very small nit PR, this is not a huge deal if PR is not needed.	2025-02-25 13:50:00 -08:00
raghotham	0885f959f1	fix: update index.md to include 0.1.4 (#1259 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-02-25 13:34:29 -08:00
LESSuseLESS	3a31611486	feat: completing text /chat-completion and /completion tests (#1223 ) # What does this PR do? The goal is to have a fairly complete set of provider and e2e tests for /chat-completion and /completion. This is the current list, ``` grep -oE "def test_[a-zA-Z_+]" llama_stack/providers/tests/inference/test_text_inference.py \| cut -d' ' -f2 ``` - test_model_list - test_text_completion_non_streaming - test_text_completion_streaming - test_text_completion_logprobs_non_streaming - test_text_completion_logprobs_streaming - test_text_completion_structured_output - test_text_chat_completion_non_streaming - test_text_chat_completion_structured_output - test_text_chat_completion_streaming - test_text_chat_completion_with_tool_calling - test_text_chat_completion_with_tool_calling_streaming ``` grep -oE "def test_[a-zA-Z_+]" tests/client-sdk/inference/test_text_inference.py \| cut -d' ' -f2 ``` - test_text_completion_non_streaming - test_text_completion_streaming - test_text_completion_log_probs_non_streaming - test_text_completion_log_probs_streaming - test_text_completion_structured_output - test_text_chat_completion_non_streaming - test_text_chat_completion_streaming - test_text_chat_completion_with_tool_calling_and_non_streaming - test_text_chat_completion_with_tool_calling_and_streaming - test_text_chat_completion_with_tool_choice_required - test_text_chat_completion_with_tool_choice_none - test_text_chat_completion_structured_output - test_text_chat_completion_tool_calling_tools_not_in_request ## Test plan == Set up Ollama local server ``` OLLAMA_HOST=127.0.0.1:8321 with-proxy ollama serve OLLAMA_HOST=127.0.0.1:8321 ollama run llama3.2:3b-instruct-fp16 --keepalive 60m ``` == Run a provider test ``` conda activate stack OLLAMA_URL="http://localhost:8321" \ pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \ llama_stack/providers/tests/inference/test_text_inference.py::TestInference ``` == Run an e2e test ``` conda activate sherpa with-proxy pip install llama-stack export INFERENCE_MODEL=llama3.2:3b-instruct-fp16 export LLAMA_STACK_PORT=8322 with-proxy llama stack build --template ollama with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama ``` ``` conda activate stack LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \ pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \ tests/client-sdk/inference/test_text_inference.py ```	2025-02-25 11:37:04 -08:00
Charlie Doern	9b130f96a7	fix: build_venv expects an extra argument (#1233 ) # What does this PR do? currently, build_venv.sh expects a `distribution_type` as the first argument but the only things ever passed are: 1. image name 2. pip dependencies so distribution_type is never passed in meaning the script errors when calling something like: `llama stack build --image-type venv --template ollama --image-name test` before output: ``` llama stack build --image-type venv --template ollama --image-name venv-test Usage: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> <env_name> <pip_dependencies> [<special_pip_deps>] Example: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> mybuild ./my-stack-build.yaml 'numpy pandas scipy' Failed to build target venv-test with return code 1 Run config path is empty ``` after: ``` llama stack build --image-type venv --template ollama --image-name venv-test Environment 'venv-test' already exists, re-using it. Using virtual environment venv-test Using CPython 3.13.0 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13 Creating virtual environment at: venv-test Activate with: source venv-test/bin/activate Using Python 3.13.0 environment at: venv-test Resolved 55 packages in 640ms Built fire==0.7.0 Prepared 54 packages in 1.14s Installed 55 packages in 82ms + annotated-types==0.7.0 ``` ## Test Plan ran locally with output above Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-25 11:08:50 -08:00
Sébastien Han	c223b1862b	fix: resolve type hint issues and import dependencies (#1176 ) # What does this PR do? - Fixed type hinting and missing imports across multiple modules. - Improved compatibility by using `TYPE_CHECKING` for conditional imports. - Updated `pyproject.toml` to enforce stricter linting. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-25 11:06:47 -08:00
Yuan Tang	1a044ef894	fix: Raise exception when tool call result is None (#1253 ) # What does this PR do? When there are issues with the tool call function, an exception is raised but the error message is not informative. This adds a clearer message to tell users to check their functions. ``` Traceback (most recent call last): File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator async for item in event_gen: File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn async for chunk in self.run( File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run async for res in self._run( File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 811, in _run content=tool_result.content, AttributeError: 'NoneType' object has no attribute 'content' ``` ## Test Plan Ran the same script and exception is raised with clearer error message. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-25 13:10:50 -05:00
Jeff Tang	73a0c7a0e7	LocalInferenceImpl update for LS013 (#1242 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-02-25 09:58:34 -08:00
ehhuang	dc3c881ffe	fix: include timezone in Agent steps' timestamps (#1247 ) Summary: kotlin SDK expects this format Test Plan: python prints the expected format >>> str(datetime.now().astimezone()) '2025-02-24 22:02:58.729763-08:00'	2025-02-25 09:49:25 -08:00
Sébastien Han	1bd080c23d	build: hint on Python version for uv venv (#1172 ) # What does this PR do? Whenever uv is instantiated and creates a virtual environment, it will use the minimal Python interpreter version supported by the project which is 3.10. Closes: https://github.com/meta-llama/llama-stack/issues/1170 Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-25 10:37:45 -05:00
Hardik Shah	30f79fafcb	fix: Update Llama_Stack_Benchmark_Evals.ipynb (#1246 ) Update eval notebook to use `--image-name __system__`	2025-02-24 18:22:42 -08:00
Hardik Shah	a1fe3c30dd	fix: Update getting_started.ipynb (#1245 ) update to install properly in system python in colab	2025-02-24 18:22:32 -08:00
Charlie Doern	de878e15a9	fix: pre-commit updates (#1243 ) # What does this PR do? PR #1139 caused pre-commit failures on main likely due to improper rebase before merge. run pre-commit on main and commit the changes see runs here: `3775148428` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-24 17:20:29 -08:00
Charlie Doern	4684fd3f8d	refactor: combine start scripts for each env (#1139 ) # What does this PR do? now that llama stack supports running in venv, conda, and container modes and the 3 scripts overlap alot, combine these three into ons `start_stack.sh` script ## Test Plan tested this locally on venv, conda, and container --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-24 16:53:31 -08:00
github-actions[bot]	47f8c592b9	Bump version to 0.1.4	2025-02-24 15:59:26 -08:00
Ashwin Bharambe	9b0f783e54	test: add a ci-tests distro template for running e2e tests (#1237 )	2025-02-24 14:43:21 -08:00
Hardik Shah	27a08b7266	test fix for sometimes tools get called more than once	2025-02-24 13:16:40 -08:00
ehhuang	e8f4efba44	test: fix test_tool_choice (#1234 ) Summary: Test Plan: --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1234). * __->__ #1234 * #1214	2025-02-24 12:42:42 -08:00
ehhuang	14c38acf97	fix: set default tool_prompt_format in inference api (#1214 ) Summary: Currently we don't set the best tool_prompt_format according to model as promisd. Test Plan: Added print around raw model input and inspected manually --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1214). * #1234 * __->__ #1214	2025-02-24 12:38:37 -08:00
Sébastien Han	c4987bc349	fix: avoid failure when no special pip deps and better exit (#1228 ) # What does this PR do? When building providers in a virtual environment or containers, special pip dependencies may not always be provided (e.g., for Ollama). The check should only fail if the required number of arguments is missing. Currently, two arguments are mandatory: 1. Environment name 2. Pip dependencies Additionally, return statements were replaced with sys.exit(1) in error conditions to ensure immediate termination on critical failures. Error handling in the stack build process was also improved to guarantee the program exits with status 1 when facing configuration issues or build failures. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan This command shouldn't fail: ``` llama stack build --template ollama --image-type venv ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-24 13:18:52 -05:00
Ashwin Bharambe	d6356f822a	fix: remove UV_SYSTEM_PYTHON from getting started notebook since llama stack build detects notebook environment	2025-02-24 10:05:02 -08:00
Ashwin Bharambe	e8e8fe7c93	fix: add LLAMA_STACK_CLIENT_DIR mount when installing in docker from source	2025-02-24 10:00:57 -08:00
Ashwin Bharambe	641549c631	Add llama stack client overrides also; necessary for correct docker building	2025-02-24 07:51:11 -08:00
Reid	1842eeb96f	docs: small fixes (#1224 )	2025-02-24 07:59:58 -05:00
Ashwin Bharambe	0973d386e6	fix: update build_container.sh to ensure llama-models is installed first	2025-02-23 21:47:26 -08:00
Yuan Tang	17162b9978	docs: Add vLLM to the list of inference providers in concepts and providers pages (#1227 ) This increases visibility of the vLLM provider.	2025-02-23 20:16:30 -08:00
Charlie Doern	34e3faa4e8	feat: add --run to llama stack build (#1156 ) # What does this PR do? --run runs the stack that was just build using the same arguments during the build process (image-name, type, etc) This simplifies the workflow a lot and makes the UX better for most local users trying to get started rather than having to match the flags of the two commands (build and then run) Also, moved `ImageType` to distribution.utils since there were circular import errors with its old location ## Test Plan tested locally using the following command: `llama stack build --run --template ollama --image-type venv` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-23 22:06:09 -05:00
Ashwin Bharambe	6227e1e3b9	fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225 ) Make sure venv behaves like conda (no prefix is added to image_name) and `--image-type venv` inside a notebook "just works" without any fiddling	2025-02-23 16:57:11 -08:00
Francisco Arceo	19ae4b35d9	docs: Adding Provider sections to docs (#1195 ) # What does this PR do? Adding Provider sections to docs (some of these will be empty and need updating). This PR is still a draft while I seek feedback from other contributors. I opened it to make the structure visible in the linked GitHub Issue. # Closes https://github.com/meta-llama/llama-stack/issues/1189 - Providers Overview Page ![Screenshot 2025-02-21 at 12 15 09 PM](https://github.com/user-attachments/assets/e83e5a17-0d96-4de0-8251-68161799a054) - SQLite-Vec specific page ![Screenshot 2025-02-21 at 12 15 34 PM](https://github.com/user-attachments/assets/14773900-fc8f-49e9-832a-b060b7ca010a) ## Test Plan N/A [//]: # (## Documentation) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-22 11:59:34 -08:00
Ashwin Bharambe	b890d7a611	Test be not having prints yo	2025-02-21 16:43:00 -08:00
ehhuang	c9e08cc0a8	test: do not overwrite agent_config (#1216 ) Summary: Test Plan:	2025-02-21 16:38:56 -08:00
Reid	187524d4ae	feat: add substring search for model list (#1099 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `llama model list` or `llama model list --show-all` will list more or all for the models, so add the `search` option to simplify the output. ``` $ llama model list --help usage: llama model list [-h] [--show-all] [-s SEARCH] Show available llama models options: -h, --help show this help message and exit --show-all Show all models (not just defaults) -s SEARCH, --search SEARCH Search for the input string as a substring in the model descriptor(ID) $ llama model list -s 70b +-----------------------+-----------------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +-----------------------+-----------------------------------+----------------+ \| Llama3.1-70B \| meta-llama/Llama-3.1-70B \| 128K \| +-----------------------+-----------------------------------+----------------+ \| Llama3.1-70B-Instruct \| meta-llama/Llama-3.1-70B-Instruct \| 128K \| +-----------------------+-----------------------------------+----------------+ \| Llama3.3-70B-Instruct \| meta-llama/Llama-3.3-70B-Instruct \| 128K \| +-----------------------+-----------------------------------+----------------+ $ llama model list -s 3.1-8b +----------------------+----------------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +----------------------+----------------------------------+----------------+ \| Llama3.1-8B \| meta-llama/Llama-3.1-8B \| 128K \| +----------------------+----------------------------------+----------------+ \| Llama3.1-8B-Instruct \| meta-llama/Llama-3.1-8B-Instruct \| 128K \| +----------------------+----------------------------------+----------------+ $ llama model list --show-all -s pro +----------------------+-----------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +----------------------+-----------------------------+----------------+ \| Prompt-Guard-86M \| meta-llama/Prompt-Guard-86M \| 2K \| +----------------------+-----------------------------+----------------+ $ llama model list -s k Not found for search. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 16:38:10 -08:00
Ashwin Bharambe	5be628f637	Add test jsons to MANIFEST for now	2025-02-21 16:25:51 -08:00
Ashwin Bharambe	45ffe87d7c	Kill noise from test output	2025-02-21 15:37:23 -08:00
ehhuang	bf38d0aba0	test: fix test_rag_agent test (#1215 ) Summary: Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py::test_rag_agent --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-21 15:24:28 -08:00
Ashwin Bharambe	e7d261ef4a	Fix test infra, sentence embeddings mixin	2025-02-21 15:11:46 -08:00
Ashwin Bharambe	182608d4bf	better test naming	2025-02-21 14:27:08 -08:00
Ashwin Bharambe	ab54b8cd58	feat(providers): support non-llama models for inference providers (#1200 ) This PR begins the process of supporting non-llama models within Llama Stack. We start simple by adding support for this functionality within a few existing providers: fireworks, together and ollama. ## Test Plan ```bash LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/inference/test_text_inference.py \ --inference-model accounts/fireworks/models/phi-3-vision-128k-instruct ``` ^ this passes most of the tests but as expected fails the tool calling related tests since they are very specific to Llama models ``` inference/test_text_inference.py::test_text_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED inference/test_text_inference.py::test_completion_log_probs_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED inference/test_text_inference.py::test_completion_log_probs_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED inference/test_text_inference.py::test_text_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct-completion-01] PASSED inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet do humans live on?-Earth] PASSED inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet has rings around it with a name starting w ith letter S?-Saturn] PASSED inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What's the name of the Sun in latin?-Sol] PASSED inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What is the name of the US captial?-Washington] PASSED inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED inference/test_text_inference.py::test_text_chat_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct] ERROR inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-True] PASSED inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-False] PASSED ```	2025-02-21 13:21:28 -08:00
Sébastien Han	9bbe34694d	ci: add mypy for static type checking (#1101 ) # What does this PR do? - Enable mypy to run in the CI on a subset of the repository - Fix a few mypy errors - Run mypy from pre-commit Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-21 13:15:40 -08:00
ehhuang	25fddccfd8	feat: tool outputs metadata (#1155 ) Summary: Allows tools to output metadata. This is useful for evaluating tool outputs, e.g. RAG tool will output document IDs, which can be used to score recall. Will need to make a similar change on the client side to support ClientTool outputting metadata. Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py	2025-02-21 13:15:31 -08:00
Ashwin Bharambe	36162c8c82	fix(ollama): register model with the helper first so it gets normalized	2025-02-21 12:51:38 -08:00
Xi Yan	0fe071764f	feat(1/n): api: unify agents for handling server & client tools (#1178 ) # Problem Our current Agent framework has discrepancies in definition on how we handle server side and client side tools. 1. Server Tools: a single Turn is returned including `ToolExecutionStep` in agenst 2. Client Tools: `create_agent_turn` is called in loop with client agent lib yielding the agent chunk `ad6ffc63df/src/llama_stack_client/lib/agents/agent.py (L186-L211)` This makes it inconsistent to work with server & client tools. It also complicates the logs to telemetry to get information about agents turn / history for observability. #### Principle The same `turn_id` should be used to represent the steps required to complete a user message including client tools. ## Solution 1. `AgentTurnResponseEventType.turn_awaiting_input` status to indicate that the current turn is not completed, and awaiting tool input 2. `continue_agent_turn` endpoint to update agent turn with client's tool response. # What does this PR do? - Skeleton API as example ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] - Just API update, no functionality change ``` llama stack run + client-sdk test ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/7ac56b5f-f424-4632-9476-7e0f57555bc3" /> [//]: # (## Documentation)	2025-02-21 11:48:27 -08:00
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Ashwin Bharambe	11697f85c5	fix: pull ollama embedding model if necessary (#1209 ) Embedding models are tiny and can be pulled on-demand. Let's do that so the user doesn't have to do "yet another thing" to get themselves set up. Thanks @hardikjshah for the suggestion. Also fixed a build dependency miss (TODO: distro_codegen needs to actually check that the build template contains all providers mentioned for the run.yaml file) ## Test Plan First run `ollama rm all-minilm:latest`. Run `llama stack build --template ollama && llama stack run ollama --env INFERENCE_MODEL=llama3.2:3b-instruct-fp16`. See that it outputs a "Pulling embedding model `all-minilm:latest`" output and the stack starts up correctly. Verify that `ollama list` shows the model is correctly downloaded.	2025-02-21 10:35:56 -08:00
Jamie Land	840fae2259	fix: Updating images so that they are able to run without root access (#1208 ) # What does this PR do? Addresses issues where the container is unable to run as root. Gives write access to required folders. [//]: # (If resolving an issue, uncomment and update the line below) (Closes #[1207]) ## Test Plan I built locally and ran `llama stack build --template remote-vllm --image-type container` and validated I could see my changes in the output: ``` #11 1.186 Installed 11 packages in 61ms #11 1.186 + llama-models==0.1.3 #11 1.186 + llama-stack==0.1.3 #11 1.186 + llama-stack-client==0.1.3 #11 1.186 + markdown-it-py==3.0.0 #11 1.186 + mdurl==0.1.2 #11 1.186 + prompt-toolkit==3.0.50 #11 1.186 + pyaml==25.1.0 #11 1.186 + pygments==2.19.1 #11 1.186 + rich==13.9.4 #11 1.186 + tiktoken==0.9.0 #11 1.186 + wcwidth==0.2.13 #11 DONE 1.6s #12 [ 9/10] RUN mkdir -p /.llama /.cache #12 DONE 0.3s #13 [10/10] RUN chmod -R g+rw /app /.llama /.cache #13 DONE 0.3s #14 exporting to image #14 exporting layers #14 exporting layers 3.7s done #14 writing image sha256:11cc8bd954db6d036037bcaf471b173ddd5261ac4b1e72074cccf85d18aefb96 done #14 naming to docker.io/library/distribution-remote-vllm:0.1.3 done #14 DONE 3.7s + set +x Success! ``` This is what the resulting image looks like: ![image](https://github.com/user-attachments/assets/070b9c05-b40f-4e7e-aa24-fef260c395e3) Also tagged the image as `0.1.3-test` and [pushed to quay](https://quay.io/repository/jland/distribution-remote-vllm?tab=tags) (note there are a bunch of critical vulnerabilities we may want to look into) And for good measure I deployed the resulting image on my Openshift environment using the default Security Context and validated that there were no issue with it coming up. My validation was all done with the `vllm-remote` distribution, but if I am understanding everything correctly the other distributions are just different run.yaml configs. [//]: # (## Documentation) Please let me know if there is anything else I need to do. Co-authored-by: Jamie Land <hokie10@gmail.com>	2025-02-21 11:32:56 -05:00

1 2 3 4 5 ...

1275 commits