llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Ihar Hrachyshka	c1f7d7f005	fix: miscellaneous job management improvements in torchtune (#1136 ) - refactor: simplify job status extraction a bit - torchtune: save job status on schedule - refactor: get rid of job_list in torchtune job management code # What does this PR do? A failed job is now registered in API, and one can consult its status. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73 JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688)) ``` [//]: # (## Documentation) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-19 19:09:37 -08:00
Francisco Arceo	7972daa72e	feat: Chunk sqlite-vec writes (#1094 ) # What does this PR do? 1. This PR adds batch inserts into sqlite-vec as requested in https://github.com/meta-llama/llama-stack/pull/1040 - Note: the inserts uses a uuid generated from the hash of the document id and chunk content. 2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can add similar tests to Faiss. ## Test Plan 1. Integration tests: ```python INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` 3. Unit tests: ```python pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ... llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED ``` I also tested using the same example RAG script in https://github.com/meta-llama/llama-stack/pull/1040 and received the output. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-19 19:07:46 -08:00
Ashwin Bharambe	cdcbeb005b	chore: remove llama_models.llama3.api imports from providers (#1107 ) There should be a choke-point for llama3.api imports -- this is the prompt adapter. Creating a ChatFormat() object on demand is inexpensive. The underlying Tokenizer is a singleton anyway.	2025-02-19 19:01:29 -08:00
Ben Browning	e9b8259cf9	fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123 ) # What does this PR do? Before this change, `distro_codegen.py` would only work if the user manually installed multiple provider-specific dependencies (see #1122). Now, users can run `distro_codegen.py` without any provider-specific dependencies because we avoid importing the entire provider implementations just to get the config needed to build the provider template. Concretely, this mostly means moving the MODEL_ALIASES (and related variants) definitions to a new models.py class within the provider implementation for those providers that require additional dependencies. It also meant moving a couple of imports from top-level imports to inside `get_adapter_impl` for some providers, which follows the pattern used by multiple existing providers. To ensure we don't regress and accidentally add new imports that cause distro_codegen.py to fail, the stubbed-in pre-commit hook for distro_codegen.py was uncommented and slightly tweaked to run via `uv run python ...` to ensure it runs with only the project's default dependencies and to run automatically instead of manually. Lastly, this updates distro_codegen.py itself to keep track of paths it might have changed and to only `git diff` those specific paths when checking for changed files instead of doing a diff on the entire working tree. The latter was overly broad and would require a user have no other unstaged changes in their working tree, even if those unstaged changes were unrelated to generated code. Now it only flags uncommitted changes for paths distro_codegen.py actually writes to. Our generated code was also out-of-date, presumably because of these issues, so this commit also has some updates to the generated code purely because it was out of sync, and the pre-commit hook now enforces things to be updated. (Closes #1122) ## Test Plan I manually tested distro_codegen.py and the pre-commit hook to verify those work as expected, flagging any uncommited changes and catching any imports that attempt to pull in provider-specific dependencies. However, I do not have valid api keys to the impacted provider implementations, and am unable to easily run the inference tests against each changed provider. There are no functional changes to the provider implementations here, but I'd appreciate a second set of eyes on the changed import statements and moving of MODEL_ALIASES type code to a separate models.py to ensure I didn't make any obvious errors. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-19 18:39:20 -08:00
Ashwin Bharambe	034ece0011	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
Ashwin Bharambe	31a5ba5268	Add title to the json schemas	2025-02-19 13:26:39 -08:00
ehhuang	8de7cf103b	feat: support tool_choice = {required, none, <function>} (#1059 ) Summary: titled Test Plan: added tests and LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-18 23:25:15 -05:00
Xi Yan	37cf60b732	style: remove prints in codebase (#1146 ) # What does this PR do? - replace prints in codebase with logger - update print_table to use rich Table ## Test Plan - library client script in https://github.com/meta-llama/llama-stack/pull/1145 ``` llama stack list-providers ``` <img width="1407" alt="image" src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2" /> [//]: # (## Documentation)	2025-02-18 19:41:37 -08:00
Xi Yan	e8cb9e0adb	fix: direct client pydantic type casting (#1145 ) # What does this PR do? - Closes #1142 - Root cause is due to having `Union[str, AgenToolGroupWithArgs]` ## Test Plan - Test with script described in issue. - Print out final converted pydantic object <img width="1470" alt="image" src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6" /> [//]: # (## Documentation)	2025-02-18 16:07:54 -08:00
Reid	4e76d312fa	fix: modify the model id title for model list (#1095 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Re-check and based on the doc, the download model id, actually is model descriptor(also without `meta-llama/`). https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html ``` $ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor Fetching 8 files: 0%\| \| 0/8 [00:00<?, ?it/s] LICENSE.txt: 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████\| 7.71k/7.71k [00:00<00:00, 10.5MB/s] $ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/ usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<--- $ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found $ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8 Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): ^CTraceback (most recent call last): $ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:26:41 -08:00
Reid	d9f5beb15a	style: update download help text (#1135 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Based on the cade: `6b1773d530/llama_stack/cli/download.py (L454)` and the test, it can use comma to specify multiple model ids. So update the usage. ``` $ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00 Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B [Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 6.4/6.4 GB - 0:00:00 Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B $ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B original%2Fparams.json: 100%\|██████████████████████████████████████████████████████████\| 220/220 [00:00<00:00, 564kB/ Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B ... tokenizer.json: 100%\|█████████████████████████████████████████████████████████████\| 9.09M/9.09M [00:00<00:00, 9.18MB/s] Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B before: $ llama model download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models after: $ llama model download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:24:31 -08:00
Reid	92aefec191	style: update verify-download help text (#1134 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Based on the code `6b1773d530/llama_stack/cli/download.py (L379)` and test, `verify-download` should only use in `downloaded from Meta`. ``` test: no checklist.chk file for hf download $ llama model download --source meta --model-id Llama3.2-1B Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00 before: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify after: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums for models downloaded from Meta options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify (only for models downloaded from Meta) ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:15:26 -08:00
Reid	89d37687dd	chore: remove --no-list-templates option (#1121 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] From the code and the usage, seems cannot see that need to use `--no-list-templates` to handle, and also make the user confused from the help text, so try to remove it. ``` $ llama stack build --no-list-templates > Enter a name for your Llama Stack (e.g. my-local-stack): $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): before: $ llama stack build --help --list-templates, --no-list-templates Show the available templates for building a Llama Stack distribution (default: False) after: --list-templates Show the available templates for building a Llama Stack distribution ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:13:46 -08:00
Yuan Tang	743f434860	fix: Ensure a tool call can be converted before adding to buffer (#1119 ) # What does this PR do? This fixes an issue when running the e2e agent example: https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py ``` \| File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response \| tool_call = convert_tool_call(choice.delta.tool_calls[0]) \| File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call \| return ToolCall( \| File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__ \| validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) \| pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall \| call_id \| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/string_type \| tool_name.enum[BuiltinTool] \| Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/enum \| tool_name.str \| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/string_type \| arguments \| Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int] \| For further information visit https://errors.pydantic.dev/2.10/v/dict_type ``` This issue happened because not all arguments have been appended to the tool call buffer yet. The current code assumes that we are ready to convert the tool call whenever args can be converted to JSON successfully. In this case, `json.loads("202")` would succeed but the rest of the arguments have not been properly parsed yet. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan The e2e example worked successfully (although note that I ran the script twice with each function call separately due to https://github.com/meta-llama/llama-stack/issues/1120): ``` tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'} tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]" tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'} tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton \| whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"`4484174096`/\", \"title\": \"January 20, 1993, President Clinton was sworn in as the 42nd ...\", \"description\": \"WATCH: On January 20, 1993, President Bill Clinton was sworn in as the 42nd President of the United States. #InaugurationDay Video courtesy of the...\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared thoughts ...\", \"description\": \"AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features \\u00b7 \\u00a9 2024 Google LLC\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/shorts/vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=PHihhihVth0\", \"title\": \"Bill & Hillary Clinton returning to Little Rock for 20th ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}]]}" ``` All text inference tests passed. [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-15 00:19:16 -05:00
ehhuang	ab2b46e528	feat: log start, complete time to Agent steps (#1116 )	2025-02-14 17:48:06 -08:00
Reid	8dc1cac333	style: fix the capitalization issue (#1117 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. After: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-14 17:16:26 -08:00
Reid	3d88b81ccf	fix: remove the empty line (#1097 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Remove the empty line from help ``` before: $ llama model download --help --max-parallel MAX_PARALLEL Maximum number of concurrent downloads --ignore-patterns IGNORE_PATTERNS <<<<<<<<<empty line>>>>>>>>>> For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring safetensors files to avoid downloading duplicate weights. after: $ llama model download --help --max-parallel MAX_PARALLEL Maximum number of concurrent downloads --ignore-patterns IGNORE_PATTERNS For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring safetensors files to avoid downloading duplicate weights. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-14 09:33:20 -08:00
Sébastien Han	369cc513cb	fix: improve stack build on venv (#980 ) # What does this PR do? Added a pre_run_checks function to ensure a smooth environment setup by verifying prerequisites. It checks for an existing virtual environment, ensures uv is installed, and deactivates any active environment if necessary. Run the full build inside a venv created by 'uv'. Improved string handling in printf statements and added shellcheck suppressions for expected word splitting in pip commands. These enhancements improve robustness, prevent conflicts, and ensure a seamless setup process. Signed-off-by: Sébastien Han <seb@redhat.com> - [ ] Addresses issue (#issue) ## Test Plan Run the following command on either Linux or MacOS: ``` llama stack build --template ollama --image-type venv --image-name foo + build_name=foo + env_name=llamastack-foo + pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + RED='\033[0;31m' + NC='\033[0m' + ENVNAME= +++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh + SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution + source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh + pre_run_checks llamastack-foo + local env_name=llamastack-foo + is_command_available uv + command -v uv + '[' -d llamastack-foo ']' + run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + local env_name=llamastack-foo + local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + echo 'Creating new virtual environment llamastack-foo' Creating new virtual environment llamastack-foo + uv venv llamastack-foo Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13 Creating virtual environment at: llamastack-foo Activate with: source llamastack-foo/bin/activate + source llamastack-foo/bin/activate ++ '[' -n x ']' ++ SCRIPT_PATH=llamastack-foo/bin/activate ++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']' ++ deactivate nondestructive ++ unset -f pydoc ++ '[' -z '' ']' ++ '[' -z '' ']' ++ hash -r ++ '[' -z '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ '[' darwin24 = cygwin ']' ++ '[' darwin24 = msys ']' ++ export VIRTUAL_ENV ++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ export PATH ++ '[' x '!=' x ']' +++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ VIRTUAL_ENV_PROMPT='(llamastack-foo) ' ++ export VIRTUAL_ENV_PROMPT ++ '[' -z '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(llamastack-foo) ' ++ export PS1 ++ alias pydoc ++ true ++ hash -r + '[' -n '' ']' + '[' -n '' ']' + uv pip install --no-cache-dir llama-stack Using Python 3.13.1 environment at: llamastack-foo Resolved 50 packages in 1.25s Built fire==0.7.0 Prepared 50 packages in 1.22s Installed 50 packages in 126ms + annotated-types==0.7.0 + anyio==4.8.0 + blobfile==3.0.0 + certifi==2025.1.31 + charset-normalizer==3.4.1 + click==8.1.8 + distro==1.9.0 + filelock==3.17.0 + fire==0.7.0 + fsspec==2025.2.0 + h11==0.14.0 + httpcore==1.0.7 + httpx==0.28.1 + huggingface-hub==0.28.1 + idna==3.10 + jinja2==3.1.5 + llama-models==0.1.2 + llama-stack==0.1.2 + llama-stack-client==0.1.2 + lxml==5.3.1 + markdown-it-py==3.0.0 + markupsafe==3.0.2 + mdurl==0.1.2 + numpy==2.2.2 + packaging==24.2 + pandas==2.2.3 + pillow==11.1.0 + prompt-toolkit==3.0.50 + pyaml==25.1.0 + pycryptodomex==3.21.0 + pydantic==2.10.6 + pydantic-core==2.27.2 + pygments==2.19.1 + python-dateutil==2.9.0.post0 + python-dotenv==1.0.1 + pytz==2025.1 + pyyaml==6.0.2 + regex==2024.11.6 + requests==2.32.3 + rich==13.9.4 + setuptools==75.8.0 + six==1.17.0 + sniffio==1.3.1 + termcolor==2.5.0 + tiktoken==0.8.0 + tqdm==4.67.1 + typing-extensions==4.12.2 + tzdata==2025.1 + urllib3==2.3.0 + wcwidth==0.2.13 + '[' -n '' ']' + printf 'Installing pip dependencies\n' Installing pip dependencies + uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn Using Python 3.13.1 environment at: llamastack-foo Resolved 105 packages in 37ms Uninstalled 2 packages in 65ms Installed 72 packages in 195ms + aiohappyeyeballs==2.4.6 + aiohttp==3.11.12 + aiosignal==1.3.2 + aiosqlite==0.21.0 + attrs==25.1.0 + autoevals==0.0.119 + backoff==2.2.1 + braintrust-core==0.0.58 + chardet==5.2.0 + chevron==0.14.0 + chromadb-client==0.6.3 + contourpy==1.3.1 + cycler==0.12.1 + datasets==3.2.0 + deprecated==1.2.18 + dill==0.3.8 + faiss-cpu==1.10.0 + fastapi==0.115.8 + fonttools==4.56.0 + frozenlist==1.5.0 - fsspec==2025.2.0 + fsspec==2024.9.0 + googleapis-common-protos==1.66.0 + grpcio==1.70.0 + importlib-metadata==8.5.0 + jiter==0.8.2 + joblib==1.4.2 + jsonschema==4.23.0 + jsonschema-specifications==2024.10.1 + kiwisolver==1.4.8 + levenshtein==0.26.1 + matplotlib==3.10.0 + monotonic==1.6 + multidict==6.1.0 + multiprocess==0.70.16 + nltk==3.9.1 - numpy==2.2.2 + numpy==1.26.4 + ollama==0.4.7 + openai==1.61.1 + opentelemetry-api==1.30.0 + opentelemetry-exporter-otlp-proto-common==1.30.0 + opentelemetry-exporter-otlp-proto-grpc==1.30.0 + opentelemetry-exporter-otlp-proto-http==1.30.0 + opentelemetry-proto==1.30.0 + opentelemetry-sdk==1.30.0 + opentelemetry-semantic-conventions==0.51b0 + orjson==3.10.15 + overrides==7.7.0 + posthog==3.12.0 + propcache==0.2.1 + protobuf==5.29.3 + psycopg2-binary==2.9.10 + pyarrow==19.0.0 + pyparsing==3.2.1 + pypdf==5.3.0 + rapidfuzz==3.12.1 + redis==5.2.1 + referencing==0.36.2 + rpds-py==0.22.3 + safetensors==0.5.2 + scikit-learn==1.6.1 + scipy==1.15.1 + sentencepiece==0.2.0 + starlette==0.45.3 + tenacity==9.0.0 + threadpoolctl==3.5.0 + tokenizers==0.21.0 + transformers==4.48.3 + uvicorn==0.34.0 + wrapt==1.17.2 + xxhash==3.5.0 + yarl==1.18.3 + zipp==3.21.0 + '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']' + IFS='#' + read -ra parts + for part in '"${parts[@]}"' + echo 'sentence-transformers --no-deps' sentence-transformers --no-deps + uv pip install sentence-transformers --no-deps Using Python 3.13.1 environment at: llamastack-foo Resolved 1 package in 141ms Installed 1 package in 6ms + sentence-transformers==3.4.1 + for part in '"${parts[@]}"' + echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu' torch torchvision --index-url https://download.pytorch.org/whl/cpu + uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu Using Python 3.13.1 environment at: llamastack-foo Resolved 13 packages in 2.15s Installed 5 packages in 324ms + mpmath==1.3.0 + networkx==3.3 + sympy==1.13.1 + torch==2.6.0 + torchvision==0.21.0 Build Successful! ``` Run: ``` $ source llamastack-foo/bin/activate $ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '******' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' Warning: `bwrap` is not available. Code interpreter tool will not work correctly. modules.json: 100%\|███████████████████████████████████████████████████████████\| 349/349 [00:00<00:00, 485kB/s] config_sentence_transformers.json: 100%\|██████████████████████████████████████\| 116/116 [00:00<00:00, 498kB/s] README.md: 100%\|█████████████████████████████████████████████████████████\| 10.7k/10.7k [00:00<00:00, 20.5MB/s] sentence_bert_config.json: 100%\|████████████████████████████████████████████\| 53.0/53.0 [00:00<00:00, 583kB/s] config.json: 100%\|███████████████████████████████████████████████████████████\| 612/612 [00:00<00:00, 4.63MB/s] model.safetensors: 100%\|█████████████████████████████████████████████████\| 90.9M/90.9M [00:02<00:00, 36.6MB/s] tokenizer_config.json: 100%\|█████████████████████████████████████████████████\| 350/350 [00:00<00:00, 4.27MB/s] vocab.txt: 100%\|███████████████████████████████████████████████████████████\| 232k/232k [00:00<00:00, 1.90MB/s] tokenizer.json: 100%\|██████████████████████████████████████████████████████\| 466k/466k [00:00<00:00, 2.23MB/s] special_tokens_map.json: 100%\|███████████████████████████████████████████████\| 112/112 [00:00<00:00, 1.47MB/s] 1_Pooling/config.json: 100%\|██████████████████████████████████████████████████\| 190/190 [00:00<00:00, 841kB/s] Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API safety POST /v1/safety/run-shield Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [39145] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-14 09:22:03 -08:00
Yuan Tang	64328bfe62	fix: enable_session_persistence in AgentConfig should be optional (#1012 ) # What does this PR do? This issue was discovered in https://github.com/meta-llama/llama-stack/pull/1009#discussion_r1947036518. ## Test Plan This field is no longer required after the change. [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-14 09:19:53 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Sébastien Han	c0ee512980	build: configure ruff from pyproject.toml (#1100 ) # What does this PR do? - Remove hardcoded configurations from pre-commit. - Allow configuration to be set via pyproject.toml. - Merge .ruff.toml settings into pyproject.toml. - Ensure the linter and formatter use the defined configuration instead of being overridden by pre-commit. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-14 09:01:57 -08:00
raghotham	a3cb039e83	docs: Add region parameter to Bedrock provider (#1103 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-02-14 08:55:22 -08:00
Ben Browning	406465622e	fix: Update QdrantConfig to QdrantVectorIOConfig (#1104 ) # What does this PR do? This fixes an import introduced due to merging #1079 before #1039, and thus the changes from #1039 needing to update `QdrantConfig` to `QdrantVectorIOConfig`. ## Test Plan I ran the remote vllm provider inference tests against the latest main: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` That failed with: ``` File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module> from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py) ``` After this change, the import no longer fails and the tests pass. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-14 06:31:00 -08:00
Reid	2f7268b790	fix: add the missed help description info (#1096 )	2025-02-13 21:31:36 -08:00
Hardik Shah	b0b696cb4f	fix: regex pattern matching to support :path suffix in the routes (#1089 ) This PR fixes client sdk test failure -- `3720312204` by updating the regex matching pattern to also consider `:path` in the routes	2025-02-13 18:18:23 -08:00
Xi Yan	da53dc3f5f	fix: openapi for eval-task (#1085 ) # What does this PR do? - as title ## Test Plan - the deprecated endpoint need to obey what it was before [//]: # (## Documentation)	2025-02-13 17:10:45 -08:00
Xi Yan	8b655e3cd2	fix!: update eval-tasks -> benchmarks (#1032 ) # What does this PR do? - Update `/eval-tasks` to `/benchmarks` - ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Ben Browning <ben324@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu <reid201711@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 16:40:58 -08:00
Bill Murdock	32d1e50a6f	test: Add qdrant to provider tests (#1039 ) # What does this PR do? This is a follow on to #1022 . It includes the changes I needed to be able to test the Qdrant support as requested by @terrytangyuan . I uncovered a lot of bigger, more systemic issues with the vector DB testing and I will open a new issue for those. For now, I am just delivering the work I already did on that. ## Test Plan As discussed on #1022: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` ``` ollama pull all-minilm:l6-v2 curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}' ``` ``` EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings ``` These show 3 tests passing and 15 deselected which is presumably working as intended. --------- Signed-off-by: Bill Murdock <bmurdock@redhat.com>	2025-02-13 15:44:55 -08:00
Yuan Tang	5858777ff0	fix: Update VectorIO config classes in registry (#1079 ) This was missed in https://github.com/meta-llama/llama-stack/pull/1023. ``` Traceback (most recent call last): File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module> main() File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main impls = asyncio.run(construct_stack(config)) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry) File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls impl = await instantiate_provider( File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider config_type = instantiate_class_type(provider_spec.config_class) File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type return getattr(module, class_name) AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig' ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 15:39:13 -08:00
Yuan Tang	8ff27b58fa	chore: Consistent naming for VectorIO providers (#1023 ) # What does this PR do? This changes all VectorIO providers classes to follow the pattern `<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All API endpoints for VectorIOs are currently consistent with `/vector-io`. Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. ## Test Plan I don't have a way to test all providers. This is a simple renaming so things should work as expected. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 13:15:49 -05:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Xi Yan	1527c30107	fix: remove :path in agents (#1077 ) # What does this PR do? Remove :path in agents, we cannot have :path in params inside endpoints except last one ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] ``` llama stack run ``` [//]: # (## Documentation)	2025-02-13 10:04:43 -08:00
Yuan Tang	f9ca441974	chore: Link to Groq docs in the warning message for preview model (#1060 ) This should be `llama-3.2-3b` instead of `llama-3.2-3b-instruct`.	2025-02-13 12:14:57 -05:00
Xi Yan	2fa9e3c941	fix: make backslash work in GET /models/{model_id:path} (#1068 )	2025-02-13 08:46:43 -08:00
Reid	47fccf0d03	style: update model id in model list title (#1072 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since the subcommands used `MODEL_ID`, it would be better to use it in `model list` and make it easy to find it. ``` $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID << $ llama model describe --help usage: llama model describe [-h] -m MODEL_ID << $ llama download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models before: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ \| Model Descriptor \| Hugging Face Repo \| Context Length \| +-----------------------------------------+-----------------------------------------------------+----------------+ after: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ \| Model Descriptor \| Model ID \| Context Length \| +-----------------------------------------+-----------------------------------------------------+----------------+ \| Llama3.1-8B \| meta-llama/Llama-3.1-8B \| 128K \| +-----------------------------------------+-----------------------------------------------------+----------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-13 08:33:11 -08:00
Sébastien Han	418645696a	fix: improve signal handling and update dependencies (#1044 ) # What does this PR do? This commit enhances the signal handling mechanism in the server by improving the `handle_signal` (previously handle_sigint) function. It now properly retrieves the signal name, ensuring clearer logging when a termination signal is received. Additionally, it cancels all running tasks and waits for their completion before stopping the event loop, allowing for a more graceful shutdown. Support for handling SIGTERM has also been added alongside SIGINT. Before the changes, handle_sigint used asyncio.run(run_shutdown()). However, asyncio.run() is meant to start a new event loop, and calling it inside an existing one (like when running Uvicorn) raises an error. The fix replaces asyncio.run(run_shutdown()) with an async function scheduled on the existing loop using loop.create_task(shutdown()). This ensures that the shutdown coroutine runs within the current event loop instead of trying to create a new one. Furthermore, this commit updates the project dependencies. `fastapi` and `uvicorn` have been added to the development dependencies in `pyproject.toml` and `uv.lock`, ensuring that the necessary packages are available for development and execution. Closes: https://github.com/meta-llama/llama-stack/issues/1043 Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run a server and send SIGINT: ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '******' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss. INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss. INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available. INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2... INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API safety POST /v1/safety/run-shield Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [65372] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ^CINFO: Shutting down INFO: Finished server process [65372] Received signal SIGINT (2). Exiting gracefully... INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 08:07:59 -08:00
Ben Browning	dd1a366347	fix: logprobs support in remote-vllm provider (#1074 ) # What does this PR do? The remote-vllm provider was not passing logprobs options from CompletionRequest or ChatCompletionRequests through to the OpenAI client parameters. I manually verified this, as well as observed this provider failing `TestInference::test_completion_logprobs`. This was filed as issue #1073. This fixes that by passing the `logprobs.top_k` value through to the parameters we pass into the OpenAI client. Additionally, this fixes a bug in `test_text_inference.py` where it mistakenly assumed chunk.delta were of type `ContentDelta` for completion requests. The deltas are of type `ContentDelta` for chat completion requests, but for basic completion requests the deltas are of type string. This test was likely failing for other providers that did properly support logprobs because of this latter issue in the test, which was hit while fixing the above issue with the remote-vllm provider. (Closes #1073) ## Test Plan First, you need a vllm running. I ran one locally like this: ``` vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json ``` Next, run test_text_inference.py against this vllm using the remote vllm provider like this: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` Before my change, the test failed with this error: ``` llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs assert 1 <= len(response.logprobs) <= 5 E TypeError: object of type 'NoneType' has no len() ``` After my change, the test passes. [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-13 11:00:00 -05:00
Ihar Hrachyshka	cc700b2f68	feat: support listing all for `llama stack list-providers` (#1056 ) # What does this PR do? Support listing all for `llama stack list-providers`. For ease of reading, sort the output rows by type. Before the change. ```  llama stack list-providers usage: llama stack list-providers [-h] {inference,safety,agents,vector_io,datasetio,scoring,eval,post_training,tool_runtime,telemetry} llama stack list-providers: error: the following arguments are required: api ``` After the change. ``` +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| API Type \| Provider Type \| PIP Package Dependencies \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| agents \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| datasetio \| inline::localfs \| pandas \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| datasetio \| remote::huggingface \| datasets \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| eval \| inline::meta-reference \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::meta-reference \| accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- \| \| \| \| enforcer,sentence-transformers \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::meta-reference-quantized \| accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- \| \| \| \| enforcer,sentence-transformers,fbgemm-gpu,torchao==0.5.0 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::sentence-transformers \| sentence-transformers \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::vllm \| vllm \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::bedrock \| boto3 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::cerebras \| cerebras_cloud_sdk \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::databricks \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::fireworks \| fireworks-ai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::groq \| groq \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::hf::endpoint \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::hf::serverless \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::nvidia \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::ollama \| ollama,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::runpod \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::sambanova \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::tgi \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::together \| together \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::vllm \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| post_training \| inline::torchtune \| torch,torchtune==0.5.0,torchao==0.8.0,numpy \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::code-scanner \| codeshield \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::llama-guard \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::meta-reference \| transformers,torch --index-url https://download.pytorch.org/whl/cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::prompt-guard \| transformers,torch --index-url https://download.pytorch.org/whl/cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| remote::bedrock \| boto3 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::basic \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::braintrust \| autoevals,openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::llm-as-judge \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| telemetry \| inline::meta-reference \| opentelemetry-sdk,opentelemetry-exporter-otlp-proto-http \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| inline::code-interpreter \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| inline::rag-runtime \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::bing-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::brave-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::model-context-protocol \| mcp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::tavily-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::wolfram-alpha \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::chromadb \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::faiss \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::meta-reference \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::chromadb \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::pgvector \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no- \| \| \| \| deps,psycopg2-binary \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::qdrant \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,qdrant- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::weaviate \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,weaviate- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-12 22:03:28 -08:00
Francisco Arceo	119fe8742a	feat: Adding sqlite-vec as a vectordb (#1040 ) # What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-12 10:50:03 -08:00
Charlie Doern	025f615868	feat: add support for running in a venv (#1018 ) # What does this PR do? add --image-type to `llama stack run`. Which takes conda, container or venv also add start_venv.sh which start the stack using a venv resolves #1007 ## Test Plan running locally: `llama stack build --template ollama --image-type venv` `llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml` ... ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml Run configuration: apis: - agents - datasetio ... ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 11:13:04 -05:00
Charlie Doern	5f88ff0b6a	fix: show proper help text (#1065 ) # What does this PR do? when executing a sub-command like `llama model` the improper help text, sub-commands, and flags are displayed. each command group needs to have `.set_defaults` to display this info properly before: ``` llama model usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} ``` after: ``` llama model usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download} ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 06:38:25 -08:00
Yuan Tang	5e97dd9919	feat: Support tool calling for streaming chat completion in remote vLLM provider (#1063 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Closes https://github.com/meta-llama/llama-stack/issues/1046. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 14 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 7%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 14%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 21%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 28%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 35%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 42%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 57%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 64%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 71%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 78%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 85%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] PASSED [ 92%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] PASSED [100%] =============================================== 12 passed, 2 xfailed, 1 warning in 366.56s (0:06:06) ================================================ ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-12 06:17:21 -08:00
Sébastien Han	bf11cc0450	chore: update return type to Optional[str] (#982 )	2025-02-11 22:10:28 -08:00
Xi Yan	66d7e15c93	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 ) # What does this PR do? Problem - Using script: https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085 - This hits an issue on server with `code_interpreter` not found, as we do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`. This is a general issue where model always tries to output `code_interpreter` in `ToolCall` even when we do not have `code_interpreter` available for execution. Reproduce Deeper Problem in chat-completion - Use script: https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603 1. We currently always populate `code_interpreter` in `ToolCall` in ChatCompletionResponse if the model's response begins with `<\|python_tag\|>`. See `c5f5958498/models/llama3/api/chat_format.py (L200-L213)` <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> 2. This happens even if we do not pass the `code_interpreter` as a `tools` in ChatCompletionRequest. This PR Explicitly make sure that the tools returned in `ChatCompletionResponse.tool_calls` is always a tool requested by `ChatCompletionRequest.tools`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Before <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> <img width="997" alt="image" src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110" /> After <img width="856" alt="image" src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9" /> <img width="1000" alt="image" src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef" /> Unit Test ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/ ``` <img width="1002" alt="image" src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779" /> Streaming - Chat Completion <img width="902" alt="image" src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a" /> - Agent <img width="916" alt="image" src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-11 18:31:35 -08:00
Yuan Tang	dd37e58868	feat: Support tool calling for non-streaming chat completion in remote vLLM provider (#1034 ) # What does this PR do? This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider. ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 12 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 8%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 16%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 41%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-11 21:08:29 -05:00
Ihar Hrachyshka	24385cfd03	fix: filter out remote::sample providers when listing (#1057 ) # What does this PR do? Before: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ \| remote::sample \| \| +------------------------+-----------------------------------------------------------------------+ ``` After: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-11 16:12:46 -08:00
Dinesh Yeduguru	d8a20e034b	feat: make telemetry attributes be dict[str,PrimitiveType] (#1055 ) # What does this PR do? Make attributes in telemetry be only primitive types and avoid arbitrary nesting. ## Test Plan ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py -k "test_builtin_tool_web_search" # Verified that attributes still show up correclty in jaeger ```	2025-02-11 15:10:17 -08:00
Dinesh Yeduguru	ab7f802698	feat: add MetricResponseMixin to chat completion response types (#1050 ) # What does this PR do? Defines a MetricResponseMixin which can be inherited by any response class. Adds it to chat completion response types. This is a short term solution to allow inference API to return metrics The ideal way to do this is to have a way for all response types to include metrics and all metric events logged to the telemetry API to be included with the response To do this, we will need to augment all response types with a metrics field. We have hit a blocker from stainless SDK that prevents us from doing this. The blocker is that if we were to augment the response types that have a data field in them like so class ListModelsResponse(BaseModel): metrics: Optional[List[MetricEvent]] = None data: List[Models] ... The client SDK will need to access the data by using a .data field, which is not ergonomic. Stainless SDK does support unwrapping the response type, but it requires that the response type to only have a single field. We will need a way in the client SDK to signal that the metrics are needed and if they are needed, the client SDK has to return the full response type without unwrapping it. ## Test Plan sh run_openapi_generator.sh ./ sh stainless_sync.sh dineshyv/dev add-metrics-to-resp-v4 LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/fireworks/fireworks-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py	2025-02-11 14:58:12 -08:00
ehhuang	96c88397da	fix: agent config validation (#1053 ) Summary: Fixes AgentConfig init bug introduced with ToolConfig. Namely, the below doesn't work ``` agent_config = AgentConfig( **common_params, tool_config=ToolConfig( tool_choice="required", ), ) ``` bvecause tool_choice was defaulted to 'auto' leading to validation check failing. Test Plan: added unittests LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-11 14:48:42 -08:00
Yuan Tang	d954f2752e	fix: Added missing `tool_config` arg in SambaNova `chat_completion()` (#1042 ) # What does this PR do? `tool_config` is missing from the signature but is used in `ChatCompletionRequest()`. ## Test Plan This is a small fix. I don't have SambaNova to test the change but I doubt that this is currently working. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 21:20:50 -08:00

... 8 9 10 11 12 ...

1129 commits