llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 10:10:36 +00:00

Author	SHA1	Message	Date
Reid	c9c4a3c921	feat: model remove cmd (#1128 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] add a subcommand, help to clean the unneeded model: ``` $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit $ llama model remove --help usage: llama model remove [-h] -m MODEL [-f] Remove the downloaded llama model options: -h, --help show this help message and exit -m MODEL, --model MODEL Specify the llama downloaded model name -f, --force Used to forcefully remove the llama model from the storage without further confirmation $ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8 Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n Removal aborted. $ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f Llama3.2-1B-Instruct:int4-qlora-eo8 removed. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 08:05:12 -08:00
Ashwin Bharambe	81ce39a607	feat(api): Add options for supporting various embedding models (#1192 ) We need to support: - asymmetric embedding models (#934) - truncation policies (#933) - varying dimensional output (#932) ## Test Plan ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ```	2025-02-20 22:27:12 -08:00
Ashwin Bharambe	6f9d622340	fix(api): update embeddings signature so inputs and outputs list align (#1161 ) See Issue #922 The change is slightly backwards incompatible but no callsite (in our client codebases or stack-apps) every passes a depth-2 `List[List[InterleavedContentItem]]` (which is now disallowed.) ## Test Plan ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ``` Also ran `tests/client-sdk/inference/test_embeddings.py`	2025-02-20 21:43:13 -08:00
ehhuang	cfa752fc92	fix: pass tool_prompt_format to chat_formatter (#1198 ) Summary: Need this to format the completion message with tool_calls correctly. See added unittest. Test Plan: python -m unittest llama_stack.providers.tests.inference.test_prompt_adapter	2025-02-20 21:38:35 -08:00
Ashwin Bharambe	dd43494847	Fix inference test fixture	2025-02-20 21:24:49 -08:00
Ben Browning	6820718b71	fix: BuiltinTool JSON serialization in remote vLLM provider (#1183 ) # What does this PR do? The `tool_name` attribute of `ToolDefinition` instances can either be a str or a BuiltinTool enum type. This fixes the remote vLLM provider to use the value of those BuiltinTool enums when serializing to JSON instead of attempting to serialize the actual enum to JSON. Reference of how this is handled in some other areas, since I followed that same pattern for the remote vLLM provider here: - [remote nvidia provider](https://github.com/meta-llama/llama-stack/blob/v0.1.3/llama_stack/providers/remote/inference/nvidia/openai_utils.py#L137-L140) - [meta reference provider](https://github.com/meta-llama/llama-stack/blob/v0.1.3/llama_stack/providers/inline/agents/meta_reference/agent_instance.py#L635-L636) There is opportunity to potentially reconcile the remove nvidia and remote vllm bits where they are both translating Llama Stack Inference APIs to OpenAI client requests, but that's a can of worms I didn't want to open for this bug fix. This explicitly fixes this error when using the remote vLLM provider and the agent tests: ``` TypeError: Object of type BuiltinTool is not JSON serializable ``` So, this is related to #1144 and addresses the immediate issue raised there. With this fix, `tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search` now gets past the JSON serialization error when using the remote vLLM provider and actually attempts to call the web search tool. I don't have any API keys setup for the actual web search providers yet, so I cannot verify everything works after that point. ## Test Plan I ran the `test_builtin_tool_web_search` locally with the remote vLLM provider like: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` Before my change, that reproduced the `TypeError: Object of type BuiltinTool is not JSON serializable` error. After my change, that error is gone and the test actually attempts the web search. That failed for me locally, due to lack of API key, but it gets past the JSON serialization error. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-20 21:18:37 -08:00
Ashwin Bharambe	35ae0e16a1	Fix sqlite_vec config defaults	2025-02-20 17:50:33 -08:00
Matthew Farrellee	832c535aaf	feat(providers): add NVIDIA Inference embedding provider and tests (#935 ) # What does this PR do? add /v1/inference/embeddings implementation to NVIDIA provider open topics - - asymmetric models. NeMo Retriever includes asymmetric models, which are models that embed differently depending on if the input is destined for storage or lookup against storage. the /v1/inference/embeddings api does not allow the user to indicate the type of embedding to perform. see https://github.com/meta-llama/llama-stack/issues/934 - truncation. embedding models typically have a limited context window, e.g. 1024 tokens is common though newer models have 8k windows. when the input is larger than this window the endpoint cannot perform its designed function. two options: 0. return an error so the user can reduce the input size and retry; 1. perform truncation for the user and proceed (common strategies are left or right truncation). many users encounter context window size limits and will struggle to write reliable programs. this struggle is especially acute without access to the model's tokenizer. the /v1/inference/embeddings api does not allow the user to delegate truncation policy. see https://github.com/meta-llama/llama-stack/issues/933 - dimensions. "Matryoshka" embedding models are available. they allow users to control the number of embedding dimensions the model produces. this is a critical feature for managing storage constraints. embeddings of 1024 dimensions what achieve 95% recall for an application may not be worth the storage cost if a 512 dimensions can achieve 93% recall. controlling embedding dimensions allows applications to determine their recall and storage tradeoffs. the /v1/inference/embeddings api does not allow the user to control the output dimensions. see https://github.com/meta-llama/llama-stack/issues/932 ## Test Plan - `llama stack run llama_stack/templates/nvidia/run.yaml` - `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_embedding.py --embedding-model baai/bge-m3` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-20 16:59:48 -08:00
Ashwin Bharambe	2608b6074f	Update embedding dimension singular	2025-02-20 16:14:46 -08:00
Ashwin Bharambe	9436dd570d	feat: register embedding models for ollama, together, fireworks (#1190 ) # What does this PR do? We have support for embeddings in our Inference providers, but so far we haven't done the final step of actually registering the known embedding models and making sure they are extremely easy to use. This is one step towards that. ## Test Plan Run existing inference tests. ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ``` The value of the EMBEDDING_DIMENSION isn't actually used in these tests, it is merely used by the test fixtures to check if the model is an LLM or Embedding.	2025-02-20 15:39:08 -08:00
Ashwin Bharambe	736560ceba	Remove os.getenv() from ollama config	2025-02-20 14:30:32 -08:00
LESSuseLESS	2cbe9395b0	feat: D69478008 [llama-stack] turning tests into data-driven (#1180 ) # What does this PR do? We have several places running tests for different purposes. - oss llama stack - provider tests - e2e tests - provider llama stack - unit tests - e2e tests It would be nice if they can share the same set of test data, so we maintain the consistency between spec and implementation. This is what this diff is about, isolating test data from test coding, so that we can reuse the same data at different places by writing different test coding. ## Test Plan == Set up Ollama local server == Run a provider test conda activate stack OLLAMA_URL="http://localhost:8321" \ pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \ llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output // test_structured_output should also work == Run an e2e test conda activate sherpa with-proxy pip install llama-stack export INFERENCE_MODEL=llama3.2:3b-instruct-fp16 export LLAMA_STACK_PORT=8322 with-proxy llama stack build --template ollama with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama - Run test client, LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \ pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \ tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output // test_text_chat_completion_structured_output should also work ## Notes - This PR was automatically generated by oss_sync - Please refer to D69478008 for more details.	2025-02-20 14:13:06 -08:00
ehhuang	1166afdf76	fix: some telemetry APIs don't currently work (#1188 ) Summary: This bug is surfaced by using the http LS client. The issue is that non-scalar values in 'GET' method are `body` params in fastAPI, but our spec generation script doesn't respect that. We fix by just making them POST method instead. Test Plan: Test API call with newly sync'd client (https://github.com/meta-llama/llama-stack-client-python/pull/149) <img width="1114" alt="image" src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2" />	2025-02-20 14:09:25 -08:00
Xi Yan	ea1faae50e	chore!: deprecate eval/tasks (#1186 ) # What does this PR do? - Fully deprecate eval/tasks [//]: # (If resolving an issue, uncomment and update the line below) Closes #1088 NOTE: this will be a breaking change. We have introduced the new API in 0.1.3 . Notebook has been updated to use the new endpoints. ## Test Plan ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="611" alt="image" src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f" /> cc @SLR722 for awareness [//]: # (## Documentation)	2025-02-20 14:06:21 -08:00
Ashwin Bharambe	07ccf908f7	ModelAlias -> ProviderModelEntry	2025-02-20 14:02:36 -08:00
Vladimir Ivić	f7161611c6	feat: adding endpoints for files and uploads (#1070 ) Summary: Adds spec definitions for file uploads operations. This API focuses around two high level operations: * Initiating and managing upload session * Accessing uploaded file information Usage examples: To start a file upload session: ``` curl -X POST https://localhost:8321/v1/files \ -d '{ "key": "image123.jpg', "bucket": "images", "mime_type": "image/jpg", "size": 12345 }' # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 0, "size": 12345 } ``` To upload file content to an existing session ``` curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \ --data-binary @<path_to_local_file> # Returns { "key": "image123.jpg", "bucket": "images", "mime_type": "image/jpg", "bytes": 12345, "created_at": 1737492240 } # Implementing on server side (Flask example for simplicity): @app.route('/uploads/{upload_id}', methods=['POST']) def upload_content_to_session(upload_id): try: # Get the binary file data from the request body file_data = request.data # Save the file to disk save_path = f"./uploads/{upload_id}" with open(save_path, 'wb') as f: f.write(file_data) return {__uploaded_file_json__}, 200 except Exception as e: return 500 ``` To read information about an existing upload session ``` curl -i -X GET "https://localhost:8321/v1/files/session:<session_id> # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 1024, "size": 12345 } ``` To list buckets ``` GET /files # Returns { "data": [ {"name": "bucket1"}, {"name": "bucket2"}, ] } ``` To list all files in a bucket ``` GET /files/{bucket} # Returns { "data": [ { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, }, { "key": "persian_cat.jpg", "mime_type": "image/jpg", "bucket": "cats", "bytes": 39924, "created_at": 1727493440, }, ] } ``` To get specific file info ``` GET /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ``` To delete specific file ``` DELETE /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ```	2025-02-20 13:09:00 -08:00
Ashwin Bharambe	eddef0b2ae	chore: slight renaming of model alias stuff (#1181 ) Quick test by running: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk ```	2025-02-20 11:48:46 -08:00
Ashwin Bharambe	2eda050aef	Fix ollama fixture	2025-02-20 11:46:02 -08:00
Ashwin Bharambe	3d891fc9ba	ModelAlias cleanup	2025-02-20 11:44:39 -08:00
Ashwin Bharambe	984a8039ad	Kill unnecessary check on --safety-shield test param	2025-02-20 09:15:23 -08:00
Rashmi Pawar	996f27a308	fix: add logging import (#1174 ) # What does this PR do? Fixes logging import and the logger instance creation cc: @dglogo	2025-02-20 11:26:47 -05:00
Ihar Hrachyshka	fb6a3efb1d	feat: Enable CPU training for torchtune (#1140 ) # What does this PR do? You are now able to run a training cycle on CPU. This is useful for debugging and testing purposes. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan On a Mac machine without CUDA devices: ``` 17:00:24.417 [START] /v1/post-training/supervised-fine-tune DEBUG 2025-02-18 12:00:24,419 torchtune.utils._logging:60: Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0 INFO 2025-02-18 12:00:24,463 torchtune.utils._logging:64: Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. INFO 2025-02-18 12:00:46,699 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:182: Model is initialized with precision torch.bfloat16. INFO 2025-02-18 12:00:46,784 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:185: Tokenizer is initialized. INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:188: Optimizer is initialized. INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:192: Loss is initialized. INFO 2025-02-18 12:00:48,997 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:209: Dataset and Sampler are initialized. INFO 2025-02-18 12:00:48,998 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:227: Learning rate scheduler is initialized. Writing logs to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/log_1739898049.txt 1\|1\|Loss: 1.7414989471435547: 100% 1/1 [03:46<00:00, 226.21s/it]INFO 2025-02-18 12:04:35,227 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:528: Starting checkpoint save... INFO 2025-02-18 12:04:49,974 torchtune.utils._logging:121: Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth INFO 2025-02-18 12:04:49,981 torchtune.utils._logging:132: Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth model_file_path /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0 1\|1\|Loss: 1.7414989471435547: 100% 1/1 [04:01<00:00, 241.18s/it] INFO: ::1:64990 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK 17:04:50.364 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (265947.01ms) 17:00:24.419 [DEBUG] Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0 17:00:24.463 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. 17:00:46.700 [INFO] Model is initialized with precision torch.bfloat16. 17:00:46.784 [INFO] Tokenizer is initialized. 17:00:46.786 [INFO] Optimizer is initialized. 17:00:46.786 [INFO] Loss is initialized. 17:00:48.997 [INFO] Dataset and Sampler are initialized. 17:00:48.998 [INFO] Learning rate scheduler is initialized. 17:04:35.227 [INFO] Starting checkpoint save... 17:04:49.974 [INFO] Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth 17:04:49.981 [INFO] Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-19 22:42:58 -08:00
Xi Yan	a324ceb9a9	precommit again	2025-02-19 22:40:45 -08:00
Sébastien Han	4694780d23	test: skip model registration for unsupported providers (#1030 ) # What does this PR do? - Updated `test_register_with_llama_model` to skip tests when using the Ollama provider, as it does not support custom model names. - Delete `test_initialize_model_during_registering` since there is no "load_model" semantic that is exposed publicly on a provider. These changes ensure that tests do not fail for providers with incompatible behaviors. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run Ollama: ``` uv run pytest -v -s -k "ollama" llama_stack/providers/tests/inference/test_model_registration.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ========================================== test session starts ========================================== platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 65 items / 60 deselected / 5 selected llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] SKIPPED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED ======================== 3 passed, 1 skipped, 60 deselected, 2 warnings in 0.22s ======================== ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-19 22:39:13 -08:00
Sixian Yi	531940aea9	script for running client sdk tests (#895 ) # What does this PR do? Create a script for running all client-sdk tests on Async Library client, with the option to generate report ## Test Plan ``` python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-19 22:38:06 -08:00
Xi Yan	a3d8c49459	precommit	2025-02-19 22:37:41 -08:00
Xi Yan	ce040ad111	precommit	2025-02-19 22:35:24 -08:00
Xi Yan	ca687d3e86	style: env var in build_venv	2025-02-19 22:32:59 -08:00
Shrinit Goyal	b74f25035c	Added support for mongoDB KV store (#543 ) Added the support for mongoDB as KV store validated in mongodb, it is able to store agent data, session data and turn data <img width="1332" alt="image" src="https://github.com/user-attachments/assets/867700a4-b9ee-4a3c-8278-f39074d39d56"> this is how run.yaml would look: ``` config: persistence_store: type: mongodb namespace: null host: localhost port: 27017 db: llamastack user: "" password: "" collection_name: llamastack_kvstore ``` --------- Co-authored-by: shrinitgoyal <shrinit.goyal@engati.com>	2025-02-19 22:30:50 -08:00
Yuan Tang	5966079770	fix: More robust handling of the arguments in tool call response in remote::vllm (#1169 ) # What does this PR do? This fixes the following issue on the server side when the tool call response contains empty args. This happens when running `examples.agents.e2e_loop_with_client_tools` but `get_ticker_data` returns `[]`: ``` Traceback (most recent call last): File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator async for item in event_gen: File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 169, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 189, in create_and_execute_turn async for chunk in self.run( File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 258, in run async for res in self._run( File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 499, in _run async for chunk in await self.inference_api.chat_completion( File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 182, in <genexpr> return (chunk async for chunk in await provider.chat_completion(**params)) File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 296, in _stream_chat_completion async for chunk in res: File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 162, in _process_vllm_chat_completion_stream_response arguments=json.loads(tool_call_buf.arguments), File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ``` ## Test Plan All existing tests in `tests/client-sdk/inference/test_text_inference.py` passed. [//]: # (## Documentation) --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-19 22:27:02 -08:00
Sébastien Han	69eebaf5bf	build: add missing dev dependencies for unit tests (#1004 ) # What does this PR do? Added necessary dependencies to ensure successful execution of unit tests. Without these, the following command would fail due to missing imports: ``` uv run pytest -v -k "ollama" \ --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py ``` Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive 2m & uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py ``` You can observe that some tests pass while others fail, but the test runs successfully. [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-19 22:26:11 -08:00
Xi Yan	61f43b8677	fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment (#1163 ) # What does this PR do? - resolves issue: #1159 - Root cause: https://github.com/meta-llama/llama-stack/pull/980 forces `build_venv.sh` to install in a venv environment, which do not work on Colab notebook environment <img width="1004" alt="image" src="https://github.com/user-attachments/assets/1f9be409-5313-4926-b078-74e141cf29eb" /> ## This PR Use `UV_SYSTEM_PYTHON` to make sure dependencies are installed in current system environment. Which will be used in the Colab environment. ``` UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv ``` ## Test Plan - Works in Colab environment <img width="621" alt="image" src="https://github.com/user-attachments/assets/ae93bc3d-e05a-44b9-bb21-fb88f29969b8" />	2025-02-19 22:21:16 -08:00
Francisco Arceo	2b752df79a	fix: Fixing some small issues with the build scripts (#1132 ) # What does this PR do? I was encountering build issues when building my `ollama` environment using `llama stack build` ```bash llama stack build --template ollama --image-type venv Traceback (most recent call last): File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module> sys.exit(main()) ^^^^^^ File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main parser.run(args) File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run args.func(args) File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command return run_stack_build_command(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command _run_stack_build_command_from_build_config( File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config return_code = build_image( ^^^^^^^^^^^^ File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image return_code = run_with_pty(args) ^^^^^^^^^^^^^^^^^^ File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty return _run_with_pty_unix(command) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix process = subprocess.Popen( ^^^^^^^^^^^^^^^^^ File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh' make: *** [build-ollama] Error 1 ``` I also had to adjust the script when testing the `common.sh` file because it returned: ```bash > source llama_stack/distribution/common.sh llama_stack/distribution/common.sh:6: command not found: ^M llama_stack/distribution/common.sh:50: parse error near `\n' ``` On my branch, I ran: ```bash sed -i '' 's/\r$//' llama_stack/distribution/common.sh ``` And then I was able to successfully build the environment. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan N/A [//]: # (## Documentation) N/A --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-19 22:20:49 -08:00
Reid	af377e844d	feat: add a option to list the downloaded models (#1127 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` $ llama model list --help usage: llama model list [-h] [--show-all] [--downloaded] Show available llama models options: -h, --help show this help message and exit --show-all Show all models (not just defaults) --downloaded List the downloaded models $ llama model list --downloaded +-------------+----------+---------------------+ \| Model \| Size \| Modified Time \| +-------------+----------+---------------------+ \| Llama3.2-1B \| 2.31 GB \| 2025-02-16 13:38:04 \| +-------------+----------+---------------------+ \| Llama3.1-8B \| 14.97 GB \| 2025-02-16 10:36:37 \| +-------------+----------+---------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-19 22:17:39 -08:00
Botao Chen	2b995c22eb	feat: inference passthrough provider (#1166 ) ## What does this PR do? In this PR, we implement a passthrough inference provider that works for any endpoints that respect llama stack inference API definition. ## Test Plan config some endpoint that respect llama stack inference API definition and got the inference results successfully <img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM" src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1" />	2025-02-19 21:47:00 -08:00
Botao Chen	b751f7003d	feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164 ) as title, to let scoring function llm_as_judge_405b_simpleqa output aggregated_results. We can leverage categorical_count to calculate the % of correctness as eval benchmark metrics	2025-02-19 19:42:04 -08:00
Ihar Hrachyshka	c1f7d7f005	fix: miscellaneous job management improvements in torchtune (#1136 ) - refactor: simplify job status extraction a bit - torchtune: save job status on schedule - refactor: get rid of job_list in torchtune job management code # What does this PR do? A failed job is now registered in API, and one can consult its status. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73 JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688)) ``` [//]: # (## Documentation) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-19 19:09:37 -08:00
Francisco Arceo	7972daa72e	feat: Chunk sqlite-vec writes (#1094 ) # What does this PR do? 1. This PR adds batch inserts into sqlite-vec as requested in https://github.com/meta-llama/llama-stack/pull/1040 - Note: the inserts uses a uuid generated from the hash of the document id and chunk content. 2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can add similar tests to Faiss. ## Test Plan 1. Integration tests: ```python INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` 3. Unit tests: ```python pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ... llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED ``` I also tested using the same example RAG script in https://github.com/meta-llama/llama-stack/pull/1040 and received the output. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-19 19:07:46 -08:00
Ashwin Bharambe	cdcbeb005b	chore: remove llama_models.llama3.api imports from providers (#1107 ) There should be a choke-point for llama3.api imports -- this is the prompt adapter. Creating a ChatFormat() object on demand is inexpensive. The underlying Tokenizer is a singleton anyway.	2025-02-19 19:01:29 -08:00
Ben Browning	e9b8259cf9	fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123 ) # What does this PR do? Before this change, `distro_codegen.py` would only work if the user manually installed multiple provider-specific dependencies (see #1122). Now, users can run `distro_codegen.py` without any provider-specific dependencies because we avoid importing the entire provider implementations just to get the config needed to build the provider template. Concretely, this mostly means moving the MODEL_ALIASES (and related variants) definitions to a new models.py class within the provider implementation for those providers that require additional dependencies. It also meant moving a couple of imports from top-level imports to inside `get_adapter_impl` for some providers, which follows the pattern used by multiple existing providers. To ensure we don't regress and accidentally add new imports that cause distro_codegen.py to fail, the stubbed-in pre-commit hook for distro_codegen.py was uncommented and slightly tweaked to run via `uv run python ...` to ensure it runs with only the project's default dependencies and to run automatically instead of manually. Lastly, this updates distro_codegen.py itself to keep track of paths it might have changed and to only `git diff` those specific paths when checking for changed files instead of doing a diff on the entire working tree. The latter was overly broad and would require a user have no other unstaged changes in their working tree, even if those unstaged changes were unrelated to generated code. Now it only flags uncommitted changes for paths distro_codegen.py actually writes to. Our generated code was also out-of-date, presumably because of these issues, so this commit also has some updates to the generated code purely because it was out of sync, and the pre-commit hook now enforces things to be updated. (Closes #1122) ## Test Plan I manually tested distro_codegen.py and the pre-commit hook to verify those work as expected, flagging any uncommited changes and catching any imports that attempt to pull in provider-specific dependencies. However, I do not have valid api keys to the impacted provider implementations, and am unable to easily run the inference tests against each changed provider. There are no functional changes to the provider implementations here, but I'd appreciate a second set of eyes on the changed import statements and moving of MODEL_ALIASES type code to a separate models.py to ensure I didn't make any obvious errors. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-19 18:39:20 -08:00
Ashwin Bharambe	034ece0011	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
Ashwin Bharambe	31a5ba5268	Add title to the json schemas	2025-02-19 13:26:39 -08:00
ehhuang	8de7cf103b	feat: support tool_choice = {required, none, <function>} (#1059 ) Summary: titled Test Plan: added tests and LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-18 23:25:15 -05:00
Xi Yan	37cf60b732	style: remove prints in codebase (#1146 ) # What does this PR do? - replace prints in codebase with logger - update print_table to use rich Table ## Test Plan - library client script in https://github.com/meta-llama/llama-stack/pull/1145 ``` llama stack list-providers ``` <img width="1407" alt="image" src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2" /> [//]: # (## Documentation)	2025-02-18 19:41:37 -08:00
Xi Yan	e8cb9e0adb	fix: direct client pydantic type casting (#1145 ) # What does this PR do? - Closes #1142 - Root cause is due to having `Union[str, AgenToolGroupWithArgs]` ## Test Plan - Test with script described in issue. - Print out final converted pydantic object <img width="1470" alt="image" src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6" /> [//]: # (## Documentation)	2025-02-18 16:07:54 -08:00
Reid	4e76d312fa	fix: modify the model id title for model list (#1095 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Re-check and based on the doc, the download model id, actually is model descriptor(also without `meta-llama/`). https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html ``` $ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor Fetching 8 files: 0%\| \| 0/8 [00:00<?, ?it/s] LICENSE.txt: 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████\| 7.71k/7.71k [00:00<00:00, 10.5MB/s] $ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/ usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<--- $ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found $ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8 Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): ^CTraceback (most recent call last): $ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:26:41 -08:00
Reid	d9f5beb15a	style: update download help text (#1135 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Based on the cade: `6b1773d530/llama_stack/cli/download.py (L454)` and the test, it can use comma to specify multiple model ids. So update the usage. ``` $ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00 Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B [Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 6.4/6.4 GB - 0:00:00 Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B $ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B original%2Fparams.json: 100%\|██████████████████████████████████████████████████████████\| 220/220 [00:00<00:00, 564kB/ Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B ... tokenizer.json: 100%\|█████████████████████████████████████████████████████████████\| 9.09M/9.09M [00:00<00:00, 9.18MB/s] Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B before: $ llama model download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models after: $ llama model download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:24:31 -08:00
Reid	92aefec191	style: update verify-download help text (#1134 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Based on the code `6b1773d530/llama_stack/cli/download.py (L379)` and test, `verify-download` should only use in `downloaded from Meta`. ``` test: no checklist.chk file for hf download $ llama model download --source meta --model-id Llama3.2-1B Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00 before: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify after: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums for models downloaded from Meta options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify (only for models downloaded from Meta) ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:15:26 -08:00
Reid	89d37687dd	chore: remove --no-list-templates option (#1121 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] From the code and the usage, seems cannot see that need to use `--no-list-templates` to handle, and also make the user confused from the help text, so try to remove it. ``` $ llama stack build --no-list-templates > Enter a name for your Llama Stack (e.g. my-local-stack): $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): before: $ llama stack build --help --list-templates, --no-list-templates Show the available templates for building a Llama Stack distribution (default: False) after: --list-templates Show the available templates for building a Llama Stack distribution ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:13:46 -08:00
Yuan Tang	743f434860	fix: Ensure a tool call can be converted before adding to buffer (#1119 ) # What does this PR do? This fixes an issue when running the e2e agent example: https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py ``` \| File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response \| tool_call = convert_tool_call(choice.delta.tool_calls[0]) \| File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call \| return ToolCall( \| File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__ \| validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) \| pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall \| call_id \| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/string_type \| tool_name.enum[BuiltinTool] \| Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/enum \| tool_name.str \| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] \| For further information visit https://errors.pydantic.dev/2.10/v/string_type \| arguments \| Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int] \| For further information visit https://errors.pydantic.dev/2.10/v/dict_type ``` This issue happened because not all arguments have been appended to the tool call buffer yet. The current code assumes that we are ready to convert the tool call whenever args can be converted to JSON successfully. In this case, `json.loads("202")` would succeed but the rest of the arguments have not been properly parsed yet. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan The e2e example worked successfully (although note that I ran the script twice with each function call separately due to https://github.com/meta-llama/llama-stack/issues/1120): ``` tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'} tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]" tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'} tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton \| whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"`4484174096`/\", \"title\": \"January 20, 1993, President Clinton was sworn in as the 42nd ...\", \"description\": \"WATCH: On January 20, 1993, President Bill Clinton was sworn in as the 42nd President of the United States. #InaugurationDay Video courtesy of the...\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared thoughts ...\", \"description\": \"AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features \\u00b7 \\u00a9 2024 Google LLC\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/shorts/vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=PHihhihVth0\", \"title\": \"Bill & Hillary Clinton returning to Little Rock for 20th ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}]]}" ``` All text inference tests passed. [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-15 00:19:16 -05:00

1 2 3 4 5 ...

715 commits