llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-13 05:17:26 +00:00

Author	SHA1	Message	Date
Xi Yan	22355e3b1f	add back 2/n	2025-02-20 17:53:29 -08:00
Xi Yan	157cf320d9	add back 2/n	2025-02-20 17:52:01 -08:00
Xi Yan	ee3c174bb3	add back 2/n	2025-02-20 17:40:39 -08:00
Xi Yan	cd36a77e20	3/n	2025-02-20 17:38:21 -08:00
Xi Yan	01f90dfe0c	Merge branch 'agents-unify-tools-2' into agents-unify-tools-3	2025-02-20 17:27:27 -08:00
Xi Yan	7677f01beb	Merge branch 'agents-unify-tools' into agents-unify-tools-2	2025-02-20 17:27:11 -08:00
Xi Yan	c7e84253e7	Merge branch 'agents-unify-tools' into agents-unify-tools-3	2025-02-20 17:26:58 -08:00
Xi Yan	8fe38d128d	streaming flag	2025-02-20 16:58:45 -08:00
Xi Yan	5fbb159cf6	fix test	2025-02-20 16:48:17 -08:00
Xi Yan	96c521ada6	temp debug	2025-02-20 16:30:27 -08:00
Xi Yan	fb0d992f99	temp debuug	2025-02-20 15:48:55 -08:00
Xi Yan	4dbe3fd9e6	Merge branch 'agents-unify-tools' into agents-unify-tools-2	2025-02-20 15:29:11 -08:00
Xi Yan	5644d10c82	remove usermessages	2025-02-20 15:26:37 -08:00
Xi Yan	beea9ac133	Merge branch 'agents-unify-tools' into agents-unify-tools-2	2025-02-20 15:07:50 -08:00
Xi Yan	afee71604f	api	2025-02-20 15:07:18 -08:00
Xi Yan	07c9222b6f	debug	2025-02-20 14:54:45 -08:00
Xi Yan	5eea2bc44d	Merge branch 'agents-unify-tools' into agents-unify-tools-2	2025-02-20 14:41:47 -08:00
Xi Yan	57ca2c6365	Merge branch 'main' into agents-unify-tools	2025-02-20 14:41:29 -08:00
Ashwin Bharambe	736560ceba	Remove os.getenv() from ollama config	2025-02-20 14:30:32 -08:00
Xi Yan	7b0ff5718e	Merge branch 'agents-unify-tools' into agents-unify-tools-2	2025-02-20 14:19:52 -08:00
Xi Yan	9c9a607b41	merge	2025-02-20 14:17:31 -08:00
LESSuseLESS	2cbe9395b0	feat: D69478008 [llama-stack] turning tests into data-driven (#1180 ) # What does this PR do? We have several places running tests for different purposes. - oss llama stack - provider tests - e2e tests - provider llama stack - unit tests - e2e tests It would be nice if they can share the same set of test data, so we maintain the consistency between spec and implementation. This is what this diff is about, isolating test data from test coding, so that we can reuse the same data at different places by writing different test coding. ## Test Plan == Set up Ollama local server == Run a provider test conda activate stack OLLAMA_URL="http://localhost:8321" \ pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \ llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output // test_structured_output should also work == Run an e2e test conda activate sherpa with-proxy pip install llama-stack export INFERENCE_MODEL=llama3.2:3b-instruct-fp16 export LLAMA_STACK_PORT=8322 with-proxy llama stack build --template ollama with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama - Run test client, LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \ pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \ tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output // test_text_chat_completion_structured_output should also work ## Notes - This PR was automatically generated by oss_sync - Please refer to D69478008 for more details.	2025-02-20 14:13:06 -08:00
ehhuang	1166afdf76	fix: some telemetry APIs don't currently work (#1188 ) Summary: This bug is surfaced by using the http LS client. The issue is that non-scalar values in 'GET' method are `body` params in fastAPI, but our spec generation script doesn't respect that. We fix by just making them POST method instead. Test Plan: Test API call with newly sync'd client (https://github.com/meta-llama/llama-stack-client-python/pull/149) <img width="1114" alt="image" src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2" />	2025-02-20 14:09:25 -08:00
Xi Yan	ea1faae50e	chore!: deprecate eval/tasks (#1186 ) # What does this PR do? - Fully deprecate eval/tasks [//]: # (If resolving an issue, uncomment and update the line below) Closes #1088 NOTE: this will be a breaking change. We have introduced the new API in 0.1.3 . Notebook has been updated to use the new endpoints. ## Test Plan ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="611" alt="image" src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f" /> cc @SLR722 for awareness [//]: # (## Documentation)	2025-02-20 14:06:21 -08:00
Xi Yan	7676756778	merge	2025-02-20 14:04:25 -08:00
Ashwin Bharambe	07ccf908f7	ModelAlias -> ProviderModelEntry	2025-02-20 14:02:36 -08:00
Xi Yan	a44d230676	rename	2025-02-20 14:02:17 -08:00
Xi Yan	ff87677102	rename	2025-02-20 14:00:08 -08:00
Xi Yan	82109749ea	rename	2025-02-20 13:56:47 -08:00
Kevin Cogan	561295af76	docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script (#1129 ) # What does this PR do? This PR improves the documentation in several ways: - Fixed incorrect link in `tools.md` to ensure all references point to the correct resources. - Added instructions for running the `code-interpreter` agent in a Podman container, helping users configure and execute the tool in containerized environments. - Introduced an unregister command for single and multiple vector databases, making it easier to manage vector DBs. - Provided a simple example script for using the `code-interpreter` agent, giving users a practical reference for implementation. These updates enhance the clarity, usability, and completeness of the documentation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan The following steps were performed to verify the accuracy of the changes: 1. Validated all fixed link by checking their destinations to ensure correctness. 2. Ran the `code-interpreter` agent in a Podman container following the new instructions to confirm functionality. 3. Executed the vector database unregister commands and verified that both single and multiple databases were correctly removed. 4. Tested the new example script for `code-interpreter`, ensuring it runs without errors. All changes were reviewed and tested successfully, improving the documentation's accuracy and ease of use. [//]: # (## Documentation)	2025-02-20 13:52:14 -08:00
Xi Yan	7a111e39f6	Merge branch 'main' into agents-unify-tools	2025-02-20 13:51:32 -08:00
Vladimir Ivić	f7161611c6	feat: adding endpoints for files and uploads (#1070 ) Summary: Adds spec definitions for file uploads operations. This API focuses around two high level operations: * Initiating and managing upload session * Accessing uploaded file information Usage examples: To start a file upload session: ``` curl -X POST https://localhost:8321/v1/files \ -d '{ "key": "image123.jpg', "bucket": "images", "mime_type": "image/jpg", "size": 12345 }' # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 0, "size": 12345 } ``` To upload file content to an existing session ``` curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \ --data-binary @<path_to_local_file> # Returns { "key": "image123.jpg", "bucket": "images", "mime_type": "image/jpg", "bytes": 12345, "created_at": 1737492240 } # Implementing on server side (Flask example for simplicity): @app.route('/uploads/{upload_id}', methods=['POST']) def upload_content_to_session(upload_id): try: # Get the binary file data from the request body file_data = request.data # Save the file to disk save_path = f"./uploads/{upload_id}" with open(save_path, 'wb') as f: f.write(file_data) return {__uploaded_file_json__}, 200 except Exception as e: return 500 ``` To read information about an existing upload session ``` curl -i -X GET "https://localhost:8321/v1/files/session:<session_id> # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 1024, "size": 12345 } ``` To list buckets ``` GET /files # Returns { "data": [ {"name": "bucket1"}, {"name": "bucket2"}, ] } ``` To list all files in a bucket ``` GET /files/{bucket} # Returns { "data": [ { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, }, { "key": "persian_cat.jpg", "mime_type": "image/jpg", "bucket": "cats", "bytes": 39924, "created_at": 1727493440, }, ] } ``` To get specific file info ``` GET /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ``` To delete specific file ``` DELETE /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ```	2025-02-20 13:09:00 -08:00
Xi Yan	7dae81cb68	tmp	2025-02-20 12:57:18 -08:00
Ashwin Bharambe	eddef0b2ae	chore: slight renaming of model alias stuff (#1181 ) Quick test by running: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk ```	2025-02-20 11:48:46 -08:00
Ashwin Bharambe	2eda050aef	Fix ollama fixture	2025-02-20 11:46:02 -08:00
Ashwin Bharambe	3d891fc9ba	ModelAlias cleanup	2025-02-20 11:44:39 -08:00
Xi Yan	6b6feebc72	update types	2025-02-20 10:53:36 -08:00
Xi Yan	dc1406c25a	dummy impl	2025-02-20 10:51:08 -08:00
Xi Yan	e0fd19531b	openapi gen	2025-02-20 10:41:08 -08:00
Xi Yan	7689ff2b54	naming submit_tool_response_messages	2025-02-20 10:36:14 -08:00
Xi Yan	5d97ee645c	api update	2025-02-20 09:52:37 -08:00
Ben Browning	fbec826883	docs: Add note about distro_codegen.py and provider dependencies (#1175 ) # What does this PR do? This expands upon the existing distro_codegen.py text in the new API provider documentation to include a note about not including provider-specific dependencies in the code path that builds the distribution's template. Our distro_codegen pre-commit hook will catch this case anyway, but this attempts to inform provider authors ahead of time about that. ## Test Plan I built the docs website locally via the following: ``` pip install docs/requirements.txt sphinx-build -M html docs/source docs_output ``` Then, I opened that newly generated `docs_output/html/contributing/new_api_provider.html` in my browser and confirmed everything rendered correctly. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-20 09:23:46 -08:00
Ashwin Bharambe	984a8039ad	Kill unnecessary check on --safety-shield test param	2025-02-20 09:15:23 -08:00
Rashmi Pawar	996f27a308	fix: add logging import (#1174 ) # What does this PR do? Fixes logging import and the logger instance creation cc: @dglogo	2025-02-20 11:26:47 -05:00
Ihar Hrachyshka	fb6a3efb1d	feat: Enable CPU training for torchtune (#1140 ) # What does this PR do? You are now able to run a training cycle on CPU. This is useful for debugging and testing purposes. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan On a Mac machine without CUDA devices: ``` 17:00:24.417 [START] /v1/post-training/supervised-fine-tune DEBUG 2025-02-18 12:00:24,419 torchtune.utils._logging:60: Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0 INFO 2025-02-18 12:00:24,463 torchtune.utils._logging:64: Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. INFO 2025-02-18 12:00:46,699 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:182: Model is initialized with precision torch.bfloat16. INFO 2025-02-18 12:00:46,784 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:185: Tokenizer is initialized. INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:188: Optimizer is initialized. INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:192: Loss is initialized. INFO 2025-02-18 12:00:48,997 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:209: Dataset and Sampler are initialized. INFO 2025-02-18 12:00:48,998 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:227: Learning rate scheduler is initialized. Writing logs to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/log_1739898049.txt 1\|1\|Loss: 1.7414989471435547: 100% 1/1 [03:46<00:00, 226.21s/it]INFO 2025-02-18 12:04:35,227 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:528: Starting checkpoint save... INFO 2025-02-18 12:04:49,974 torchtune.utils._logging:121: Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth INFO 2025-02-18 12:04:49,981 torchtune.utils._logging:132: Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth model_file_path /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0 1\|1\|Loss: 1.7414989471435547: 100% 1/1 [04:01<00:00, 241.18s/it] INFO: ::1:64990 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK 17:04:50.364 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (265947.01ms) 17:00:24.419 [DEBUG] Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0 17:00:24.463 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. 17:00:46.700 [INFO] Model is initialized with precision torch.bfloat16. 17:00:46.784 [INFO] Tokenizer is initialized. 17:00:46.786 [INFO] Optimizer is initialized. 17:00:46.786 [INFO] Loss is initialized. 17:00:48.997 [INFO] Dataset and Sampler are initialized. 17:00:48.998 [INFO] Learning rate scheduler is initialized. 17:04:35.227 [INFO] Starting checkpoint save... 17:04:49.974 [INFO] Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth 17:04:49.981 [INFO] Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-19 22:42:58 -08:00
Xi Yan	a324ceb9a9	precommit again	2025-02-19 22:40:45 -08:00
Sébastien Han	4694780d23	test: skip model registration for unsupported providers (#1030 ) # What does this PR do? - Updated `test_register_with_llama_model` to skip tests when using the Ollama provider, as it does not support custom model names. - Delete `test_initialize_model_during_registering` since there is no "load_model" semantic that is exposed publicly on a provider. These changes ensure that tests do not fail for providers with incompatible behaviors. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run Ollama: ``` uv run pytest -v -s -k "ollama" llama_stack/providers/tests/inference/test_model_registration.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ========================================== test session starts ========================================== platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 65 items / 60 deselected / 5 selected llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] SKIPPED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED ======================== 3 passed, 1 skipped, 60 deselected, 2 warnings in 0.22s ======================== ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-19 22:39:13 -08:00
Sixian Yi	531940aea9	script for running client sdk tests (#895 ) # What does this PR do? Create a script for running all client-sdk tests on Async Library client, with the option to generate report ## Test Plan ``` python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-19 22:38:06 -08:00
Xi Yan	a3d8c49459	precommit	2025-02-19 22:37:41 -08:00
Xi Yan	ce040ad111	precommit	2025-02-19 22:35:24 -08:00

1 2 3 4 5 ...

1235 commits