llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-18 02:59:48 +00:00

Author	SHA1	Message	Date
Botao Chen	cd1fc4fd17	refine	2024-12-18 16:08:17 -08:00
Botao Chen	7b0deee899	refine	2024-12-18 16:05:35 -08:00
Botao Chen	92a367340c	refine	2024-12-18 15:59:45 -08:00
Botao Chen	7ab807ad76	refine	2024-12-18 15:58:51 -08:00
Botao Chen	9e5b7d5c9e	address comment	2024-12-18 14:32:23 -08:00
Botao Chen	75c881770a	Merge branch 'main' into inference_refactor	2024-12-18 14:14:14 -08:00
Botao Chen	0000e1e8c6	address comments	2024-12-18 14:12:57 -08:00
Ashwin Bharambe	3b4b2ea30c	fix replace_env_vars bug	2024-12-18 13:48:30 -08:00
Ashwin Bharambe	12cbed1617	Register Message and ResponseFormat	2024-12-18 10:32:25 -08:00
Ashwin Bharambe	ceadaf1840	Dont include 3B / 1B models for bedrock since they arent ondemand	2024-12-18 06:30:02 -08:00
Ashwin Bharambe	c39a3777b5	Make bedrock "just" work	2024-12-18 06:22:33 -08:00
Ashwin Bharambe	d6fcdefec7	Bump version to 0.0.63	2024-12-17 23:15:27 -08:00
Ashwin Bharambe	f1d6cb22d7	Update URL type to avoid string-ifying and creating complexity	2024-12-17 22:50:11 -08:00
Botao Chen	d021983b0e	refine	2024-12-17 20:43:20 -08:00
Botao Chen	fadb7deae5	Merge branch 'main' into inference_refactor	2024-12-17 20:10:23 -08:00
Xi Yan	75e72cf2fc	model_type=llm for filering available models for playground	2024-12-17 19:42:38 -08:00
Ashwin Bharambe	2f9fdb0ea7	Update notebook	2024-12-17 18:52:02 -08:00
Ashwin Bharambe	0fb4b7de6f	Add more debugging logs to when llama guard fails	2024-12-17 18:52:02 -08:00
Ashwin Bharambe	eea478618d	Bump version to 0.0.62	2024-12-17 18:19:47 -08:00
Xi Yan	af8f1b3531	model selection playground fix	2024-12-17 18:13:52 -08:00
Dinesh Yeduguru	3700022d6f	store attributes values in builtin types to avoid otel warnings (#649 ) # What does this PR do? Serialize objects to built in types to avoid otel warnings ## Test Plan ╰─❯ llama stack run ~/.llama/distributions/llamastack-together/together-run.yaml	2024-12-17 17:10:43 -08:00
Henry Tu	0e2a99e223	Update Cerebras from Llama 3.1 to 3.3 (#645 ) # What does this PR do? Cerebras is rolling out support for llama 3.3 70b and deprecating llama 3.1 70b. This PR updates the documentation, config, and internal mapping to reflect this change. cc: @ashwinb @raghotham	2024-12-17 16:28:24 -08:00
Botao Chen	85d0f5f528	modify doc	2024-12-17 14:09:32 -08:00
Ashwin Bharambe	b7a7caa9a8	Fix conversion to RawMessage everywhere	2024-12-17 14:00:43 -08:00
Botao Chen	486c0bc9c8	refine	2024-12-17 13:41:36 -08:00
Botao Chen	48482ff9c3	refine	2024-12-17 13:38:19 -08:00
Ashwin Bharambe	fbca51d6da	Fix to conda env build script	2024-12-17 12:19:34 -08:00
Ashwin Bharambe	0452c6a0c7	add missing init file	2024-12-17 11:49:03 -08:00
Ashwin Bharambe	8de8eb03c8	Update the "InterleavedTextMedia" type (#635 ) ## What does this PR do? This is a long-pending change and particularly important to get done now. Specifically: - we cannot "localize" (aka download) any URLs from media attachments anywhere near our modeling code. it must be done within llama-stack. - `PIL.Image` is infesting all our APIs via `ImageMedia -> InterleavedTextMedia` and that cannot be right at all. Anything in the API surface must be "naturally serializable". We need a standard `{ type: "image", image_url: "<...>" }` which is more extensible - `UserMessage`, `SystemMessage`, etc. are moved completely to llama-stack from the llama-models repository. See https://github.com/meta-llama/llama-models/pull/244 for the corresponding PR in llama-models. ## Test Plan ```bash cd llama_stack/providers/tests pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py pytest -s -v -k chroma memory/test_memory.py \ --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar pytest -s -v -k fireworks agents/test_agents.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct ``` Updated the client sdk (see PR ...), installed the SDK in the same environment and then ran the SDK tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py # this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py ```	2024-12-17 11:18:31 -08:00
Arun Brahma	10eb31badf	docs: Update getting_started.ipynb link to correct jupyter notebook path in README.md (#636 ) # What does this PR do? This PR fixes a broken link in the README.md that was causing a 404 error. The link to `getting_started.ipynb` was pointing to a non-existent file. Updated it to point to the correct notebook `Llama_Stack_Building_AI_Applications.ipynb` which contains the walk-through for text and vision inference llama_stack_client APIs. - [x] Addresses issue (#633 ) ## Test Plan 1. Verified that the new notebook path exists: ```bash ls docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb ``` 2. Verified the notebook content contains text and vision inference examples by: - Checking the notebook contents - Confirming the presence of vision models like Llama-3.2-11B-Vision-Instruct - Verifying llama_stack_client API usage examples ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section. - [x] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests (N/A - documentation change only).	2024-12-17 11:11:13 -08:00
Xi Yan	99f331f5c8	[bugfix] no shield_call when there's no shields configured (#642 ) # What does this PR do? Why - When AgentConfig has no `input_shields` / `output_shields` defined, we still outputs a shield_call step with violation=None. This is impossible to distinguish the case b/w (1) no violation from running shields v.s. (2) no shields call What - We should not have a shield_call step when no `input_shields` / `output_shields` are defined. - Also removes a never reached try/catch code block in agent loop. `run_multiple_shields` is never called in the try block (verified by stacktrace print) Side Note - pre-commit fix ## Test Plan Tested w/ DirectClient via: https://gist.github.com/yanxi0830/b48f2a53b6f5391b9ff1e39992bc05b3 No Shields <img width="858" alt="image" src="https://github.com/user-attachments/assets/67319370-329f-4954-bd16-d21ce54c6ebf" /> With Input + Output Shields <img width="854" alt="image" src="https://github.com/user-attachments/assets/75ab1bee-3ba9-4549-ab51-23210be83da7" /> Input Shields Only <img width="858" alt="image" src="https://github.com/user-attachments/assets/1897206b-13dd-4ea5-92c2-b39bf68e9286" /> E2E pytest ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk/agents/test_agents.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-17 11:10:19 -08:00
Botao Chen	415b8f2dbd	temp commit	2024-12-16 22:39:08 -08:00
Botao Chen	81e1957446	temp commit	2024-12-16 21:43:30 -08:00
Botao Chen	30f6eb282f	temp commit	2024-12-16 19:04:47 -08:00
Botao Chen	b2dbb5e3fe	merge	2024-12-16 16:49:02 -08:00
Botao Chen	6a51e2268d	Merge branch 'main' into inference_refactor	2024-12-16 16:47:57 -08:00
Botao Chen	35b1a6f2dc	temp commit	2024-12-16 16:44:15 -08:00
Ashwin Bharambe	c2f7905fa4	Fix bedrock inference impl	2024-12-16 14:22:34 -08:00
Ashwin Bharambe	eb37fba9da	Small fix to library client	2024-12-16 14:08:30 -08:00
Ashwin Bharambe	5e08812bcb	Add Dinesh to be a code owner	2024-12-16 13:00:50 -08:00
Ashwin Bharambe	2e5bfcd42a	Update Telemetry API so OpenAPI generation can work (#640 ) We cannot use recursive types because not only does our OpenAPI generator not like them, even if it did, it is not easy for all client languages to automatically construct proper APIs (especially considering garbage collection) around them. For now, we can return a `Dict[str, SpanWithStatus]` instead of `SpanWithChildren` and rely on the client to reconstruct the tree. Also fixed a super subtle issue with the OpenAPI generation process (monkey-patching of json_schema_type wasn't working because of import reordering.)	2024-12-16 13:00:14 -08:00
Xi Yan	78e2bfbe7a	[tests] add client-sdk pytests & delete client.py (#638 ) # What does this PR do? Why - Clean up examples which we will not maintain; reduce the surface area to the minimal showcases What - Delete `client.py` in /apis/* - Move all scripts to unit tests - SDK sync in the future will just require running pytests Side notes - `bwrap` not available on Mac so code_interpreter will not work ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk ``` <img width="725" alt="image" src="https://github.com/user-attachments/assets/36bfe537-628d-43c3-8479-dcfcfe2e4035" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-16 12:04:56 -08:00
Aidan Do	cb8a28c128	Doc: Ollama command references non-existent file (#632 ) # What does this PR do? Fixes: <img width="719" alt="Screenshot 2024-12-15 at 22 04 37" src="https://github.com/user-attachments/assets/1555308a-31fb-41ba-95b7-d47d75504b58" /> ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-15 06:52:28 -08:00
Xi Yan	815f4af6cf	add colab notebook & update docs (#619 ) # What does this PR do? - add notebooks - restructure docs ## Test Plan <img width="1201" alt="image" src="https://github.com/user-attachments/assets/3f9a09d9-b5ec-406c-b44b-e896e340d209" /> <img width="1202" alt="image" src="https://github.com/user-attachments/assets/fdc1173f-2417-4ad6-845e-4f265fc40a31" /> <img width="1201" alt="image" src="https://github.com/user-attachments/assets/b1e4e2a8-acf6-4ef2-a2fc-00d26cf32359" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-13 19:15:15 -08:00
Botao Chen	20383bfea5	[3/n][torchtune integration] add validation logic (#600 ) ## What does this PR do? - add validation logic in SFT recipe (validation loss and perplexity) - add progress bar in both training and validation to better track the progress on server side (eval has the similar logic) ## Test Plan validation logic shows up in the Checkpoint training_metric part <img width="799" alt="Screenshot 2024-12-12 at 3 21 52 PM" src="https://github.com/user-attachments/assets/36330ffe-0555-4b2d-93f0-9487dfdf7b4e" /> progress bar shows up as <img width="476" alt="Screenshot 2024-12-12 at 3 38 11 PM" src="https://github.com/user-attachments/assets/77306fa2-cb9c-460f-8efc-b41bbe424a7d" /> expected	2024-12-13 16:35:06 -08:00
Botao Chen	c294a01c4b	[2/n][torchtune integration] implement job management and return training artifacts (#593 ) ### Context In this PR, we - Implement the post training job management and get training artifacts apis - get_training_jobs - get_training_job_status - get_training_job_artifacts - get_training_job_logstream is deleted since the trace can be directly accessed by UI with Jaeger https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#jaeger-to-visualize-traces - Refactor the post training and training types definition to make them more intuitive. - Rewrite the checkpointer to make it compatible with llama-stack file system and can be recognized during inference ### Test Unit test `pytest llama_stack/providers/tests/post_training/test_post_training.py -m "torchtune_post_training_huggingface_datasetio" -v -s --tb=short --disable-warnings` <img width="1506" alt="Screenshot 2024-12-10 at 4 06 17 PM" src="https://github.com/user-attachments/assets/16225029-bdb7-48c4-9d13-e580cc769c0a"> e2e test with client side call <img width="888" alt="Screenshot 2024-12-10 at 4 09 44 PM" src="https://github.com/user-attachments/assets/de375e4c-ef67-4dcc-a045-4037d9489191">	2024-12-13 15:00:04 -08:00
Yuan Tang	5764a95912	Add missing environments field for vLLM provider (#623 ) @ashwinb sorry I missed this earlier in https://github.com/meta-llama/llama-stack/pull/604. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-13 14:06:27 -08:00
Dinesh Yeduguru	516e1a3e59	add embedding model by default to distribution templates (#617 ) # What does this PR do? Adds the sentence transformer provider and the `all-MiniLM-L6-v2` embedding model to the default models to register in the run.yaml for all providers. ## Test Plan llama stack build --template together --image-type conda llama stack run ~/.llama/distributions/llamastack-together/together-run.yaml	2024-12-13 12:48:00 -08:00
Ashwin Bharambe	e893b22868	export LibraryClient	2024-12-13 12:08:00 -08:00
Yuan Tang	6de92a6c33	Reformat distributions table (#608 ) This ensures everything is centered correctly and nicely formatted in editor. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-13 11:45:17 -08:00

1 2 3 4 5 ...

769 commits