llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Xi Yan	2da93c8835	fix 3.2-1b fireworks	2024-11-19 14:20:07 -08:00
Xi Yan	189df6358a	codegen docs	2024-11-19 14:16:00 -08:00
Xi Yan	185df4b568	fix fireworks registration	2024-11-19 14:09:00 -08:00
Ashwin Bharambe	38ba3b9f0c	Fix fireworks stream completion	2024-11-19 13:36:14 -08:00
Ashwin Bharambe	05d1ead02f	Update condition in tests to handle llama-3.1 vs llama3.1 (HF names)	2024-11-19 13:25:36 -08:00
Ashwin Bharambe	394519d68a	Add llama-stack-client as a legitimate dependency for llama-stack	2024-11-19 11:44:35 -08:00
Ashwin Bharambe	c46b462c22	Updates to docker build script	2024-11-19 11:36:53 -08:00
Henry Tai	39e99b39fe	update quick start to have the working instruction (#467 ) # What does this PR do? Fix the instruction in quickstart readme so the new developers/users can run it without issues. ## Test Plan None ## Sources Please link relevant resources if necessary. ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Co-authored-by: Henry Tai <henrytai@fb.com>	2024-11-19 10:32:19 -08:00
Xi Yan	1b0f5fff5a	fix curl endpoint	2024-11-19 10:26:05 -08:00
Ashwin Bharambe	1619d37cc6	codegen per-distro dependencies; not hooked into setup.py yet	2024-11-19 09:54:30 -08:00
Ashwin Bharambe	5e4ac1b7c1	Make sure server code uses version prefixed routes	2024-11-19 09:15:05 -08:00
Ashwin Bharambe	84d5f35a48	Update the model alias for llama guard models in ollama	2024-11-19 00:22:24 -08:00
Ashwin Bharambe	e8d3eee095	Fix docs yet again	2024-11-18 23:51:35 -08:00
Dinesh Yeduguru	02f1c47416	support adding alias for models without hf repo/sku entry (#481 ) # What does this PR do? adds a new method build_model_alias_with_just_llama_model which is needed for cases like ollama's quantized models which do not really have a repo in hf and an entry in SKU list. ## Test Plan pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_text_inference.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 23:50:18 -08:00
Ashwin Bharambe	8ed79ad0f3	Fix the pyopenapi generator avoid potential circular imports	2024-11-18 23:37:52 -08:00
Ashwin Bharambe	d463d68e1e	Update docs	2024-11-18 23:21:25 -08:00
Ashwin Bharambe	93abb8e208	Include all yamls	2024-11-18 22:47:00 -08:00
Ashwin Bharambe	0dc7f5fa89	Add version to REST API url (#478 ) # What does this PR do? Adds a `/alpha/` prefix to all the REST API urls. Also makes them all use hyphens instead of underscores as is more standard practice. (This is based on feedback from our partners.) ## Test Plan The Stack itself does not need updating. However, client SDKs and documentation will need to be updated.	2024-11-18 22:44:14 -08:00
Xi Yan	05e93bd2f7	together default	2024-11-18 22:39:45 -08:00
Ashwin Bharambe	7693786322	Use HF names for registering fireworks and together models	2024-11-18 22:34:47 -08:00
Xi Yan	6765fd76ff	fix llama stack build for together & llama stack build from templates (#479 ) # What does this PR do? - Fix issue w/ llama stack build using together template <img width="669" alt="image" src="https://github.com/user-attachments/assets/1cbef052-d902-40b9-98f8-37efb494d117"> - For builds from templates, copy over the `templates/<template-name>/run.yaml` file to the `~/.llama/distributions/<name>/<name>-run.yaml` instead of re-building run config. ## Test Plan ``` $ llama stack build --template together --image-type conda .. Build spec configuration saved at /opt/anaconda3/envs/llamastack-together/together-build.yaml Build Successful! Next steps: 1. Set the environment variables: LLAMASTACK_PORT, TOGETHER_API_KEY 2. `llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml` ``` ``` $ llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml ``` ``` $ llama-stack-client models list $ pytest -v -s -m remote agents/test_agents.py --env REMOTE_STACK_URL=http://localhost:5000 --inference-model meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo ``` <img width="764" alt="image" src="https://github.com/user-attachments/assets/b805b6c5-a316-4561-8fe3-24fc3b1f8b80"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 22:29:16 -08:00
Ashwin Bharambe	ea52a3ee1c	minor enhancement for test fixtures	2024-11-18 22:21:17 -08:00
Matthew Farrellee	fcc2132e6f	remove pydantic namespace warnings using model_config (#470 ) # What does this PR do? remove another model_ pydantic namespace warning and convert old-style 'class Config' to new-style 'model_config' workaround. also a whitespace change to get past - flake8...................................................................Failed llama_stack/cli/download.py:296:85: E226 missing whitespace around arithmetic operator llama_stack/cli/download.py:297:54: E226 missing whitespace around arithmetic operator ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-11-18 19:24:14 -08:00
Riandy	2108a779f2	Update kotlin client docs (#476 ) # What does this PR do? In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. Add Kotlin package link into readme docs	2024-11-19 08:43:20 +05:30
Kai Wu	d2b7c5aeae	add quantized model ollama support (#471 ) # What does this PR do? add more quantized model support for ollama. - [ ] Addresses issue (#issue) ## Test Plan Tested with ollama docker that run llama3.2 3b 4bit model. ``` root@docker-desktop:/# ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:3b a80c4f17acd5 3.5 GB 100% CPU 3 minutes from now ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 18:55:23 -08:00
Ashwin Bharambe	14c75c3f21	Update CONTRIBUTING to include info about pre-commit	2024-11-18 18:17:54 -08:00
Dinesh Yeduguru	fe19076838	get stack run config based on template name (#477 ) This PR adds a method in stack to return the stackrunconfig object based on the template name. This will be used to instantiate a direct client without the need for an explicit run.yaml --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 18:05:05 -08:00
Xi Yan	50d539e6d7	update tests --inference-model to hf id	2024-11-18 17:36:58 -08:00
Ashwin Bharambe	939056e265	More documentation fixes	2024-11-18 17:06:13 -08:00
Ashwin Bharambe	e40404625b	Update to docs	2024-11-18 16:52:48 -08:00
Ashwin Bharambe	91f3009c67	No more built_at	2024-11-18 16:38:51 -08:00
Ashwin Bharambe	afa4f0b19f	Update remote vllm docs	2024-11-18 16:34:33 -08:00
Ashwin Bharambe	fb15ff4a97	Move to use argparse, fix issues with multiple --env cmdline options	2024-11-18 16:31:59 -08:00
Ashwin Bharambe	b87f3ac499	Allow server to accept --env key pairs	2024-11-18 16:17:59 -08:00
Ashwin Bharambe	1fb61137ad	Add conda_env	2024-11-18 16:08:14 -08:00
Ashwin Bharambe	b822149098	Update start conda	2024-11-18 16:07:27 -08:00
Ashwin Bharambe	47c37fd831	Fixes	2024-11-18 16:03:53 -08:00
Ashwin Bharambe	3aedde2ab4	Add a pre-commit for distro_codegen but it does not work yet	2024-11-18 15:21:13 -08:00
Dinesh Yeduguru	57a9b4d57f	Allow models to be registered as long as llama model is provided (#472 ) This PR allows models to be registered with provider as long as the user specifies a llama model, even though the model does not match our prebuilt provider specific mapping. Test: pytest -v -s llama_stack/providers/tests/inference/test_model_registration.py -m "together" --env TOGETHER_API_KEY=<KEY> --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 15:05:29 -08:00
Ashwin Bharambe	2a31163178	Auto-generate distro yamls + docs (#468 ) # What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.	2024-11-18 14:57:06 -08:00
Xi Yan	0784284ab5	[Agentic Eval] add ability to run agents generation (#469 ) # What does this PR do? - add ability to run agents generation for full eval (generate + scoring) - pre-register SimpleQA benchmark llm-as-judge scoring function in code ## Test Plan ![image](https://github.com/user-attachments/assets/b4b6f086-1be4-4c2a-8ab0-6839f0067c0a) ![image](https://github.com/user-attachments/assets/05bb7a09-2d7a-4031-8eb6-e1ca670ee439) #### Simple QA w/ Search ![image](https://github.com/user-attachments/assets/0a51e3f3-9fc7-479b-8295-89aed63496e0) - eval_task_config_simpleqa_search.json ```json { "type": "benchmark", "eval_candidate": { "type": "agent", "config": { "model": "Llama3.1-405B-Instruct", "instructions": "Please use the search tool to answer the question.", "sampling_params": { "strategy": "greedy", "temperature": 1.0, "top_p": 0.9 }, "tools": [ { "type": "brave_search", "engine": "brave", "api_key": "API_KEY" } ], "tool_choice": "auto", "tool_prompt_format": "json", "input_shields": [], "output_shields": [], "enable_session_persistence": false } } } ``` #### SimpleQA w/o Search ![image](https://github.com/user-attachments/assets/6301feef-2abb-4bee-b50c-97da1c90482b) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 11:43:03 -08:00
Vladimir Ivić	f1b9578f8d	Extend shorthand support for the `llama stack run` command (#465 ) Summary: Extend the shorthand run command so it can run successfully when config exists under DISTRIBS_BASE_DIR (i.e. ~/.llama/distributions). For example, imagine you created a new stack using the `llama stack build` command where you named it "my-awesome-llama-stack". ``` $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): my-awesome-llama-stack ``` To run the stack you created you will have to use long config path: ``` llama stack run ~/.llama/distributions/llamastack-my-awesome-llama-stack/my-awesome-llama-stack-run.yaml ``` With this change, you can start it using the stack name instead of full path: ``` llama stack run my-awesome-llama-stack ``` Test Plan: Verify command fails when stack doesn't exist ``` python3 -m llama_stack.cli.llama stack run my-test-stack ``` Output [FAILURE] ``` usage: llama stack run [-h] [--port PORT] [--disable-ipv6] config llama stack run: error: File /Users/vladimirivic/.llama/distributions/llamastack-my-test-stack/my-test-stack-run.yaml does not exist. Please run `llama stack build` to generate (and optionally edit) a run.yaml file ``` Create a new stack using `llama stack build`. Name it `my-test-stack`. Verify command runs successfully ``` python3 -m llama_stack.cli.llama stack run my-test-stack ``` Output [SUCCESS] ``` Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [80146] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) ```	2024-11-15 23:16:42 -08:00
Dinesh Yeduguru	57bafd0f8c	fix faiss serialize and serialize of index (#464 ) faiss serialize index returns a np object, that we first need to save to buffer and then write to sqllite. Since we are using json, we need to base64 encode the data. Same in the read path, we base64 decode and read into np array and then call into deserialize index. tests: torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 18:02:48 -08:00
Dinesh Yeduguru	ff99025875	await initialize in faiss (#463 ) tests: ``` torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py ``` Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 14:21:31 -08:00
Ashwin Bharambe	20bf2f50c2	No more model_id warnings	2024-11-15 12:20:18 -08:00
Xi Yan	e8112b31ab	move hf addapter->remote (#459 ) # What does this PR do? - move folder ## Test Plan Unit Test ``` pytest -v -s -m "huggingface" datasetio/test_datasetio.py ``` E2E ``` llama stack run ``` ``` llama-stack-client eval run_benchmark meta-reference-mmlu --num-examples 5 --output-dir ./ --eval-task-config ~/eval_task_config.json --visualize ``` <img width="657" alt="image" src="https://github.com/user-attachments/assets/63d53f9d-6c7e-4667-af8c-9d16c91ae6e3"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 22:41:19 -05:00
Xi Yan	788411b680	categorical score for llm as judge	2024-11-14 22:33:59 -05:00
Dinesh Yeduguru	0850ad656a	unregister for memory banks and remove update API (#458 ) The semantics of an Update on resources is very tricky to reason about especially for memory banks and models. The best way to go forward here is for the user to unregister and register a new resource. We don't have a compelling reason to support update APIs. Tests: pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000 pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0 $CONDA_PREFIX/bin/pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_model_registration.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-14 17:12:11 -08:00
Xi Yan	2eab3b7ed9	skip aggregation for llm_as_judge	2024-11-14 17:50:46 -05:00
Ashwin Bharambe	bba6edd06b	Fix OpenAPI generation to have text/event-stream for streamable methods	2024-11-14 12:51:38 -08:00

1 2 3 4 5 ...

552 commits