llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	0dc7f5fa89	Add version to REST API url (#478 ) # What does this PR do? Adds a `/alpha/` prefix to all the REST API urls. Also makes them all use hyphens instead of underscores as is more standard practice. (This is based on feedback from our partners.) ## Test Plan The Stack itself does not need updating. However, client SDKs and documentation will need to be updated.	2024-11-18 22:44:14 -08:00
Xi Yan	05e93bd2f7	together default	2024-11-18 22:39:45 -08:00
Ashwin Bharambe	7693786322	Use HF names for registering fireworks and together models	2024-11-18 22:34:47 -08:00
Xi Yan	6765fd76ff	fix llama stack build for together & llama stack build from templates (#479 ) # What does this PR do? - Fix issue w/ llama stack build using together template <img width="669" alt="image" src="https://github.com/user-attachments/assets/1cbef052-d902-40b9-98f8-37efb494d117"> - For builds from templates, copy over the `templates/<template-name>/run.yaml` file to the `~/.llama/distributions/<name>/<name>-run.yaml` instead of re-building run config. ## Test Plan ``` $ llama stack build --template together --image-type conda .. Build spec configuration saved at /opt/anaconda3/envs/llamastack-together/together-build.yaml Build Successful! Next steps: 1. Set the environment variables: LLAMASTACK_PORT, TOGETHER_API_KEY 2. `llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml` ``` ``` $ llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml ``` ``` $ llama-stack-client models list $ pytest -v -s -m remote agents/test_agents.py --env REMOTE_STACK_URL=http://localhost:5000 --inference-model meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo ``` <img width="764" alt="image" src="https://github.com/user-attachments/assets/b805b6c5-a316-4561-8fe3-24fc3b1f8b80"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 22:29:16 -08:00
Ashwin Bharambe	ea52a3ee1c	minor enhancement for test fixtures	2024-11-18 22:21:17 -08:00
Matthew Farrellee	fcc2132e6f	remove pydantic namespace warnings using model_config (#470 ) # What does this PR do? remove another model_ pydantic namespace warning and convert old-style 'class Config' to new-style 'model_config' workaround. also a whitespace change to get past - flake8...................................................................Failed llama_stack/cli/download.py:296:85: E226 missing whitespace around arithmetic operator llama_stack/cli/download.py:297:54: E226 missing whitespace around arithmetic operator ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-11-18 19:24:14 -08:00
Riandy	2108a779f2	Update kotlin client docs (#476 ) # What does this PR do? In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. Add Kotlin package link into readme docs	2024-11-19 08:43:20 +05:30
Kai Wu	d2b7c5aeae	add quantized model ollama support (#471 ) # What does this PR do? add more quantized model support for ollama. - [ ] Addresses issue (#issue) ## Test Plan Tested with ollama docker that run llama3.2 3b 4bit model. ``` root@docker-desktop:/# ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:3b a80c4f17acd5 3.5 GB 100% CPU 3 minutes from now ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 18:55:23 -08:00
Ashwin Bharambe	14c75c3f21	Update CONTRIBUTING to include info about pre-commit	2024-11-18 18:17:54 -08:00
Dinesh Yeduguru	fe19076838	get stack run config based on template name (#477 ) This PR adds a method in stack to return the stackrunconfig object based on the template name. This will be used to instantiate a direct client without the need for an explicit run.yaml --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 18:05:05 -08:00
Xi Yan	50d539e6d7	update tests --inference-model to hf id	2024-11-18 17:36:58 -08:00
Ashwin Bharambe	939056e265	More documentation fixes	2024-11-18 17:06:13 -08:00
Ashwin Bharambe	e40404625b	Update to docs	2024-11-18 16:52:48 -08:00
Ashwin Bharambe	91f3009c67	No more built_at	2024-11-18 16:38:51 -08:00
Ashwin Bharambe	afa4f0b19f	Update remote vllm docs	2024-11-18 16:34:33 -08:00
Ashwin Bharambe	fb15ff4a97	Move to use argparse, fix issues with multiple --env cmdline options	2024-11-18 16:31:59 -08:00
Ashwin Bharambe	b87f3ac499	Allow server to accept --env key pairs	2024-11-18 16:17:59 -08:00
Ashwin Bharambe	1fb61137ad	Add conda_env	2024-11-18 16:08:14 -08:00
Ashwin Bharambe	b822149098	Update start conda	2024-11-18 16:07:27 -08:00
Ashwin Bharambe	47c37fd831	Fixes	2024-11-18 16:03:53 -08:00
Ashwin Bharambe	3aedde2ab4	Add a pre-commit for distro_codegen but it does not work yet	2024-11-18 15:21:13 -08:00
Dinesh Yeduguru	57a9b4d57f	Allow models to be registered as long as llama model is provided (#472 ) This PR allows models to be registered with provider as long as the user specifies a llama model, even though the model does not match our prebuilt provider specific mapping. Test: pytest -v -s llama_stack/providers/tests/inference/test_model_registration.py -m "together" --env TOGETHER_API_KEY=<KEY> --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 15:05:29 -08:00
Ashwin Bharambe	2a31163178	Auto-generate distro yamls + docs (#468 ) # What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.	2024-11-18 14:57:06 -08:00
Xi Yan	0784284ab5	[Agentic Eval] add ability to run agents generation (#469 ) # What does this PR do? - add ability to run agents generation for full eval (generate + scoring) - pre-register SimpleQA benchmark llm-as-judge scoring function in code ## Test Plan ![image](https://github.com/user-attachments/assets/b4b6f086-1be4-4c2a-8ab0-6839f0067c0a) ![image](https://github.com/user-attachments/assets/05bb7a09-2d7a-4031-8eb6-e1ca670ee439) #### Simple QA w/ Search ![image](https://github.com/user-attachments/assets/0a51e3f3-9fc7-479b-8295-89aed63496e0) - eval_task_config_simpleqa_search.json ```json { "type": "benchmark", "eval_candidate": { "type": "agent", "config": { "model": "Llama3.1-405B-Instruct", "instructions": "Please use the search tool to answer the question.", "sampling_params": { "strategy": "greedy", "temperature": 1.0, "top_p": 0.9 }, "tools": [ { "type": "brave_search", "engine": "brave", "api_key": "API_KEY" } ], "tool_choice": "auto", "tool_prompt_format": "json", "input_shields": [], "output_shields": [], "enable_session_persistence": false } } } ``` #### SimpleQA w/o Search ![image](https://github.com/user-attachments/assets/6301feef-2abb-4bee-b50c-97da1c90482b) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 11:43:03 -08:00
Vladimir Ivić	f1b9578f8d	Extend shorthand support for the `llama stack run` command (#465 ) Summary: Extend the shorthand run command so it can run successfully when config exists under DISTRIBS_BASE_DIR (i.e. ~/.llama/distributions). For example, imagine you created a new stack using the `llama stack build` command where you named it "my-awesome-llama-stack". ``` $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): my-awesome-llama-stack ``` To run the stack you created you will have to use long config path: ``` llama stack run ~/.llama/distributions/llamastack-my-awesome-llama-stack/my-awesome-llama-stack-run.yaml ``` With this change, you can start it using the stack name instead of full path: ``` llama stack run my-awesome-llama-stack ``` Test Plan: Verify command fails when stack doesn't exist ``` python3 -m llama_stack.cli.llama stack run my-test-stack ``` Output [FAILURE] ``` usage: llama stack run [-h] [--port PORT] [--disable-ipv6] config llama stack run: error: File /Users/vladimirivic/.llama/distributions/llamastack-my-test-stack/my-test-stack-run.yaml does not exist. Please run `llama stack build` to generate (and optionally edit) a run.yaml file ``` Create a new stack using `llama stack build`. Name it `my-test-stack`. Verify command runs successfully ``` python3 -m llama_stack.cli.llama stack run my-test-stack ``` Output [SUCCESS] ``` Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [80146] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) ```	2024-11-15 23:16:42 -08:00
Dinesh Yeduguru	57bafd0f8c	fix faiss serialize and serialize of index (#464 ) faiss serialize index returns a np object, that we first need to save to buffer and then write to sqllite. Since we are using json, we need to base64 encode the data. Same in the read path, we base64 decode and read into np array and then call into deserialize index. tests: torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 18:02:48 -08:00
Dinesh Yeduguru	ff99025875	await initialize in faiss (#463 ) tests: ``` torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py ``` Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 14:21:31 -08:00
Ashwin Bharambe	20bf2f50c2	No more model_id warnings	2024-11-15 12:20:18 -08:00
Xi Yan	e8112b31ab	move hf addapter->remote (#459 ) # What does this PR do? - move folder ## Test Plan Unit Test ``` pytest -v -s -m "huggingface" datasetio/test_datasetio.py ``` E2E ``` llama stack run ``` ``` llama-stack-client eval run_benchmark meta-reference-mmlu --num-examples 5 --output-dir ./ --eval-task-config ~/eval_task_config.json --visualize ``` <img width="657" alt="image" src="https://github.com/user-attachments/assets/63d53f9d-6c7e-4667-af8c-9d16c91ae6e3"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 22:41:19 -05:00
Xi Yan	788411b680	categorical score for llm as judge	2024-11-14 22:33:59 -05:00
Dinesh Yeduguru	0850ad656a	unregister for memory banks and remove update API (#458 ) The semantics of an Update on resources is very tricky to reason about especially for memory banks and models. The best way to go forward here is for the user to unregister and register a new resource. We don't have a compelling reason to support update APIs. Tests: pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000 pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0 $CONDA_PREFIX/bin/pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_model_registration.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-14 17:12:11 -08:00
Xi Yan	2eab3b7ed9	skip aggregation for llm_as_judge	2024-11-14 17:50:46 -05:00
Ashwin Bharambe	bba6edd06b	Fix OpenAPI generation to have text/event-stream for streamable methods	2024-11-14 12:51:38 -08:00
Ashwin Bharambe	acbecbf8b3	Add a verify-download command to llama CLI (#457 ) # What does this PR do? It is important to verify large checkpoints downloaded via `llama model download` because subtle corruptions can easily happen with large file system writes. This PR adds a `verify-download` subcommand. Note that verification itself is a very time consuming process (and will take several minutes for the 405B model), hence this is a separate subcommand (and not part of the download which can already be time-consuming) and there are spinners and a bit of a "show" around it in the implementation. ## Test Plan <img width="1012" alt="image" src="https://github.com/user-attachments/assets/f82b0d42-2a15-4917-b85e-6d3cd7d31e55">	2024-11-14 11:47:51 -08:00
Ashwin Bharambe	0713607b68	Support parallel downloads for `llama model download` (#448 ) # What does this PR do? Enables parallel downloads for `llama model download` CLI command. It is rather necessary for folks having high bandwidth connections to the Internet in order to download checkpoints quickly. ## Test Plan ![image](https://github.com/user-attachments/assets/f5df69e2-ec4f-4360-bf84-91273d8cee22)	2024-11-14 09:56:22 -08:00
Martin Hickey	0c750102c6	Fix build configure deprecation message (#456 ) # What does this PR do? Removes from the `llama build configure` deprecation message the `--configure` flag because the `llama stack run` command does not support this flag. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2024-11-14 09:56:03 -08:00
Xi Yan	58381dbe78	local persistence for eval tasks (#453 ) # What does this PR do? - add local persistence for eval tasks - follow https://github.com/meta-llama/llama-stack/pull/375 ## Test Plan 1. fresh llama stack run 2. kill server 3. restart server: llama stack run <img width="690" alt="image" src="https://github.com/user-attachments/assets/3d76e477-b91a-43a6-86ea-8e3ef2d04ed3"> Using run.yaml ```yaml eval_tasks: - eval_task_id: meta-reference-mmlu provider_id: meta-reference-0 dataset_id: mmlu scoring_functions: - basic::regex_parser_multiple_choice_answer ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 10:36:23 -05:00
Dinesh Yeduguru	46f0b6606a	init registry once (#450 ) We are calling the initialize function on the registery in the common routing table impl, which is incorrect as the common routing table is the base class inherited by each resource's routing table. this change moves remove that and add the initialize to the creation, where it inits once server run. Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 22:20:57 -08:00
Dinesh Yeduguru	efe791bab7	Support model resource updates and deletes (#452 ) # What does this PR do? * Changes the registry to store only one RoutableObject per identifier. Before it was a list, which is not really required. * Adds impl for updates and deletes * Updates routing table to handle updates correctly ## Test Plan ``` ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models register dineshyv-model --provider-model-id=fireworks/llama-v3p1-70b-instruct Successfully registered model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-70b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models update dineshyv-model --provider-model-id=fireworks/llama-v3p1-405b-instruct Successfully updated model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ llama-stack-client models delete dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ``` --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 21:55:41 -08:00
Xi Yan	4253cfcd7f	local persistent for hf dataset provider (#451 ) # What does this PR do? - local persistence for HF dataset provider - follow https://github.com/meta-llama/llama-stack/pull/375 ## Test Plan e2e 1. fresh llama stack run w/ yaml 2. kill server 3. restart llama stack run w/ yaml ```yaml datasets: - dataset_id: mmlu provider_id: huggingface-0 url: uri: https://huggingface.co/datasets/llamastack/evals metadata: path: llamastack/evals name: evals__mmlu__details split: train dataset_schema: input_query: type: string expected_answer: type: string ``` <img width="686" alt="image" src="https://github.com/user-attachments/assets/d7737931-6a7d-400a-a17d-fef6cbd97eea"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 00:08:37 -05:00
Dinesh Yeduguru	e90ea1ab1e	make distribution registry thread safe and other fixes (#449 ) This PR makes the following changes: 1) Fixes the get_all and initialize impl to actually read the values returned from the range call to kvstore and not keys. 2) The start_key and end_key are fixed to correct perform the range query after the key format changes 3) Made the cache registry thread safe since there are multiple initializes called for each routing table. Tests: * Start stack * Register dataset * Kill stack * Bring stack up * dataset list ``` llama-stack-client datasets list +--------------+---------------+---------------------------------------------------------------------------------+---------+ \| identifier \| provider_id \| metadata \| type \| +==============+===============+=================================================================================+=========+ \| alpaca \| huggingface-0 \| {} \| dataset \| +--------------+---------------+---------------------------------------------------------------------------------+---------+ \| mmlu \| huggingface-0 \| {'path': 'llama-stack/evals', 'name': 'evals__mmlu__details', 'split': 'train'} \| dataset \| +--------------+---------------+---------------------------------------------------------------------------------+---------+ ``` Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 15:12:34 -08:00
Jeff Tang	15dee2b8b8	Added link to the Colab notebook of the Llama Stack lesson on the Llama 3.2 course on DLAI (#445 ) # What does this PR do? It shows a complete zero-setup Colab using the Llama Stack server implemented and powered by together.ai: using Llama Stack Client API to run inference, agent and 3.2 models. Good for a quick start guide. - [ ] Addresses issue (#issue) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-13 13:59:41 -08:00
Dinesh Yeduguru	787e2034b7	model registration in ollama and vllm check against the available models in the provider (#446 ) tests: pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -m vllm_remote llama_stack/providers/tests/inference/test_text_inference.py --env VLLM_URL="http://localhost:9798/v1" ---------	2024-11-13 13:04:06 -08:00
Ashwin Bharambe	7f6ac2fbd7	allow seeing warnings with traces optionally	2024-11-13 12:27:19 -08:00
Ashwin Bharambe	96e7ef646f	add support for ${env.FOO_BAR} placeholders in run.yaml files (#439 ) # What does this PR do? We'd like our docker steps to require _ZERO EDITS_ to a YAML file in order to get going. This is often not possible because depending on the provider, we do need some configuration input from the user. Environment variables are the best way to obtain this information. This PR allows our run.yaml to contain `${env.FOO_BAR}` placeholders which can be replaced using `docker run -e FOO_BAR=baz` (and similar `docker compose` equivalent). ## Test Plan For remote-vllm, example `run.yaml` snippet looks like this: ```yaml providers: inference: # serves main inference model - provider_id: vllm-0 provider_type: remote::vllm config: # NOTE: replace with "localhost" if you are running in "host" network mode url: ${env.LLAMA_INFERENCE_VLLM_URL:http://host.docker.internal:5100/v1} max_tokens: ${env.MAX_TOKENS:4096} api_token: fake # serves safety llama_guard model - provider_id: vllm-1 provider_type: remote::vllm config: # NOTE: replace with "localhost" if you are running in "host" network mode url: ${env.LLAMA_SAFETY_VLLM_URL:http://host.docker.internal:5101/v1} max_tokens: ${env.MAX_TOKENS:4096} api_token: fake ``` `compose.yaml` snippet looks like this: ```yaml llamastack: depends_on: - vllm-0 - vllm-1 # image: llamastack/distribution-remote-vllm image: llamastack/distribution-remote-vllm:test-0.0.52rc3 volumes: - ~/.llama:/root/.llama - ~/local/llama-stack/distributions/remote-vllm/run.yaml:/root/llamastack-run-remote-vllm.yaml # network_mode: "host" environment: - LLAMA_INFERENCE_VLLM_URL=${LLAMA_INFERENCE_VLLM_URL:-http://host.docker.internal:5100/v1} - LLAMA_INFERENCE_MODEL=${LLAMA_INFERENCE_MODEL:-Llama3.1-8B-Instruct} - MAX_TOKENS=${MAX_TOKENS:-4096} - SQLITE_STORE_DIR=${SQLITE_STORE_DIR:-$HOME/.llama/distributions/remote-vllm} - LLAMA_SAFETY_VLLM_URL=${LLAMA_SAFETY_VLLM_URL:-http://host.docker.internal:5101/v1} - LLAMA_SAFETY_MODEL=${LLAMA_SAFETY_MODEL:-Llama-Guard-3-1B} ```	2024-11-13 11:25:58 -08:00
Sarthak Deshpande	838b8d4fb5	PR-437-Fixed bug to allow system instructions after first turn (#440 ) # What does this PR do? In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. - [This PR solves the issue where agents cannot keep track of instructions after executing the first turn because system instructions were not getting appended in the messages list. It also solves the issue where turns are not being fetched in the appropriate sequence.] Addresses issue (#issue) ## Test Plan Please describe: - I have a file which has a precise prompt which requires more than one turn to be executed will share the file below. I ran that file as a python script to make sure that the turns are being executed as per the instructions after making the code change ``` import asyncio from typing import List, Optional, Dict from llama_stack_client import LlamaStackClient from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types import SamplingParams, UserMessage from llama_stack_client.types.agent_create_params import AgentConfig LLAMA_STACK_API_TOGETHER_URL="http://10.12.79.177:5001" class Agent: def __init__(self): self.client = LlamaStackClient( base_url=LLAMA_STACK_API_TOGETHER_URL, ) def create_agent(self, agent_config: AgentConfig): agent = self.client.agents.create( agent_config=agent_config, ) self.agent_id = agent.agent_id session = self.client.agents.session.create( agent_id=agent.agent_id, session_name="example_session", ) self.session_id = session.session_id async def execute_turn(self, content: str): response = self.client.agents.turn.create( agent_id=self.agent_id, session_id=self.session_id, messages=[ UserMessage(content=content, role="user"), ], stream=True, ) for chunk in response: if chunk.event.payload.event_type != "turn_complete": yield chunk async def run_main(): system_prompt="""You are an AI Agent tasked with Capturing Book Renting Information for a Library. You will politely gather the book and user details one step at a time to send over the book to the user. Here’s how to proceed: 1. Data Security: Inform the user that their data will be kept secure. 2. Optional Participation: Let them know they are not required to share details but that doing so will help them learn about the books offered. 3. Sequential Information Capture: Follow the steps below, one question at a time. Do not skip or combine questions. Steps Step 1: Politely ask to provide the name of the book. Step 2: Ask for the name of the author. Step 3: Ask for the Author's country. Step 4: Ask for the year of publication. Step 5: If any information is missing or seems incorrect, ask the user to re-enter that specific detail. Step 6: Confirm that the user consents to share the entered information. Step 7: Thank the user for providing the details and let them know they will receive an email about the book. Do not do any validation of the user entered information. Do not print the Steps or your internal thoughts in the response. Do not print the prompts or data structure object in the response Do not fill in the requested user data on your own. It has to be entered by the user only. Finally, compile and print the user-provided information as a JSON object in your response. """ agent_config = AgentConfig( model="Llama3.2-11B-Vision-Instruct", instructions=system_prompt, enable_session_persistence=True, ) agent = Agent() agent.create_agent(agent_config) print("Agent and Session:", agent.agent_id, agent.session_id) while True: query = input("Enter your query (or type 'exit' to quit): ") if query.lower() == "exit": print("Exiting the loop.") break else: prompt = query print(f"User> {prompt}") response = agent.execute_turn(content=prompt) async for log in EventLogger().log(response): if log is not None: log.print() if __name__ == "__main__": asyncio.run(run_main()) ``` Below is a screenshot of the results of the first commit <img width="1770" alt="Screenshot 2024-11-13 at 3 15 29 PM" src="https://github.com/user-attachments/assets/1a7a090d-fc92-49cc-a786-bfc812e3d9cc"> Below is a screenshot of the results of the second commit <img width="1792" alt="Screenshot 2024-11-13 at 6 40 56 PM" src="https://github.com/user-attachments/assets/a9474f75-cd8c-4d49-82cd-5ff81ff12b07"> Also a screenshot of print statement to show that the turns being fetched now are in a sequence <img width="1783" alt="Screenshot 2024-11-13 at 6 42 22 PM" src="https://github.com/user-attachments/assets/b906404e-a3e4-48a2-b893-69f36bbdcb98"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-11-13 10:34:04 -08:00
Xi Yan	94a6f57812	change schema -> dataset_schema for register_dataset api (#443 ) # What does this PR do? - API updates: change schema to dataset_schema for register_dataset for resolving pydantic naming conflict - Note: this OpenAPI update will be synced with llama-stack-client-python SDK. cc @dineshyv ## Test Plan ``` pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-13 11:17:46 -05:00
Xi Yan	d5b1202c83	change schema -> dataset_schema (#442 ) # What does this PR do? - `schema` should not a field w/ pydantic warnings - change `schema` to `dataset_schema` <img width="855" alt="image" src="https://github.com/user-attachments/assets/47cb6bb9-4be0-46a5-8701-24d24e2eaabd"> ## Test Plan ``` pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-13 10:58:12 -05:00
Xi Yan	c29fa56dde	add inline:: prefix for localfs provider (#441 ) # What does this PR do? - add inline:: prefix for localfs provider ## Test Plan ``` llama stack run datasetio: - provider_id: localfs-0 provider_type: inline::localfs config: {} ``` ``` pytest -v -s -m meta_reference_eval_fireworks_inference eval/test_eval.py pytest -v -s -m localfs datasetio/test_datasetio.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-13 10:44:39 -05:00
Ashwin Bharambe	36b052ab10	slightly update README.md	2024-11-12 22:11:46 -08:00

1 2 3 4 5 ...

535 commits