llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-06 20:44:58 +00:00

Author	SHA1	Message	Date
Matthew Farrellee	435f34b05e	reduce the accuracy requirements to pass the chat completion structured output test (#522 ) i find `test_structured_output` to be flakey. it's both a functionality and accuracy test - ``` answer = AnswerFormat.model_validate_json(response.completion_message.content) assert answer.first_name == "Michael" assert answer.last_name == "Jordan" assert answer.year_of_birth == 1963 assert answer.num_seasons_in_nba == 15 ``` it's an accuracy test because it checks the value of first/last name, birth year, and num seasons. i find that - - llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality portion - llama-3.2-3b-instruct consistently fails the accuracy portion (thinking MJ was in the NBA for 14 seasons) - llama-3.1-8b-instruct occasionally fails the accuracy portion suggestions (not mutually exclusive) - 1. turn the test into functionality only, skip the value checks 2. split the test into a functionality version and an xfail accuracy version 3. add context to the prompt so the llm can answer without accessing embedded memory # What does this PR do? implements option (3) by adding context to the system prompt. ## Test Plan `pytest -s -v ... llama_stack/providers/tests/inference/ ... -k structured_output` ## Before submitting - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-12-03 02:55:14 -08:00
Matthew Farrellee	060b4eb776	allow env NVIDIA_BASE_URL to set NVIDIAConfig.url (#531 ) # What does this PR do? this allows setting an NVIDIA_BASE_URL variable to control the NVIDIAConfig.url option ## Test Plan `pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_BASE_URL=http://localhost:8000` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 17:46:44 -08:00
Xi Yan	50cc165077	fixes tests & move braintrust api_keys to request headers (#535 ) # What does this PR do? - braintrust scoring provider requires OPENAI_API_KEY env variable to be set - move this to be able to be set as request headers (e.g. like together / fireworks api keys) - fixes pytest with agents dependency ## Test Plan E2E ``` llama stack run ``` ```yaml scoring: - provider_id: braintrust-0 provider_type: inline::braintrust config: {} ``` Client ```python self.client = LlamaStackClient( base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:5000"), provider_data={ "openai_api_key": os.environ.get("OPENAI_API_KEY", ""), }, ) ``` - run `llama-stack-client eval run_scoring` Unit Test ``` pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py ``` ``` pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py --env OPENAI_API_KEY=$OPENAI_API_KEY ``` <img width="745" alt="image" src="https://github.com/user-attachments/assets/68f5cdda-f6c8-496d-8b4f-1b3dabeca9c2"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 13:11:21 -08:00
Xi Yan	d3956a1d22	fix description	2024-11-25 22:02:45 -08:00
Xi Yan	2936133f95	precommit	2024-11-25 18:55:54 -08:00
Xi Yan	bbd81231ce	add missing __init__	2024-11-25 17:23:27 -08:00
Dinesh Yeduguru	de7af28756	Tgi fixture (#519 ) # What does this PR do? * Add a test fixture for tgi * Fixes the logic to correctly pass the llama model for chat completion Fixes #514 ## Test Plan pytest -k "tgi" llama_stack/providers/tests/inference/test_text_inference.py --env TGI_URL=http://localhost:$INFERENCE_PORT --env TGI_API_TOKEN=$HF_TOKEN	2024-11-25 13:17:02 -08:00
Xi Yan	60cb7f64af	add missing __init__	2024-11-25 09:42:46 -08:00
Matthew Farrellee	4e6c984c26	add NVIDIA NIM inference adapter (#355 ) # What does this PR do? this PR adds a basic inference adapter to NVIDIA NIMs what it does - - chat completion api - tool calls - streaming - structured output - logprobs - support hosted NIM on integrate.api.nvidia.com - support downloaded NIM containers what it does not do - - completion api - embedding api - vision models - builtin tools - have certainty that sampling strategies are correct ## Feature/Issue validation/testing/test plan `pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_API_KEY=...` all tests should pass. there are pydantic v1 warnings. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? - [x] Did you write any new necessary tests? Thanks for contributing 🎉!	2024-11-23 15:59:00 -08:00
Ashwin Bharambe	707da55c23	Fix TGI register_model() issue	2024-11-23 08:47:05 -08:00
Martin Hickey	76fc5d9f31	Update Ollama supported llama model list (#483 ) # What does this PR do? Update the llama model supported list for Ollama. - [x] Addresses issue (#462) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2024-11-22 21:56:43 -08:00
Dinesh Yeduguru	501e7c9d64	Fix opentelemetry adapter (#510 ) # What does this PR do? This PR fixes some of the issues with our telemetry setup to enable logs to be delivered to opentelemetry and jaeger. Main fixes 1) Updates the open telemetry provider to use the latest oltp exports instead of deprected ones. 2) Adds a tracing middleware, which injects traces into each HTTP request that the server recieves and this is going to be the root trace. Previously, we did this in the create_dynamic_route method, which is actually not the actual exectuion flow, but more of a config and this causes the traces to end prematurely. Through middleware, we plugin the trace start and end at the right location. 3) We manage our own methods to create traces and spans and this does not fit well with Opentelemetry SDK since it does not support provide a way to take in traces and spans that are already created. it expects us to use the SDK to create them. For now, I have a hacky approach of just maintaining a map from our internal telemetry objects to the open telemetry specfic ones. This is not the ideal solution. I will explore other ways to get around this issue. for now, to have something that works, i am going to keep this as is. Addresses: #509	2024-11-22 18:18:11 -08:00
Ashwin Bharambe	97dc5b68e5	model -> model_id for TGI	2024-11-22 15:40:08 -08:00
Dalton Flanagan	b007b062f3	Fix `llama stack build` in 0.0.54 (#505 ) # What does this PR do? Safety provider `inline::meta-reference` is now deprecated. However, we * aren't checking / printing the deprecation message in `llama stack build` * make the deprecated (unusable) provider So I (1) added checking and (2) made `inline::llama-guard` the default ## Test Plan Before ``` Traceback (most recent call last): File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module> sys.exit(main()) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main parser.run(args) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run args.func(args) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command self._run_stack_build_command_from_build_config(build_config) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 305, in _run_stack_build_command_from_build_config self._generate_run_config(build_config, build_dir) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 226, in _generate_run_config config_type = instantiate_class_type( File "/home/dalton/all/llama-stack/llama_stack/distribution/utils/dynamic.py", line 12, in instantiate_class_type module = importlib.import_module(module_name) File "/home/dalton/.conda/envs/nov22/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'llama_stack.providers.inline.safety.meta_reference' ``` After ``` Traceback (most recent call last): File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module> sys.exit(main()) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main parser.run(args) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run args.func(args) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command self._run_stack_build_command_from_build_config(build_config) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 309, in _run_stack_build_command_from_build_config self._generate_run_config(build_config, build_dir) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 228, in _generate_run_config raise InvalidProviderError(p.deprecation_error) llama_stack.distribution.resolver.InvalidProviderError: Provider `inline::meta-reference` for API `safety` does not work with the latest Llama Stack. - if you are using Llama Guard v3, please use the `inline::llama-guard` provider instead. - if you are using Prompt Guard, please use the `inline::prompt-guard` provider instead. - if you are using Code Scanner, please use the `inline::code-scanner` provider instead. ``` <img width="469" alt="Screenshot 2024-11-22 at 4 10 24 PM" src="https://github.com/user-attachments/assets/8c2e09fe-379a-4504-b246-7925f80a6ed6"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-22 16:23:44 -05:00
Ashwin Bharambe	c1025ebfdb	Delete some dead code	2024-11-21 15:20:06 -08:00
Ashwin Bharambe	a0a00f1345	Update telemetry to have TEXT be the default log format	2024-11-21 15:18:45 -08:00
Xi Yan	945db5dac2	fix logging	2024-11-21 15:02:57 -08:00
Ashwin Bharambe	d790be28b3	Don't skip meta-reference for the tests	2024-11-21 13:29:53 -08:00
Xi Yan	654722da7d	fix model id for llm_as_judge_405b	2024-11-21 11:34:49 -08:00
Dinesh Yeduguru	6395dadc2b	use logging instead of prints (#499 ) # What does this PR do? This PR moves all print statements to use logging. Things changed: - Had to add `await start_trace("sse_generator")` to server.py to actually get tracing working. else was not seeing any logs - If no telemetry provider is provided in the run.yaml, we will write to stdout - by default, the logs are going to be in JSON, but we expose an option to configure to output in a human readable way.	2024-11-21 11:32:53 -08:00
liyunlu0618	4e1105e563	Fix fp8 quantization script. (#500 ) # What does this PR do? Fix fp8 quantization script. ## Test Plan ``` sh run_quantize_checkpoint.sh localhost fp8 /home/yll/fp8_test/ /home/yll/fp8_test/quantized_2 /home/yll/fp8_test/tokenizer.model 1 1 ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Yunlu Li <yll@meta.com>	2024-11-21 09:15:28 -08:00
Ashwin Bharambe	cd6ccb664c	Integrate distro docs into the restructured docs	2024-11-20 23:20:05 -08:00
Ashwin Bharambe	2411a44833	Update more distribution docs to be simpler and partially codegen'ed	2024-11-20 22:03:44 -08:00
Ashwin Bharambe	e84d4436b5	Since we are pushing for HF repos, we should accept them in inference configs (#497 ) # What does this PR do? As the title says. ## Test Plan This needs `8752149f58` to also land. So the next package (0.0.54) will make this work properly. The test is: ```bash pytest -v -s -m "llama_3b and meta_reference" test_model_registration.py ```	2024-11-20 16:14:37 -08:00
Dinesh Yeduguru	91e7efbc91	fall to back to read from chroma/pgvector when not in cache (#489 ) # What does this PR do? The chroma provider maintains a cache but does not sync up with chroma on a cold start. this change adds a fallback to read from chroma on a cache miss. ## Test Plan ```bash #start stack llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml # Add documents PYTHONPATH=. python -m examples.agents.rag_with_memory_bank localhost 5000 No available shields. Disable safety. Using model: Llama3.1-8B-Instruct Created session_id=b951b14f-a9d2-43a3-8b80-d80114d58322 for Agent(0687a251-6906-4081-8d4c-f52e19db9dd7) memory_retrieval> Retrieved context from banks: ['test_bank']. ==== Here are the retrieved documents for relevant context: === START-RETRIEVED-CONTEXT === id:num-1; content:_ the template from Llama2 to better support multiturn conversations. The same text in the Lla... > inference> Based on the retrieved documentation, the top 5 topics that were explained are: ............... # Kill stack # Bootup stack llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml # Run a RAG app with just the agent flow. it discovers the previously added documents No available shields. Disable safety. Using model: Llama3.1-8B-Instruct Created session_id=7a30c1a7-c87e-4787-936c-d0306589fe5d for Agent(b30420f3-c928-498a-887b-d084f0f3806c) memory_retrieval> Retrieved context from banks: ['test_bank']. ==== Here are the retrieved documents for relevant context: === START-RETRIEVED-CONTEXT === id:num-1; content:_ the template from Llama2 to better support multiturn conversations. The same text in the Lla... > inference> Based on the provided documentation, the top 5 topics that were explained are: ..... ```	2024-11-20 10:30:23 -08:00
Mengtao Yuan	1086b500f9	Support Tavily as built-in search tool. (#485 ) # What does this PR do? Add Tavily as a built-in search tool, in addition to Brave and Bing. ## Test Plan It's tested using ollama remote, showing parity to the Brave search tool. - Install and run ollama with `ollama run llama3.1:8b-instruct-fp16` - Build ollama distribution `llama stack build --template ollama --image-type conda` - Run ollama `stack run /$USER/.llama/distributions/llamastack-ollama/ollama-run.yaml --port 5001` - Client test command: `python - m agents.test_agents.TestAgents.test_create_agent_turn_with_tavily_search`, with enviroments: MASTER_ADDR=0.0.0.0;MASTER_PORT=5001;RANK=0;REMOTE_STACK_HOST=0.0.0.0;REMOTE_STACK_PORT=5001;TAVILY_SEARCH_API_KEY=tvly-<YOUR-KEY>;WORLD_SIZE=1 Test passes on the specific case (ollama remote). Server output: ``` Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [7220] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) INFO: 127.0.0.1:65209 - "POST /agents/create HTTP/1.1" 200 OK INFO: 127.0.0.1:65210 - "POST /agents/session/create HTTP/1.1" 200 OK INFO: 127.0.0.1:65211 - "POST /agents/turn/create HTTP/1.1" 200 OK role='user' content='What are the latest developments in quantum computing?' context=None role='assistant' content='' stop_reason=<StopReason.end_of_turn: 'end_of_turn'> tool_calls=[ToolCall(call_id='fc92ccb8-1039-4ce8-ba5e-8f2b0147661c', tool_name=<BuiltinTool.brave_search: 'brave_search'>, arguments={'query': 'latest developments in quantum computing'})] role='ipython' call_id='fc92ccb8-1039-4ce8-ba5e-8f2b0147661c' tool_name=<BuiltinTool.brave_search: 'brave_search'> content='{"query": "latest developments in quantum computing", "top_k": [{"title": "IBM Unveils 400 Qubit-Plus Quantum Processor and Next-Generation IBM ...", "url": "https://newsroom.ibm.com/2022-11-09-IBM-Unveils-400-Qubit-Plus-Quantum-Processor-and-Next-Generation-IBM-Quantum-System-Two", "content": "This system is targeted to be online by the end of 2023 and will be a building b...<more>...onnect large-scale ...", "url": "https://news.mit.edu/2023/quantum-interconnects-photon-emission-0105", "content": "Quantum computers hold the promise of performing certain tasks that are intractable even on the world\'s most powerful supercomputers. In the future, scientists anticipate using quantum computing to emulate materials systems, simulate quantum chemistry, and optimize hard tasks, with impacts potentially spanning finance to pharmaceuticals.", "score": 0.71721, "raw_content": null}]}' Assistant: The latest developments in quantum computing include: * IBM unveiling its 400 qubit-plus quantum processor and next-generation IBM Quantum System Two, which will be a building block of quantum-centric supercomputing. * The development of utility-scale quantum computing, which can serve as a scientific tool to explore utility-scale classes of problems in chemistry, physics, and materials beyond brute force classical simulation of quantum mechanics. * The introduction of advanced hardware across IBM's global fleet of 100+ qubit systems, as well as easy-to-use software that users and computational scientists can now obtain reliable results from quantum systems as they map increasingly larger and more complex problems to quantum circuits. * Research on quantum repeaters, which use defects in diamond to interconnect quantum systems and could provide the foundation for scalable quantum networking. * The development of a new source of quantum light, which could be used to improve the efficiency of quantum computers. * The creation of a new mathematical "blueprint" that is accelerating fusion device development using Dyson maps. * Research on canceling noise to improve quantum devices, with MIT researchers developing a protocol to extend the life of quantum coherence. ``` Verified with tool response. The final model response is updated with the search requests. ## Sources ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Martin Yuan <myuan@meta.com>	2024-11-19 20:59:02 -08:00
Ashwin Bharambe	7bfcfe80b5	Add logs (prints :/) to dump out what URL vllm / tgi is connecting to	2024-11-19 15:50:26 -08:00
Xi Yan	2da93c8835	fix 3.2-1b fireworks	2024-11-19 14:20:07 -08:00
Xi Yan	185df4b568	fix fireworks registration	2024-11-19 14:09:00 -08:00
Ashwin Bharambe	38ba3b9f0c	Fix fireworks stream completion	2024-11-19 13:36:14 -08:00
Ashwin Bharambe	05d1ead02f	Update condition in tests to handle llama-3.1 vs llama3.1 (HF names)	2024-11-19 13:25:36 -08:00
Ashwin Bharambe	84d5f35a48	Update the model alias for llama guard models in ollama	2024-11-19 00:22:24 -08:00
Dinesh Yeduguru	02f1c47416	support adding alias for models without hf repo/sku entry (#481 ) # What does this PR do? adds a new method build_model_alias_with_just_llama_model which is needed for cases like ollama's quantized models which do not really have a repo in hf and an entry in SKU list. ## Test Plan pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_text_inference.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 23:50:18 -08:00
Xi Yan	6765fd76ff	fix llama stack build for together & llama stack build from templates (#479 ) # What does this PR do? - Fix issue w/ llama stack build using together template <img width="669" alt="image" src="https://github.com/user-attachments/assets/1cbef052-d902-40b9-98f8-37efb494d117"> - For builds from templates, copy over the `templates/<template-name>/run.yaml` file to the `~/.llama/distributions/<name>/<name>-run.yaml` instead of re-building run config. ## Test Plan ``` $ llama stack build --template together --image-type conda .. Build spec configuration saved at /opt/anaconda3/envs/llamastack-together/together-build.yaml Build Successful! Next steps: 1. Set the environment variables: LLAMASTACK_PORT, TOGETHER_API_KEY 2. `llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml` ``` ``` $ llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml ``` ``` $ llama-stack-client models list $ pytest -v -s -m remote agents/test_agents.py --env REMOTE_STACK_URL=http://localhost:5000 --inference-model meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo ``` <img width="764" alt="image" src="https://github.com/user-attachments/assets/b805b6c5-a316-4561-8fe3-24fc3b1f8b80"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 22:29:16 -08:00
Ashwin Bharambe	ea52a3ee1c	minor enhancement for test fixtures	2024-11-18 22:21:17 -08:00
Kai Wu	d2b7c5aeae	add quantized model ollama support (#471 ) # What does this PR do? add more quantized model support for ollama. - [ ] Addresses issue (#issue) ## Test Plan Tested with ollama docker that run llama3.2 3b 4bit model. ``` root@docker-desktop:/# ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:3b a80c4f17acd5 3.5 GB 100% CPU 3 minutes from now ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 18:55:23 -08:00
Xi Yan	50d539e6d7	update tests --inference-model to hf id	2024-11-18 17:36:58 -08:00
Dinesh Yeduguru	57a9b4d57f	Allow models to be registered as long as llama model is provided (#472 ) This PR allows models to be registered with provider as long as the user specifies a llama model, even though the model does not match our prebuilt provider specific mapping. Test: pytest -v -s llama_stack/providers/tests/inference/test_model_registration.py -m "together" --env TOGETHER_API_KEY=<KEY> --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 15:05:29 -08:00
Ashwin Bharambe	2a31163178	Auto-generate distro yamls + docs (#468 ) # What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.	2024-11-18 14:57:06 -08:00
Xi Yan	0784284ab5	[Agentic Eval] add ability to run agents generation (#469 ) # What does this PR do? - add ability to run agents generation for full eval (generate + scoring) - pre-register SimpleQA benchmark llm-as-judge scoring function in code ## Test Plan ![image](https://github.com/user-attachments/assets/b4b6f086-1be4-4c2a-8ab0-6839f0067c0a) ![image](https://github.com/user-attachments/assets/05bb7a09-2d7a-4031-8eb6-e1ca670ee439) #### Simple QA w/ Search ![image](https://github.com/user-attachments/assets/0a51e3f3-9fc7-479b-8295-89aed63496e0) - eval_task_config_simpleqa_search.json ```json { "type": "benchmark", "eval_candidate": { "type": "agent", "config": { "model": "Llama3.1-405B-Instruct", "instructions": "Please use the search tool to answer the question.", "sampling_params": { "strategy": "greedy", "temperature": 1.0, "top_p": 0.9 }, "tools": [ { "type": "brave_search", "engine": "brave", "api_key": "API_KEY" } ], "tool_choice": "auto", "tool_prompt_format": "json", "input_shields": [], "output_shields": [], "enable_session_persistence": false } } } ``` #### SimpleQA w/o Search ![image](https://github.com/user-attachments/assets/6301feef-2abb-4bee-b50c-97da1c90482b) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-18 11:43:03 -08:00
Dinesh Yeduguru	57bafd0f8c	fix faiss serialize and serialize of index (#464 ) faiss serialize index returns a np object, that we first need to save to buffer and then write to sqllite. Since we are using json, we need to base64 encode the data. Same in the read path, we base64 decode and read into np array and then call into deserialize index. tests: torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 18:02:48 -08:00
Dinesh Yeduguru	ff99025875	await initialize in faiss (#463 ) tests: ``` torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss" llama_stack/providers/tests/memory/test_memory.py ``` Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-15 14:21:31 -08:00
Xi Yan	e8112b31ab	move hf addapter->remote (#459 ) # What does this PR do? - move folder ## Test Plan Unit Test ``` pytest -v -s -m "huggingface" datasetio/test_datasetio.py ``` E2E ``` llama stack run ``` ``` llama-stack-client eval run_benchmark meta-reference-mmlu --num-examples 5 --output-dir ./ --eval-task-config ~/eval_task_config.json --visualize ``` <img width="657" alt="image" src="https://github.com/user-attachments/assets/63d53f9d-6c7e-4667-af8c-9d16c91ae6e3"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 22:41:19 -05:00
Xi Yan	788411b680	categorical score for llm as judge	2024-11-14 22:33:59 -05:00
Dinesh Yeduguru	0850ad656a	unregister for memory banks and remove update API (#458 ) The semantics of an Update on resources is very tricky to reason about especially for memory banks and models. The best way to go forward here is for the user to unregister and register a new resource. We don't have a compelling reason to support update APIs. Tests: pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000 pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0 $CONDA_PREFIX/bin/pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_model_registration.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-14 17:12:11 -08:00
Xi Yan	2eab3b7ed9	skip aggregation for llm_as_judge	2024-11-14 17:50:46 -05:00
Xi Yan	58381dbe78	local persistence for eval tasks (#453 ) # What does this PR do? - add local persistence for eval tasks - follow https://github.com/meta-llama/llama-stack/pull/375 ## Test Plan 1. fresh llama stack run 2. kill server 3. restart server: llama stack run <img width="690" alt="image" src="https://github.com/user-attachments/assets/3d76e477-b91a-43a6-86ea-8e3ef2d04ed3"> Using run.yaml ```yaml eval_tasks: - eval_task_id: meta-reference-mmlu provider_id: meta-reference-0 dataset_id: mmlu scoring_functions: - basic::regex_parser_multiple_choice_answer ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 10:36:23 -05:00
Dinesh Yeduguru	efe791bab7	Support model resource updates and deletes (#452 ) # What does this PR do? * Changes the registry to store only one RoutableObject per identifier. Before it was a list, which is not really required. * Adds impl for updates and deletes * Updates routing table to handle updates correctly ## Test Plan ``` ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models register dineshyv-model --provider-model-id=fireworks/llama-v3p1-70b-instruct Successfully registered model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-70b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models update dineshyv-model --provider-model-id=fireworks/llama-v3p1-405b-instruct Successfully updated model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ llama-stack-client models delete dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ``` --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 21:55:41 -08:00
Xi Yan	4253cfcd7f	local persistent for hf dataset provider (#451 ) # What does this PR do? - local persistence for HF dataset provider - follow https://github.com/meta-llama/llama-stack/pull/375 ## Test Plan e2e 1. fresh llama stack run w/ yaml 2. kill server 3. restart llama stack run w/ yaml ```yaml datasets: - dataset_id: mmlu provider_id: huggingface-0 url: uri: https://huggingface.co/datasets/llamastack/evals metadata: path: llamastack/evals name: evals__mmlu__details split: train dataset_schema: input_query: type: string expected_answer: type: string ``` <img width="686" alt="image" src="https://github.com/user-attachments/assets/d7737931-6a7d-400a-a17d-fef6cbd97eea"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-14 00:08:37 -05:00
Dinesh Yeduguru	787e2034b7	model registration in ollama and vllm check against the available models in the provider (#446 ) tests: pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -m vllm_remote llama_stack/providers/tests/inference/test_text_inference.py --env VLLM_URL="http://localhost:9798/v1" ---------	2024-11-13 13:04:06 -08:00

... 3 4 5 6 7 ...

416 commits