llama-stack

640 commits 21 branches 64 tags 62 MiB

Author	SHA1	Message	Date
Matthew Farrellee	435f34b05e	reduce the accuracy requirements to pass the chat completion structured output test (#522 ) i find `test_structured_output` to be flakey. it's both a functionality and accuracy test - ``` answer = AnswerFormat.model_validate_json(response.completion_message.content) assert answer.first_name == "Michael" assert answer.last_name == "Jordan" assert answer.year_of_birth == 1963 assert answer.num_seasons_in_nba == 15 ``` it's an accuracy test because it checks the value of first/last name, birth year, and num seasons. i find that - - llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality portion - llama-3.2-3b-instruct consistently fails the accuracy portion (thinking MJ was in the NBA for 14 seasons) - llama-3.1-8b-instruct occasionally fails the accuracy portion suggestions (not mutually exclusive) - 1. turn the test into functionality only, skip the value checks 2. split the test into a functionality version and an xfail accuracy version 3. add context to the prompt so the llm can answer without accessing embedded memory # What does this PR do? implements option (3) by adding context to the system prompt. ## Test Plan `pytest -s -v ... llama_stack/providers/tests/inference/ ... -k structured_output` ## Before submitting - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-12-03 02:55:14 -08:00
Matthew Farrellee	4e6c984c26	add NVIDIA NIM inference adapter (#355 ) # What does this PR do? this PR adds a basic inference adapter to NVIDIA NIMs what it does - - chat completion api - tool calls - streaming - structured output - logprobs - support hosted NIM on integrate.api.nvidia.com - support downloaded NIM containers what it does not do - - completion api - embedding api - vision models - builtin tools - have certainty that sampling strategies are correct ## Feature/Issue validation/testing/test plan `pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_API_KEY=...` all tests should pass. there are pydantic v1 warnings. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? - [x] Did you write any new necessary tests? Thanks for contributing 🎉!	2024-11-23 15:59:00 -08:00
Ashwin Bharambe	d790be28b3	Don't skip meta-reference for the tests	2024-11-21 13:29:53 -08:00
Ashwin Bharambe	05d1ead02f	Update condition in tests to handle llama-3.1 vs llama3.1 (HF names)	2024-11-19 13:25:36 -08:00
Ashwin Bharambe	12947ac19e	Kill "remote" providers and fix testing with a remote stack properly (#435 ) # What does this PR do? This PR kills the notion of "pure passthrough" remote providers. You cannot specify a single provider you must specify a whole distribution (stack) as remote. This PR also significantly fixes / upgrades testing infrastructure so you can now test against a remotely hosted stack server by just doing ```bash pytest -s -v -m remote test_agents.py \ --inference-model=Llama3.1-8B-Instruct --safety-shield=Llama-Guard-3-1B \ --env REMOTE_STACK_URL=http://localhost:5001 ``` Also fixed `test_agents_persistence.py` (which was broken) and killed some deprecated testing functions. ## Test Plan All the tests.	2024-11-12 21:51:29 -08:00
Dinesh Yeduguru	fdff24e77a	Inference to use provider resource id to register and validate (#428 ) This PR changes the way model id gets translated to the final model name that gets passed through the provider. Major changes include: 1) Providers are responsible for registering an object and as part of the registration returning the object with the correct provider specific name of the model provider_resource_id 2) To help with the common look ups different names a new ModelLookup class is created. Tested all inference providers including together, fireworks, vllm, ollama, meta reference and bedrock	2024-11-12 20:02:00 -08:00
Dinesh Yeduguru	ec644d3418	migrate model to Resource and new registration signature (#410 ) * resource oriented object design for models * add back llama_model field * working tests * register singature fix * address feedback --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-08 16:12:57 -08:00
Ashwin Bharambe	7c340f0236	rename test_inference -> test_text_inference	2024-11-06 16:12:50 -08:00

Renamed from llama_stack/providers/tests/inference/test_inference.py (Browse further)

8 commits