llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-17 14:59:48 +00:00

Author	SHA1	Message	Date
Xi Yan	a097bfa761	override faiss memory provider only in run.yaml	2024-12-03 20:41:44 -08:00
Xi Yan	eeb914fe4d	add all providers	2024-12-03 20:37:36 -08:00
Xi Yan	e6ed7eabbb	add all providers	2024-12-03 20:31:34 -08:00
Xi Yan	1dc337db8b	Merge branch 'playground-ui' into ui-compose	2024-12-03 19:12:16 -08:00
Xi Yan	2da1e742e8	Merge branch 'main' into playground-ui	2024-12-03 19:11:49 -08:00
Xi Yan	6e10d0b23e	precommit	2024-12-03 18:52:43 -08:00
Xi Yan	fd19a8a517	add missing __init__	2024-12-03 18:50:18 -08:00
Xi Yan	3fc6b10d22	autogen build/run	2024-12-03 17:04:35 -08:00
Xi Yan	95187891ca	add eval provider to distro	2024-12-03 17:01:33 -08:00
Xi Yan	f32092178e	native eval flow refactor	2024-12-03 16:29:43 -08:00
Xi Yan	92f79d4dfb	expander refactor	2024-12-03 16:20:31 -08:00
Xi Yan	e245f459bb	requirements	2024-12-03 16:05:01 -08:00
Matthew Farrellee	435f34b05e	reduce the accuracy requirements to pass the chat completion structured output test (#522 ) i find `test_structured_output` to be flakey. it's both a functionality and accuracy test - ``` answer = AnswerFormat.model_validate_json(response.completion_message.content) assert answer.first_name == "Michael" assert answer.last_name == "Jordan" assert answer.year_of_birth == 1963 assert answer.num_seasons_in_nba == 15 ``` it's an accuracy test because it checks the value of first/last name, birth year, and num seasons. i find that - - llama-3.1-8b-instruct and llama-3.2-3b-instruct pass the functionality portion - llama-3.2-3b-instruct consistently fails the accuracy portion (thinking MJ was in the NBA for 14 seasons) - llama-3.1-8b-instruct occasionally fails the accuracy portion suggestions (not mutually exclusive) - 1. turn the test into functionality only, skip the value checks 2. split the test into a functionality version and an xfail accuracy version 3. add context to the prompt so the llm can answer without accessing embedded memory # What does this PR do? implements option (3) by adding context to the system prompt. ## Test Plan `pytest -s -v ... llama_stack/providers/tests/inference/ ... -k structured_output` ## Before submitting - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2024-12-03 02:55:14 -08:00
Xi Yan	114595ce71	navigation	2024-12-02 20:11:32 -08:00
Xi Yan	06b9566eb6	more msg	2024-12-02 16:14:04 -08:00
Xi Yan	0e718b9712	native eval	2024-12-02 15:49:34 -08:00
Xi Yan	b59810cd9a	native eval	2024-12-02 15:38:58 -08:00
Xi Yan	de2ab1243a	native eval	2024-12-02 14:36:17 -08:00
Xi Yan	2f7e39fb10	fix	2024-12-02 13:20:23 -08:00
Xi Yan	6bdad37372	readme	2024-12-02 13:14:15 -08:00
Xi Yan	3335bcd83d	cleanup	2024-12-02 13:12:44 -08:00
Xi Yan	7f2ed9622c	cleanup	2024-12-02 13:06:36 -08:00
Xi Yan	9bceb1912e	Merge branch 'main' into playground-ui	2024-12-02 12:44:50 -08:00
Jeffrey Lind	5fc2ee6f77	Fix URLs to Llama Stack Read the Docs Webpages (#547 ) # What does this PR do? Many of the URLs pointing to the Llama Stack's Read The Docs webpages were broken, presumably due to recent refactor of the documentation. This PR fixes all effected URLs throughout the repository.	2024-11-29 10:11:50 -06:00
Xi Yan	9bb6c1346b	rag page	2024-11-27 16:56:57 -08:00
Xi Yan	2ecbbd92ed	rag page	2024-11-27 16:52:07 -08:00
Xi Yan	5d9faca81b	distribution inspect	2024-11-27 16:03:58 -08:00
Xi Yan	73335e4aaf	playground	2024-11-27 15:31:57 -08:00
Xi Yan	68b70d1b1f	playground	2024-11-27 15:27:10 -08:00
Xi Yan	c544e4b015	chat playground	2024-11-27 15:11:27 -08:00
Xi Yan	b1a63df8cd	move playground ui to llama-stack repo (#536 ) # What does this PR do? - Move Llama Stack Playground UI to llama-stack repo under llama_stack/distribution - Original PR in llama-stack-apps: https://github.com/meta-llama/llama-stack-apps/pull/127 ## Test Plan ``` cd llama-stack/llama_stack/distribution/ui streamlit run app.py ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 22:04:21 -08:00
Xi Yan	371259ca5b	readme	2024-11-26 22:02:29 -08:00
Xi Yan	8840cf1d9a	readme	2024-11-26 20:16:39 -08:00
Xi Yan	2c8a7a972c	rename playground-ui -> ui	2024-11-26 20:15:41 -08:00
Xi Yan	d467638f26	move playground ui to llama-stack repo	2024-11-26 19:57:00 -08:00
Xi Yan	c2cfd2261e	move playground ui to llama-stack repo	2024-11-26 19:54:24 -08:00
Matthew Farrellee	060b4eb776	allow env NVIDIA_BASE_URL to set NVIDIAConfig.url (#531 ) # What does this PR do? this allows setting an NVIDIA_BASE_URL variable to control the NVIDIAConfig.url option ## Test Plan `pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_BASE_URL=http://localhost:8000` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 17:46:44 -08:00
Xi Yan	50cc165077	fixes tests & move braintrust api_keys to request headers (#535 ) # What does this PR do? - braintrust scoring provider requires OPENAI_API_KEY env variable to be set - move this to be able to be set as request headers (e.g. like together / fireworks api keys) - fixes pytest with agents dependency ## Test Plan E2E ``` llama stack run ``` ```yaml scoring: - provider_id: braintrust-0 provider_type: inline::braintrust config: {} ``` Client ```python self.client = LlamaStackClient( base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:5000"), provider_data={ "openai_api_key": os.environ.get("OPENAI_API_KEY", ""), }, ) ``` - run `llama-stack-client eval run_scoring` Unit Test ``` pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py ``` ``` pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py --env OPENAI_API_KEY=$OPENAI_API_KEY ``` <img width="745" alt="image" src="https://github.com/user-attachments/assets/68f5cdda-f6c8-496d-8b4f-1b3dabeca9c2"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 13:11:21 -08:00
Xi Yan	d3956a1d22	fix description	2024-11-25 22:02:45 -08:00
Xi Yan	2936133f95	precommit	2024-11-25 18:55:54 -08:00
Xi Yan	bbd81231ce	add missing __init__	2024-11-25 17:23:27 -08:00
Dinesh Yeduguru	de7af28756	Tgi fixture (#519 ) # What does this PR do? * Add a test fixture for tgi * Fixes the logic to correctly pass the llama model for chat completion Fixes #514 ## Test Plan pytest -k "tgi" llama_stack/providers/tests/inference/test_text_inference.py --env TGI_URL=http://localhost:$INFERENCE_PORT --env TGI_API_TOKEN=$HF_TOKEN	2024-11-25 13:17:02 -08:00
Xi Yan	60cb7f64af	add missing __init__	2024-11-25 09:42:46 -08:00
Ashwin Bharambe	34be07e0df	Ensure model_local_dir does not mangle "C:\" on Windows	2024-11-24 14:18:59 -08:00
Matthew Farrellee	4e6c984c26	add NVIDIA NIM inference adapter (#355 ) # What does this PR do? this PR adds a basic inference adapter to NVIDIA NIMs what it does - - chat completion api - tool calls - streaming - structured output - logprobs - support hosted NIM on integrate.api.nvidia.com - support downloaded NIM containers what it does not do - - completion api - embedding api - vision models - builtin tools - have certainty that sampling strategies are correct ## Feature/Issue validation/testing/test plan `pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_API_KEY=...` all tests should pass. there are pydantic v1 warnings. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? - [x] Did you write any new necessary tests? Thanks for contributing 🎉!	2024-11-23 15:59:00 -08:00
Ashwin Bharambe	2cfc41e13b	Mark some pages as not-in-toctree explicitly	2024-11-23 15:27:44 -08:00
Ashwin Bharambe	358db3c5b6	No need to use os.path.relpath() when `Path()` knows everything anyway	2024-11-23 11:45:47 -08:00
Ashwin Bharambe	707da55c23	Fix TGI register_model() issue	2024-11-23 08:47:05 -08:00
Martin Hickey	76fc5d9f31	Update Ollama supported llama model list (#483 ) # What does this PR do? Update the llama model supported list for Ollama. - [x] Addresses issue (#462) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2024-11-22 21:56:43 -08:00
Dinesh Yeduguru	501e7c9d64	Fix opentelemetry adapter (#510 ) # What does this PR do? This PR fixes some of the issues with our telemetry setup to enable logs to be delivered to opentelemetry and jaeger. Main fixes 1) Updates the open telemetry provider to use the latest oltp exports instead of deprected ones. 2) Adds a tracing middleware, which injects traces into each HTTP request that the server recieves and this is going to be the root trace. Previously, we did this in the create_dynamic_route method, which is actually not the actual exectuion flow, but more of a config and this causes the traces to end prematurely. Through middleware, we plugin the trace start and end at the right location. 3) We manage our own methods to create traces and spans and this does not fit well with Opentelemetry SDK since it does not support provide a way to take in traces and spans that are already created. it expects us to use the SDK to create them. For now, I have a hacky approach of just maintaining a map from our internal telemetry objects to the open telemetry specfic ones. This is not the ideal solution. I will explore other ways to get around this issue. for now, to have something that works, i am going to keep this as is. Addresses: #509	2024-11-22 18:18:11 -08:00

1 2 3 4 5 ...

384 commits