llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-15 22:47:59 +00:00

Author	SHA1	Message	Date
IAN MILLER	007efa6eb5	refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-10-14 10:44:20 -04:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Christian Zaccaria	feb9eb8b0d	docs: Remove datasets.rst and fix llama-stack build commands (#2061 ) # Issue Closes #2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: https://github.com/pytorch/torchtune/pull/1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)	2025-05-06 09:51:20 -07:00
Xi Yan	094eb6a5ae	feat(rag): entire document context with attachments (#1763 ) # What does this PR do? What Instead of adhoc creating a vectordb and chunking when documents ae sent as an attachment to agent turn, we directly pass raw text from document into messages to model for user context, and let model perform summarization directly. This removes the magic behaviour, and yields better performance than existing approach. Improved Performance - RAG lifecycle notebook - Model: 0.3 factuality score - (+ websearch) Agent: 0.44 factuality score - (+ vector db) Agent: 0.3 factuality score - (+ raw context) Agent: 0.6 factuality score Closes https://github.com/meta-llama/llama-stack/issues/1478 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - [NEW] added section in RAG lifecycle notebook shows better performance <img width="840" alt="image" src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5" /> [//]: # (## Documentation)	2025-03-23 16:57:48 -07:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Xi Yan	b8c519ba11	feat: rag eval lifecycle notebook (#1458 ) # What does this PR do? - Add RAG eval lifecycle notebook - Closes https://github.com/meta-llama/llama-stack/issues/1113 - Best reviewed in https://github.com/meta-llama/llama-stack/blob/rag_eval_notebook/docs/notebooks/Llama_Stack_RAG_Lifecycle.ipynb [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run notebook [//]: # (## Documentation)	2025-03-07 10:41:50 -08:00

6 commits