llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

History

Xi Yan 094eb6a5ae feat(rag): entire document context with attachments (#1763 ) # What does this PR do? What Instead of adhoc creating a vectordb and chunking when documents ae sent as an attachment to agent turn, we directly pass raw text from document into messages to model for user context, and let model perform summarization directly. This removes the magic behaviour, and yields better performance than existing approach. Improved Performance - RAG lifecycle notebook - Model: 0.3 factuality score - (+ websearch) Agent: 0.44 factuality score - (+ vector db) Agent: 0.3 factuality score - (+ raw context) Agent: 0.6 factuality score Closes https://github.com/meta-llama/llama-stack/issues/1478 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - [NEW] added section in RAG lifecycle notebook shows better performance <img width="840" alt="image" src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5" /> [//]: # (## Documentation)		2025-03-23 16:57:48 -07:00
..
inline	feat(rag): entire document context with attachments (#1763 )	2025-03-23 16:57:48 -07:00
registry	fix: Add 'accelerate' dependency to 'prompt-guard' (#1724 )	2025-03-21 07:37:20 -07:00
remote	fix: Updating `ToolCall.arguments` to allow for json strings that can be decoded on client side (#1685 )	2025-03-19 10:36:19 -07:00
tests	refactor(test): introduce --stack-config and simplify options (#1404 )	2025-03-05 17:02:02 -08:00
utils	feat: use same trace ids in stack and otel (#1759 )	2025-03-21 15:41:26 -07:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00