llama-stack-mirror/llama_stack/providers/inline
Xi Yan 094eb6a5ae
feat(rag): entire document context with attachments (#1763)
# What does this PR do?
**What**
Instead of adhoc creating a vectordb and chunking when documents ae sent
as an attachment to agent turn, we directly pass raw text from document
into messages to model for user context, and let model perform
summarization directly.

This removes the magic behaviour, and yields better performance than
existing approach.

**Improved Performance**
- RAG lifecycle notebook
  - Model: 0.3 factuality score
  - (+ websearch) Agent: 0.44 factuality score
  - (+ vector db) Agent: 0.3 factuality score
  - (+ raw context) Agent: 0.6 factuality score

Closes https://github.com/meta-llama/llama-stack/issues/1478

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- [NEW] added section in RAG lifecycle notebook shows better performance

<img width="840" alt="image"
src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5"
/>


[//]: # (## Documentation)
2025-03-23 16:57:48 -07:00
..
agents feat(rag): entire document context with attachments (#1763) 2025-03-23 16:57:48 -07:00
datasetio fix: Call pandas.read_* in a seperate thread (#1698) 2025-03-19 10:46:37 -07:00
eval fix: fix jobs api literal return type (#1757) 2025-03-21 14:04:21 -07:00
inference fix: Updating ToolCall.arguments to allow for json strings that can be decoded on client side (#1685) 2025-03-19 10:36:19 -07:00
ios/inference chore: removed executorch submodule (#1265) 2025-02-25 21:57:21 -08:00
post_training chore: fix mypy violations in post_training modules (#1548) 2025-03-18 14:58:16 -07:00
safety feat(agent): support multiple tool groups (#1556) 2025-03-17 22:13:09 -07:00
scoring fix: a couple of tests were broken and not yet exercised by our per-PR test workflow 2025-03-21 12:12:14 -07:00
telemetry feat: use same trace ids in stack and otel (#1759) 2025-03-21 15:41:26 -07:00
tool_runtime chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711) 2025-03-20 10:01:10 -07:00
vector_io chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711) 2025-03-20 10:01:10 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00