mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 19:04:19 +00:00
add telemetry docs (#572)
Add an experimental section and telemetry doc  --------- Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com>
This commit is contained in:
parent
27a27152cd
commit
cb9e9048e7
2 changed files with 251 additions and 1 deletions
|
@ -11,5 +11,12 @@
|
||||||
- memory / RAG; pre-ingesting content or attaching content in a turn
|
- memory / RAG; pre-ingesting content or attaching content in a turn
|
||||||
- how does tool calling work
|
- how does tool calling work
|
||||||
- can you do evaluation?
|
- can you do evaluation?
|
||||||
|
```
|
||||||
|
For details on how to use the telemetry system to debug your applications, export traces to a dataset, and run evaluations, see the [Telemetry](telemetry) section.
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:hidden:
|
||||||
|
:maxdepth: 3
|
||||||
|
|
||||||
|
telemetry
|
||||||
```
|
```
|
||||||
|
|
243
docs/source/building_applications/telemetry.md
Normal file
243
docs/source/building_applications/telemetry.md
Normal file
|
@ -0,0 +1,243 @@
|
||||||
|
# Telemetry
|
||||||
|
```{note}
|
||||||
|
The telemetry system is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
The Llama Stack telemetry system provides comprehensive tracing, metrics, and logging capabilities. It supports multiple sink types including OpenTelemetry, SQLite, and Console output.
|
||||||
|
|
||||||
|
## Key Concepts
|
||||||
|
|
||||||
|
### Events
|
||||||
|
The telemetry system supports three main types of events:
|
||||||
|
|
||||||
|
- **Unstructured Log Events**: Free-form log messages with severity levels
|
||||||
|
```python
|
||||||
|
unstructured_log_event = UnstructuredLogEvent(
|
||||||
|
message="This is a log message",
|
||||||
|
severity=LogSeverity.INFO
|
||||||
|
)
|
||||||
|
```
|
||||||
|
- **Metric Events**: Numerical measurements with units
|
||||||
|
```python
|
||||||
|
metric_event = MetricEvent(
|
||||||
|
metric="my_metric",
|
||||||
|
value=10,
|
||||||
|
unit="count"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
- **Structured Log Events**: System events like span start/end. Extensible to add more structured log types.
|
||||||
|
```python
|
||||||
|
structured_log_event = SpanStartPayload(
|
||||||
|
name="my_span",
|
||||||
|
parent_span_id="parent_span_id"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Spans and Traces
|
||||||
|
- **Spans**: Represent operations with timing and hierarchical relationships
|
||||||
|
- **Traces**: Collection of related spans forming a complete request flow
|
||||||
|
|
||||||
|
### Sinks
|
||||||
|
- **OpenTelemetry**: Send events to an OpenTelemetry Collector. This is useful for visualizing traces in a service like Jaeger.
|
||||||
|
- **SQLite**: Store events in a local SQLite database. This is needed if you want to query the events later through the Llama Stack API.
|
||||||
|
- **Console**: Print events to the console.
|
||||||
|
|
||||||
|
## APIs
|
||||||
|
|
||||||
|
The telemetry API is designed to be flexible for different user flows like debugging/visualization in UI, monitoring, and saving traces to datasets.
|
||||||
|
The telemetry system exposes the following HTTP endpoints:
|
||||||
|
|
||||||
|
### Log Event
|
||||||
|
```http
|
||||||
|
POST /telemetry/log-event
|
||||||
|
```
|
||||||
|
Logs a telemetry event (unstructured log, metric, or structured log) with optional TTL.
|
||||||
|
|
||||||
|
### Query Traces
|
||||||
|
```http
|
||||||
|
POST /telemetry/query-traces
|
||||||
|
```
|
||||||
|
Retrieves traces based on filters with pagination support. Parameters:
|
||||||
|
- `attribute_filters`: List of conditions to filter traces
|
||||||
|
- `limit`: Maximum number of traces to return (default: 100)
|
||||||
|
- `offset`: Number of traces to skip (default: 0)
|
||||||
|
- `order_by`: List of fields to sort by
|
||||||
|
|
||||||
|
### Get Span Tree
|
||||||
|
```http
|
||||||
|
POST /telemetry/get-span-tree
|
||||||
|
```
|
||||||
|
Retrieves a hierarchical view of spans starting from a specific span. Parameters:
|
||||||
|
- `span_id`: ID of the root span to retrieve
|
||||||
|
- `attributes_to_return`: Optional list of specific attributes to include
|
||||||
|
- `max_depth`: Optional maximum depth of the span tree to return
|
||||||
|
|
||||||
|
### Query Spans
|
||||||
|
```http
|
||||||
|
POST /telemetry/query-spans
|
||||||
|
```
|
||||||
|
Retrieves spans matching specified filters and returns selected attributes. Parameters:
|
||||||
|
- `attribute_filters`: List of conditions to filter traces
|
||||||
|
- `attributes_to_return`: List of specific attributes to include in results
|
||||||
|
- `max_depth`: Optional maximum depth of spans to traverse (default: no limit)
|
||||||
|
|
||||||
|
Returns a flattened list of spans with requested attributes.
|
||||||
|
|
||||||
|
### Save Spans to Dataset
|
||||||
|
This is useful for saving traces to a dataset for running evaluations. For example, you can save the input/output of each span that is part of an agent session/turn to a dataset and then run an eval task on it. See example in [Example: Save Spans to Dataset](#example-save-spans-to-dataset).
|
||||||
|
```http
|
||||||
|
POST /telemetry/save-spans-to-dataset
|
||||||
|
```
|
||||||
|
Queries spans and saves their attributes to a dataset. Parameters:
|
||||||
|
- `attribute_filters`: List of conditions to filter traces
|
||||||
|
- `attributes_to_save`: List of span attributes to save to the dataset
|
||||||
|
- `dataset_id`: ID of the dataset to save to
|
||||||
|
- `max_depth`: Optional maximum depth of spans to traverse (default: no limit)
|
||||||
|
|
||||||
|
## Providers
|
||||||
|
|
||||||
|
### Meta-Reference Provider
|
||||||
|
Currently, only the meta-reference provider is implemented. It can be configured to send events to three sink types:
|
||||||
|
1) OpenTelemetry Collector
|
||||||
|
2) SQLite
|
||||||
|
3) Console
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Here's an example that sends telemetry signals to all three sink types. Your configuration might use only one.
|
||||||
|
```yaml
|
||||||
|
telemetry:
|
||||||
|
- provider_id: meta-reference
|
||||||
|
provider_type: inline::meta-reference
|
||||||
|
config:
|
||||||
|
sinks: ['console', 'sqlite', 'otel']
|
||||||
|
otel_endpoint: "http://localhost:4318/v1/traces"
|
||||||
|
sqlite_db_path: "/path/to/telemetry.db"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Jaeger to visualize traces
|
||||||
|
|
||||||
|
The `otel` sink works with any service compatible with the OpenTelemetry collector. Let's use Jaeger to visualize this data.
|
||||||
|
|
||||||
|
Start a Jaeger instance with the OTLP HTTP endpoint at 4318 and the Jaeger UI at 16686 using the following command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ docker run --rm \
|
||||||
|
--name jaeger jaegertracing/jaeger:2.0.0 \
|
||||||
|
-p 16686:16686 -p 4318:4318 \
|
||||||
|
--set receivers.otlp.protocols.http.endpoint=0.0.0.0:4318
|
||||||
|
```
|
||||||
|
|
||||||
|
Once the Jaeger instance is running, you can visualize traces by navigating to http://localhost:16686.
|
||||||
|
|
||||||
|
## Querying Traces Stored in SQLIte
|
||||||
|
|
||||||
|
The `sqlite` sink allows you to query traces without an external system. Here are some example queries:
|
||||||
|
|
||||||
|
Querying Traces for a agent session
|
||||||
|
The client SDK is not updated to support the new telemetry API. It will be updated soon. You can manually query traces using the following curl command:
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
curl -X POST 'http://localhost:5000/alpha/telemetry/query-traces' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"attribute_filters": [
|
||||||
|
{
|
||||||
|
"key": "session_id",
|
||||||
|
"op": "eq",
|
||||||
|
"value": "dd667b87-ca4b-4d30-9265-5a0de318fc65" }],
|
||||||
|
"limit": 100,
|
||||||
|
"offset": 0,
|
||||||
|
"order_by": ["start_time"]
|
||||||
|
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"trace_id": "6902f54b83b4b48be18a6f422b13e16f",
|
||||||
|
"root_span_id": "5f37b85543afc15a",
|
||||||
|
"start_time": "2024-12-04T08:08:30.501587",
|
||||||
|
"end_time": "2024-12-04T08:08:36.026463"
|
||||||
|
},
|
||||||
|
........
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Querying spans for a specifc root span id
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
curl -X POST 'http://localhost:5000/alpha/telemetry/get-span-tree' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{ "span_id" : "6cceb4b48a156913", "max_depth": 2 }'
|
||||||
|
|
||||||
|
{
|
||||||
|
"span_id": "6cceb4b48a156913",
|
||||||
|
"trace_id": "dafa796f6aaf925f511c04cd7c67fdda",
|
||||||
|
"parent_span_id": "892a66d726c7f990",
|
||||||
|
"name": "retrieve_rag_context",
|
||||||
|
"start_time": "2024-12-04T09:28:21.781995",
|
||||||
|
"end_time": "2024-12-04T09:28:21.913352",
|
||||||
|
"attributes": {
|
||||||
|
"input": [
|
||||||
|
"{\"role\":\"system\",\"content\":\"You are a helpful assistant\"}",
|
||||||
|
"{\"role\":\"user\",\"content\":\"What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.\",\"context\":null}"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"children": [
|
||||||
|
{
|
||||||
|
"span_id": "1a2df181854064a8",
|
||||||
|
"trace_id": "dafa796f6aaf925f511c04cd7c67fdda",
|
||||||
|
"parent_span_id": "6cceb4b48a156913",
|
||||||
|
"name": "MemoryRouter.query_documents",
|
||||||
|
"start_time": "2024-12-04T09:28:21.787620",
|
||||||
|
"end_time": "2024-12-04T09:28:21.906512",
|
||||||
|
"attributes": {
|
||||||
|
"input": null
|
||||||
|
},
|
||||||
|
"children": [],
|
||||||
|
"status": "ok"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"status": "ok"
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example: Save Spans to Dataset
|
||||||
|
Save all spans for a specific agent session to a dataset.
|
||||||
|
``` bash
|
||||||
|
curl -X POST 'http://localhost:5000/alpha/telemetry/save-spans-to-dataset' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"attribute_filters": [
|
||||||
|
{
|
||||||
|
"key": "session_id",
|
||||||
|
"op": "eq",
|
||||||
|
"value": "dd667b87-ca4b-4d30-9265-5a0de318fc65"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"attributes_to_save": ["input", "output"],
|
||||||
|
"dataset_id": "my_dataset",
|
||||||
|
"max_depth": 10
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Save all spans for a specific agent turn to a dataset.
|
||||||
|
```bash
|
||||||
|
curl -X POST 'http://localhost:5000/alpha/telemetry/save-spans-to-dataset' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"attribute_filters": [
|
||||||
|
{
|
||||||
|
"key": "turn_id",
|
||||||
|
"op": "eq",
|
||||||
|
"value": "123e4567-e89b-12d3-a456-426614174000"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"attributes_to_save": ["input", "output"],
|
||||||
|
"dataset_id": "my_dataset",
|
||||||
|
"max_depth": 10
|
||||||
|
}'
|
||||||
|
```
|
Loading…
Add table
Add a link
Reference in a new issue