mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 10:54:19 +00:00
# What does this PR do? - A follow-up for #572 - The command in the original PR did not run - Remove `--set` command unnecessary since Jaeger 2.1.0 ## Test Plan ``` $ docker run --rm --name jaeger \ -p 16686:16686 -p 4318:4318 \ jaegertracing/jaeger:2.1.0 2024/12/07 19:07:13 application version: git-commit=65cff3c30823ea20d3dc48bae39d5685ae307da5, git-version=v2.1.0, build-date=2024-12-06T21:17:15Z ... ``` ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Yuri Shkuro <github@ysh.us>
242 lines
7.5 KiB
Markdown
242 lines
7.5 KiB
Markdown
# Telemetry
|
|
```{note}
|
|
The telemetry system is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
|
|
```
|
|
|
|
|
|
|
|
The Llama Stack telemetry system provides comprehensive tracing, metrics, and logging capabilities. It supports multiple sink types including OpenTelemetry, SQLite, and Console output.
|
|
|
|
## Key Concepts
|
|
|
|
### Events
|
|
The telemetry system supports three main types of events:
|
|
|
|
- **Unstructured Log Events**: Free-form log messages with severity levels
|
|
```python
|
|
unstructured_log_event = UnstructuredLogEvent(
|
|
message="This is a log message",
|
|
severity=LogSeverity.INFO
|
|
)
|
|
```
|
|
- **Metric Events**: Numerical measurements with units
|
|
```python
|
|
metric_event = MetricEvent(
|
|
metric="my_metric",
|
|
value=10,
|
|
unit="count"
|
|
)
|
|
```
|
|
- **Structured Log Events**: System events like span start/end. Extensible to add more structured log types.
|
|
```python
|
|
structured_log_event = SpanStartPayload(
|
|
name="my_span",
|
|
parent_span_id="parent_span_id"
|
|
)
|
|
```
|
|
|
|
### Spans and Traces
|
|
- **Spans**: Represent operations with timing and hierarchical relationships
|
|
- **Traces**: Collection of related spans forming a complete request flow
|
|
|
|
### Sinks
|
|
- **OpenTelemetry**: Send events to an OpenTelemetry Collector. This is useful for visualizing traces in a tool like Jaeger.
|
|
- **SQLite**: Store events in a local SQLite database. This is needed if you want to query the events later through the Llama Stack API.
|
|
- **Console**: Print events to the console.
|
|
|
|
## APIs
|
|
|
|
The telemetry API is designed to be flexible for different user flows like debugging/visualization in UI, monitoring, and saving traces to datasets.
|
|
The telemetry system exposes the following HTTP endpoints:
|
|
|
|
### Log Event
|
|
```http
|
|
POST /telemetry/log-event
|
|
```
|
|
Logs a telemetry event (unstructured log, metric, or structured log) with optional TTL.
|
|
|
|
### Query Traces
|
|
```http
|
|
POST /telemetry/query-traces
|
|
```
|
|
Retrieves traces based on filters with pagination support. Parameters:
|
|
- `attribute_filters`: List of conditions to filter traces
|
|
- `limit`: Maximum number of traces to return (default: 100)
|
|
- `offset`: Number of traces to skip (default: 0)
|
|
- `order_by`: List of fields to sort by
|
|
|
|
### Get Span Tree
|
|
```http
|
|
POST /telemetry/get-span-tree
|
|
```
|
|
Retrieves a hierarchical view of spans starting from a specific span. Parameters:
|
|
- `span_id`: ID of the root span to retrieve
|
|
- `attributes_to_return`: Optional list of specific attributes to include
|
|
- `max_depth`: Optional maximum depth of the span tree to return
|
|
|
|
### Query Spans
|
|
```http
|
|
POST /telemetry/query-spans
|
|
```
|
|
Retrieves spans matching specified filters and returns selected attributes. Parameters:
|
|
- `attribute_filters`: List of conditions to filter traces
|
|
- `attributes_to_return`: List of specific attributes to include in results
|
|
- `max_depth`: Optional maximum depth of spans to traverse (default: no limit)
|
|
|
|
Returns a flattened list of spans with requested attributes.
|
|
|
|
### Save Spans to Dataset
|
|
This is useful for saving traces to a dataset for running evaluations. For example, you can save the input/output of each span that is part of an agent session/turn to a dataset and then run an eval task on it. See example in [Example: Save Spans to Dataset](#example-save-spans-to-dataset).
|
|
```http
|
|
POST /telemetry/save-spans-to-dataset
|
|
```
|
|
Queries spans and saves their attributes to a dataset. Parameters:
|
|
- `attribute_filters`: List of conditions to filter traces
|
|
- `attributes_to_save`: List of span attributes to save to the dataset
|
|
- `dataset_id`: ID of the dataset to save to
|
|
- `max_depth`: Optional maximum depth of spans to traverse (default: no limit)
|
|
|
|
## Providers
|
|
|
|
### Meta-Reference Provider
|
|
Currently, only the meta-reference provider is implemented. It can be configured to send events to three sink types:
|
|
1) OpenTelemetry Collector
|
|
2) SQLite
|
|
3) Console
|
|
|
|
## Configuration
|
|
|
|
Here's an example that sends telemetry signals to all three sink types. Your configuration might use only one.
|
|
```yaml
|
|
telemetry:
|
|
- provider_id: meta-reference
|
|
provider_type: inline::meta-reference
|
|
config:
|
|
sinks: ['console', 'sqlite', 'otel']
|
|
otel_endpoint: "http://localhost:4318/v1/traces"
|
|
sqlite_db_path: "/path/to/telemetry.db"
|
|
```
|
|
|
|
## Jaeger to visualize traces
|
|
|
|
The `otel` sink works with any service compatible with the OpenTelemetry collector. Let's use Jaeger to visualize this data.
|
|
|
|
Start a Jaeger instance with the OTLP HTTP endpoint at 4318 and the Jaeger UI at 16686 using the following command:
|
|
|
|
```bash
|
|
$ docker run --rm --name jaeger \
|
|
-p 16686:16686 -p 4318:4318 \
|
|
jaegertracing/jaeger:2.1.0
|
|
```
|
|
|
|
Once the Jaeger instance is running, you can visualize traces by navigating to http://localhost:16686/.
|
|
|
|
## Querying Traces Stored in SQLIte
|
|
|
|
The `sqlite` sink allows you to query traces without an external system. Here are some example queries:
|
|
|
|
Querying Traces for a agent session
|
|
The client SDK is not updated to support the new telemetry API. It will be updated soon. You can manually query traces using the following curl command:
|
|
|
|
``` bash
|
|
curl -X POST 'http://localhost:5000/alpha/telemetry/query-traces' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"attribute_filters": [
|
|
{
|
|
"key": "session_id",
|
|
"op": "eq",
|
|
"value": "dd667b87-ca4b-4d30-9265-5a0de318fc65" }],
|
|
"limit": 100,
|
|
"offset": 0,
|
|
"order_by": ["start_time"]
|
|
|
|
[
|
|
{
|
|
"trace_id": "6902f54b83b4b48be18a6f422b13e16f",
|
|
"root_span_id": "5f37b85543afc15a",
|
|
"start_time": "2024-12-04T08:08:30.501587",
|
|
"end_time": "2024-12-04T08:08:36.026463"
|
|
},
|
|
........
|
|
]
|
|
}'
|
|
|
|
```
|
|
|
|
Querying spans for a specifc root span id
|
|
|
|
``` bash
|
|
curl -X POST 'http://localhost:5000/alpha/telemetry/get-span-tree' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{ "span_id" : "6cceb4b48a156913", "max_depth": 2 }'
|
|
|
|
{
|
|
"span_id": "6cceb4b48a156913",
|
|
"trace_id": "dafa796f6aaf925f511c04cd7c67fdda",
|
|
"parent_span_id": "892a66d726c7f990",
|
|
"name": "retrieve_rag_context",
|
|
"start_time": "2024-12-04T09:28:21.781995",
|
|
"end_time": "2024-12-04T09:28:21.913352",
|
|
"attributes": {
|
|
"input": [
|
|
"{\"role\":\"system\",\"content\":\"You are a helpful assistant\"}",
|
|
"{\"role\":\"user\",\"content\":\"What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.\",\"context\":null}"
|
|
]
|
|
},
|
|
"children": [
|
|
{
|
|
"span_id": "1a2df181854064a8",
|
|
"trace_id": "dafa796f6aaf925f511c04cd7c67fdda",
|
|
"parent_span_id": "6cceb4b48a156913",
|
|
"name": "MemoryRouter.query_documents",
|
|
"start_time": "2024-12-04T09:28:21.787620",
|
|
"end_time": "2024-12-04T09:28:21.906512",
|
|
"attributes": {
|
|
"input": null
|
|
},
|
|
"children": [],
|
|
"status": "ok"
|
|
}
|
|
],
|
|
"status": "ok"
|
|
}
|
|
|
|
```
|
|
|
|
## Example: Save Spans to Dataset
|
|
Save all spans for a specific agent session to a dataset.
|
|
``` bash
|
|
curl -X POST 'http://localhost:5000/alpha/telemetry/save-spans-to-dataset' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"attribute_filters": [
|
|
{
|
|
"key": "session_id",
|
|
"op": "eq",
|
|
"value": "dd667b87-ca4b-4d30-9265-5a0de318fc65"
|
|
}
|
|
],
|
|
"attributes_to_save": ["input", "output"],
|
|
"dataset_id": "my_dataset",
|
|
"max_depth": 10
|
|
}'
|
|
```
|
|
|
|
Save all spans for a specific agent turn to a dataset.
|
|
```bash
|
|
curl -X POST 'http://localhost:5000/alpha/telemetry/save-spans-to-dataset' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"attribute_filters": [
|
|
{
|
|
"key": "turn_id",
|
|
"op": "eq",
|
|
"value": "123e4567-e89b-12d3-a456-426614174000"
|
|
}
|
|
],
|
|
"attributes_to_save": ["input", "output"],
|
|
"dataset_id": "my_dataset",
|
|
"max_depth": 10
|
|
}'
|
|
```
|