mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-17 07:07:19 +00:00
chore!: BREAKING CHANGE: remove sqlite from telemetry config
# What does this PR do? ## Test Plan
This commit is contained in:
parent
d875e427bf
commit
49e9b53e00
21 changed files with 26 additions and 1026 deletions
|
@ -10,58 +10,8 @@ import TabItem from '@theme/TabItem';
|
|||
|
||||
# Telemetry
|
||||
|
||||
The Llama Stack telemetry system provides comprehensive tracing, metrics, and logging capabilities. It supports multiple sink types including OpenTelemetry, SQLite, and Console output for complete observability of your AI applications.
|
||||
The Llama Stack uses OpenTelemetry to provide comprehensive tracing, metrics, and logging capabilities.
|
||||
|
||||
## Event Types
|
||||
|
||||
The telemetry system supports three main types of events:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="unstructured" label="Unstructured Logs">
|
||||
|
||||
Free-form log messages with severity levels for general application logging:
|
||||
|
||||
```python
|
||||
unstructured_log_event = UnstructuredLogEvent(
|
||||
message="This is a log message",
|
||||
severity=LogSeverity.INFO
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="metrics" label="Metric Events">
|
||||
|
||||
Numerical measurements with units for tracking performance and usage:
|
||||
|
||||
```python
|
||||
metric_event = MetricEvent(
|
||||
metric="my_metric",
|
||||
value=10,
|
||||
unit="count"
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="structured" label="Structured Logs">
|
||||
|
||||
System events like span start/end that provide structured operation tracking:
|
||||
|
||||
```python
|
||||
structured_log_event = SpanStartPayload(
|
||||
name="my_span",
|
||||
parent_span_id="parent_span_id"
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Spans and Traces
|
||||
|
||||
- **Spans**: Represent individual operations with timing information and hierarchical relationships
|
||||
- **Traces**: Collections of related spans that form a complete request flow across your application
|
||||
|
||||
This hierarchical structure allows you to understand the complete execution path of requests through your Llama Stack application.
|
||||
|
||||
## Automatic Metrics Generation
|
||||
|
||||
|
@ -129,21 +79,6 @@ Send events to an OpenTelemetry Collector for integration with observability pla
|
|||
- Compatible with all OpenTelemetry collectors
|
||||
- Supports both traces and metrics
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="sqlite" label="SQLite">
|
||||
|
||||
Store events in a local SQLite database for direct querying:
|
||||
|
||||
**Use Cases:**
|
||||
- Local development and debugging
|
||||
- Custom analytics and reporting
|
||||
- Offline analysis of application behavior
|
||||
|
||||
**Features:**
|
||||
- Direct SQL querying capabilities
|
||||
- Persistent local storage
|
||||
- No external dependencies
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="console" label="Console">
|
||||
|
||||
|
@ -174,9 +109,8 @@ telemetry:
|
|||
provider_type: inline::meta-reference
|
||||
config:
|
||||
service_name: "llama-stack-service"
|
||||
sinks: ['console', 'sqlite', 'otel_trace', 'otel_metric']
|
||||
sinks: ['console', 'otel_trace', 'otel_metric']
|
||||
otel_exporter_otlp_endpoint: "http://localhost:4318"
|
||||
sqlite_db_path: "/path/to/telemetry.db"
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
@ -185,7 +119,7 @@ Configure telemetry behavior using environment variables:
|
|||
|
||||
- **`OTEL_EXPORTER_OTLP_ENDPOINT`**: OpenTelemetry Collector endpoint (default: `http://localhost:4318`)
|
||||
- **`OTEL_SERVICE_NAME`**: Service name for telemetry (default: empty string)
|
||||
- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `console,sqlite`)
|
||||
- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `[]`)
|
||||
|
||||
### Quick Setup: Complete Telemetry Stack
|
||||
|
||||
|
@ -248,37 +182,10 @@ Forward metrics to other observability systems:
|
|||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## SQLite Querying
|
||||
|
||||
The `sqlite` sink allows you to query traces without an external system. This is particularly useful for development and custom analytics.
|
||||
|
||||
### Example Queries
|
||||
|
||||
```sql
|
||||
-- Query recent traces
|
||||
SELECT * FROM traces WHERE timestamp > datetime('now', '-1 hour');
|
||||
|
||||
-- Analyze span durations
|
||||
SELECT name, AVG(duration_ms) as avg_duration
|
||||
FROM spans
|
||||
GROUP BY name
|
||||
ORDER BY avg_duration DESC;
|
||||
|
||||
-- Find slow operations
|
||||
SELECT * FROM spans
|
||||
WHERE duration_ms > 1000
|
||||
ORDER BY duration_ms DESC;
|
||||
```
|
||||
|
||||
:::tip[Advanced Analytics]
|
||||
Refer to the [Getting Started notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for more examples on querying traces and spans programmatically.
|
||||
:::
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 🔍 **Monitoring Strategy**
|
||||
- Use OpenTelemetry for production environments
|
||||
- Combine multiple sinks for development (console + SQLite)
|
||||
- Set up alerts on key metrics like token usage and error rates
|
||||
|
||||
### 📊 **Metrics Analysis**
|
||||
|
@ -293,45 +200,8 @@ Refer to the [Getting Started notebook](https://github.com/meta-llama/llama-stac
|
|||
|
||||
### 🔧 **Configuration Management**
|
||||
- Use environment variables for flexible deployment
|
||||
- Configure appropriate retention policies for SQLite
|
||||
- Ensure proper network access to OpenTelemetry collectors
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Basic Telemetry Setup
|
||||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
# Client with telemetry headers
|
||||
client = LlamaStackClient(
|
||||
base_url="http://localhost:8000",
|
||||
extra_headers={
|
||||
"X-Telemetry-Service": "my-ai-app",
|
||||
"X-Telemetry-Version": "1.0.0"
|
||||
}
|
||||
)
|
||||
|
||||
# All API calls will be automatically traced
|
||||
response = client.chat.completions.create(
|
||||
model="meta-llama/Llama-3.2-3B-Instruct",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
|
||||
### Custom Telemetry Context
|
||||
|
||||
```python
|
||||
# Add custom span attributes for better tracking
|
||||
with tracer.start_as_current_span("custom_operation") as span:
|
||||
span.set_attribute("user_id", "user123")
|
||||
span.set_attribute("operation_type", "chat_completion")
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="meta-llama/Llama-3.2-3B-Instruct",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
|
||||
## Related Resources
|
||||
|
||||
|
|
|
@ -119,7 +119,7 @@ The following environment variables can be configured:
|
|||
|
||||
### Telemetry Configuration
|
||||
- `OTEL_SERVICE_NAME`: OpenTelemetry service name
|
||||
- `TELEMETRY_SINKS`: Telemetry sinks (default: `console,sqlite`)
|
||||
- `TELEMETRY_SINKS`: Telemetry sinks (default: `[]`)
|
||||
|
||||
## Enabling Providers
|
||||
|
||||
|
@ -216,7 +216,6 @@ The starter distribution uses SQLite for local storage of various components:
|
|||
- **Files metadata**: `~/.llama/distributions/starter/files_metadata.db`
|
||||
- **Agents store**: `~/.llama/distributions/starter/agents_store.db`
|
||||
- **Responses store**: `~/.llama/distributions/starter/responses_store.db`
|
||||
- **Trace store**: `~/.llama/distributions/starter/trace_store.db`
|
||||
- **Evaluation store**: `~/.llama/distributions/starter/meta_reference_eval.db`
|
||||
- **Dataset I/O stores**: Various HuggingFace and local filesystem stores
|
||||
|
||||
|
|
|
@ -16,14 +16,12 @@ Meta's reference implementation of telemetry and observability using OpenTelemet
|
|||
|-------|------|----------|---------|-------------|
|
||||
| `otel_exporter_otlp_endpoint` | `str \| None` | No | | The OpenTelemetry collector endpoint URL (base URL for traces, metrics, and logs). If not set, the SDK will use OTEL_EXPORTER_OTLP_ENDPOINT environment variable. |
|
||||
| `service_name` | `<class 'str'>` | No | | The service name to use for telemetry |
|
||||
| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [<TelemetrySink.SQLITE: 'sqlite'>] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, sqlite, console) |
|
||||
| `sqlite_db_path` | `<class 'str'>` | No | ~/.llama/runtime/trace_store.db | The path to the SQLite database to use for storing traces |
|
||||
| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, console) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
|
||||
sinks: ${env.TELEMETRY_SINKS:=sqlite}
|
||||
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/trace_store.db
|
||||
sinks: ${env.TELEMETRY_SINKS:=}
|
||||
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue