Observability is the ability to understand system behavior from the outside using logs (discrete events), metrics (aggregated numbers over time), and tracing (request flow across services). Together they support debugging, alerting, and performance analysis.
| Pillar | What | Use |
|---|---|---|
| Logs | Event records (timestamp, level, message, context) | Debugging, audit |
| Metrics | Numeric aggregates (QPS, latency p99, error rate) | Dashboards, SLOs, alerting |
| Tracing | Request path and timing across services (trace/span IDs) | Latency analysis, dependency map |
Emit structured logs (JSON) with correlation IDs. Export metrics (Prometheus, StatsD) for RED (rate, errors, duration) or USE. Use distributed tracing (OpenTelemetry, Jaeger) with a common trace ID so you can follow a request across services.