📊 OBSERVABILITY
See Everything
Full-spectrum observability: metrics, logs, traces, profiles, and cost monitoring in a unified stack.
THE FOUR PILLARS OF OBSERVABILITY
Metrics
Mimir
Time-series data with PromQL. Long-term retention, global view queries, Thanos-compatible API.
Logs
Loki
Label-indexed log aggregation. LogQL queries, minimal storage footprint, Prometheus-compatible.
Traces
Tempo
Distributed tracing with Jaeger/Zipkin/OTLP support. Trace-to-log and trace-to-metric correlation.
Profiles
Pyroscope
Continuous profiling for CPU, memory, goroutines. Flame graphs, cross-time comparison.
Telemetry Pipeline
Grafana Alloy is the single agent that collects everything — replacing Promtail, Grafana Agent, and OTel Collector.
graph LR WL["Workloads DaemonSet on every node"]:::workloads ALLOY["Grafana Alloy StatefulSet + DaemonSet"]:::alloy MIMIR["Mimir Metrics"]:::storage LOKI["Loki Logs"]:::storage TEMPO["Tempo Traces"]:::storage PYRO["Pyroscope Profiles"]:::storage GRAF["Grafana SSO via Keycloak"]:::grafana WL -->|"scrape + forward"| ALLOY ALLOY -->|"metrics"| MIMIR ALLOY -->|"logs"| LOKI ALLOY -->|"traces"| TEMPO ALLOY -->|"profiles"| PYRO MIMIR --> GRAF LOKI --> GRAF TEMPO --> GRAF PYRO --> GRAF classDef workloads fill:#1e293b,stroke:#e2e8f0,color:#e2e8f0,stroke-width:2px classDef alloy fill:#0e3a3a,stroke:#06b6d4,color:#67e8f9,stroke-width:2px classDef storage fill:#2e2a0e,stroke:#facc15,color:#fde68a,stroke-width:2px classDef grafana fill:#14332a,stroke:#4ade80,color:#86efac,stroke-width:2px
All Components
Prometheus
productionPull-based metrics collection with multi-dimensional data model and powerful PromQL query language.
Role: Primary metrics scraping for all platform services via ServiceMonitors
Grafana
productionVisualization platform connecting metrics, logs, traces, and profiles in unified dashboards.
Role: Central observability UI with pre-built dashboards for every platform component
Mimir
productionHorizontally-scalable long-term metrics storage with Prometheus-compatible API.
Role: Indefinite metrics retention with high compression and fast queries
Loki
productionLog aggregation system inspired by Prometheus — indexes labels, not full log lines.
Role: Centralized logging with LogQL queries across all namespaces
Tempo
productionDistributed tracing backend supporting Jaeger, Zipkin, and OpenTelemetry formats.
Role: End-to-end request tracing across microservices
Pyroscope
productionContinuous profiling platform for CPU, memory, goroutine, and lock contention analysis.
Role: Runtime performance profiling with flame graph visualization
Grafana Alloy
productionUnified telemetry collector replacing Promtail, Grafana Agent, and OpenTelemetry Collector.
Role: Single agent collecting metrics, logs, traces, and profiles from all nodes
OpenCost
productionReal-time Kubernetes cost monitoring with per-namespace and per-workload breakdown.
Role: Infrastructure cost visibility and optimization recommendations