📊 OBSERVABILITY

See Everything

Full-spectrum observability: metrics, logs, traces, profiles, and cost monitoring in a unified stack.

THE FOUR PILLARS OF OBSERVABILITY

Metrics

Mimir

Time-series data with PromQL. Long-term retention, global view queries, Thanos-compatible API.

Logs

Loki

Label-indexed log aggregation. LogQL queries, minimal storage footprint, Prometheus-compatible.

Traces

Tempo

Distributed tracing with Jaeger/Zipkin/OTLP support. Trace-to-log and trace-to-metric correlation.

Profiles

Pyroscope

Continuous profiling for CPU, memory, goroutines. Flame graphs, cross-time comparison.

Telemetry Pipeline

Grafana Alloy is the single agent that collects everything — replacing Promtail, Grafana Agent, and OTel Collector.

graph LR
  WL["Workloads
DaemonSet on every node"]:::workloads
  ALLOY["Grafana Alloy
StatefulSet + DaemonSet"]:::alloy
  MIMIR["Mimir
Metrics"]:::storage
  LOKI["Loki
Logs"]:::storage
  TEMPO["Tempo
Traces"]:::storage
  PYRO["Pyroscope
Profiles"]:::storage
  GRAF["Grafana
SSO via Keycloak"]:::grafana

  WL -->|"scrape + forward"| ALLOY
  ALLOY -->|"metrics"| MIMIR
  ALLOY -->|"logs"| LOKI
  ALLOY -->|"traces"| TEMPO
  ALLOY -->|"profiles"| PYRO
  MIMIR --> GRAF
  LOKI --> GRAF
  TEMPO --> GRAF
  PYRO --> GRAF

  classDef workloads fill:#1e293b,stroke:#e2e8f0,color:#e2e8f0,stroke-width:2px
  classDef alloy fill:#0e3a3a,stroke:#06b6d4,color:#67e8f9,stroke-width:2px
  classDef storage fill:#2e2a0e,stroke:#facc15,color:#fde68a,stroke-width:2px
  classDef grafana fill:#14332a,stroke:#4ade80,color:#86efac,stroke-width:2px

All Components

Prometheus

production

Pull-based metrics collection with multi-dimensional data model and powerful PromQL query language.

Role: Primary metrics scraping for all platform services via ServiceMonitors

Grafana

production

Visualization platform connecting metrics, logs, traces, and profiles in unified dashboards.

Role: Central observability UI with pre-built dashboards for every platform component

Mimir

production

Horizontally-scalable long-term metrics storage with Prometheus-compatible API.

Role: Indefinite metrics retention with high compression and fast queries

Loki

production

Log aggregation system inspired by Prometheus — indexes labels, not full log lines.

Role: Centralized logging with LogQL queries across all namespaces

Tempo

production

Distributed tracing backend supporting Jaeger, Zipkin, and OpenTelemetry formats.

Role: End-to-end request tracing across microservices

Pyroscope

production

Continuous profiling platform for CPU, memory, goroutine, and lock contention analysis.

Role: Runtime performance profiling with flame graph visualization

Grafana Alloy

production

Unified telemetry collector replacing Promtail, Grafana Agent, and OpenTelemetry Collector.

Role: Single agent collecting metrics, logs, traces, and profiles from all nodes

OpenCost

production

Real-time Kubernetes cost monitoring with per-namespace and per-workload breakdown.

Role: Infrastructure cost visibility and optimization recommendations