CATALOGUE DES COMPOSANTS

Le Stack complet

50 composants de production répartis dans 9 catégories — tous open-source, tous éprouvés.

🏗️

Infrastructure & OS

The bare-metal foundation: immutable OS, container runtime, and cluster orchestration.

Talos Linux

production

Minimal, immutable Linux distribution designed specifically for Kubernetes. No SSH, no shell — managed entirely through API.

Rôle : Node operating system for all 6 cluster nodes (3 control plane, 2 Talos workers, 1 DGX Spark)

containerd

production

Industry-standard container runtime with low overhead and broad compatibility.

Rôle : Container runtime on all nodes

Kubernetes

production

Production-grade container orchestration system for automating deployment, scaling, and management.

Rôle : Core orchestration platform running v1.34.1

🌐

Networking & Service Mesh

eBPF-powered networking, Gateway API ingress, service mesh, and DNS resolution.

Cilium

production

eBPF-based networking, observability, and security. Replaces kube-proxy with high-performance service load balancing.

Rôle : CNI plugin, network policy enforcement, L2 ARP announcement, Gateway API implementation

Hubble

production

Network observability platform built on Cilium eBPF data plane for deep visibility into communication and behavior.

Rôle : Network flow observability, service dependency mapping

Gateway API

production

Next-generation Kubernetes ingress API with expressive routing, TLS termination, and traffic splitting.

Rôle : Single shared gateway handling all HTTP/HTTPS traffic at 192.168.0.200

APISIX

production

High-performance, cloud-native API gateway with rich traffic management features.

Rôle : Advanced API gateway for complex routing scenarios

CoreDNS

production

Flexible, extensible DNS server for Kubernetes service discovery.

Rôle : Cluster DNS with wildcard resolution for *.apps.edgeprime.io

Linkerd

deployed

Ultralight service mesh providing mTLS, observability, and reliability features.

Rôle : Service mesh for zero-trust networking with automatic mTLS

🛡️

Security & Identity

Zero-trust security: SSO, secrets management, policy enforcement, runtime detection, and certificate automation.

Keycloak

production

Enterprise identity and access management with OIDC, SAML, social login, and LDAP integration.

Rôle : Centralized SSO for all platform services — Vault, Harbor, Grafana, ArgoCD, OneDev, AFFiNE

HashiCorp Vault

production

Secrets management, encryption as a service, and privileged access management.

Rôle : HA deployment (3-replica Raft cluster) storing all platform secrets, DNS credentials, TLS certificates

External Secrets Operator

production

Kubernetes operator that synchronizes secrets from external stores into Kubernetes secrets.

Rôle : Bridges Vault ↔ Kubernetes: syncs secrets to pods, pushes certificates back to Vault

cert-manager

production

Automatic TLS certificate management with Let's Encrypt ACME protocol support.

Rôle : Automated certificate issuance via DNS-01 challenges with Cloudflare

Kyverno

production

Kubernetes-native policy engine for validation, mutation, and generation of resources.

Rôle : Enforces security policies: label requirements, container restrictions, cross-tenant isolation

Falco

production

Runtime security monitoring using eBPF probes to detect anomalous container behavior.

Rôle : Real-time threat detection: shell spawning, privilege escalation, sensitive file access

Kubescape

deployed

Kubernetes security platform for continuous scanning against NSA, MITRE, and CIS benchmarks.

Rôle : Compliance scanning and hardening recommendations

Open AppSec

deployed

ML-based web application firewall and API security.

Rôle : WAF protection for exposed services

📊

Observability

Full-spectrum observability: metrics, logs, traces, profiles, and cost monitoring in a unified stack.

Prometheus

production

Pull-based metrics collection with multi-dimensional data model and powerful PromQL query language.

Rôle : Primary metrics scraping for all platform services via ServiceMonitors

Grafana

production

Visualization platform connecting metrics, logs, traces, and profiles in unified dashboards.

Rôle : Central observability UI with pre-built dashboards for every platform component

Mimir

production

Horizontally-scalable long-term metrics storage with Prometheus-compatible API.

Rôle : Indefinite metrics retention with high compression and fast queries

Loki

production

Log aggregation system inspired by Prometheus — indexes labels, not full log lines.

Rôle : Centralized logging with LogQL queries across all namespaces

Tempo

production

Distributed tracing backend supporting Jaeger, Zipkin, and OpenTelemetry formats.

Rôle : End-to-end request tracing across microservices

Pyroscope

production

Continuous profiling platform for CPU, memory, goroutine, and lock contention analysis.

Rôle : Runtime performance profiling with flame graph visualization

Grafana Alloy

production

Unified telemetry collector replacing Promtail, Grafana Agent, and OpenTelemetry Collector.

Rôle : Single agent collecting metrics, logs, traces, and profiles from all nodes

OpenCost

production

Real-time Kubernetes cost monitoring with per-namespace and per-workload breakdown.

Rôle : Infrastructure cost visibility and optimization recommendations

🔄

GitOps & CI/CD

Git-driven deployment pipelines with progressive delivery and infrastructure-as-code.

Argo CD

production

GitOps continuous delivery tool that reconciles desired state from Git with cluster state.

Rôle : Core GitOps engine with App-of-Apps pattern managing 40+ applications

Terraform

production

Infrastructure as Code for provisioning and managing cloud-agnostic resources.

Rôle : Manages Vault secrets, Keycloak OIDC clients, Grafana dashboards, Harbor config

OneDev

production

Self-hosted Git repository manager with integrated CI/CD pipelines and code review.

Rôle : Private Git hosting with container-based CI runners

Kargo

planned

Progressive delivery engine adding multi-stage promotion workflows on top of Argo CD.

Rôle : Environment promotion pipelines: dev → staging → production

💾

Storage & Registry

Distributed block storage, S3-compatible object storage, and secure container registry.

Longhorn

production

Cloud-native distributed block storage with 3-way replication, snapshots, and backups.

Rôle : Primary storage class for all stateful workloads with automatic replication

Harbor

production

Enterprise container registry with vulnerability scanning, image signing, and RBAC.

Rôle : Private registry with Trivy scanning, OIDC auth, and replication policies

Garage

production

S3-compatible distributed object storage designed for self-hosted deployments.

Rôle : Cost-effective object storage for backups, logs, and unstructured data

Velero

deployed

Kubernetes backup and disaster recovery tool with snapshot and restore capabilities.

Rôle : Cluster-wide backup to S3 with scheduled policies

🗄️

Databases & Messaging

Managed PostgreSQL, Redis-compatible cache, distributed KV store, Kafka streaming, and multi-model databases.

CloudNativePG

production

Kubernetes operator for PostgreSQL with HA clustering, automated failover, and point-in-time recovery.

Rôle : Manages PostgreSQL clusters for 5+ applications (Keycloak, Backstage, Matomo, etc.)

Dragonfly

production

Redis-compatible in-memory data store with superior performance through modern algorithms.

Rôle : High-performance caching layer replacing Redis

Strimzi (Apache Kafka)

production

Kubernetes operator for Apache Kafka with native CRD-based management.

Rôle : Event streaming platform for asynchronous communication

TiKV

production

Distributed transactional key-value store with ACID transactions and Raft consensus.

Rôle : Backend storage engine for SurrealDB with strong consistency

SurrealDB

production

Multi-model database supporting document, graph, and key-value data models.

Rôle : Flexible database for applications needing graph + document queries

Qdrant

production

Vector database for similarity search, powering semantic search and AI applications.

Rôle : Vector embeddings store for AI/ML workloads

🚀

Application Platform

Developer portal, BaaS, workflow automation, analytics, and self-service tools.

Backstage

production

Open platform for building developer portals with service catalog and self-service templates.

Rôle : Self-service portal for certificate management and tenant onboarding

Supabase

production

Open-source Firebase alternative: PostgreSQL, auth, real-time, storage, and edge functions.

Rôle : Backend-as-a-Service for rapid application development

n8n

production

Self-hosted workflow automation with 400+ integrations and visual builder.

Rôle : Event-driven automation for platform operations and notifications

Matomo

production

Privacy-focused web analytics platform — self-hosted Google Analytics alternative.

Rôle : Visitor tracking without third-party data sharing

Homepage

production

Application dashboard providing a unified start page for all platform services.

Rôle : Central dashboard linking all 20+ platform services

AFFiNE

production

Privacy-focused knowledge management workspace — alternative to Notion.

Rôle : Team documentation and knowledge management

KubeVirt

deployed

Run virtual machines alongside containers on the same Kubernetes infrastructure.

Rôle : VM workloads for legacy applications that can't be containerized

🤖

AI & Machine Learning

Edge AI inference on NVIDIA DGX Spark with Blackwell GPU — LLM model serving via AIBrix and vLLM on bare-metal Kubernetes.

NVIDIA DGX Spark

production

Desktop AI supercomputer powered by Grace Blackwell GB10 Superchip — 1 PFLOP FP4, 128GB unified LPDDR5x memory, ARM64 architecture.

Rôle : Dedicated GPU worker node (gx10) with Blackwell GPU, CUDA 13.0, and ConnectX-7 networking

AIBrix

production

Open-source Kubernetes-native AI inference platform with prefix-cache-aware routing, LLM-specific autoscaling, and distributed KV cache.

Rôle : LLM model serving control plane — 3-wave ArgoCD deployment with Envoy Gateway routing

vLLM

production

High-throughput LLM inference engine with PagedAttention, continuous batching, and OpenAI-compatible API.

Rôle : Inference runtime serving Qwen, Llama, and Mistral models via NVIDIA NGC images on ARM64

NVIDIA GPU Operator

production

Kubernetes operator automating GPU driver, container toolkit, device plugin, and DCGM exporter lifecycle.

Rôle : GPU resource management with driver-less mode for DGX OS — exposes nvidia.com/gpu to scheduler