🤖 AI & MACHINE LEARNING

الذكاء الاصطناعي على الحافة

شغّل نماذج اللغة الكبيرة على أجهزتك الخاصة. NVIDIA DGX Spark مع GPU Blackwell ومنصة استدلال AIBrix وخدمة vLLM — متكاملة بالكامل في كلستر Kubernetes.

مواصفات NVIDIA DGX SPARK

الشريحة الفائقة

Grace Blackwell GB10

أداء الذكاء الاصطناعي

1 PFLOP FP4 · 1000 TOPS

هندسة المعالج

معالج ARM64 Grace

الذاكرة الموحدة

128 جيجابايت LPDDR5x · 273 جيجابايت/ث

الشبكة

ConnectX-7 · 100/200 GbE

CUDA

13.0 · DGX OS (Ubuntu 24.04)

هندسة الاستدلال

هندسة بوابة مزدوجة مع Cilium لإنهاء TLS وEnvoy Gateway للتوجيه إلى حاويات vLLM على عُقدة DGX Spark.

graph TD
  CLIENT["Client / App"]:::client
  DNS["CoreDNS
llm.exitthecloud.eu"]:::dns
  CGW["Cilium Gateway
192.168.0.200
TLS termination"]:::cgw
  EGW["Envoy Gateway
192.168.0.201
Model routing"]:::egw
  VLLM["vLLM Pod
DGX Spark (gx10)
Blackwell GPU"]:::vllm
  HF["HuggingFace Hub
Model weights"]:::hf

  CLIENT --> DNS --> CGW --> EGW --> VLLM
  HF -.->|"download"| VLLM

  classDef client fill:#1e293b,stroke:#e2e8f0,color:#e2e8f0,stroke-width:2px
  classDef dns fill:#0e3a3a,stroke:#06b6d4,color:#67e8f9,stroke-width:2px
  classDef cgw fill:#14332a,stroke:#4ade80,color:#86efac,stroke-width:2px
  classDef egw fill:#2e2a0e,stroke:#facc15,color:#fde68a,stroke-width:2px
  classDef vllm fill:#1a2e1a,stroke:#10b981,color:#6ee7b7,stroke-width:2px
  classDef hf fill:#2e1a47,stroke:#a78bfa,color:#c4b5fd,stroke-width:2px

النماذج المدعومة

قدّم أي نموذج من HuggingFace يتسع في 128 جيجابايت من الذاكرة الموحدة. وحدة GPU واحدة، نموذج واحد في كل مرة.

🟢

Qwen 2.5 (1.5B / 7B / 32B)

Alibaba Cloud

🟣

Llama 3.1 (8B)

Meta AI

🔵

Mistral (7B)

Mistral AI

التكامل مع المنصة

مكدس الذكاء الاصطناعي ليس إضافة جانبية — بل منسوج بالكامل في منصة Kubernetes.

🔄

GitOps

مزامنة ArgoCD على 3 موجات: CRDs → متحكمات → أحمال عمل. تصريحي بالكامل.

🔐

الأسرار

رمز HuggingFace من Vault عبر ESO. صفر بيانات اعتماد في Git.

📊

المراقبة

مصدّر DCGM + ServiceMonitor. مقاييس GPU في لوحات Grafana.

🌐

الشبكات

Cilium Gateway → Envoy Gateway → vLLM. بوابة مزدوجة مع TLS.

نقطة الوصول API

API متوافق مع OpenAI متاح على:

https://llm.exitthecloud.eu/v1/chat/completions

جميع المكونات

NVIDIA DGX Spark

production

Desktop AI supercomputer powered by Grace Blackwell GB10 Superchip — 1 PFLOP FP4, 128GB unified LPDDR5x memory, ARM64 architecture.

الدور: Dedicated GPU worker node (gx10) with Blackwell GPU, CUDA 13.0, and ConnectX-7 networking

AIBrix

production

Open-source Kubernetes-native AI inference platform with prefix-cache-aware routing, LLM-specific autoscaling, and distributed KV cache.

الدور: LLM model serving control plane — 3-wave ArgoCD deployment with Envoy Gateway routing

vLLM

production

High-throughput LLM inference engine with PagedAttention, continuous batching, and OpenAI-compatible API.

الدور: Inference runtime serving Qwen, Llama, and Mistral models via NVIDIA NGC images on ARM64

NVIDIA GPU Operator

production

Kubernetes operator automating GPU driver, container toolkit, device plugin, and DCGM exporter lifecycle.

الدور: GPU resource management with driver-less mode for DGX OS — exposes nvidia.com/gpu to scheduler

Hindsight

production

Temporal semantic memory system for AI agents — retain, recall, and reflect operations backed by pgvector similarity search.

الدور: Agent memory layer with GPU-accelerated local embeddings and reranking, powered by minimax LLM

← العودة إلى المكونات الكاملة